Cases

Engineering breakdowns of real deployments: what we tried, what we threw away, and why the final system was harder than it looked at first.

Deployments

Each case study shows the problem, the system architecture, and the approaches we tried on the way there, including the ones we discarded. Under the spoiler blocks are the full engineering maps for readers who want the component-level view.

We deliberately write about mistakes and dead ends. In our experience, those explain why the system ended up the way it did and why the project took as long as it did. Without that context, the final result looks either trivial or unbelievable.

AI agent replaces the personal manager for small investors

The agent initiates the sale inside the mobile app. Seventeen triggers decide when to start the conversation, and a multi-layer compliance stack keeps the flow inside regulatory boundaries. Almost half of all changes across ten weeks came from real conversations with real clients.

Public Sector Tech Support

Operator of an Urban Transport System

One hallucination is enough for a driver to believe the platform is down and skip a shift.

A confidence formula with 30+ parameters and eight hallucination markers. Every rule came from a concrete production failure.

Insurance VHI Service

Luchi: a decision system for the VHI service workflow

A patient case could stretch close to two hours because the operator had to stitch together chat, program rules, clinics, approvals, and documents by hand.

Not a narrow copilot but a decision system: one global chat, temporal reasoning, booking, guarantee letters, and controlled CRM side effects.

Public Sector Legal AI

AI assistant for a government lawyer in procurement disputes

A lawyer remembers the most dangerous rule only in the courtroom, when the other side names it.

The engine adds the rules the other side strikes with on its own. Phrase-level filters stop the law from being inverted, and quality is measured against benchmark answers from lawyers.

Railways Operations Assistant

AI Assistant for Railway Engineers and Dispatchers

A dispatcher asks about one node, but the answer is spread across several systems where the same station is spelled differently and means something different in each.

A router instead of one global search: for each question the system picks the right source and never confuses a plan with a fact. Read-only access, and the assistant is not wired to movement control.

Accounting SaaS LLM Observability

Moe Delo: observability for LLM agents inside their own perimeter

The client was already running LLM agents in production and couldn't see inside: which prompt went to the model, what an answer cost, and whether it was even correct.

An observability layer — per-call tracing, token-level cost accounting, and step-by-step quality scoring — shipped as a working reference inside the client's own Kubernetes. Plain OpenTelemetry, no vendor-SDK lock-in.

Entertainment Development harness

PARTYstation: a development harness for a live legacy platform

A live party-game platform on PHP and Node with several years of legacy. Technical debt grew faster than new features, and a large team could no longer keep it under control.

We build a development harness: the legacy is frozen as an executable spec, wrapped in tests, and rewritten one endpoint at a time so a couple of people with agents can run it. Part one: authorization is already in the new code.

Cross-platform layer

These four deployments produced more than client systems. Out of the repeated pain around requirements, test sets, and regression loops came a separate platform layer.

Platform deep dive Quality loop

AI Quality Loop

A breakdown of the layer that grew out of four deployments and turns documents, live conversations, and production failures into requirements, test sets, and a regression loop. Inside the platform we call it datasetgen.

In progress

We are preparing the full write-ups and will publish them as they are ready.

Infrastructure Level 1

Urban Services Operator

AI agents for level-1 support in the categories where the deterministic bot kept failing. Ten high-frequency request types were automated above 80% quality, with speech analytics added on top for voice and chat QA.

Retail Oracle CMS

Largest Beauty Retailer

An AI agent layer on top of Oracle CMS for twelve level-2 request categories. Agents repeat the operator workflow: receive the request, categorize it, clarify details, query internal systems, and contact counterparties without losing context between iterations.

Tell us which process you want to break down.

We will tell you whether the task fits AI agents and, if it does, outline a concrete plan.

or write directly to ilya@manaraga.ai