Engineering breakdowns of real deployments: what we tried, what we threw away, and why the final system was harder than it looked at first.
Each case study shows the problem, the system architecture, and the approaches we tried on the way there, including the ones we discarded. Under the spoiler blocks are the full engineering maps for readers who want the component-level view.
We deliberately write about mistakes and dead ends. In our experience, those explain why the system ended up the way it did and why the project took as long as it did. Without that context, the final result looks either trivial or unbelievable.
The agent initiates the sale inside the mobile app. Seventeen triggers decide when to start the conversation, and a multi-layer compliance stack keeps the flow inside regulatory boundaries. Almost half of all changes across ten weeks came from real conversations with real clients.
One hallucination is enough for a driver to believe the platform is down and skip a shift.
A confidence formula with 30+ parameters and eight hallucination markers. Every rule came from a concrete production failure.
A patient case could stretch close to two hours because the operator had to stitch together chat, program rules, clinics, approvals, and documents by hand.
Not a narrow copilot but a decision system: one global chat, temporal reasoning, booking, guarantee letters, and controlled CRM side effects.
A lawyer remembers the most dangerous rule only in the courtroom, when the other side names it.
The engine adds the rules the other side strikes with on its own. Phrase-level filters stop the law from being inverted, and quality is measured against benchmark answers from lawyers.
A dispatcher asks about one node, but the answer is spread across several systems where the same station is spelled differently and means something different in each.
A router instead of one global search: for each question the system picks the right source and never confuses a plan with a fact. Read-only access, and the assistant is not wired to movement control.
The client was already running LLM agents in production and couldn't see inside: which prompt went to the model, what an answer cost, and whether it was even correct.
An observability layer — per-call tracing, token-level cost accounting, and step-by-step quality scoring — shipped as a working reference inside the client's own Kubernetes. Plain OpenTelemetry, no vendor-SDK lock-in.
A live party-game platform on PHP and Node with several years of legacy. Technical debt grew faster than new features, and a large team could no longer keep it under control.
We build a development harness: the legacy is frozen as an executable spec, wrapped in tests, and rewritten one endpoint at a time so a couple of people with agents can run it. Part one: authorization is already in the new code.
These four deployments produced more than client systems. Out of the repeated pain around requirements, test sets, and regression loops came a separate platform layer.
A breakdown of the layer that grew out of four deployments and turns documents, live conversations, and production failures into requirements, test sets, and a regression loop. Inside the platform we call it datasetgen.
We are preparing the full write-ups and will publish them as they are ready.
AI agents for level-1 support in the categories where the deterministic bot kept failing. Ten high-frequency request types were automated above 80% quality, with speech analytics added on top for voice and chat QA.
An AI agent layer on top of Oracle CMS for twelve level-2 request categories. Agents repeat the operator workflow: receive the request, categorize it, clarify details, query internal systems, and contact counterparties without losing context between iterations.
We will tell you whether the task fits AI agents and, if it does, outline a concrete plan.
Inquiry sent
We will reply within one business day to the email you provided.
or write directly to ilya@manaraga.ai