Building a Pluggable AI Brain: A Modern Chatbot Architecture

Building an AI chatbot is deceptively simple at first. You grab a library for Telegram or a web framework like Typebot, wire it up to an LLM, and you have a working prototype in an afternoon. But this initial velocity often hides a critical architectural flaw that leads to significant technical debt: platform lock-in.

I've been thinking about a more robust, scalable approach. What if we could build a central, intelligent "brain" once, and then simply "plug it in" to any frontend we choose—Telegram today, a web app tomorrow, maybe even a Slack bot next month?

This isn't a fantasy. It's achievable with a clean, decoupled architecture.

The Core Problem: The Platform Trap

The common approach is to write your backend logic to directly serve a specific frontend. If you're building for Telegram, your code is filled with logic for handling Telegram Update objects and calling the sendMessage API. If you're building for a web UI, your code is handling HTTP requests and returning JSON.

The problem is that this hard-wires your core business logic—your RAG implementation, your complex prompting, your very "secret sauce"—to a specific communication protocol. Migrating to a new platform means a significant rewrite. This is not a sustainable model.

The Architectural Shift: A Decoupled Brain

The solution is to enforce a strict separation of concerns. We think of our system not as two parts (frontend and backend), but as three:

The Frontend: This is the "face" of our application. It could be Typebot for a structured web chat or Telegram for a native mobile experience. Its only job is to manage the UI and communicate with our brain. It is completely interchangeable.
The Brain (The Backend): This is our protocol-agnostic, core logic service. It contains all the complex AI processing, but it has zero knowledge of whether it's talking to Telegram or a web browser.
The Deployment Platform: The infrastructure that runs our brain. A modern PaaS like Railway is perfect here. It takes our code, deploys it, and gives it a stable, secure address on the internet.

The Secret Sauce: The Adapter Pattern

So, how does the Brain remain ignorant of the outside world while still communicating with it? The answer is the classic Adapter Pattern.

The idea is simple: you wouldn't rewire your laptop just to plug it into a foreign power outlet; you'd use an adapter. We do the same with our code.

The Brain defines a single, standardized, internal format for requests and responses. Then, for each frontend we want to support, we build a lightweight Adapter.

An Adapter has two jobs: 1. Translate Inbound: It takes a platform-specific request (like a Telegram webhook) and translates it into the Brain's standard internal format. 2. Translate Outbound: It takes the Brain's standard internal response and translates it into the specific API calls or HTTP responses that the platform expects.

This creates a clean, insulated core.

+----------------+      +---------------------+      +---------------------+
|   Telegram     |----->|  Telegram Adapter   |      |                     |
+----------------+      +---------------------+      |                     |
                                     |               |                     |
                                     +-------------->|     Core "Brain"    |
                                     |               |   (Protocol Agnostic) |
+----------------+      +---------------------+      |    - RAG Logic      |
|  Typebot Web UI|----->|    Web Adapter      |      |    - LLM Calls      |
+----------------+      +---------------------+      |                     |
                                                     +---------------------+

The core Brain only ever sees a standard request object. It doesn't care if that request originated from a Telegram user or a Typebot session. It does its work and returns a standard response, leaving the adapter to handle the platform-specific translation.

A Note on Security & State

This model handles production concerns gracefully. Security becomes layered. The first layer is a static M2M (machine-to-machine) token to authenticate the frontend application itself. The second is a dynamic user token (like a JWT) passed inside the request body to identify the end-user for long-term memory access (RAG). The adapter is responsible for extracting these details and passing them to the core in a standardized way.

Why This Architecture Matters

Adopting this decoupled, adapter-driven approach provides clear, immediate benefits:

Flexibility: Adding a new frontend like Discord or Slack is no longer a rewrite. It's simply the work of creating a new adapter.
Reusability: The most complex and valuable part of your system—the AI brain—is written once and reused everywhere.
Testability: You can write comprehensive unit tests for your core logic without needing to mock complex web requests or external APIs. You test your brain in complete isolation.

This isn't just a theoretical exercise. It's a pragmatic blueprint for building serious, multi-platform AI applications that are designed to evolve, not break.

Published on: Jul 26, 2025