ResearchPublications
Applied Research·April 2026·20 min read
Synapse: Designing a Research-Driven AI System Beyond the Base Model

Synapse: Designing a Research-Driven AI System Beyond the Base Model

A research-first look at how Synapse combines retrieval, routing, memory, verification, and runtime controls into a dependable applied intelligence system.

By Taneem Ullah Jan
ErisAI Research

Our Synapse is not a thin wrapper around a large language model. It is a system for applied intelligence: a research-driven AI system that treats retrieval, structured execution, conversational state, knowledge-base isolation, knowledge ingestion, verification, and operational reliability as part of the intelligence stack rather than as peripheral infrastructure. This distinction matters. In practical deployments, the useful behavior of an AI product is rarely determined by the base model alone. It emerges from the interaction between models, memory, routing, data, evidence control, and runtime policy.

The following article presents Synapse from a research-first perspective, with necessary engineering details to explain the system behavior. The core argument is simple: if we want dependable AI for knowledge-intensive, multi-turn, production workloads, we must move frommodel-centric design to system-centric design. At ErisAI, Synapse is our expression of that idea.

1. Introduction

Public discussion around AI often collapses everything into the model. Which model was used? How large is it? How well does it benchmark? No doubt, these are important questions, but they are incomplete. In real-world deployments, useful AI behavior depends just as much on how the model is embedded inside a broader system: how it retrieves knowledge, how it routes different query types, how it handles ambiguity, how it preserves conversational continuity, how it behaves under load, and how safely it updates its knowledge sources.

These questions build the design space in which Synapse operates

At ErisAI, we approach Synapse as a full-stack applied intelligence system rather than a single generative endpoint. It combines retrieval, memory, structured execution paths, knowledge-base orchestration, and scale-aware runtime controls into one coherent architecture. This article reflects a close architectural review of Synapse together with current literature state of the main design patterns it draws on. It explains the scientific and systems principles behind the system without exposing sensitive implementation leverage.

The rest of this article explains the research ideas behind Synapse. How those ideas appear in the architecture, and why this design space is stronger for production knowledge systems than the increasingly common pattern of wrapping a powerful model with minimal orchestration.


2. From Model-Centric AI to System-Centric AI

One of the clearest lessons from modern AI research is that parametric knowledge alone is not enough for knowledge-intensive work. Retrieval-Augmented Generation (RAG) helped formalize this shift by showing that generation can be grounded in external information rather than relying purely on what a model stores in its weights [1]. Information retrieval research had already established that different retrieval methods capture different signals. Lexical methods such as BM25 remain powerful for exact or sparse matching [2], while dense retrieval captures semantic similarity. Rank-fusion methods such as Reciprocal Rank Fusion (RRF) show that combining signals often outperforms trusting any single ranker [3]. Retrieve-and-rerank pipelines further refine this idea by separating broad candidate discovery from more precise reordering [4].

More recent work sharpens the same lesson. Surveys of RAG systems now commonly distinguish between naïve, advanced, and modular forms of retrieval-augmented design, which helps make the move from model-centric to system-centric AI more explicit [6]. Self-RAG [5] argues that retrieval should not be blindly applied in a fixed way for every query. It should be adaptive, and generation should be able to critique its own evidence use [7]. CRAG shows that retrieval quality itself should be assessed, and that weak retrieval should trigger corrective actions rather than being accepted as valid context by default [7].

The implication is larger than retrieval itself. Once we admit that useful AI behavior depends on external knowledge and intermediate control layers, we also have to admit that memory, verification, orchestration, and runtime policy are not extra engineering. They are essential and part of the intelligence architecture.

Synapse is built around that premise.


3. The Central Research Thesis Behind Synapse

Synapse is grounded in five broad hypotheses.

3.1. Useful AI systems must externalize and organize knowledge

A model can be impressive and still be fragile on enterprise or domain-specific work. Synapse therefore treats knowledge as external, indexed, filterable, version-aware, and refreshable rather than as something we expect the model to 'memorize'. This allows the system to evolve with the data, isolate domains into separate knowledge bases, and preserve stronger control over what context may influence an answer.

3.2. No single retrieval signal is sufficient

Semantic retrieval is excellent when a user’s wording differs from the source text. Lexical retrieval is often stronger when the task hinges on exact entities, filenames, formulas, or sparse terminology. Synapse therefore treats retrieval as plural. It combines semantic and lexical search, then uses fusion, reranking, and post-retrieval verification to narrow context toward the evidence most likely to support a grounded response.

3.3. Not every question should be answered generatively

Many AI applications overuse generation. In practice, some user intents are better handled by structured execution paths: counting entities, locating specific files, extracting exact spans, or answering follow-up questions whose semantics are narrow and explicit. Synapse routes such requests early when possible, reducing latency, lowering hallucination risk, and reserving free-form generation for the parts of the problem that actually benefit from it.

3.4. Conversation is a state problem, not just a prompt-length problem

A multi-turn system should not merely append previous messages to the next prompt and hope for the best. It should track what the user is asking about, what entities remain active, whether the current turn is a clarification or continuation, and which pieces of earlier context remain relevant. Conversational retrieval work supports this view from two complementary angles. Conversational dense retrieval shows why retrieval quality degrades when follow-up questions depend on omitted context [9], while CONQRR shows that rewriting follow-up turns into standalone retrieval-ready queries can materially improve downstream retrieval [10]. Memory research such as LongMem and MemGPT further supports the broader claim that useful continuity requires explicit memory design rather than raw transcript accumulation [11, 12, 13].

3.5. Reliability under load is part of model quality in practice

An AI system that becomes inconsistent under concurrency, leaks partial ingestion states into retrieval, or behaves unpredictably when sessions collide is not merely an operations' problem. It is a quality problem. Synapse therefore incorporates bounded generation concurrency, session protection, staged document visibility, asynchronous job handling, and readiness logic as first-class mechanisms for preserving system behavior at scale.


4. What Synapse Actually Is

At a high level, Synapse is a multi-knowledge-base AI orchestration system that coordinates the following layers:

  • knowledge-base lifecycle management
  • session-aware query routing
  • structured query handling
  • hybrid retrieval and fusion
  • reranking and relevance verification
  • conversational memory and topic continuity
  • grounded generation
  • carefully scoped caching
  • online and offline indexing
  • concurrency and safety controls

The simplest way to understand Synapse is as a pipeline in which not every request takes the same path.

Figure 1. Synapse routes requests through different execution paths instead of forcing every query through one generic generation stack.

This architecture matters because it avoids two common failure modes:

  1. sending every query through the same expensive generative path
  2. assuming retrieval alone is enough without state, routing, or verification

Synapse explicitly rejects both assumptions.


5. The Research Layers of Synapse

5.1. Knowledge as a controlled external substrate

Synapse organizes information into distinct knowledge bases, each with its own configuration snapshot and operational identity. This has two research advantages.

First, it separates knowledge domains in a controlled way. A system can be specialized without requiring a separate monolithic deployment for every use case. Second, it makes behavior more reproducible. If a knowledge base has a materialized configuration, then routing, grounding, and generation policy can be reasoned about as versioned state rather than an implicit set of runtime accidents.

This turns knowledge-base design into something closer to experimental control. Different domains can vary in carefully bounded ways without losing the benefits of shared infrastructure.

5.2. Hybrid retrieval instead of retrieval monoculture

Synapse uses hybrid retrieval because user queries are heterogeneous. Some are conceptual. Some are lexical. Some are entity-heavy. Some are follow-up questions that omit the most important noun because the noun is already alive in the conversation.

In retrieval research terms, this is exactly the kind of setting where heterogeneous evidence helps. BM25-style retrieval remains valuable for sparse or exact matching [2]. Dense retrieval captures paraphrase and semantic similarity, with Sentence-BERT remaining one of the canonical references for practical sentence embedding retrieval [4]. Fusion methods such as RRF help synthesize rankings from both worlds [3]. Cross-encoder reranking then improves local precision by scoring question-document pairs jointly, as shown in passage reranking work built on BERT [8].

Synapse adopts this philosophy in system form. Rather than committing to a single search paradigm, it treats retrieval as a staged evidence-gathering process:

  1. produce candidates from different retrieval signals
  2. fuse the candidate sets
  3. rerank toward question-specific relevance
  4. verify grounding before generation

This is one of the clearest places where Synapse behaves like an AI system and not just an LLM endpoint.

5.3. Grounding is more than retrieval

Basic RAG is often described as retrieve documents, append them to the prompt, and generate. That is useful, but simply not sufficient. Retrieved context can be redundant, weakly relevant, contradictory, or noisy. Synapse therefore inserts a verification layer between retrieval and final answer generation.

This design is also aligned with newer research. Self-RAG and CRAG both reinforce the idea that retrieval should be assessed rather than blindly trusted [6, 7]. Evaluation work such as RAGAs and ARES also supports the broader framing that grounded systems must be judged along multiple dimensions, including faithfulness and context quality rather than answer fluency alone [13, 14]. Conceptually, this makes Synapse less willing to treat raw retrieval as 'truth'. The system narrows context through reranking, relevance checks, and citation-aware response assembly before the generative model produces the final response.

5.4. Structured execution as an antidote to unnecessary generation

A subtle but important design choice in Synapse is that it detects when a query is better handled deterministically than generatively. Some user requests reduce to operations such as:

  • counting or enumerating known entities in the knowledge base
  • locating files or documents tied to those entities
  • pulling exact passages or snippets
  • resolving ambiguous follow-up references

These cases do not always require a long-form generated answer path. Routing them into structured handlers improves both speed and reliability. In research terms, this reflects a broader principle: intelligence systems should allocate the most powerful and expensive reasoning path only when the task actually requires it.

5.5. Conversation as memory, topic, and reformulation

Synapse treats dialogue as a persistent state machine rather than a sliding window of recent text. The system keeps track of session history, topic continuity, extracted entities, follow-up intent, and pending conversational threads. When needed, it reformulates the active query into a more retrieval-ready standalone representation before search begins.

This matters because enterprise-style conversations are rarely isolated one-shot prompts. Users ask for comparisons, follow-ups, clarifications, expansions, and returns to earlier topics. A useful AI system must be able to preserve coherence without overcommitting to irrelevant history. Synapse addresses that by combining memory with routing and selective reformulation rather than relying on raw context accumulation alone.

5.6. Caching as scoped efficiency, not blind reuse

Caching is often described purely as a latency optimization. In Synapse it also serves a conceptual role: if two sufficiently similar requests, under the same relevant knowledge state and policy conditions, should follow the same grounded answer path, then carefully scoped cache reuse can reduce repeated work without weakening evidence discipline.

That caveat matters. Cache reuse is only safe when it is tied to the correct knowledge version, routing conditions, and context assumptions. Used carelessly, it can blur freshness and state boundaries. Used carefully, it becomes part of a more stable and economical system behavior. Work such as GPTCache helps justify this design space by treating semantic caching as a first-class systems technique for reducing repeated model calls while preserving usefulness when similarity checks are applied carefully [15].


6. Formalizing Synapse as a Hybrid AI System

A useful abstraction for Synapse is:

where is the current user turn, is the session state, is the active knowledge base, and is the configuration snapshot associated with that knowledge base. This is a conceptual formalism. Its purpose is to show that Synapse behavior is not just a function of the prompt. It is a function of prompt, state, knowledge partition, and runtime policy together.

6.1. Routing as Conditional Computation

Rather than applying one universal path to every request, Synapse approximates conditional computation over a route set .

where includes paths such as structured execution, retrieval-generation, clarification, and formatting behavior. A better public-safe reading is not that Synapse claims an optimal mathematical router in the formal learning sense, but that it treats route selection as a first-order systems problem rather than a hidden implementation detail.

6.2 Retrieval-grounded answering as a latent evidence model

The classic RAG view can be adapted into a Synapse-style abstraction as:

where is a reformulated retrieval query and is the verified context set selected for the turn. This should be read as a conceptual lens, not as a claim that Synapse implements the exact training objective from Lewis et al. [1]. Rather, it follows the same scientific principle: response quality depends jointly on retrieval quality and generation quality.

6.3 Hybrid retrieval in explicit scoring terms

For lexical retrieval, the BM25 family gives the canonical sparse retrieval form [2]:

For fusion, a standard RRF formulation is:

where is the set of retrieval channels and is the rank assigned to document by channel . The significance for Synapse is architectural: dense retrieval and sparse retrieval are not competing dogmas here. They are complementary estimators whose outputs are reconciled before reranking and verification.

6.4 Conversational continuity as a state transition

where denotes extracted entities or salient references from the turn and denotes evidence or citations attached to the answer. This is a more faithful representation of the architectural idea than the usual append-prior-messages story. The system updates topic, entities, and reusable conversational context after each turn rather than treating history as undifferentiated text.

6.5 Safe serving as a visibility constraint

One of the most important engineering-research connections in Synapse is that retrieval is constrained to serve only committed knowledge:

where is the visibility state of a document or chunk at time . This is the formal version of staged ingestion. It prevents partially processed material from being treated as production-ready evidence.

Taken altogether, these equations describe Synapse more accurately than the phrase 'an LLM application'. Synapse is better understood as a routed, stateful, retrieval-grounded, visibility-constrained AI system.


7. The Knowledge Lifecycle: Why Ingestion Design Matters

The quality of an AI system is deeply affected by how knowledge enters the system. If ingestion is unsafe, slow, partial, or opaque, the answering layer inherits those weaknesses.

Synapse therefore treats document processing as a staged lifecycle rather than a one-step upload:

Figure 2. Synapse keeps ingestion separate from live serving so partially processed material does not leak into production retrieval.

First, it enables both bulk and online ingestion. That means Synapse can support large indexing runs as well as live document updates. Second, it reduces the chance that partially indexed content appears in live retrieval before the ingest process has completed.In other words, the system preserves a separation between currently being processed and safe to serve.

For production AI, that distinction is not a minor implementation detail. It is part of epistemic hygiene.

8. Why Synapse Is Stronger Than a Typical LLM Wrapper

It is tempting to compare AI systems only by the base model they call. That misses the larger question: what behavior does the surrounding system make possible?

The table below offers a more meaningful comparison than raw model marketing. Note that, this should be read as an architectural comparison, not as a claim of universal benchmark superiority.

DimensionSingle-Model AssistantBasic RAG WrapperSynapse
Knowledge sourceMostly parametricExternal docs plus model weightsExternal knowledge bases plus session state, metadata, and grounded generation
Query handlingOne generic pathUsually one retrieval-plus-generation pathRouted structured, retrieval, conversational, and generative paths
Retrieval strategyOften noneOften one main retrieverHybrid retrieval, fusion, reranking, and verification
Conversational continuityPrompt history onlyPrompt history plus retrieved contextMemory, topic tracking, reformulation, and citation-aware persistence
Knowledge-base isolationLimitedSometimes collection-levelExplicit multi-knowledge-base orchestration with per-KB configuration state
Operational trustDependent on prompt qualityBetter than prompt-onlyStronger control over evidence flow, state, isolation, and production behavior

Synapse is not claimed to dominate every benchmark or every use case. Rather, it is designed to offer stronger controllability, inspectability, domain isolation, conversational continuity, grounding discipline, and robustness under concurrency than minimalist prompt-only or basic retrieval wrappers.

Those are system virtues, not just model virtues.


9. How an Expert Would Evaluate Synapse

From an expert standpoint, the right question is not is the base model strong, but does the composed system preserve correctness, coherence, and controllability across realistic workloads. That changes the evaluation frame considerably.

Evaluation AxisExpert QuestionWhy It Matters
Grounding fidelityDoes the answer remain supported after retrieval, reranking, and verification?A grounded system should reduce unsupported generation, not merely decorate it with citations.
Route qualityDoes the system choose structured execution when the task is deterministic?Misrouting exact tasks into free-form generation is a preventable source of error and cost.
Session coherenceDo follow-up turns preserve topic and entity continuity without dragging in irrelevant history?Multi-turn usefulness depends on selective state, not raw transcript length.
Knowledge isolationDoes one knowledge base remain behaviorally distinct from another while sharing infrastructure?Controlled domain specialization depends on strong isolation boundaries.
Ingestion consistencyCan the system update knowledge without leaking partial state into live retrieval?Freshness without visibility discipline weakens trust.

This is also why a purely anecdotal demo is a weak evaluation method for a system like Synapse. The stronger evaluation protocol is multidimensional: route correctness, retrieval quality, citation quality, session continuity, ingestion safety, and behavior under concurrency all need to be measured together.

Recent RAG evaluation work reinforces this point. RAGAs proposes reference-free metrics across retrieval and generation quality [13]. ARES likewise targets automated evaluation for retrieval-grounded generation pipelines [14]. The research direction is clear: evaluating RAG systems requires more than checking fluency or answer plausibility in isolation.


10. Engineering at Scale

The engineering story of Synapse matters because scale changes the nature of an AI system. A pipeline that works in a notebook or in single-user evaluation can degrade rapidly when many requests, sessions, and document updates happen at once.

Synapse addresses this through a set of runtime controls that are as much about preserving behavior as they are about preserving uptime:

  • bounded admission into expensive generation paths
  • per-session protections so overlapping turns do not corrupt conversational state
  • distributed coordination and contention management through coordination layers and lock services, with patterns documented in systems, and with an explicit awareness that locking semantics depend on the underlying failure model and consistency assumptions [16]
  • vector and metadata-aware retrieval infrastructure of the kind documented by systems, where filtering, indexing, and structured retrieval constraints materially affect retrieval behavior
  • staged async job execution for long-running work
  • readiness, liveness, and watchdog mechanisms for self-protective runtime behavior

On the serving side, Synapse is also designed around practical LLM systems knowledge. vLLM’s PagedAttention work is a strong example of why model-serving architecture matters [17]. Its current automatic prefix caching design further illustrates how repeated prompt prefixes can be reused without changing outputs when the prefix truly matches [18]. In Synapse, the broader lesson is simple: model throughput, concurrency boundaries, and response isolation must be treated explicitly, not optimistically.

A simple systems decomposition is:

In most real deployments, is the most volatile term. That is why bounded generation concurrency, backpressure, and asynchronous job segregation are not implementation trivia. They are the mechanisms that stop one expensive stage from destabilizing the whole system.

That may sound operational, but it has research consequences. If a system cannot preserve its intended semantics under load, then the semantics were never fully specified in the first place.


11. Research Implications

Synapse suggests a broader view of AI progress.

The next leap in useful AI systems may not come only from bigger base models. It may come from stronger composition of: better routing, better knowledge interfaces, better state management, better evidence control, better memory architectures, and better runtime discipline. In other words, it may come from turning isolated model capability into dependable system capability.

That shift has several implications:

  1. evaluation should increasingly measure systems, not just models
  2. reliability under concurrency should be treated as part of AI quality
  3. knowledge freshness and ingestion safety are core research concerns for deployed AI
  4. memory and conversational continuity deserve explicit architectures
  5. structured execution should coexist with generative flexibility
  6. evaluation should track the interaction of routing, retrieval, memory, and serving behavior rather than judging each in isolation

Synapse is our current answer to those ideas. Indeed, it will evolve. But for now, it should be understood less as a finished endpoint and more as a strong systems blueprint for building dependable AI around, rather than underneath, a foundation model.


12. Limitations and Future Work

No production AI architecture is ever complete. Synapse opens several directions for future work that are intellectually interesting and practically important:

  • richer evaluation frameworks for grounded multi-turn behavior
  • deeper adaptive routing based on uncertainty and evidence quality
  • stronger personalization and memory controls
  • more explicit calibration for when the system should abstain, ask, or escalate
  • tighter feedback loops between retrieval quality and answer quality

These are not patches around the edges. They are part of the long arc of moving from impressive models to dependable AI systems.


13. Conclusion

Synapse was built from a straightforward conviction: an AI product should be designed as a system, not mistaken for a model. The model matters, but so do the knowledge substrate, retrieval stack, structured handlers, memory model, orchestration layer, ingestion lifecycle, and concurrency controls that determine whether intelligence remains useful outside a demo.

That is why Synapse is better understood as an applied and research-driven AI system. Its strength comes from composition. It is grounded where simpler systems are loose, stateful where simpler systems are stateless, and operationally disciplined where simpler systems rely on best effort.

For ErisAI, that is the real meaning of Synapse: not a single neural component, but a carefully shaped connection of many of them into one coherent intelligence system. In that sense, Synapse is where knowledge meets intelligence.

References

  1. Lewis, P. et al. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” NeurIPS. 2020. [arXiv:2005.11401]
  2. Robertson, S. and Zaragoza, H. “The Probabilistic Relevance Framework: BM25 and Beyond.” 2009. [Foundations and Trends]
  3. Cormack, G. V., Clarke, C. L. A., and Buettcher, S. “Reciprocal Rank Fusion Outperforms Condorcet and Individual Rank Learning Methods.” SIGIR. 2009. [SIGIR 2009]
  4. Reimers, N. and Gurevych, I. “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks.” EMNLP. 2019. [arXiv:1908.10084]
  5. Gao, Y. et al. “Retrieval-Augmented Generation for Large Language Models: A Survey.” 2024. [arXiv:2312.10997]
  6. Asai, A. et al. “Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection.” ICLR. 2024. [arXiv:2310.11511]
  7. Yan, S.-Q. et al. “Corrective Retrieval Augmented Generation.” 2024. [arXiv:2401.15884]
  8. Nogueira, R. and Cho, K. “Passage Re-ranking with BERT.” 2019. [arXiv:1901.04085]
  9. Lin, S.-C. et al. “Conversational Dense Retrieval for Conversational Question Answering.” 2020. [arXiv:2005.11768]
  10. Wu, Z. et al. “CONQRR: Conversational Query Rewriting for Retrieval with Reinforcement Learning.” EMNLP. 2022. [arXiv:2112.08558]
  11. Wang, W. et al. “Augmenting Language Models with Long-Term Memory.”. 2023. [arXiv:2306.07174]
  12. Packer, C. et al. “MemGPT: Towards LLMs as Operating Systems.” 2023. [arXiv:2310.08560]
  13. Es, S. et al. “RAGAs: Automated Evaluation of Retrieval Augmented Generation.” EACL. 2024 Demo Track. [EACL 2024 Demo Track]
  14. Saad-Falcon, J. et al. “ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems.” NAACL. 2024. [arXiv:2311.09476]
  15. Bang, F. et al. “GPTCache: An Open-Source Semantic Cache for LLM Applications Enabling Faster Answers and Cost Savings.” NLP-OSS. 2023. [NLP-OSS 2023]
  16. Kleppmann, M. “How to do distributed locking.” 2016. [How to do distributed locking - Martin Kleppmann]
  17. Kwon, W. et al. “Efficient Memory Management for Large Language Model Serving with PagedAttention.” 2023. [arXiv:2309.06180]
  18. vLLM Documentation. “Automatic Prefix Caching.” [vLLM Documentation: Automatic Prefix Caching]

Subscribe to ErisAI Research

Get notified when we publish new technical reports, safety evaluations, and alignment methodologies.