The Definitive RAG Architecture

RAG architecture connecting relational data to vector databases via n8n, with chunking strategies and embedding model selection.

Overview

Retrieval-Augmented Generation (RAG) combines semantic search with LLM generation. In production, most teams already store structured data in PostgreSQL, MySQL, or similar — but language models operate on unstructured text with limited context windows.

The recommended architecture uses n8n as the orchestration layer between your relational database and a vector store (Pinecone, Qdrant, Weaviate). The pipeline has four stages:

  1. Extraction — incremental queries against the relational source (by updated_at, cursor, or event queue).
  2. Normalization and chunking — transform raw content into indexable units.
  3. Embedding — call an embedding model (OpenAI, Cohere, or local models via Ollama).
  4. Vector upsert — persist vectors + metadata for later retrieval.

This separation keeps the relational database as the source of truth while the vector index serves semantic retrieval only.

Document chunking strategies

Chunking defines search granularity. Large chunks preserve context but dilute precision; small chunks improve relevance but fragment ideas.

Fixed-size Splits text every N tokens/characters, often with overlap (e.g. 512 tokens, 64 overlap). Easy to implement in n8n with a Code node. Risk: cuts mid-sentence or mid-paragraph.

Semantic Uses similarity between sentences or local embeddings to group related paragraphs before splitting. Better thematic coherence, but slower and dependent on an auxiliary model.

Recursive Applies hierarchical separators (\n\n → \n → . → space) until the target size is reached. Standard in libraries like LangChain; works well for technical docs with headings.

Strategy Precision Compute cost Best for
Fixed-size Medium Low Logs, tickets, CSVs
Semantic High High Articles, contracts
Recursive High Medium MD/HTML structured docs

Embedding model selection

Practical criteria for choosing an embedding model in n8n/Make automations:

  • Dimensionality — larger vectors (1536d, 3072d) capture nuance but increase storage cost and search latency. For catalogs under 1M chunks, 768–1536d is sufficient.
  • Cost — APIs charge per token; open-source models (e5, bge) reduce cost if you self-host inference.
  • Latency — synchronous pipelines (RAG chat) need P95 < 300ms on embedding; batch pipelines tolerate seconds.
  • Domain fit — generalist models fail on vertical jargon (legal, finance). Evaluate with a golden set of 20–50 real queries before scaling.

Operational rule: version the embedding model in each vector's metadata (embedding_model: "text-embedding-3-small@v1"). Reindex when switching models — mixing versions degrades retrieval quality.

Reference n8n workflow

The JSON below shows the minimum structure of a RAG ingestion workflow: scheduled trigger, PostgreSQL read, chunking in a Code node, embedding via HTTP, and upsert to the vector store. Adapt credentials, batch size, and error branches to your environment.

n8n Workflow JSON placeholder
{
  "name": "RAG Ingestion Pipeline",
  "nodes": [
    {
      "id": "trigger-1",
      "name": "Schedule Trigger",
      "type": "n8n-nodes-base.scheduleTrigger",
      "typeVersion": 1,
      "position": [240, 300],
      "parameters": { "rule": { "interval": [{ "field": "hours", "hoursInterval": 6 }] } }
    },
    {
      "id": "postgres-1",
      "name": "Fetch Source Rows",
      "type": "n8n-nodes-base.postgres",
      "typeVersion": 2,
      "position": [480, 300],
      "parameters": { "operation": "executeQuery", "query": "SELECT id, title, body FROM docs WHERE synced_at IS NULL LIMIT 100;" }
    },
    {
      "id": "code-1",
      "name": "Chunk Documents",
      "type": "n8n-nodes-base.code",
      "typeVersion": 2,
      "position": [720, 300],
      "parameters": { "language": "javaScript", "jsCode": "// TODO: split body into overlapping chunks\nreturn items;" }
    },
    {
      "id": "http-1",
      "name": "Create Embeddings",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4,
      "position": [960, 300],
      "parameters": { "method": "POST", "url": "https://api.openai.com/v1/embeddings", "authentication": "predefinedCredentialType" }
    },
    {
      "id": "http-2",
      "name": "Upsert Vectors",
      "type": "n8n-nodes-base.httpRequest",
      "typeVersion": 4,
      "position": [1200, 300],
      "parameters": { "method": "POST", "url": "https://YOUR-INDEX.pinecone.io/vectors/upsert" }
    }
  ],
  "connections": {
    "Schedule Trigger": { "main": [[{ "node": "Fetch Source Rows", "type": "main", "index": 0 }]] },
    "Fetch Source Rows": { "main": [[{ "node": "Chunk Documents", "type": "main", "index": 0 }]] },
    "Chunk Documents": { "main": [[{ "node": "Create Embeddings", "type": "main", "index": 0 }]] },
    "Create Embeddings": { "main": [[{ "node": "Upsert Vectors", "type": "main", "index": 0 }]] }
  },
  "settings": { "executionOrder": "v1" },
  "meta": { "templateCredsSetupCompleted": false }
}