pgvector and RAG: our first client project with AI on internal data

SEM Devs06 May 20242 min read

Semantic document search across 8,000 internal documents. Postgres + pgvector + embeddings: lessons learned.

Between late 2023 and early 2024 we built our first semantic document search for a professional firm client: 8,000 documents (contracts, regulations, circulars) searchable in natural language.

Chosen stack

Postgres + pgvector for embedding storage (no Pinecone, no Weaviate).
OpenAI text-embedding-3-small for embeddings (1536 dims).
Anthropic Claude for answer synthesis (top-5 chunks → answer).
Next.js for UI and ingestion Server Actions.

Ingestion pipeline

Document → chunk at 800 tokens with 100-token overlap.
Each chunk → embedding via OpenAI batch API.
Save in Postgres with metadata (source, page, section).
ivfflat index with lists = sqrt(N).

Real costs

One-off ingestion: ~12 USD for 8,000 docs (60k chunks).
Storage: 110 MB for embeddings (local Postgres, invisible cost).
Queries/month: ~22 USD for 6,000 queries (query embedding + Claude completion).

Lessons

Chunking is 70% of quality. We iterated 4 times on chunking before finding the right combo. Semantic boundaries (paragraphs, numbered sections) beat fixed token-count chunking.

End-of-pipeline rerank with a small model bumped precision by 18%. Cost: +1ms per query.

pgvector is enough for datasets under 5M vectors. Above, evaluate Qdrant or Pinecone. For this client, Postgres stays home for years.