Between late 2023 and early 2024 we built our first semantic document search for a professional firm client: 8,000 documents (contracts, regulations, circulars) searchable in natural language.
Chosen stack
- Postgres + pgvector for embedding storage (no Pinecone, no Weaviate).
- OpenAI text-embedding-3-small for embeddings (1536 dims).
- Anthropic Claude for answer synthesis (top-5 chunks → answer).
- Next.js for UI and ingestion Server Actions.
Ingestion pipeline
- Document → chunk at 800 tokens with 100-token overlap.
- Each chunk → embedding via OpenAI batch API.
- Save in Postgres with metadata (source, page, section).
ivfflatindex withlists = sqrt(N).
Real costs
- One-off ingestion: ~12 USD for 8,000 docs (60k chunks).
- Storage: 110 MB for embeddings (local Postgres, invisible cost).
- Queries/month: ~22 USD for 6,000 queries (query embedding + Claude completion).
Lessons
Chunking is 70% of quality. We iterated 4 times on chunking before finding the right combo. Semantic boundaries (paragraphs, numbered sections) beat fixed token-count chunking.
End-of-pipeline rerank with a small model bumped precision by 18%. Cost: +1ms per query.
pgvector is enough for datasets under 5M vectors. Above, evaluate Qdrant or Pinecone. For this client, Postgres stays home for years.