Retrieval Augmented Generation (RAG) on Postgres
So I recently learnt that Postgres works quite well for RAG Applications. Here is the flow I followed.
Pre-prep
Install the pgvector extension for my Postgres db.
Prep
Upload a file
Extract text
Break it into smaller chunks
Generate embeddings for each chunk using the really cheap OpenAI API
store embeddings, chunks, and the file_id in a table
Retrieval / Search
Get OpenAI to call a
function
that returns the search terms extracted from the users query. For example, “Tell me about Tesla Roadster” returns “Tesla Roadster”Generate embeddings for the search term
Query the DB for embeddings that are closer to the search term.
const similarChunks = await pool.query(
`SELECT chunks.content, chunks.chunk_index, f.original_name, chunks.embedding <=> $1 AS distance
FROM document_chunks chunks
JOIN files f ON chunks.file_id = f.id
WHERE dc.embedding <=> $1 < $2
ORDER BY distance ASC
LIMIT 10`,
[queryEmbedding, 0.3] // 0.3 represents a cosine similarity of 0.7
);
Generation
The results are sent back to OpenAI along with the user’s query.
The prompt tells OpenAI to use the chunks as context to answer the user’s query.
Tunables
Limit - the number of results you want
vector length - by default OpenAI returns vectors arrays of length 1536. For very large datasets, smaller vector arrays might perform better for memory and compute reasons.
chunk length - smaller the chunks, more granular the resulting embeddings. Memory, compute, tokens tradeoff.
cosing similarity - specifying this will return results that are in a cone of angle instead of a specific angle, I think. larger number means results from a larger cone are returned.
Post Script
All databases use indexing to speed up lookup and retrieval of information. However, for really tiny amounts of data, indexing actually doesn’t work very well. I got irrelevant results or no results even.
pgvector offers multiple methods to find the nearest neighbors including L2 distance (also known as the Euclidean distance), and Cosine Similarity.
Cosine similarity is better suited for Semantic search. Semantic search means searching for similar meaning (direction). Cosine similarity finds neighbours that are at similar angles instead of neighbours that are closest but could be at very different angles.
Older Relevant Posts
PPS
This is my attempt to document my recent learnings. Feel free to comment or point out mistakes. I will learn something new.
43 views
Comments
Participate in the conversation.
Never miss a post from
Satyajeet Jadhav
Get notified when Satyajeet Jadhav publishes a new post.
Read More
Semantic Search, aka Magic
The related notes feature searches all your notes to find the ones that are closest in meaning to your current note.Searching notes to find text similar in meaning to your query is called semantic search. We are trying to build a semantic search engine.
data:image/s3,"s3://crabby-images/03c08/03c082da686b1d6793401d38aeecb2d4ab261618" alt="Semantic Search, aka Magic"
Embeddings eli5 version
Computers don’t understand text like humans do. Computers understand numbers.To get computers to understand text, you need to convert the text to numbers.
Subdomain vs Subfolder: Which is Better for SEO?
A subdomain is an extension of your main domain, treated as a separate entity by search engines.
data:image/s3,"s3://crabby-images/921f7/921f7dca0df19c36d5c41f5a95b6e3c48c6c3dd6" alt="Subdomain vs Subfolder: Which is Better for SEO?"
Untitled
When it comes to optimizing your website for SEO and gaining valuable insights, Google Tag Manager is a powerful tool you don’t want to overlook. It allows you to add and manage various tags—small pieces of code that track everything from website traffic to user behavior—without ...
data:image/s3,"s3://crabby-images/47426/474263b87c573cac210ccaaee71fcb1b209abfd1" alt="Untitled"
Mastering Internal Linking: Lessons to Boost Your SEO and User Experience
Internal linking is one of the most underrated yet powerful tools in the SEO arsenal. It’s not just about connecting pages, it’s about guiding your audience, boosting your website’s authority and signaling search engines about the structure of your content. Done right, internal l...
data:image/s3,"s3://crabby-images/cbb3c/cbb3caabc9f54499a9d0bd4ecc5a061a4af52279" alt="Mastering Internal Linking: Lessons to Boost Your SEO and User Experience"
Business Owners Share How SEO Transformed Their Sales and Revenue
In digital marketing, SEO is often hailed as one of the best investments, especially for small businesses. Unlike short-term campaigns, SEO requires sustained effort and time but can yield powerful long-term results. To explore just how impactful SEO can be, we spoke with seven b...
data:image/s3,"s3://crabby-images/d7728/d772815ed3bd71b1346b7fd7e3c549c38e27709a" alt="Business Owners Share How SEO Transformed Their Sales and Revenue"
Untitled
Domain Authority is a score that ranges from 1 to 100, with higher scores indicating a stronger likelihood of ranking in search engines. Created by Moz, DA is based on a variety of link-related factors and is designed to predict the performance of a website on search engines like...
data:image/s3,"s3://crabby-images/1db7c/1db7c3f70e3881b166b3043363ecfca651367328" alt="Untitled"