Satyajeet Jadhav

23 days ago

Retrieval Augmented Generation (RAG) on Postgres

So I recently learnt that Postgres works quite well for RAG Applications. Here is the flow I followed.

Pre-prep

Install the pgvector extension for my Postgres db.

Prep

  1. Upload a file

  2. Extract text

  3. Break it into smaller chunks

  4. Generate embeddings for each chunk using the really cheap OpenAI API

  5. store embeddings, chunks, and the file_id in a table

Retrieval / Search

  1. Get OpenAI to call a function that returns the search terms extracted from the users query. For example, “Tell me about Tesla Roadster” returns “Tesla Roadster”

  2. Generate embeddings for the search term

  3. Query the DB for embeddings that are closer to the search term.

const similarChunks = await pool.query(
  `SELECT chunks.content, chunks.chunk_index, f.original_name, chunks.embedding <=> $1 AS distance
   FROM document_chunks chunks 
   JOIN files f ON chunks.file_id = f.id 
   WHERE dc.embedding <=> $1 < $2
   ORDER BY distance ASC 
   LIMIT 10`,
  [queryEmbedding, 0.3]  // 0.3 represents a cosine similarity of 0.7
);

Generation

  1. The results are sent back to OpenAI along with the user’s query. 

  2. The prompt tells OpenAI to use the chunks as context to answer the user’s query.

Tunables

  1. Limit - the number of results you want

  2. vector length - by default OpenAI returns vectors arrays of length 1536. For very large datasets, smaller vector arrays might perform better for memory and compute reasons.

  3. chunk length - smaller the chunks, more granular the resulting embeddings. Memory, compute, tokens tradeoff.

  4. cosing similarity - specifying this will return results that are in a cone of angle instead of a specific angle, I think. larger number means results from a larger cone are returned.

Post Script

  • All databases use indexing to speed up lookup and retrieval of information. However, for really tiny amounts of data, indexing actually doesn’t work very well. I got irrelevant results or no results even.

  • pgvector offers multiple methods to find the nearest neighbors including L2 distance (also known as the Euclidean distance), and Cosine Similarity.

  • Cosine similarity is better suited for Semantic search. Semantic search means searching for similar meaning (direction). Cosine similarity finds neighbours that are at similar angles instead of neighbours that are closest but could be at very different angles. 

Older Relevant Posts

Semantic Search, aka Magic

Embeddings eli5 version

PPS

This is my attempt to document my recent learnings. Feel free to comment or point out mistakes. I will learn something new.

Never miss a post from
Satyajeet Jadhav

Get notified when Satyajeet Jadhav publishes a new post.

Comments

Participate in the conversation.

Read More

Semantic Search, aka Magic

The related notes feature searches all your notes to find the ones that are closest in meaning to your current note.Searching notes to find text similar in meaning to your query is called semantic search. We are trying to build a semantic search engine.