Semantic Search, aka Magic
In this post I write about the Related Notes feature on thinkdeli. I attempt to describe my journey of implementing a deceptively simple feature using the latest advances in AI and browser tech.
The feature
Any decent writing app lets you organize and link notes. But all the apps expect you, the writer, to do all the work. What if your writing app could link your thoughts automatically in a fast and private manner? You could find new connections and patterns in your thinking and discover new insights. This is where the Related Notes feature was born.
The related notes feature searches all your notes to find the ones that are closest in meaning to your current note. Searching notes to find text similar in meaning to your query is called semantic search. We are trying to build a semantic search engine.
Semantic means connected with the meaning of words and sentences. Traditional text search is lexical. It relies on finding patterns in the text. In lexical search, the computer doesn’t understand the meaning of those words. It relies on the user repeating the same patterns and keywords. Like other traditional things, it lacks serendipity. There is no magic.
Breaking it down
Finding related notes is a two part problem. The first part is understanding the meaning of the user’s writing. The second part, is searching through your notes. But what does it take for a computer to understand the meaning of your writing? And what does it mean to search by meaning?
🤔 Making the computer understand
Computers don’t understand text like humans do. Computers understand numbers. For computers to understand text, you need to convert the text to numbers. This numerical representation of text is called embedding
in the AI world. Embeddings transform words and sentences into points on a graph. Each note of yours is a point on a graph.
I have tried to explain embeddings in simple terms here - Embeddings eli5 version.
🔍️ Search
With embeddings, the computer now understands your notes as points on a graph. Finding similar words or sentences then becomes equivalent to finding the distance between two points on the graph. In other words, finding the nearest neighbors on the graph.
The distance between two points on the graph can be found using a method like Cosine similarity. But calculating the distance to every single point (a note) on the graph is not efficient.
Most of the language models use a very high number of dimensions. This simply means that the emebeddings graph has a lot of axes, not just two or three. Finding the nearest neighbours on a n-dimensional graph is a non trivial problem.
Thankfully, advanced algorithms like Hierarchical Navigable Small World (HNSW) or k-d-tree are built exactly for this.
🪄 The result, aka Magic
When all of this comes together, it creates magic. When I open a note about embeddings, the Related Notes section surfaces my notes like “Cosine Similarity”, “Automatic Mind Map”, and, “Personal AI Assistant”. It is amazing how close these notes are in meaning and context. Imagine discovering new insights and connections where you thought none existed as you keep writing every day!
💻️ The solution
At this point, you must be wondering how to create embeddings? Where to run the search algorithms?
Speed and privacy were the top two requirements for the related notes feature. That is why I wanted a solution that could work locally in the user’s browser.
The amazing transformers.js library lets you create embeddings locally. And there are amazing libraries like Voy and hnswlib-node that implement the advanced search algorigthms. Of these, Voy can run in your browser.
Voy is originally written in Rust. It uses WASM (Web Assembly). WASM is a fascinating new development of the last few years. It lets you run machine code in your browser. This means code written in languages like C, Rust, Go, can be compiled and used inside the browser. With WASM the efficiencies of low level machine code come to your browser applications. Without this, there is no ability to do an effective and efficient n-dimensional nearest neighbor search on your browser.
Running language models in the browser is just one way of approaching this problem. There are also amazing APIs available for really cheap which provide your text embeddings. And there are equally amazing Vector DBs which let you store and search these embeddings. I have tried to list the various scenarios here - AI - Local or Cloud ?
Beyond related notes
Semantic search opens up a lot of possibilities. Just like related notes, we also built the Related Posts feature.
Semantic search is the first step for RAG - Retrieval Augmented Generation. RAG is how you build a truly personal AI assistant.
Sign up for thinkdeli and try out the related notes feature.
Thanks to Srijan Shukla for helping discover the basic tools for this job - transformers.js, and voy. Thanks also to Sanika Joshi, Saurabh Hirani, Manan Dedhia for reviewing drafts of this post and their valuable feedback.
Never miss a post from
Satyajeet Jadhav
Get notified when Satyajeet Jadhav publishes a new post.
15 views
Liked by
Comments
Participate in the conversation.
Read More
FAQs from friends and strangers
Who do you think is the closest to thinkdeli in terms of competition? Is it notion, notes, or medium?
Thinkdeli Release Notes
This document maintains the list of new features and bug fixes as they happen.
Hey there!👋
Awesome to have you here! Let's get you started, shall we?
Hey there!👋
Awesome to have you here! Let's get you started, shall we?
Hey there!👋
Awesome to have you here! Let's get you started, shall we?
Introducing, Chat
For the above conversation the prompt was