Embeddings eli5 version

Computers don’t understand text like humans do. Computers understand numbers. To get computers to understand text, you need to convert the text to numbers.

But simply transforming text to numbers won’t make a lot of sense. Words make sense when they are next to each other. So the numberical representation should capture this relation.

One way to achieve this is by plotting these numbers on a graph.

As a simplistic example, consider a line. Higher numbers on the line represent more sweetness. The word apple can be represented by the number 1. The word juice can be represented by a number 2. The distance between apple and juice is 2-1 = 1. So we could say that the sweetness of apple and this juice is similar.

But words have more complex attributes than just sweetness. It could be shape, size, color, or any other arbitrary attribute. Each of these could become a line or a dimension by itself. What if we could use two dimensions instead of one? For example, x dimension is for sweetness. y dimension is for sourness. We could then plot apple at (1,1) and juice at (2,2). Orange could be (1,2). Orange Juice could be (2,1). We could arbitrarily keep adding more such dimensions. A language model could come up with its own characteristics for a dimension. Though one can’t really visualize hundreds of dimensions, theoretically, they are possible for a computer. Thus, words and sentences become points on a graph of n-dimensions.

There is an excellent explainer by Dharmesh Shah on what embeddings are. I don’t think anyone can explain it better. Please go and read that if you want to understand embeddings better - https://simple.ai/p/guide-vector-embeddings

A lot of models are available today that let you convert words to embeddings. There are a few categories of models here.

Companies like OpenAI, Cohere, etc. have powerful and large models that run on their servers. These companies expose their embeddings API for a small fee. You can send your text to the API and get back embeddings as the response. Recent advancements in AI technology have made these models really cheap.

The other category of models is Llama, Mistral, etc. These too are large and powerful. But these are open. This means, anyone can download these models and run them on their servers. Since these are large models, they can’t typically run from within every user’s browser.

The third category is models like Xenova, and Nombic. These are small models. They are open. They can run in your user’s browsers! But the obvious tradeoff is less accuracy.

Embeddings eli5 version

Comments

Never miss a post from
Satyajeet Jadhav

Read More

Semantic Search, aka Magic

Cosine Similarity

Formatting on thinkdeli

The biggest challenge with AI

Retrieval Augmented Generation (RAG) on Postgres

Technology

LangChain for LLM Application Development

FAQs from friends and strangers

Embeddings eli5 version

Comments

Never miss a post fromSatyajeet Jadhav

Read More

Never miss a post from
Satyajeet Jadhav