Semantic Emoji Search: Vector Search in the Browser
Have you ever gotten frustrated searching for the right emoji? You know you've seen the emoji but you don't know the exact name.
The Problem with Text Search
Have you ever gotten frustrated searching for the right emoji?
You know you’ve seen the emoji but you don’t know the exact name.
Most search even on iOS using simple text matching:
🧵
Trying “cook” gets you through ”🍪 Cookie” emoji because the letters of one are contained in the other. Even better if they match exactly.
It can also do clever things like search for words like “cooking” or “cooked” and even use synonyms but it’s limited to words the developer can predict.
This is why when you search “shout” on the iOS Emoji Keyboard not only do you not get emojis like 🗣️, 📢, or 📣 but you get nothing.
You can try it now in the comments of this post.
But what if you could make search smarter? What if search could understand emojis the way we do?
Enter Vector Search and Embedding Models
I’ve been digging into the boring parts of AI that don’t change as much such as Semantic Search, Retrieval-Augmented Generation (RAG), and Vector Databases such as pgvector and learning about how they store, not just letters, but meaning, recorded into data to be stored and search.
These are pretty humble narrow task tools such as has EmbeddingGemma by Google which converts words and sentences into these meaning maps called Vector Embeddings.
To simplify how it works: These store a list of concepts such as “Royalty” and scores them from 0.0(🧑not royal) to 1.0(🤴the ultimate most royal you can get)
Getting It to Work in the Browser
Lot’s of parts doing this and it usually requires a server to consume and search against the list of documents(in this case emojis), however I wanted to do something slightly unhinged:
Get the the whole thing to work inside of web browser. No cloud, just the device in your hands.
This can be pretty tricky, however, if you a good jumping off point it can help a ton.
I started with a template from Supabase(a Postgres Database) and an embedding model called GTE-Small. This solved everything but 2 big things.
- Getting Postgres in the browser
- Getting the embedding model in the browser
Postgres in the browser wasn’t too bad since we have pglite by ElectricSQL which let’s us take Postgres and use it in all kinds of funny places, including the browser.
This solves the issue of where our Vectors or “meaning maps” are stored and searched.
Next the AI Model part.
Typical AI models that we think of like Llama, GPT and Claude take up dozens of GBs or very likely more.
However, those have a ton of extra features that we don’t need. That’s why we have more purpose built models like GTE-Small that’s less than 100MB.
That solves downloading a big file. Next we have a compatibility challenge.
When AI Models like Llama are stored as files they get stored as something like GGUF file but those types of files aren’t exactly built for using with web browsers.
ONNX in the Browser
This is where the ONNX format comes in.
Microsoft has been doing a ton of work on running AI on mobile devices including PHI LLM model.
Part of that work is a format to make models more portable called ONNX.
Unlike formats like GGUF, ONNX is a widely used model format that runs well via ONNX Runtime / transformers.js.
Try It
With that we’ve got all the pieces we need for our Emoji Search.
Did it all work together? See for yourself: fetchmoji.com
It encodes all of the 4000 emojis as Vector Embedding “map” and then does the same with search query, such as “shout”, to emojis it thinks are related.
This means our emoji search can match emojis to words it’s never seen before. No exact match required.
It’s not perfect, in fact, at the time of writing, it crashes a lot on mobile but it’s a lot of progress for the constraints we’re working under.
Also you can try making it work yourself.
The code is completely open source and you can see all the details of how it works here: https://github.com/ThatGuySam/emoji-search