What is Vector Database?
Let’s imagine you’re a foodie who loves to try out new food. You’ve just come back from a trip to Thailand and you’re craving that delicious salad you had in Phuket. You can’t remember the name of the dish, but you remember it had shrimp and chilies.
You try a word-by-word traditional search, but all you get are results for Pad Thai and Tom Yum. You’re scrolling through pages of irrelevant recipes, but your salad is nowhere to be found. You’re starting to lose hope and consider settling for a bowl of Phở.
But wait! You remember about vector database search. Now, what does searching in a vector database mean? Well, it’s a type of search that understand the ‘context’ of your search. Instead of just matching the exact words you typed, like “spicy”, “salad”, and “shrimp”, it understands the meaning behind your search. It’s like having a super smart assistant who not only listens to your words but also understands your intent.
So, you input your ingredients into this vector-based search. The AI understands you’re not just looking for any dish with these ingredients thrown in, but a specific combination that matches your craving. And voila! There it is, your long-lost salad dish: “Som Tum Goong Yang”.
So, in the battle of man versus Thai food, thanks to vector database search, man wins! And the stomach rejoices!
In a vectorized database, the distance between Thai dish like pad Thai and tom yum is closer than from pad Thai to Burger or pizza. Source: Joon Solutions
How to apply vector database search
In early 2023, one of our clients launched a platform designed to bridge the gap between mental health users and providers. However, being a new platform, it had a limited database of mental health providers, which could potentially discourage users.
But what if we could build a crawler bot to extract information from mental health providers’ websites, vectorize this data, and store it in a database? This would allow users to express their needs via a form or problem statement, and we could then perform a semantic search on the vector database using that vectorized problem statement.
A workflow for semantic search using vector database
Demo
Let’s explore a workflow for semantic search using a vector database through a demo. In this demo, we’ll use Weaviate, an open-source vector database provider, and Hugging Face, an open-source NLP model, to create a recommendation model for the platform.
First, we’ll establish a database on the Weaviate Console:
Next, we’ll create an account to access the Hugging Face model.
Once these are set up, we’ll connect the two using their API keys in a local notebook:
We’ll then create a class of paragraphs to store the information we’ve crawled and load the crawled JSON file into the database:
By using a keyword, we can generate recommendations:
This will return the top 5 most suitable mental health providers.
In just a few lines of code, we’ve created a recommendation system that, until recently, would have been a complex and heavy task. Astonishing, isn’t it?
Despite the risks associated with the rapid growth of AI, I personally believe it broadens our horizons and offers immense potential for innovation.
- Understanding Vector Databases: A Comprehensive Guide - December 8, 2023