Full Text Search in applications

Tech_blog

Just like how our preferred method of searching the internet evolved from web directories to search engines (and now to AI), modern applications strive to offer the most efficient ways to access stored data.

Why force users to remember where a specific piece of content is located, when we can simply offer search functionality that shows everything neatly in one place?
Let’s briefly go through a few common approaches to implementing search in databases.

SELECT LIKE searching

The first idea is often to create search queries based on the existing data model—it doesn't require anything complicated, just creating a few SQL queries over relevant columns. However, over time, as the data stored grows and the number of tables searched increases, the queries become too slow. As a rough example, using LIKE to search over a text column in a table with millions of rows can take several minutes. What’s more, users start demanding more: fuzzy matching for typos, synonym support (searching for “sea” and getting results for “ocean”), and other expectations shaped by their experience with web search engines.

At first, we can try to provide these features by adding different indexes and improving the logic of our search code, but sooner or later we realize that we might need an entirely different data model for the search than for the application business logic itself.
The main disadvantage of this solution is low performance and flexibility.
On the other hand, it has very small requirements on data architecture—we usually search existing records and there is no need to create duplicate data using indexes or specialized tables.

For small datasets or when users don’t need advanced search features like typo tolerance, ranking, or synonym support, this approach may be good enough.

Fulltext search - database

If the application uses a database with built-in full-text search (e.g., MS SQL or PostgreSQL), we can (and should) take advantage of that. Well-configured indexes can greatly speed up search and offer additional features like prefix matching, support for synonyms (Thesaurus), typo tolerance, multilingual support, and more.

The behavior of full-text search is not unified between databases, usually it is necessary to create a catalogue and fill it with full-text indexes based on existing tables or views. A suitable synchronization strategy needs to be chosen, i.e. whether each change should be propagated immediately to the fulltext indexes, or whether the changes will be reflected a little later—which, although it will cause a slight delay in the search results, can help the performance of the application when data changes frequently.

A big advantage of this solution is that it is integrated into the existing database and therefore does not complicate the deployment and maintenance of the solution. Another advantage is the automatic synchronization of indexes.

The disadvantage is that we are still partially bound by the application data model. While we can help ourselves by creating various specialized views, from a purist standpoint these views do not belong in the model and can unnecessarily complicate it.
Another potential disadvantage is the higher disk space requirements, as the full-text index will duplicate and transform the searched data.

Fulltext search - specialized search engines

If the database we use doesn’t support full-text search, or if we have demanding users or need lightning-fast search (e.g., for autocomplete), specialized engines like Typesense, Elasticsearch, or Meilisearch are a strong option.

They work similarly to database full-text search mentioned in the previous section, except that they offer many more configuration and search options. Unlike relational databases, these engines are schema-flexible and work like document databases. We then define rules over collections of these documents to facilitate searches.

We then keep this collection up to date as the searched data changes. On the one hand, this is an advantage because we can transform the data as needed before placing it in the collection, but on the other hand, it can be a challenge because we need to control all the places where the data changes if we want to have searches over live data.

When performing a search, we specify which collections to search, define search rules (sorting, ranking, etc.), and get results in the same format as the original indexed documents. This lets us include extra contextual data for display or further processing.

So far, it might seem that specialized search engines are no different from full-text database searches, apart from arguably faster search speeds. Do they offer any extra features?

Faceting

This is a technique that allows users to filter and narrow search results using categories or attributes. When searching, we can refine the scope of our search by using various interactive filters, defining, for example, the brand of the product and any other criteria that will help us target more precisely. For example, in a laptop e-shop, facets can be the manufacturer's brand, screen size, color or other criteria that can make the customer's choice easier.

Vector search

We can index vector embeddings from machine learning models and perform nearest neighbor searches. We can use this approach to create similarity-based search, semantic search, visual search, recommendation, and other features.

Geolocation search

If the document contains geographic coordinates, the search engine then allows us to search for data defined around the given GPS coordinate, or within a defined bounding rectangle. For instance, we can easily find all companies near your location that end with “Ltd.”

If we need to answer questions like

What are the most frequently searched terms?
Are there any search terms returning no results? If so, what are they?
Where would adding synonyms improve results?
What items appear most frequently in search results?

...and many others, specialized search engines offer APIs that help collect and analyze such data, allowing you to refine your search parameters.

Promotion and Result Curation

Specialized search engines allow us to sort results not only by ranking or similarity, but it is possible to "fix" certain specific listings in specific positions in the search results — just like sponsored links in Google. This can be used to prioritize items by season, popularity, or any business-specific logic.

Speed

Speed is likely the biggest advantage—even with millions of records, they can search them, sort them according to the desired criteria, filter them using facets, similarities and other filters, and all this in times usually under 50 ms.
Of course, nothing is perfect. These engines have their drawbacks. These include, in particular, greater demands on resources—both disk capacity, as data is often duplicated, and RAM, where they hold maximum data for the fastest possible searches.

Conclusion

If your application needs to support data search, and you're dealing with large datasets or want to provide advanced search features, definitely consider one of the available search engines, be it ElasticSearch, MeiliSearch or Typesense or any other.

Full Text Search in applications

SELECT LIKE searching

Fulltext search - database

Fulltext search - specialized search engines

Faceting

Vector search

Geolocation search

Promotion and Result Curation

Speed

Conclusion

More posts

AI brings new challenges to software testing

How to use Gherkin in Robot Framework

From Idea to Application with AI in a Flash

Edhouse newsletter