SpringBoot: Fuzzy Match With Postgres

George Berar
Dev Genius
Published in
5 min readDec 23, 2021

--

In today’s article I will talk about how to use fuzzy match support from Postgres in order to do partial searches using a REST API.

Disclaimer: The purpose of this article is to give you a practical example of using fuzzy match and not to explain in details all concepts.

Photo by Clay Banks on Unsplash

Before we start you can find the entire code here.

Background

Is Sunday night and you want to watch a movie recommended by one of your friends but you don’t remember the entire name only the fact it has something to do with ‘bats’. So you open IMDB or Rotten Tomatoes and type ‘bat’ in the search input and you see a list of movies already showing up like this:

Example search on Rotten Tomatoes for ‘bat’

And there you have it! The movie you are looking for: ‘Batman’. What is happening behind that search input is called fuzzy matching and it’s the foundation stone of many search engine frameworks.

But what is fuzzy matching?

It’s a technique to identify two elements of text, strings or entries that are approximately similar but are not exactly the same.

If you look closely you see other movies showing up (e.g. Battered or Bat) because of this ‘approximately similar’ matching. Typing more characters gives us a more accurate result:

Example search on Rotten Tomatoes for ‘batman’

The Problem

Let’s suppose we build a TODO API (with Postgres database) for managing daily tasks and we have some persisted TODOs:

Example TODOs

Note: the second TODO record has a typo in ‘title’ (Klean instead of Clean) but I did it on purpose and we’ll see later why.

Now, we want our API to provide an endpoint for allowing users to search TODOs based on their titles. Using JPA we can implement the search in the database really simple and quick by leveraging the well-known LIKE syntax:

JPA Repository for searching TODO using LIKE ‘%%’ approach

This is similar to SQL statement: SELECT * FROM t_todo WHERE title LIKE '%partialTitle%'.

Doing the request in Postman using ‘Eat’ as a search string gives us the expected output:

Searching TODOs using ‘Eat’

Now let’s see what happens when we try to search using ‘Clean’:

Searching TODOs using ‘Clean’

As you can see, it gives me back the Clean House TODO and works as expected, right? Well…not really. Imagine the scenario when you are not paying enough attention to creating a new TODO and you do a typo, Klean instead of Clean. For us, it makes sense to get back the Klean Car TODO also because the only wrong character is ‘K’ instead of ‘C’ and the rest of the search string matches. So why doesn’t work then? Let’s see what happens when we do the same request but his time using ‘Klean’ instead of ‘Clean’:

Searching TODOs using ‘Klean’

We get back only the TODO containing the typo and yet again we expected both of them.

The thing is, LIKE syntax works with exact matches and it fails completely at handling typos. Even if only one character is different and the rest are the same, for LIKE this is a mismatch. In real-life scenarios this is a problem because it’s a well-known fact — people make typos on a frequent basis; and we can’t expect our users to remember they made a typo or give the exact search strings every single time in order to provide a meaningful response.

The Solution

Fuzzy Match! This is a pretty obvious answer I agree :) but let’s see how we can use Postgres to implement it in 2 easy steps.

Step 1. pg_trgm

Postgres provides a module called pg_trgm which ships different functions to work with trigrams matching. A trigram is a group of three consecutive characters taken from a string. Based on this we can measure the similarity of two strings by counting the number of trigrams they share and estimating how similar they are on a scale between 0 and 1. For example, the string “hello” would be represented by the following set of trigrams:

  • “ h”, “ he”, “hel”, “ell”, “llo”, “lo “

In order to make use of this module we need to enable it using the following statement:

Enable PostgreSQL’s trigram module support

Step 2. JPA Query

Next we use the % operator to compare against elements of an array, so we can match against any part of the title:

Sample SQL query using the % operator

The query uses Postgres’ STRING_TO_ARRAY function to split the TODO's title into arrays of separate strings.

Adjusting the native query for JPA gives us:

Now we can search our TODOs by partial titles the right way. Here are some working examples:

Fuzzy match on ‘Hou’ value
Fuzzy match on ‘Clean’ value

As you can see for ‘Clean’ search string we get both of them now. Trying to use ‘Klean’ gives us the same result:

Fuzzy match on ‘Klean’ value

One more thing…the % operator uses the default similarity threshold having a value of 0.3 but if we need to adjust it (e.g 0.5) we can use the SIMILARITY function:

Sample SQL query using the SIMILARITY function

That’s it!

Conclusion

As always please keep in mind this approach might or might not suit your project context or needs and I’m not in the position to say there’s no other way to do it differently or better. I really hope you enjoyed it and had fun reading it.

Stay safe and remember you can find the code here!

--

--