Enhancing Search Relevance with NER and Multi-Vectors

General-purpose embedding models often struggle to extract specific domain features from input and represent them effectively in embeddings. This is because they lack guidance on what is important, leading to suboptimal results. Domain-critical aspects are not prioritised appropriately.

For instance, location information might be crucial in one domain but irrelevant in another.

Another important aspect is how relevant a feature is for a particular search query in that domain. A search query could contain multiple features, but only one could be that really matters.

For instance, “park in London with animals”. If we don’t have any documents that match exactly, should we prioritise “London”, “parks” or “animals”? The answer here again depends on what the end-user expects. I would imagine that a tourist searching for parks with animals in London would not expect to see a park in “Dallas” since it is not relevant at all.

Given our situation described above, this is not what the end-user (a tourist in London) would expect, even though the Dallas Zoo sentence is more similar in the vector space.

Common approaches to solving this issue would include fine-tuning models, using cross-encoders, or implementing hybrid search. Another flexible approach would be to use feature (entity) extraction in combination with multi-vectors.

Feature extraction using NER models

In our example above, the intention is to prioritise location. To do that, we first need to reliably extract location information from the search query and documents.

The NER model does extract the locations accurately, but it also does extract “Africa” as a location, which is not what we would like in this particular case. If a tourist is searching for something in Africa, we don’t want to show a zoo in Dallas.

Additionally, by default, the NER models have a limited set of feature (entity) categories, and it means, without fine-tuning, you might not have everything you need.

Feature extraction using LLMs

Another approach would be to use an LLM model to extract the important features.

As we can see, it solved the issue with related locations and focused only on the main location, which is exactly what we want.

Multi-vector approach

Now, using the extracted locations. Let’s see how they rank.

And now let’s combine the scores accordingly.

As we can see, we have achieved the desired result. Hyde Park in London would be the first result, followed by the Science Museum in London, with the Dallas Zoo coming in third with a combined score of 0.6655 (which is quite far from the first two results).

This is a simplified example, and the scores could also be adjusted using feature weights. For instance, the location could be boosted or reduced depending on the search query. We could also assign relative weights to each feature for the document itself and use them at the time of search to increase or decrease the final score.

One concern with this approach could be an LLM model latency at the search time, and this is a valid concern. If it's not acceptable, then an option would be to skip the feature extraction at the search time.

The scores still represent expected relevance for the location embeddings, but in some situations, this could lead to unexpected results. This is where fine-tuning a NER model for the search query feature extraction could be a good approach, and LLMs can help to generate a good training data set.  

Final thoughts 

The concept of defining a feature schema for your documents and also using it for your search queries is a simple and yet powerful approach to improve semantic search relevance. It can also be combined with hybrid search (even for filtering). It also enables the possibility to use per-field (feature) embedding models (if necessary).

I'm not a machine learning engineer, and this post is a result of my experience building semantic search systems. If you have any great suggestions, please let me know on x.com/aivisSilins

Thanks for reading!

PHP max_execution_time explained


The crucial part of PHP application maintenance and performance issue mitigation is an understanding of the PHP and server limits (configuration). However, they are not that straightforward as we tend to think. In this article, I will explain and give examples of the PHP max_execution_time configuration limit, which in some situations behaves differently than you would expect. 

From the documentation 

The maximum execution time is not affected by system calls, stream operations etc. Please see the set_time_limit() function for more details.

These days this paragraph is significant because PHP applications mostly use their time performing that type of operations. And it means your script can actually take more time to finish than your configuration limit. In other words, it can take more time than you expect.

The max_execution_time limit in action

The following script changes the max_execution_time to 5 seconds (the default PHP max_execution_time is 30 seconds) and performs the endless loop to fill up the execution time.

As we can see, the script was allowed to run for ~5 seconds until it resulted in the 500 status code triggered by the max_execution_time limit:

In the following examples, I'm going to extend this code.

1) SQL queries, Redis, Memcached or any other TCP connection

The time spent executing SQL queries is not counted. For example, you have a script that has a SQL query, and it's execution takes 10 seconds. It means the script will be allowed to run 15 seconds in case if the max_execution_time is set to 5 seconds.

The following example is the way to test it:

As we can see, the script execution time does not include the SQL query execution time.

The same would apply to any other external TCP calls such as:

  • Memcached
  • MongoDB
  • Redis

2) 3rd party HTTP API calls

I decided to make this as a separate paragraph to clearly show that HTTP request (TCP communication) does not count towards the limit.

3) System calls

The total execution time is 20 seconds because 15 seconds of the system call does not count towards the max_execution_time limit.

4) Filesystem calls

Interaction time (such as fwrite) with the filesystem is not limited by max_execution_time.

5) Sleep methods

As we can see, the sleep or usleep methods are not counted towards the max_execution_time limit.

Closing Note 

I hope this article gave you an idea about the max_execution_time setting nature and what to expect from it. Knowing these aspects, in situations where you would like to limit the PHP execution time, you may probably want to consider using the server or PHP-FPM limits instead of max_execution_time