The full palette of search queries – how to control recall with a smart query strategy

In a recent post, we introduced Evalueserve’s Problem-Solution-Applications (PSA) Framework, why it’s helpful in Research & Development and IP Intelligence, and how it can be used to build search queries founded on a clear perspective of the question that we want to answer. In this post, we’ll extend the framework and the resulting query sets, mix them together like paints on an artist’s palette, and visualize how we can control recall using such techniques.

The PSA Palette – Which shade should you choose?
Primary Colours – Problem, Solution, Application
Secondary colours – (Application AND Solution), (Problem AND Solution), (Application AND Problem)
Tertiary Colour – (Application AND Problem AND Solution)

 

Applying the PSA framework to define queries for a novelty search

In a novelty search, the searcher has an invention, which effectively means there should be: a defined application area, an accurate understanding of the problem, an invention conceived as a solution to the problem. The searcher then tries to confirm that the invention (particularly the solution) is novel or not. Let’s pick three of the combinations that target novelty search:

  1. Solution only,
  2. (Application AND Solution) OR (Problem AND Solution), or
  3. Application AND Problem AND Solution.

As you will see, these three combinations go from a broad query, to a medium-sized query, to a narrow query – where the range is based on the number of documents covered in each. This is the second component of the art of query building (after using the PSA framework!) – a clear understanding of the broadness of the query depending on how you have built it.

For a Solution only query, if drafted properly (with all the possible combinations of keywords, classes etc. and searched over a space wherein all the documents are also clearly defined and have suitable text), should have all the relevant documents in it. So the solution query can have 100% recall. But it will also have the most noise (the ratio of relevant documents compared to irrelevant documents will be very low), meaning the precision of the query will also be low.

The (Application AND Solution) OR (Problem AND Solution) query is likely to have a lower recall than the solution-only query and the ratio mentioned above (precision) will be a higher than the solution-only query.

The Application AND Problem AND Solution query will most likely have the least recall of the three queries, but the highest precision of the three queries.

The advantage of being able to understand the broadness of queries and their relationship with recall and precision is that a user can make a suitable choice of which query to use depending on specific requirements. Where a particularly-high recall is needed (e.g. to mitigate risk) going through the entire Solution query is advisable. Whereas if for a quick novelty search – or where high recall is not required – the Application AND Problem AND Solution query may be enough in addition to some secondary searches (we’ll discuss this concept in later posts).

Three-search strategy to optimize recall

The above discussion introduces a third component of query creation: one single query is often insufficient in itself for any use-case. At Evalueserve we aim to follow a three-search strategy concept.

  1. A big query – This is the broadest form of query with 100% recall and low precision. It’s important to note that a big query always starts with the entire universe of literature and is reduced depending on the use case and required recall levels.
  2. Affordable query – This is a single or combination of queries that a user can review (the resources of time and money are sufficient to review this set of results). The affordable query is always going to be a sub-set of the big query.
  3. Secondary queries – Queries that will use some details/attributes of the relevant and irrelevant results from the affordable query and produce some results from the big-query that may be potentially relevant.

So, picking up on our novelty search example above, in a typical use-case the solution only query may get construed as the ‘big query’. The Application AND Problem AND Solution as the ‘affordable query’ is a more likely starting point, probably in addition to running some secondary searches within the big query set.

Very much like choosing between an ‘Old Master’ painting or your child’s latest artistic offering, each search query has its intrinsic value and beauty is in the eye of the beholder! Creating successful search queries can be as much about a flash of inspiration or insight from the data – but only if you’ve properly mixed the PSA Palette to begin with! More on that next time: how to use secondary searches to raise recall while ensuring precision requirements are met!

Ashutosh Pande
Vice President, IP and R&D Solutions Posts

Ashutosh Pande is an expert in intellectual property, mathematics, data analytics, and data science. He’s the Vice President at Evalueserve’s IP and R&D Solutions responsible for doing and managing research around IP and R&D research services, aimed at improving their outputs for customers. He’s realized that to find the simplicity in anything, you just need to continually break down the complex ‘big picture’ into ever-simpler component parts, and search out the commonalities. Via the Information Adventurers blog – and in collaboration it’s with readers – Ashutosh looks forward to honing his own thought processes about challenges in IP and R&D research, and creating knowledge that will impact the industry.

Latest Posts