Get ready – it’s time for a solid theoretical framework that helps us to discuss and later measure the quality of patent search! To do this, I’d like to introduce Ashutosh Pande, a key member of our team at Evalueserve IPR&D who is passionate about high-quality patent search, using artificial intelligence, and other service improvements internally at Evalueserve. Ashutosh has researched these topics for several years now and has been working with a core team on questions such as: how to define a good patent search, and how to ensure that we can deliver a great patent search service that flows into related processes. It’s a great pleasure to write this post together with him!
Everyday example – doing an internet search
Let’s start with a generic example to understand what we call the searcher’s dilemma. Assume you are visiting a new city and its dinner time. You want to try some authentic local food, in a pleasant atmosphere in a comfortably-sized restaurant. So, you search on your favorite search engine to find a nearby restaurant that also has good reviews and matches all your criteria. Off you go...
On reflection, in finding a suitable restaurant, you really want the search engine to do three things:
- Give me a list of ALL the authentic local restaurants in the city (and not miss out any that fit your criteria).
- Give me ONLY authentic local restaurants as an output (and no other restaurants, e.g. international chains).
- Present the information in such a way that it is easy for you to assess the facts and choose one.
Our daily experience shows that the process of searching for a restaurant – or anything else – on the internet is far from perfect. The reality is that we get a long list of results with some potentially good hits on the first page, lost among many less relevant options and with some real treasures missing. There is also a huge variety of sources – some less valid and reliable than others, although it can be difficult to tell which!
The patent search or technology landscape example, although of course very different, can share some of these characteristics. Our thinking on high-quality search in an intellectual property or research & development environment is quite similar. As a customer of such a project, e.g. a freedom to operate search, you need a very similar output:
- Ideally, a list of ALL relevant patent hits or non-patent literature (i.e. documents) – one measurement criteria of this could be the percentage of the relevant documents provided out of the entire search universe (termed RECALL).
- Plus, a list of ONLY relevant documents – which can be measured as a percentage of which documents provided are relevant (termed PRECISION).1
- Presented in such a way that next step in the analysis is straightforward
In patent searching and technology landscapes, the good news is that all of this is possible. An experienced information professional can find most of the relevant documents with very high precision, and present the results in a suitable way for the IP or R&D end-user.
So what is the ‘searcher’s dilemma?’
When we say above that this is possible, then we are ignoring the fact that to achieve high recall and precision, resources need to be allocated to the search.
And here lies the ‘searcher’s dilemma’ – you can maximise recall (finding all documents) but only by putting in effort (either time spent, better database used, more experienced searcher hired, etc.) or paying in the form of low precision (accept irrelevant documents in the output, or ‘noise’).
The upshot: in the daily routine, missing some relevant documents is part of every searcher’s life.
This is best illustrated by looking at United States Patent and Trademarks Office (USPTO) and the European Patent Office (EPO) where you will certainly find some of the best patent searchers in the world, but inevitably, resources are limited. A considerable proportion of granted patents are, or could be, successfully challenged in post-grant oppositions or validity procedures, as by investing more resources additional prior art is identified.
In the case of parallel search in our earlier post, high recall resulted from over 20 patent search companies working on a pilot, leading to potentially-lower precision, but also greater effort in harmonizing and understanding the results.
Coming up …
Over the next few posts we will discuss the questions this raises:
- What can we learn from simple economics and market price to understand that quality requirement (and recall, precision, output) varies by use case – why can the market price for different types of searches be 500, 5,000, or 50,000 USD?
- Although difficult to measure, why does Evalueserve aim to measure recall and define output quality? We believe it is ultimately the key to quantifying search quality!
- High precision and recall are important factors and we will dissect these concepts and go into detail on how to maximize them by looking at the typical errors that lead to lower values.
- What other factors are important for search quality? We will introduce such concepts.
You may ask why we are so passionate about a business where it seems impossible to deliver perfection? Well, we both feel that this is exactly why it is one of the most interesting fields to work in these days, with huge opportunities for individuals and companies with a clear view on high-quality patent search or patent landscapes. At the end of the day, we do so to help meet our client’s challenges in a complex, transforming, and imperfect, world!
The next time you visit a city and want to find a good restaurant we hope you find the one you will have a great evening and enjoy the food – and thinking about search quality.
1 For those interested in more detail now, we recommend checking out Wikipedia for more general discussion, or blogs on data science; or look out for a paper that we will soon publish on this topic.