I got introduced to tagging early in life as my parents managed our village library. My long summer vacations were full of what felt like a never-ending supply of books and publications, each with a unique, color-coded, Dewey Decimal Classification ‘tag’ on its spine.
This meant that each book could be tagged into a library catalogue (structure, taxonomy), by classifying and marking according to features such as: reader age group; genre (adventure, romance, non-fiction, crime, novel, science, etc.); sub-genre (e.g. chemistry, biology, physics, under the tag science); but also bibliographical information, such as author, number of pages, etc. While most visitors browsed the physical library – our upstairs neighbour headed straight for the ‘police stories for adults’ shelf, the farmer’s wife next door looked for local literature with a romantic edge – the DDC library system, first created in 1876, was used in the pre-digital days to locate and retrieve publications.
Tagging in the field of IP and R&D Search and Intelligence
The concept of tagging is well known both in information science and the field of patent and scientific literature searching. All patents are categorized into patent classifications for easier retrieval and allocation to examiners during prosecution. Some of the databases (or retrieval systems) use proprietary classification systems to tag/categorize literature into a taxonomy. Examples are Derwent WPI, Chemical Abstracts, Medline, and many more.
Tagging can be manual, or it can be done in an automated way using text analytics or similar systems (or combination of both). And it can be done in a customized way for a particular project or a standard taxonomy e.g. used for a database product or a patent classification.
Tagging can improve and/or facilitate retrieval and lead to high recall
Tagging facilitates retrieval. In theory, a patent or literature search query could use such classifications and result in a high recall and accuracy (a high number of relevant documents, not many missed, not a lot of noise in the result). However, typically, generic tags (e.g. patent classification, commercial deep indexing) aren’t specific enough for precise searching and aren’t that useful for retrieving R&D literature.
Hidden value: tagging provides an overview and a basis for IPR&D Intelligence
Every year, village leaders inspected the village library: without opening a single book, they could walk through the library and immediately understand its overall composition – the percentage of crime versus romance, non-fiction versus fiction, etc. They got this instant overview, perhaps we could call it insight, from the structure/taxonomy of the library.
The same principle applies to scoping a patent or technology landscape simply using patent classifications or other indices. Patent classifications in many domains give a good overview on the technology development and structure of the field. Chemical Abstracts/Registry is the tool of choice for any information professional in the field of chemicals. And Derwent World Patent Index is certainly an interesting addition to using patent classification only. But as mentioned above, in many cases it’s not specific enough and doesn’t include non-patent literature for many areas.
Custom tagging brings clarity and understanding
If the classification systems discussed above were perfect we wouldn’t need information professionals, and there would be no need for the development of technology-powered automated retrieval and analytics systems like Patsnap or Innography. The main reason for this is that every use case is different. In most of the cases, the questions are one or two levels deeper than the available classifications and indices, and as a consequence precision is low which leads to the wrong conclusions and an unclear overview. In other cases, the question posed is not answered by the classification system. To use the library example again: there are too many books in a specific classification, and more granular sub-categories aren’t available.
Custom tagging of patent and non-patent literature solves all these problems. In our daily routine at Evalueserve IPR&D, we prepare ‘customized patent classifications’ or ‘customized world patent index’ for our clients. We do this for landscapes, but also for IPR&D Alerts and Watches, plus license customized data sets for our clients. A typical client could be the R&D or product management director responsible for ‘vacuum cleaning brush development’ in R&D at a large vacuum cleaner manufacturing company. We would now prepare a custom data set containing all patents and literature of interest, categorized into a custom taxonomy and updated on a weekly or monthly basis. The data set would be delivered on our digital platform, Insightloupe, and the customer would have any-time access to competitive intelligence reports, technology landscape reports, technology scouting and open innovation opportunities, licensing opportunities, as well as using it as a highly-efficient retrieval system for quick searching.
So – the customer would get the ‘perfect database’ to his/her bespoke purpose. Custom tagged data (patent and non-patent literature) on a powerful digital platform.
We will go deeper into this subject in future blog posts and also explore topics such as the power and limits of tech analytics, query formulation and information retrieval systems, taxonomy systems, visualizations and insight generation from structured datasets and other topics.
As with many of us, today I’ve stepped into my parents’ shoes – I have a small library at home and it is carefully structured so that I can easily retrieve books. But even more important, it acts like an extended memory, offering a partial overview on human thinking and reminding me every day that there is more to life than just my perspective!