Exploring Data-Centric Strategies for French Patent Classification: A Baseline and Comparisons
1 : Inria de Paris
Institut National de Recherche en Informatique et en Automatique
2 : Qatent
Qatent
3 : Laboratoire Interdisciplinaire des Sciences du Numérique
Institut National de Recherche en Informatique et en Automatique, CentraleSupélec, Université Paris-Saclay, Centre National de la Recherche Scientifique
4 : Institut national de la propriété industrielle
* : Auteur correspondant
INPI
This paper proposes a novel approach to French patent classification leveraging data-centric strategies. We compare different approaches for the two deepest levels of the IPC hierarchy: the IPC group and subgroups. Our experiments show that while simple ensemble strategies work for shallower levels, deeper levels require more sophisticated techniques such as data augmentation, clustering, and negative sampling. Our research highlights the importance of language-specific features and data-centric strategies for accurate and reliable French patent classification. It provides valuable insights and solutions for researchers and practitioners in the field of patent classification, advancing research in French patent classification.