5-9 juin 2023 PARIS (France)
Exploring Data-Centric Strategies for French Patent Classification: A Baseline and Comparisons
You Zuo  1, 2, *@  , Benoît Sagot  1, *@  , Kim Gerdes  2, 3@  , Houda Mouzoun  4@  , Samir Ghamri Doudane  4@  
1 : Inria de Paris
Institut National de Recherche en Informatique et en Automatique
2 : Qatent
Qatent
3 : Laboratoire Interdisciplinaire des Sciences du Numérique
Institut National de Recherche en Informatique et en Automatique, CentraleSupélec, Université Paris-Saclay, Centre National de la Recherche Scientifique
4 : Institut national de la propriété industrielle
INPI
* : Auteur correspondant

This paper proposes a novel approach to French patent classification leveraging data-centric strategies. We compare different approaches for the two deepest levels of the IPC hierarchy: the IPC group and subgroups. Our experiments show that while simple ensemble strategies work for shallower levels, deeper levels require more sophisticated techniques such as data augmentation, clustering, and negative sampling. Our research highlights the importance of language-specific features and data-centric strategies for accurate and reliable French patent classification. It provides valuable insights and solutions for researchers and practitioners in the field of patent classification, advancing research in French patent classification.


Personnes connectées : 3 Vie privée
Chargement...