Improving Performance in AI-based Automatic Classification through Feature Augmentation: A Case Study of KDC

Chul, Jung; Soo-Sang, Lee; Jee-Hyun, Rho

Improving Performance in AI-based Automatic Classification through Feature Augmentation: A Case Study of KDC

Files

Performance Changes in Automatic KCD Classifications_Chul_WLIC2025.pdf (2.39 MB)

Date

2025-10

Authors

Chul, Jung

Soo-Sang, Lee

Jee-Hyun, Rho

Publisher

International Federation of Library Associations and Institutions (IFLA)

Abstract

The objective of this study is to empirically examine the performance variations of an AI-based Korean Decimal Classification(KDC) automatic classification model through the augmentation of classification features, aiming to identify strategies that improve the consistency and accuracy in automated subject cataloguing of classification numbers. Experiments were conducted using 5,882 bibliographic records, where metadata from the library domain were supplemented with publishing metadata by integrating independent attributes from both sources. Core features(title, author) and KDC extracted from the National Library of Korea’s database were enriched with external features(keywords, book summary, tables of contents) collected from the Korea Publication Industry Promotion Agency’s BNK database. Feature composition was organized into three sets: Feature Set A(title, author), Feature Set B(title, author, keywords), and Feature Set C(title, author, keywords, book summary, tables of contents). Multi-class classification models based on KLUE-BERT were developed for each set, and their performance variations were systematically analyzed. The findings demonstrate that feature enrichment resulted in progressive improvements across all KDC main classes. The Arts(6XX) class exhibited the most substantial improvement, with a 124.24% increase in the F1-score from Feature Set C to Feature Set A. Significant gains were also observed in several other classes, including Science and Technology(57.14%), Social Sciences(40.00%), History(34.04%), and Literature(25.37%). Further analysis across the 61 divisions revealed that 28 divisions demonstrated continuous improvement, 20 showed limited improvement, 7 exhibited performance degradation, and 6 showed no significant change. These findings underscore the critical importance of feature augmentation in enhancing the performance of KDC automatic classification model, while indicating that its effectiveness may vary depending on the interaction between classification divisions and feature attributes. To improve classification performance, it is necessary to adopt not only feature enrichment but also more advanced strategies, including hierarchical classification structures, data refinement techniques, and sophisticated data augmentation methods. (presented on 15 August 2025 at "Pushing Boundaries to Next Generation Cataloguing: Experiments at the Edge of AI and Metadata" session)