Multi-Label Topic Classification on the Qur'an using the K-Nearest Neighbor and Latent Semantic Analysis Methods

Authors

  • Ghina Annisa Shabrina Universitas Telkom, Indonesia
  • Kemas Muslim Lhaksmana Universitas Telkom, Indonesia

DOI:

https://doi.org/10.59141/jist.v5i12.1340

Keywords:

Qur’an, KNN, LSA, Hamming Loss

Abstract

The Qur'an, comprising over 80,000 words, 6,236 verses, and 114 surahs, presents a multifaceted and deeply significant text that demands a nuanced understanding of historical context, classical Arabic, and exegesis. To analyze and classify its content, various methodologies have been employed, including K-Nearest Neighbor (KNN) and Latent Semantic Analysis (LSA). This research investigates the effectiveness of combining KNN with LSA for multi-label topic classification of Qur'anic verses. The study reveals that KNN alone achieved a micro average F1-score of 0.49, demonstrating reliable performance particularly for topics such as "aqidah" (creed) and "worldly matters." When LSA was applied with 100 components, there was a decrease in performance, reflected by a drop in the micro average F1-score to 0.43 and an increase in Hamming loss to 0.1657. However, as the number of LSA components increased to 200 and 300, performance improved, with micro average F1-scores rising to 0.45 and 0.47, and Hamming loss values decreasing to 0.1507 and 0.1466, respectively. This indicates that while LSA can enhance KNN performance, optimal results are achieved with a higher number of components

Downloads

Published

2024-12-27

How to Cite

Shabrina, G. A., & Lhaksmana, K. M. . (2024). Multi-Label Topic Classification on the Qur’an using the K-Nearest Neighbor and Latent Semantic Analysis Methods. Jurnal Indonesia Sosial Teknologi, 5(12), 5717–5726. https://doi.org/10.59141/jist.v5i12.1340