pISSN: 2723 - 6609 e-ISSN: 2745-5254
Vol. 5, No. 10, October 2024 http://jist.publikasiindonesia.id/
Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 4593
Sentiment Analysis Towards the KitaLulus Application
Using the Naive Bayes Method from Google Play Store
Reviews
Nadia Amalia Putri
1*
, Agustina Srirahayu
2
, Nugroho Arif Sudibyo
3
Universitas Duta Bangsa, Indonesia
1*
2
3
*Correspondence
ABSTRACT
Keywords: sentiment
analysis, naive Bayes,
kitalulus.
Job search apps like KitaLulus are essential in helping
graduates find jobs based on their skills and interests.
Sentiment analysis is needed to understand user opinions
about the KitaLulus application. The Naive Bayes method is
used in this analysis because of its high efficiency and
accuracy. This research used 597 data and achieved an
accuracy rate of 91%. The evaluation results show positive
sentiment values for precision, recall, and f1-score of 0.99,
0.94, and 0.97 respectively. On the other hand, the model
performance is low for negative and neutral sentiments. This
research aims to increase user understanding of the Kita
Lulus application and provide valuable assistance to
developers in their efforts to improve the quality of the
application. Conclusion The classification results revealed a
model accuracy of 91%. The "positive" sentiment category
shows exemplary performance metrics with a precision of
0.99, a memory of 0.94, and an F1 score of 0.97. In contrast,
the "neutral" category displays a lower f1-score of 0.46,
despite a relatively high memory level of 0.75. In contrast,
the "negative" category shows a consistent precision, recall,
and f1-score value of 0.56. In summary, it can be concluded
that the KitaLulus app has a beneficial impact on the
community, effectively identifying and presenting the
majority of positive reviews.
Introduction
In the era of digital technology that continues to advance, mobile applications have
become an important component of everyday existence. Among the categories of
applications that are increasingly preferred are those dedicated to job search and CV
creation. Given the increasing competitiveness in the job market, such apps provide a new
and effective avenue for job-seeking individuals to identify job openings that align with
their abilities and preferences.
Nadia Amalia Putri, Agustina Srirahayu, Nugroho Arif Sudibyo
Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4594
Unfortunately, in recent times, many graduates have faced challenges in identifying
relevant job opportunities, leading to a situation where most graduates choose to switch
career paths in pursuit of job openings, prioritizing securing a job rather than aligning
with their skill set. As a result, this trend results in scenarios where the skills acquired
during their education remain underutilized and fail to reach their full potential. This job
vacancy problem poses significant challenges for individuals from various backgrounds
(Suprastiyo & Airlangga, 2021). It is important to find effective ways to support new
graduates in securing job opportunities that align with their skills and passions. Therefore,
there is an urgent need for proactive measures to address the mismatch between job
seekers and available positions in the workforce.
The KitaLulus app is designed to assist individuals in the job market in identifying
job opportunities that align with their unique skills and interests. In addition, the app not
only facilitates the exploration of job openings but also allows users to create and
customize their resumes through an online platform. Serving as an important intermediary
between job seekers and potential employers, it is essential to understand user feedback
and engagement with this app. Recognizing the importance of user response and
interaction can greatly contribute to improving the effectiveness and user experience of
those platforms.
In this particular context, sentiment analysis emerged as an important tool to explain
the perspectives, emotions, and encounters of users about the Kitalulus app. Sentiment
analysis is believed to be a technique of mastering and processing datasets directly to
obtain information. Sentiment analysis is used as a detector of opinions about a topic and
an object (such as an individual, organization, or product) in a data set (Hasri & Alita,
2022). Accurate sentiment analysis can provide in-depth insights into user satisfaction,
allowing developers to take appropriate actions to improve the quality and
competitiveness of their applications, as well as respond to dynamic market demands
(Gumilar et al., 2024).
The Naive Bayes method applied to sentiment analysis classifies a review of an app
into sentiment categories such as positive, negative, or neutral. The Naive Bayes
classification method is meant to be a method that utilizes simple statistics based on
Bayes' theorem, which assumes the existence or absence of classes with other
characteristics (Azhar et al., 2022). When conducting sentiment analysis, many people
use the Naive Bayes method as method. The Naive Bayes classification suspects that the
chances of membership in a class are based on the assumption of independent prediction.
Therefore, the chances of the coming time can be determined according to experience at
the beginning. The advantage of the Naive Bayes method is that it is efficient because it
can shorten the sentiment analysis process. In addition, the Naive Bayes method often has
high accuracy despite the scant training data (Tanggraeni & Sitokdana, 2022).
The Naive Bayes method is famous for its ability to achieve high-accuracy results.
In a study researched by Fikri et al. in 2020 entitled "Comparison of Naive Bayes Method
and Support Vector Machine in Twitter Sentiment Analysis", it was found that both
methods have good performance, but the Naive Bayes method outperforms the SVM
Sentiment Analysis Towards the KitaLulus Application Using the Naive Bayes Method from
Google Play Store Reviews
Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4595
method in terms of accuracy, precision, recall, and f1-score (Fikri et al., 2020). Another
study reviewed by Pattiiha and Hendry in 2022 compared the K-Nearest Neighbor, Naive
Bayes, and Decision Tree methods to analyze sentiment expressed in Twitter tweets
regarding responses from PT PAL Indonesia . The results of the study prove that the
Naive Bayes method has a high accuracy value among the three methods, namely
achieving an accuracy value of 84.08% with a standard deviation of +/- 1.28% (Pattiiha
& Hendry, 2022).
Considering the results of previous studies that confirm the good performance of
the Naive Bayes method in sentiment analysis. The hope of this study is not only to
provide deeper insight into user satisfaction with the Kitalulus application but also to
provide additional input for developers to improve the quality of the application.
Method
The methodology of this research is used to ensure accuracy and consistency in the
process of data collection and analysis.
Data Collection
The data collection process is used as the beginning of the process of this research,
where KitaLulus application user data is obtained from the Google Play Store. The data
was collected using web scraping, which allowed researchers to automatically download
user reviews from the Google Play Store app page. By utilizing a web scraping method
that uses one of the libraries in the Python programming language, google-play-scraper,
the review data is automatically extracted and compiled into CSV form (Firmansyah &
Lestariningsih, 2024).
Dataset Labeling
Data labeling is the process of ranking individual user reviews on the KitaLulus
application based on the sentiment expressed. In this study, reviews were judged based
on their emotional value. Comments with ratings 1 and 2 are marked as "negative", with
3 ratings as "neutral" and "positive" for the other scores. Accuracy and consistency in
labeling are important in this way to ensure that the data is organized and then used as
intended (Wijaya et al., 2024).
Preprocessing
Data preprocessing is an important action in sentiment analysis, where texts are
prepared for human review to be used in building Naive Bayesian models. The process is
carried out first to process data, therefore researchers will avoid the problem of irregular
data. The purpose of technical information is a classification that has a high level of
accuracy (Anwar, 2022). The stages of preprocessing to process the data in this study are
word normalization, case folding, cleansing, tokenizing, stopword removal, and
stemming. In the context of word normalization, the process of word processing is carried
out, including the conversion of words that deviate from standard language norms into
words that comply with linguistic conventions and are considered normal (Yutika et al.,
2021). Case folding is the process of changing the shape of words so that they have the
same meaning, either by changing everything to lowercase or uppercase. Cleansing is the
Nadia Amalia Putri, Agustina Srirahayu, Nugroho Arif Sudibyo
Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4596
process of removing unnecessary words to reduce clutter or clutter in the data. Tokenizing
is used to separate a series of words into segments separated by spaces or special
characters. The stopword removal task removes words that are frequently seen and
generic, indicating that the text being analyzed is less important. Stemming is the process
of returning words containing suffixes to their basic or original form (Mastan & Toni,
2020).
TF-IDF Weighting
TF-IDF (Term Frequency-Inverse Document Frequency) is a technique used to
convert text into numerical representations that can be understood by the Naive Bayes
model. The TF-IDF technique can be described as a technique used to assign a frequency
that is only visible to each word or token, where each word is weighted based on its
importance in the document. This weight is calculated based on the number of times a
word appears in a document and evaluates the word relative to the existence of all
documents (Rozi et al., 2021). Here's the TF-IDF equation:
 󰇛 󰇜 󰇛 󰇜 󰇛󰇜 (1)
where:
󰇛 󰇜


(2)
󰇛󰇜 


(3)
and:
t = word
d = document
Model Creation
After weighting TF-IDF, the next stage is the application of the model using the
Naive Bayes method to classify the sentiment of user reviews. Model creation is an
important stage in sentiment analysis, where Naive Bayes' model is trained to classify the
sentiment of Kitalulus app user reviews. The implementation is carried out using the
Python programming language. Naive Bayes classifier is a classification technique that
uses probability as the basis for predicting a data class. This method refers to Bayes'
theorem and is considered very effective in classifying (Muflih et al., 2023). The general
equation of the Naive Bayes classification algorithm is as in (4).
󰇛
󰇜
󰇛󰇜󰇛󰇜
󰇛󰇜
(4)
Information:
H: Make data assumptions for a specific category.
X: Data whose layers are not yet known.
P(H): Peluang H.
P(X): Peluang X.
P(H|X): The chance of an assumption based on circumstances.
P(X|H): Odds are based on circumstances and assumptions.
Sentiment Analysis Towards the KitaLulus Application Using the Naive Bayes Method from
Google Play Store Reviews
Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4597
Evaluation and Visualization
The final step is a very important evaluation because it aims to measure the
accuracy, recall, precision, and f1-score of the model being built. Then in the visualization
stage, it uses a confusion matrix to get a more detailed picture of the model's performance
in classifying each label. To provide a clearer visual representation of the prediction
distribution and model performance, we'll use a pie chart to show the percentage of each
label.
Results and Discussion
This research collects data by utilizing Google Collab to download data in the form
of reviews on the Kitalulus application from the Google Play store and then save it in
CSV form. In the process of accumulating this data, the researcher managed to get data
in the form of reviews from 597 review data that were considered relevant in the form of
comments and review scores. The data obtained is then downloaded into a .csv file. Below
is Figure 2 which is the data sweeping process.
Figure 2. Scraping Data
Figure 2 contains a data accumulation process program in the form of a review of
the KitaLulus application from the Google Play Store using google_play_scraper,
converting it to a DataFrame with pandas, and saving it to a CSV file named
'kitalulus_reviews.csv'.
Dataset Labeling
The data labeling phase involves classifying each user review on the KitaLulus app
into positive, negative, or neutral sentiment, which is used to train Naive Bayes' model in
measuring sentiment.
Nadia Amalia Putri, Agustina Srirahayu, Nugroho Arif Sudibyo
Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4598
Gambar 3. Pelabelan Data
Based on the score value, it is labeled as a modified sentiment, namely "negative"
for score values 1 and 2, "neutral" for score 3, and "positive" for other score values.
Preprocessing
The first step in preprocessing is the normalization of the text. In this step, the text
is normalized in the data frame to standardize abbreviations, for example changing "yg"
to "who", "with" to "with", "us" to "I've", "tp" to "but", etc. In addition, words such as
"already" and "already", as well as "no" and "no", are considered to have the same
meaning (Nehe et al., 2024).
Figure 4
Text Normalization
After normalizing the text, the next action is case folding, where all letters in the
user's review text are changed to lowercase, and then the cleansing stage is carried out.
Sentiment Analysis Towards the KitaLulus Application Using the Naive Bayes Method from
Google Play Store Reviews
Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4599
Gambar 5
Case Folding
Text cleansing is carried out using a variety of techniques including duplicate data
removal, blank text removal, character and punctuation removal, Uniform Resource
Locator (URL), and HyperText Markup Language (HTML) removal, as well as the
removal of unrelated symbols and references (Ramadhani & Suryono, 2024).
Model Creation
The next stage is model creation, where a machine-learning model is built to
analyze the sentiment of user reviews. From this stage, the Naive Bayes model is used for
text classification. The model is trained using TF-IDF processed and weighted data.
Figure 6
Model Making Process
In Figure 6, the process of creating a model is carried out by dividing it into training
data and test data using an 80:20 ratio. The SMOTE method is then applied to the training
data to deal with the problem of data asymmetry. SMOTE is an effective technique to
address the problem of imbalance in data sets, ensuring a more even distribution between
classes (Utami, 2022).
The evaluation is carried out at this stage using a confusion matrix and pie chart on
the visualization. During the evaluation, Naive Bayes' algorithm measures accuracy,
recall, precision, and f1-score.
Table 1
Evaluation Results
Precision
Recall
F1-score
Support
Positive
0.99
0.94
0.97
107
Neutral
0.33
0.75
0.46
4
Negative
0.56
0.56
0.56
9
Nadia Amalia Putri, Agustina Srirahayu, Nugroho Arif Sudibyo
Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4600
Accuracy
-
-
0.91
120
Macro
avg
0.63
0.75
0.66
120
Weighted
avg
0.94
0.91
0.92
120
Figure 8
Pie Chart
Based on the classification results, it can be seen that the Naive Bayes model gets
an accuracy score of 91%. Digging deeper into the results for each category, we found
that the model excelled at accurately classifying positive classes, with a precision of 0.99,
recall of 0.94, and an f1-score of 0.97. However, its performance is comparatively lower
when classifying negative and neutral classes. The negative class showed precision,
recall, and f1-score of 0.56 each, while the neutral class showed precision of 0.33, recall
of 0.75, and f1-score of 0.46. Taking into account the average macro, it shows precision
(0.63), recall (0.75), and f1-score (0.66). On the other hand, the weighted average shows
that this model maintains its high-performance level (precision 0.94, recall 0.91, f1-score
0.92), mainly due to the dominant sample size in the positive class. The confusion matrix
shows that the model is very good at classifying positive reviews with 101 correct
predictions but less accurate at classifying neutral and negative reviews.
In this study, Naive Bayes' methodology was applied to examine the sentiment of
Google Play Store reviews by utilizing scraping data. The results illustrate the use of TF-
IDF to convert the review text into numerical representations. The SMOTE method is
applied to the training dataset to increase the number of instances in underrepresented
categories while maintaining the majority class. After this, the training and testing
datasets are randomly shared to assess the model's performance. Classification analysis
shows that the model shows a fairly high level of accuracy.
Conclusion
The findings of the study show that the Naive Bayes classification model shows
effective performance in evaluating user sentiment towards the KitaLulus application
Sentiment Analysis Towards the KitaLulus Application Using the Naive Bayes Method from
Google Play Store Reviews
Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4601
based on the latest review data. A total of 597 review data points were collected during
the data collection phase, consisting of review comments and corresponding ratings. Each
review is categorized as "negative" for rankings 1 and 2, "neutral" for rankings 3, and
"positive" for the rest of the rating values. Furthermore, the TF-IDF weighting technique
is used to convert the review text into a numerical format. To overcome the class
imbalance, the SMOTE method was used on the training data. The classification results
revealed a model accuracy of 91%. The "positive" sentiment category shows exemplary
performance metrics with a precision of 0.99, a memory of 0.94, and an F1 score of 0.97.
In contrast, the "neutral" category displays a lower f1-score of 0.46, despite a relatively
high memory level of 0.75. In contrast, the "negative" category shows a consistent
precision, recall, and f1-score value of 0.56. In summary, it can be concluded that the
KitaLulus app has had a beneficial impact on the community, effectively identifying and
presenting the majority of positive reviews.
Nadia Amalia Putri, Agustina Srirahayu, Nugroho Arif Sudibyo
Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4602
Bibliography
Anwar, K. (2022). Analisa sentimen Pengguna Instagram Di Indonesia Pada Review
Smartphone Menggunakan Naive Bayes. KLIK: Kajian Ilmiah Informatika Dan
Komputer, 2(4), 148155.
Azhar, R., Surahman, A., & Juliane, C. (2022). Analisis Sentimen Terhadap
Cryptocurrency Berbasis Python TextBlob Menggunakan Algoritma Naïve Bayes.
J-SAKTI (Jurnal Sains Komputer Dan Informatika), 6(1), 267281.
Fikri, M. I., Sabrila, T. S., & Azhar, Y. (2020). Perbandingan metode naïve bayes dan
support vector machine pada analisis sentimen twitter. SMATIKA Jurnal: STIKI
Informatika Jurnal, 10(02), 7176.
Firmansyah, D. R., & Lestariningsih, E. (2024). Analisis Sentimen Ulasan Aplikasi Smart
Campus Unisbank di Google Playstore Menggunakan Algoritma Naive Bayes.
Jurnal JTIK (Jurnal Teknologi Informasi Dan Komunikasi), 8(2), 498507.
Gumilar, T. S., Astuti, R., & Wijaya, Y. A. (2024). Analisis Sentimen Ulasan Aplikasi
Lita Di Play Store Menggunakan Algoritma Naive Bayes. Jati (Jurnal Mahasiswa
Teknik Informatika), 8(1), 543550.
Hasri, C. F., & Alita, D. (2022). Penerapan Metode Naïve Bayes Classifier Dan Support
Vector Machine Pada Analisis Sentimen Terhadap Dampak Virus Corona Di
Twitter. Jurnal Informatika Dan Rekayasa Perangkat Lunak, 3(2), 145160.
Mastan, I. A., & Toni, Y. (2020). Analisis Sentimen Terhadap Tempat Kuliner Ayam
Gedebuk Dari Komentar Pengunjung Dengan Menggunakan Metode Naïve Bayes
Classifier. JBASE-Journal of Business and Audit Information Systems, 3(1).
Muflih, H. Z., Abdillah, A. R., & Hasan, F. N. (2023). Analisis Sentimen Ulasan
Pengguna Aplikasi Ajaib Menggunakan Metode Naïve Bayes. KLIK: Kajian Ilmiah
Informatika Dan Komputer, 4(3), 16131621.
Nehe, P. H., Berutu, S. S., & Budiati, H. (2024). Analisis Sentimen Opini Masyarakat
Terhadap Presiden Jokowi Sebelum Dan Sesudah Pilpres 2024 Menggunakan
Metode Naive Bayes Classification. Jutisi: Jurnal Ilmiah Teknik Informatika Dan
Sistem Informasi, 13(1), 451463.
Pattiiha, F. S., & Hendry, H. (2022). Perbandingan Metode K-NN, Naïve Bayes, Decision
Tree untuk Analisis Sentimen Tweet Twitter Terkait Opini Terhadap PT PAL
Indonesia. JURIKOM (Jurnal Riset Komputer), 9(2), 506514.
Ramadhani, B., & Suryono, R. R. (2024). Komparasi Algoritma Naïve Bayes dan
Logistic Regression Untuk Analisis Sentimen Metaverse. JURNAL MEDIA
INFORMATIKA BUDIDARMA, 8(2), 714725.
Sentiment Analysis Towards the KitaLulus Application Using the Naive Bayes Method from
Google Play Store Reviews
Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4603
Rozi, F., Sukmana, F., & Adani, M. N. (2021). Pengelompokkan Judul Buku dengan
Menggunakan Algoritma K-Nearest Neighbor (K-NN) dan Term Frequency
Inverse Document Frequency (TF-IDF). JIMP-Jurnal Informatika Merdeka
Pasuruan, 6(3).
Suprastiyo, W. T., & Airlangga, P. (2021). Penyebaran Informasi Lowongan Pekerjaan
Berbasis Website (Studi Kasus Mahasiswa Dan Alumni Unwaha). Jurnal Tekinkom
(Teknik Informasi Dan Komputer), 4(2), 280287.
Tanggraeni, A. I., & Sitokdana, M. N. N. (2022). Analisis Sentimen Aplikasi E-
Government pada Google Play Menggunakan Algoritma Naïve Bayes. JATISI
(Jurnal Teknik Informatika Dan Sistem Informasi), 9(2), 785795.
Utami, H. (2022). Analisis Sentimen dari Aplikasi Shopee Indonesia Menggunakan
Metode Recurrent Neural Network. Indonesian Journal of Applied Statistics, 5(1),
3138.
Wijaya, D., Saputra, R. A., & Irwiensyah, F. (2024). Analisis Sentimen Ulasan Aplikasi
Samsat Digital Nasional Pada Google Playstore Menggunakan Algoritma Naïve
Bayes. KLIK: Kajian Ilmiah Informatika Dan Komputer, 4(4), 23692380.
Yutika, C. H., Adiwijaya, A., & Al Faraby, S. (2021). Analisis Sentimen Berbasis Aspek
pada Review Female Daily Menggunakan TF-IDF dan Naïve Bayes. Jurnal Media
Informatika Budidarma, 5(2), 422430.