p–ISSN: 2723 - 6609 e-ISSN: 2745-5254

Vol. 5, No. 10, October 2024 http://jist.publikasiindonesia.id/

Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 4593

Sentiment Analysis Towards the KitaLulus Application

Using the Naive Bayes Method from Google Play Store

Reviews

Nadia Amalia Putri

, Agustina Srirahayu

, Nugroho Arif Sudibyo

Universitas Duta Bangsa, Indonesia

Email: [email protected]

, [email protected]

*Correspondence

ABSTRACT

Keywords: sentiment

analysis, naive Bayes,

kitalulus.

Job search apps like KitaLulus are essential in helping

graduates find jobs based on their skills and interests.

Sentiment analysis is needed to understand user opinions

about the KitaLulus application. The Naive Bayes method is

used in this analysis because of its high efficiency and

accuracy. This research used 597 data and achieved an

accuracy rate of 91%. The evaluation results show positive

sentiment values for precision, recall, and f1-score of 0.99,

0.94, and 0.97 respectively. On the other hand, the model

performance is low for negative and neutral sentiments. This

research aims to increase user understanding of the Kita

Lulus application and provide valuable assistance to

developers in their efforts to improve the quality of the

application. Conclusion The classification results revealed a

model accuracy of 91%. The "positive" sentiment category

shows exemplary performance metrics with a precision of

0.99, a memory of 0.94, and an F1 score of 0.97. In contrast,

the "neutral" category displays a lower f1-score of 0.46,

despite a relatively high memory level of 0.75. In contrast,

the "negative" category shows a consistent precision, recall,

and f1-score value of 0.56. In summary, it can be concluded

that the KitaLulus app has a beneficial impact on the

community, effectively identifying and presenting the

majority of positive reviews.

Introduction

In the era of digital technology that continues to advance, mobile applications have

become an important component of everyday existence. Among the categories of

applications that are increasingly preferred are those dedicated to job search and CV

creation. Given the increasing competitiveness in the job market, such apps provide a new

and effective avenue for job-seeking individuals to identify job openings that align with

their abilities and preferences.

Nadia Amalia Putri, Agustina Srirahayu, Nugroho Arif Sudibyo

Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4594

Unfortunately, in recent times, many graduates have faced challenges in identifying

relevant job opportunities, leading to a situation where most graduates choose to switch

career paths in pursuit of job openings, prioritizing securing a job rather than aligning

with their skill set. As a result, this trend results in scenarios where the skills acquired

during their education remain underutilized and fail to reach their full potential. This job

vacancy problem poses significant challenges for individuals from various backgrounds

(Suprastiyo & Airlangga, 2021). It is important to find effective ways to support new

graduates in securing job opportunities that align with their skills and passions. Therefore,

there is an urgent need for proactive measures to address the mismatch between job

seekers and available positions in the workforce.

The KitaLulus app is designed to assist individuals in the job market in identifying

job opportunities that align with their unique skills and interests. In addition, the app not

only facilitates the exploration of job openings but also allows users to create and

customize their resumes through an online platform. Serving as an important intermediary

between job seekers and potential employers, it is essential to understand user feedback

and engagement with this app. Recognizing the importance of user response and

interaction can greatly contribute to improving the effectiveness and user experience of

those platforms.

In this particular context, sentiment analysis emerged as an important tool to explain

the perspectives, emotions, and encounters of users about the Kitalulus app. Sentiment

analysis is believed to be a technique of mastering and processing datasets directly to

obtain information. Sentiment analysis is used as a detector of opinions about a topic and

an object (such as an individual, organization, or product) in a data set (Hasri & Alita,

2022). Accurate sentiment analysis can provide in-depth insights into user satisfaction,

allowing developers to take appropriate actions to improve the quality and

competitiveness of their applications, as well as respond to dynamic market demands

(Gumilar et al., 2024).

The Naive Bayes method applied to sentiment analysis classifies a review of an app

into sentiment categories such as positive, negative, or neutral. The Naive Bayes

classification method is meant to be a method that utilizes simple statistics based on

Bayes' theorem, which assumes the existence or absence of classes with other

characteristics (Azhar et al., 2022). When conducting sentiment analysis, many people

use the Naive Bayes method as method. The Naive Bayes classification suspects that the

chances of membership in a class are based on the assumption of independent prediction.

Therefore, the chances of the coming time can be determined according to experience at

the beginning. The advantage of the Naive Bayes method is that it is efficient because it

can shorten the sentiment analysis process. In addition, the Naive Bayes method often has

high accuracy despite the scant training data (Tanggraeni & Sitokdana, 2022).

The Naive Bayes method is famous for its ability to achieve high-accuracy results.

In a study researched by Fikri et al. in 2020 entitled "Comparison of Naive Bayes Method

and Support Vector Machine in Twitter Sentiment Analysis", it was found that both

methods have good performance, but the Naive Bayes method outperforms the SVM

Sentiment Analysis Towards the KitaLulus Application Using the Naive Bayes Method from

Google Play Store Reviews

Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4595

method in terms of accuracy, precision, recall, and f1-score (Fikri et al., 2020). Another

study reviewed by Pattiiha and Hendry in 2022 compared the K-Nearest Neighbor, Naive

Bayes, and Decision Tree methods to analyze sentiment expressed in Twitter tweets

regarding responses from PT PAL Indonesia . The results of the study prove that the

Naive Bayes method has a high accuracy value among the three methods, namely

achieving an accuracy value of 84.08% with a standard deviation of +/- 1.28% (Pattiiha

& Hendry, 2022).

Considering the results of previous studies that confirm the good performance of

the Naive Bayes method in sentiment analysis. The hope of this study is not only to

provide deeper insight into user satisfaction with the Kitalulus application but also to

provide additional input for developers to improve the quality of the application.

Method

The methodology of this research is used to ensure accuracy and consistency in the

process of data collection and analysis.

Data Collection

The data collection process is used as the beginning of the process of this research,

where KitaLulus application user data is obtained from the Google Play Store. The data

was collected using web scraping, which allowed researchers to automatically download

user reviews from the Google Play Store app page. By utilizing a web scraping method

that uses one of the libraries in the Python programming language, google-play-scraper,

the review data is automatically extracted and compiled into CSV form (Firmansyah &

Lestariningsih, 2024).

Dataset Labeling

Data labeling is the process of ranking individual user reviews on the KitaLulus

application based on the sentiment expressed. In this study, reviews were judged based

on their emotional value. Comments with ratings 1 and 2 are marked as "negative", with

3 ratings as "neutral" and "positive" for the other scores. Accuracy and consistency in

labeling are important in this way to ensure that the data is organized and then used as

intended (Wijaya et al., 2024).

Preprocessing

Data preprocessing is an important action in sentiment analysis, where texts are

prepared for human review to be used in building Naive Bayesian models. The process is

carried out first to process data, therefore researchers will avoid the problem of irregular

data. The purpose of technical information is a classification that has a high level of

accuracy (Anwar, 2022). The stages of preprocessing to process the data in this study are

word normalization, case folding, cleansing, tokenizing, stopword removal, and

stemming. In the context of word normalization, the process of word processing is carried

out, including the conversion of words that deviate from standard language norms into

words that comply with linguistic conventions and are considered normal (Yutika et al.,

2021). Case folding is the process of changing the shape of words so that they have the

same meaning, either by changing everything to lowercase or uppercase. Cleansing is the

Nadia Amalia Putri, Agustina Srirahayu, Nugroho Arif Sudibyo

Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4596

process of removing unnecessary words to reduce clutter or clutter in the data. Tokenizing

is used to separate a series of words into segments separated by spaces or special

characters. The stopword removal task removes words that are frequently seen and

generic, indicating that the text being analyzed is less important. Stemming is the process

of returning words containing suffixes to their basic or original form (Mastan & Toni,

2020).

TF-IDF Weighting

TF-IDF (Term Frequency-Inverse Document Frequency) is a technique used to

convert text into numerical representations that can be understood by the Naive Bayes

model. The TF-IDF technique can be described as a technique used to assign a frequency

that is only visible to each word or token, where each word is weighted based on its

importance in the document. This weight is calculated based on the number of times a

word appears in a document and evaluates the word relative to the existence of all

documents (Rozi et al., 2021). Here's the TF-IDF equation:

  󰇛 󰇜   󰇛 󰇜 󰇛󰇜 (1)

where:

󰇛 󰇜  





(2)

󰇛󰇜   





(3)

and:

t = word

d = document

Model Creation

After weighting TF-IDF, the next stage is the application of the model using the

Naive Bayes method to classify the sentiment of user reviews. Model creation is an

important stage in sentiment analysis, where Naive Bayes' model is trained to classify the

sentiment of Kitalulus app user reviews. The implementation is carried out using the

Python programming language. Naive Bayes classifier is a classification technique that

uses probability as the basis for predicting a data class. This method refers to Bayes'

theorem and is considered very effective in classifying (Muflih et al., 2023). The general

equation of the Naive Bayes classification algorithm is as in (4).



󰇛







󰇜



󰇛󰇜󰇛󰇜

󰇛󰇜

(4)

Information:

H: Make data assumptions for a specific category.

X: Data whose layers are not yet known.

P(H): Peluang H.

P(X): Peluang X.

P(H|X): The chance of an assumption based on circumstances.

P(X|H): Odds are based on circumstances and assumptions.

Sentiment Analysis Towards the KitaLulus Application Using the Naive Bayes Method from

Google Play Store Reviews

Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4597

Evaluation and Visualization

The final step is a very important evaluation because it aims to measure the

accuracy, recall, precision, and f1-score of the model being built. Then in the visualization

stage, it uses a confusion matrix to get a more detailed picture of the model's performance

in classifying each label. To provide a clearer visual representation of the prediction

distribution and model performance, we'll use a pie chart to show the percentage of each

label.

Results and Discussion

This research collects data by utilizing Google Collab to download data in the form

of reviews on the Kitalulus application from the Google Play store and then save it in

CSV form. In the process of accumulating this data, the researcher managed to get data

in the form of reviews from 597 review data that were considered relevant in the form of

comments and review scores. The data obtained is then downloaded into a .csv file. Below

is Figure 2 which is the data sweeping process.

Figure 2. Scraping Data

Figure 2 contains a data accumulation process program in the form of a review of

the KitaLulus application from the Google Play Store using google_play_scraper,

converting it to a DataFrame with pandas, and saving it to a CSV file named

'kitalulus_reviews.csv'.

Dataset Labeling

The data labeling phase involves classifying each user review on the KitaLulus app

into positive, negative, or neutral sentiment, which is used to train Naive Bayes' model in

measuring sentiment.

Nadia Amalia Putri, Agustina Srirahayu, Nugroho Arif Sudibyo

Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4598

Gambar 3. Pelabelan Data

Based on the score value, it is labeled as a modified sentiment, namely "negative"

for score values 1 and 2, "neutral" for score 3, and "positive" for other score values.

Preprocessing

The first step in preprocessing is the normalization of the text. In this step, the text

is normalized in the data frame to standardize abbreviations, for example changing "yg"

to "who", "with" to "with", "us" to "I've", "tp" to "but", etc. In addition, words such as

"already" and "already", as well as "no" and "no", are considered to have the same

meaning (Nehe et al., 2024).

Figure 4

Text Normalization

After normalizing the text, the next action is case folding, where all letters in the

user's review text are changed to lowercase, and then the cleansing stage is carried out.

Sentiment Analysis Towards the KitaLulus Application Using the Naive Bayes Method from

Google Play Store Reviews

Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4599

Gambar 5

Case Folding

Text cleansing is carried out using a variety of techniques including duplicate data

removal, blank text removal, character and punctuation removal, Uniform Resource

Locator (URL), and HyperText Markup Language (HTML) removal, as well as the

removal of unrelated symbols and references (Ramadhani & Suryono, 2024).

Model Creation

The next stage is model creation, where a machine-learning model is built to

analyze the sentiment of user reviews. From this stage, the Naive Bayes model is used for

text classification. The model is trained using TF-IDF processed and weighted data.

Figure 6

Model Making Process

In Figure 6, the process of creating a model is carried out by dividing it into training

data and test data using an 80:20 ratio. The SMOTE method is then applied to the training

data to deal with the problem of data asymmetry. SMOTE is an effective technique to

address the problem of imbalance in data sets, ensuring a more even distribution between

classes (Utami, 2022).

The evaluation is carried out at this stage using a confusion matrix and pie chart on

the visualization. During the evaluation, Naive Bayes' algorithm measures accuracy,

recall, precision, and f1-score.

Table 1

Evaluation Results

Precision

Recall

F1-score

Support

Positive

0.99

0.94

0.97

107

Neutral

0.33

0.75

0.46

Negative

0.56

Nadia Amalia Putri, Agustina Srirahayu, Nugroho Arif Sudibyo

Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4600

Accuracy

0.91

120

Macro

avg

0.63

0.75

0.66

120

Weighted

avg

0.94

0.91

0.92

120

Figure 7

Confusion Matrix

Figure 8

Pie Chart

Based on the classification results, it can be seen that the Naive Bayes model gets

an accuracy score of 91%. Digging deeper into the results for each category, we found

that the model excelled at accurately classifying positive classes, with a precision of 0.99,

recall of 0.94, and an f1-score of 0.97. However, its performance is comparatively lower

when classifying negative and neutral classes. The negative class showed precision,

recall, and f1-score of 0.56 each, while the neutral class showed precision of 0.33, recall

of 0.75, and f1-score of 0.46. Taking into account the average macro, it shows precision

(0.63), recall (0.75), and f1-score (0.66). On the other hand, the weighted average shows

that this model maintains its high-performance level (precision 0.94, recall 0.91, f1-score

0.92), mainly due to the dominant sample size in the positive class. The confusion matrix

shows that the model is very good at classifying positive reviews with 101 correct

predictions but less accurate at classifying neutral and negative reviews.

In this study, Naive Bayes' methodology was applied to examine the sentiment of

Google Play Store reviews by utilizing scraping data. The results illustrate the use of TF-

IDF to convert the review text into numerical representations. The SMOTE method is

applied to the training dataset to increase the number of instances in underrepresented

categories while maintaining the majority class. After this, the training and testing

datasets are randomly shared to assess the model's performance. Classification analysis

shows that the model shows a fairly high level of accuracy.

Conclusion

The findings of the study show that the Naive Bayes classification model shows

effective performance in evaluating user sentiment towards the KitaLulus application

Sentiment Analysis Towards the KitaLulus Application Using the Naive Bayes Method from

Google Play Store Reviews

Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4601

based on the latest review data. A total of 597 review data points were collected during

the data collection phase, consisting of review comments and corresponding ratings. Each

review is categorized as "negative" for rankings 1 and 2, "neutral" for rankings 3, and

"positive" for the rest of the rating values. Furthermore, the TF-IDF weighting technique

is used to convert the review text into a numerical format. To overcome the class

imbalance, the SMOTE method was used on the training data. The classification results

revealed a model accuracy of 91%. The "positive" sentiment category shows exemplary

performance metrics with a precision of 0.99, a memory of 0.94, and an F1 score of 0.97.

In contrast, the "neutral" category displays a lower f1-score of 0.46, despite a relatively

high memory level of 0.75. In contrast, the "negative" category shows a consistent

precision, recall, and f1-score value of 0.56. In summary, it can be concluded that the

KitaLulus app has had a beneficial impact on the community, effectively identifying and

presenting the majority of positive reviews.

Nadia Amalia Putri, Agustina Srirahayu, Nugroho Arif Sudibyo

Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4602

Bibliography

Anwar, K. (2022). Analisa sentimen Pengguna Instagram Di Indonesia Pada Review

Smartphone Menggunakan Naive Bayes. KLIK: Kajian Ilmiah Informatika Dan

Komputer, 2(4), 148–155.

Azhar, R., Surahman, A., & Juliane, C. (2022). Analisis Sentimen Terhadap

Cryptocurrency Berbasis Python TextBlob Menggunakan Algoritma Naïve Bayes.

J-SAKTI (Jurnal Sains Komputer Dan Informatika), 6(1), 267–281.

Fikri, M. I., Sabrila, T. S., & Azhar, Y. (2020). Perbandingan metode naïve bayes dan

support vector machine pada analisis sentimen twitter. SMATIKA Jurnal: STIKI

Informatika Jurnal, 10(02), 71–76.

Firmansyah, D. R., & Lestariningsih, E. (2024). Analisis Sentimen Ulasan Aplikasi Smart

Campus Unisbank di Google Playstore Menggunakan Algoritma Naive Bayes.

Jurnal JTIK (Jurnal Teknologi Informasi Dan Komunikasi), 8(2), 498–507.

Gumilar, T. S., Astuti, R., & Wijaya, Y. A. (2024). Analisis Sentimen Ulasan Aplikasi

Lita Di Play Store Menggunakan Algoritma Naive Bayes. Jati (Jurnal Mahasiswa

Teknik Informatika), 8(1), 543–550.

Hasri, C. F., & Alita, D. (2022). Penerapan Metode Naïve Bayes Classifier Dan Support

Vector Machine Pada Analisis Sentimen Terhadap Dampak Virus Corona Di

Twitter. Jurnal Informatika Dan Rekayasa Perangkat Lunak, 3(2), 145–160.

Mastan, I. A., & Toni, Y. (2020). Analisis Sentimen Terhadap Tempat Kuliner Ayam

Gedebuk Dari Komentar Pengunjung Dengan Menggunakan Metode Naïve Bayes

Classifier. JBASE-Journal of Business and Audit Information Systems, 3(1).

Muflih, H. Z., Abdillah, A. R., & Hasan, F. N. (2023). Analisis Sentimen Ulasan

Pengguna Aplikasi Ajaib Menggunakan Metode Naïve Bayes. KLIK: Kajian Ilmiah

Informatika Dan Komputer, 4(3), 1613–1621.

Nehe, P. H., Berutu, S. S., & Budiati, H. (2024). Analisis Sentimen Opini Masyarakat

Terhadap Presiden Jokowi Sebelum Dan Sesudah Pilpres 2024 Menggunakan

Metode Naive Bayes Classification. Jutisi: Jurnal Ilmiah Teknik Informatika Dan

Sistem Informasi, 13(1), 451–463.

Pattiiha, F. S., & Hendry, H. (2022). Perbandingan Metode K-NN, Naïve Bayes, Decision

Tree untuk Analisis Sentimen Tweet Twitter Terkait Opini Terhadap PT PAL

Indonesia. JURIKOM (Jurnal Riset Komputer), 9(2), 506–514.

Ramadhani, B., & Suryono, R. R. (2024). Komparasi Algoritma Naïve Bayes dan

Logistic Regression Untuk Analisis Sentimen Metaverse. JURNAL MEDIA

INFORMATIKA BUDIDARMA, 8(2), 714–725.

Sentiment Analysis Towards the KitaLulus Application Using the Naive Bayes Method from

Google Play Store Reviews

Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4603

Rozi, F., Sukmana, F., & Adani, M. N. (2021). Pengelompokkan Judul Buku dengan

Menggunakan Algoritma K-Nearest Neighbor (K-NN) dan Term Frequency–

Inverse Document Frequency (TF-IDF). JIMP-Jurnal Informatika Merdeka

Pasuruan, 6(3).

Suprastiyo, W. T., & Airlangga, P. (2021). Penyebaran Informasi Lowongan Pekerjaan

Berbasis Website (Studi Kasus Mahasiswa Dan Alumni Unwaha). Jurnal Tekinkom

(Teknik Informasi Dan Komputer), 4(2), 280–287.

Tanggraeni, A. I., & Sitokdana, M. N. N. (2022). Analisis Sentimen Aplikasi E-

Government pada Google Play Menggunakan Algoritma Naïve Bayes. JATISI

(Jurnal Teknik Informatika Dan Sistem Informasi), 9(2), 785–795.

Utami, H. (2022). Analisis Sentimen dari Aplikasi Shopee Indonesia Menggunakan

Metode Recurrent Neural Network. Indonesian Journal of Applied Statistics, 5(1),

31–38.

Wijaya, D., Saputra, R. A., & Irwiensyah, F. (2024). Analisis Sentimen Ulasan Aplikasi

Samsat Digital Nasional Pada Google Playstore Menggunakan Algoritma Naïve

Bayes. KLIK: Kajian Ilmiah Informatika Dan Komputer, 4(4), 2369–2380.

Yutika, C. H., Adiwijaya, A., & Al Faraby, S. (2021). Analisis Sentimen Berbasis Aspek

pada Review Female Daily Menggunakan TF-IDF dan Naïve Bayes. Jurnal Media

Informatika Budidarma, 5(2), 422–430.