p–ISSN: 2723 – 6609 e-ISSN: 2745-5254Vol. 5, No. 11, November 2 024 http://jist.publikasiindonesia.id/Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 5250Sentiment Analysis on Skincare Product Reviews UsingLexicon-Based and Comparison of SVM KernelLutfiana Sinta Lestari1*, Tri Sutrisno2, Irvan Lewenusa3Universitas Tarumanegara, IndonesiaEmail: [email protected]1*, [email protected]2,[email protected]3*CorrespondenceABSTRACTKeywords: sentimentanalysis, SVM kernel,lexicon-based, skincareproducts.The skincare industry has seen remarkable growth in recentyears, fueled by increasing public awareness of skincareand beauty. As awareness of the importance of skincaregrows, skincare products are becoming more popular. Theskincare brands available on the market today are diverse.However, not all skincare products offer the same quality,and some are more suitable for specific skin types orconcerns, depending on the ingredients used and productformulation. To help consumers understand skincarereviews, this study conducts sentiment analysis on skincareproducts, identifying whether reviews tend to be positive,negative, or neutral. The sentiment analysis utilizes alexicon-based approach with comparisons of various SVMkernels, including linear, polynomial, RBF, and sigmoid.Additionally, this research applies the Term Frequency-Inverse Document Frequency (TF-IDF) for word weighting.The study results indicate that the best performance wasachieved with the Sigmoid and Linear kernels when nooversampling technique was applied. The results for thelinear kernel without balancing achieved 81.83% accuracy,77.46% precision, 81.83% recall, and 79.53% F1 score.Meanwhile, the Sigmoid kernel yielded 81.83% accuracy,77.39% precision, 81.83% recall, and 79.53% F1-score.IntroductionIndustry skincare has experienced significant growth in recent years, driven byincreasing public awareness of skincare and beauty. Where everyone wants to have anattractive and pleasing appearance. An attractive appearance is not only from the clothesor accessories worn but having healthy skin is also support for appearance. Skincare isone of the facial skin treatments that can be done to maintain and maintain a healthyskin condition. By doing facial skincare, is a form of appreciation and concern for Sentiment Analysis on Skincare Product Reviews Using Lexicon-Based and
Comparison of SVM Kernel
Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 5251
yourself and others to support your appearance to be attractive (Sinaga & Hutapea,
2022). The use of skincare is one of the efforts to achieve an attractive appearance
through skin health.
With the increasing public awareness of the importance of skincare, skincare
products are becoming increasingly popular. The various skincare brands circulating in
the market today are very diverse. However, not all skincare products have the same
quality, and some of them are more suitable for skin types with specific concerns
depending on the ingredients used and how the product is formulated. An individual's
skin type and condition can affect how the skin reacts to the skincare used. Before
consumers decide to buy a skincare product, consumers should know their skin type and
condition. This can be done by reading the results of reviews from other consumers who
have purchased the product.
According to the results of a survey conducted in December 2019 in the US, the
majority of internet users 76% place the same level of trust between online reviews and
recommendations from family or friends in making purchase decisions. Several reviews
about skincare products can help consumers assess whether the quality of the skincare
brand is worth using or not. This is because not all skincare brands have good quality
and everyone has a different skin type. Even the same skin type is not necessarily
compatible with the same skincare. (Kamal, 2021). Following Figure 1 is sales data for
Brand Skincare Best Selling Local Online.
Figure 1
Sales Data of the Best-Selling Local Skincare Brand Online
Based on sales data from Compas. Id quoted from various e-commerce platforms
in the April-June 2022 period, sales for local skincare brands are said to be quite great.
Throughout the April-June 2022 period, total sales on the marketplace reached IDR
292.4 billion with a total transaction of 3.8 million. The Something skincare brand
managed to become the best-selling skincare brand in e-commerce with a total revenue
of IDR 53.2 billion. Followed by Scarlet with IDR 40.9 billion and Ms Glow with IDR
29.4 billion. This data shows the dominance of local brands in the Indonesian skincare
industry.
Reviews and opinions about several skincare brands are the focus of research to
identify whether there are trends in positive, negative, and neutral sentiments. By
classifying reviews, it can help consumers assess the quality of products and find
products that match their skin type. So this study focuses on sentiment analysis of
Lutfiana Sinta Lestari, Tri Sutrisno, Irvan Lewenusa
Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 5252
skincare product reviews. The following in Figure 2 is an example of a review of a
skincare product.
Figure 2
Skincare Review on Female Daily Forum
Based on the reviews above, there are many forms of phrases in expressing
positive or negative feelings from a product review. Phrases like "bounce is a
brightening effect" indicate user satisfaction with the quality of the product, especially
in terms of its scent and effect on the skin. In the context of the embedding lexicon,
these phrases can be considered part of the Natural Language Processing (NLP) which
is used to understand user sentiment. Lexicon embedding can be used to get information
about a word or phrase that is positive or negative. Where the lexicon sentiment is a list
of lexical features that are generally labeled according to positive or negative semantic
orientation (Bonta, Kumaresh, & Janardhan, 2019). With a lexicon-based approach that
uses sentiment words, it is essential to describe and understand the emotional aspects of
written communication, therefore lexicon-based techniques are used to carry out
labeling of existing data into positive, negative, and neutral.
In research on text classification using the lexicon-based approach, Oktaviana et
al (2022) also conducted. (Oktaviana, Sari, & Indriati, 2022). The study discusses
sentiment analysis on online lecture policies during the pandemic. The problem behind
this research is that there are policies issued by the government to minimize the spread
of COVID-19, these policies result in contributions to the community and many end up
having opinions on social media. So to find out the polarity of public opinion regarding
online lecture policies during the pandemic, the study conducted a sentiment analysis
using the Lexicon and Support Vector Machine (SVM). The result of this study is that
there is an increase after using lexicon-based features, namely with an accuracy value of
60%, a precision value of 56%, a recall value of 75%, and an f1-score of 64%,
compared to the classification process without Lexicon Based Feature which only
produces an accuracy value of 48%, a precision value of 46%, a recall value of 58%,
and an f1-score value of 52%. Where the evaluation value increased by 12% compared
to the SVM method which did not use the lexicon-based approach.
Other research conducted by Arsi & Waluyo (2021) (Arsi & Waluyo, 2021)
Discussed the sentiment analysis of the discourse on the relocation of the Indonesian
capital. The problem behind this study is that there are pros and cons related to the
relocation of the Indonesian capital on social media such as Twitter (X). So the study
Sentiment Analysis on Skincare Product Reviews Using Lexicon-Based and
Comparison of SVM Kernel
Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 5253
conducted a sentiment analysis related to the discourse on the relocation of the
Indonesian capital using the SVM algorithm. The result of this study is to produce an
accuracy of 96.68%, Precision of 95.82%, Recall of 94.04%, and AUC of 0.979.
Based on the description above, this study will conduct sentiment analysis using
lexicon-based with a comparison of kernels in SVM such as linear, polynomial, RBF,
and Sigmoid. The use of the SVM algorithm for classification in this study is because
previous studies have shown that the SVM method is one of the effective techniques for
text classification, including sentiment analysis. SVM works by separating data into
different classes using Hyperplane (Annur, Murtopo, & Fadilah, 2022). Meanwhile, in
this study, word weighting will use the Term Frequency-Inverse Document Frequency
(TF-IDF). The TF-IDF technique aims to overcome these limitations by giving weight
to the relationship of a word (term) on a document. (Lestandy, Abdurrahim, & Syafa’ah,
2021). The TF-IDF method has the advantage of its ability to give the right weight to
the terms in the document. By combining TF and IDF scores, the TF-IDF method can
identify the most relevant terms in the document. (Annisa, Kalifia, Bisnis, Humaniora,
& Yogyakarta, 2024). While lexicon-based in this study is used for labeling by
extracting opinion sentences automatically using a dictionary of opinion words that will
be used as a reference in the classification. (Roiqoh, Zaman, & Kartono, 2023) So with
this research hope that can add insight related to sentiment analysis in product reviews
skincare, and find out which kernel on the SVM method has the best performance.
Method
In classifying, it will be website-based using the Python programming language
and using the Flask framework. In the first process of creating a system, it is necessary
to know the needs that will be applied to the system.
The system that will be designed will later apply a comparison of the kernels of
SVMs such as linear, polynomial, RBF, and sigmoid. This aims to determine the best
performance in the sentiment analysis process for skincare product review data totaling
3000 data.
The dataset that has been collected will only use feature review and then positive,
negative, and neutral labeling using lexicon-based, and preprocessing. Furthermore,
datasets were distributed with a proportion of 80% training data and 20% testing data.
Furthermore, data imbalance is handled and without data imbalance is handled. The
method used in handling data imbalances is SMOTE. Then classification is carried out
using a comparison of SVM kernels. Finally, an evaluation was carried out using a
confusion matrix with the values of accuracy, precision, recall, and f1-score. The
process in the system design uses the waterfall method which consists of requirement
analysis, design, development, and testing.
After that, an elaboration was carried out regarding the creation of a classification
system for sentiment analysis based on the previous draft. The following is an
explanation of the stages:
Lutfiana Sinta Lestari, Tri Sutrisno, Irvan Lewenusa
Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 5254
1. Dataset collection is carried out before creating the system. The dataset used is user
review/review data on skincare products. The dataset was obtained through the
female daily website with a total of 3000 data.
2. The creation of the user interface is carried out based on the previously designed
interface design which includes the interface design of the admin and user modules.
Classify sentiment analysis on skincare product reviews. The first process carried
out is preprocessing using case folding, data cleaning, tokenizing, normalization,
stopword, and stemming. Then labeling is carried out using lexicon-based, and
weighting using TF-IDF. Then split the data. The distribution of datasets was carried
out by dividing the data into 80% training data and 20% testing data. Then an
oversampling process was carried out using SMOTE. Next, the implementation of the
SVM kernel was carried out. After the model creation is completed, the performance of
each kernel will then be evaluated. The creation of the model will later be implemented
using the Python programming language with the Flask framework. The last stage is to
test the system and validate the results using confusion matrix evaluation metrics,
namely accuracy, precision, recall, and f1-score.
Results and Discussion
Testing on sentiment analysis in skincare product reviews using lexicon-based and
comparison of SVM kernels implemented in website-based is by using confusion matrix
evaluation, black box testing, and UAT. Confusion matrix by knowing the value of
accuracy, precision, recall, and f1-score. Blackbox testing is used to test the
functionality of the system so that the system can run properly and minimize errors.
Meanwhile, by using UAT to ensure that the application will be built according to user
needs, it is easy to use as a sentiment analysis on skincare product reviews by users.
Testing on the dataset is carried out using kernel comparison, in each kernel data
balancing is carried out without using data balancing. The following in Figure 3 to
Figure 11 are the results obtained in each kernel and the process of using balancing and
without balancing.
The following in Figure 3 is the result of the classification of linear kernels
without using data balancing.
Figure 3
Linear kernel results without using data balancing
Sentiment Analysis on Skincare Product Reviews Using Lexicon-Based and
Comparison of SVM Kernel
Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 5255
The following in Figure 4 is the classification of polynomial kernels without using
data balancing.
Figure 4
Polynomial kernel results without using data balancing
The following in Figure 5 is the classification result of the RBF kernel without
using data balancing.
Figure 5
RBF kernel results without using data balancing
The following in Figure 6 is the classification result of the sigmoid kernel without
using data balancing.
Figure 6
Sigmoid Kernel Results Without Using Data Balancing
Lutfiana Sinta Lestari, Tri Sutrisno, Irvan Lewenusa
Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 5256
The following in Figure 7 is the result of the classification of linear kernels using
data balancing.
Figure 7
Linear Kernel Results Using Data Balancing
The following in Figure 8 is the result of the classification of polynomial kernels
using balancing data.
Figure 8
Polynomial Kernel Results Using Data Balancing
The following in Figure 9 is the classification of the RBF kernel using balancing
data.
Figure 9
RBF Kernel Results Using Data Balancing on Users
The following in Figure 10 is the classification of sigmoid kernels using data
balancing.
Sentiment Analysis on Skincare Product Reviews Using Lexicon-Based and
Comparison of SVM Kernel
Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 5257
Figure 10
Sigmoid Kernel Results Using Data Balancing
In testing using black box testing, a scenario or system test design is created on
each model of the skincare product sentiment analysis system that has been made. In
addition, evaluations were also carried out by distributing questionnaires and filled out
by application users.
From the results of the research that has been carried out, this study analyzes the
sentiment of skincare products using an SVM kernel comparison. The dataset used in
this study came from the female daily website which amounted to 3000 data. The labels
used in this study consist of 3 (three) labels, namely positive, negative, and neutral. In
its implementation, after the dataset input, namely preprocessing, labeling, TF-IDF
weighting, splitting the data by dividing it into a proportion of 80% data train and 20%
test data. Furthermore, an oversampling process is carried out or without an
oversampling process. The method used to overcome the data imbalance is the SMITE
technique. Furthermore, the classification process using SVM was carried out using a
comparison of 4 (four) kernels, namely RBF, Linear, Sigmoid, and Polynomial, and
continued with an evaluation using a confusion matrix. From the results of the
confusion matrix with the values of accuracy, precision, recall, and f1-score. The results
obtained vary, as shown in Table 1 below.
Table 1
Comparison Results
Kernel Confusion
Matrix
Without using Balancing Using Balancing
Linear Accuracy 81.83% 79%
Precision 77.46% 80.09%
Recall 81.83% 79%
F1-score 79.56% 79.43%
Kernel Confusion
Matrix
Without using Balancing Using Balancing
Polynomial Accuracy 68.33% 73.17%
Precision 69.24% 69.73%
Recall 68.33% 73.17%
F1-score 64.25% 70.72%
RBF Accuracy 80% 80.67%
Precision 75.6% 76.22%
Recall 80% 80.67%
F1-score 77.74% 78.38%
Lutfiana Sinta Lestari, Tri Sutrisno, Irvan Lewenusa
Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 5258
Sigmoid Accuracy 81.83% 76.67%
Precision 77.39% 79.6%
Recall 81.83% 76.67%
F1-score 79.54% 77.99%
Based on Table 1, the best performance is obtained on Sigmoid and Linear kernels
when not using oversampling techniques. The results obtained on the linear kernel
without balancing were 81.83% on accuracy, 77.46% on precision, 81.83% on recall,
and 79.53% on f1-score. Meanwhile, in the sigmoid kernel, it was 81.83% in accuracy,
77.39% in precision, 81.83% in recall, and 79.53% in f1-score. Meanwhile, in
polynomial kernels and RBF, the highest results were obtained when applying the
oversampling technique. In the polynomial kernel it was obtained by 73.17% in
accuracy, 69.73% in precision, 73.17% in recall, and 70.72% in f1-score. In the kernel,
RBF was obtained by 80.67% on accuracy, 76.22% on precision, 80.67% on recall, and
78.38% on f1-score.
Conclusion
Based on the implementation process and discussion that has been carried out, it is
concluded that to compare the SVM kernel using lexicon embedding in website-based
skincare product sentiment analysis, this study is applied using the Python programming
language. After the dataset input was carried out, the steps taken included
preprocessing, labeling, TF-IDF weighting, and data division into a proportion of 80%
of the data train and 20% of the data. Furthermore, an oversampling process was carried
out without oversampling, where the method used to overcome the data imbalance was
the SMATE technique. After that, a classification process using SVM was carried out
by comparing four kernels, namely RBF, Linear, Sigmoid, and Polynomial, which was
followed by evaluation using a confusion matrix.
The performance results of each SVM kernel show that the best performance is
obtained in the Sigmoid and Linear kernels when not using the oversampling technique.
The Linear kernel produces an accuracy of 81.83%, a precision of 77.46%, a recall of
81.83%, and an f1-score of 79.53%, while a Sigmoid kernel produces an accuracy of
81.83%, a precision of 77.39%, a recall of 81.83%, and an f1-score of 79.53%.
As a suggestion for future research development, other oversampling methods
such as the Adaptive Synthetic Sampling Approach (ADASYN) can be applied. Thus,
the best classification model for conducting sentiment analysis can be known through a
comparison of oversampling techniques applied to various SVM kernels.
Sentiment Analysis on Skincare Product Reviews Using Lexicon-Based and
Comparison of SVM Kernel
Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 5259
Bibliography
Annisa, Luthfiyah, Kalifia, Anna Dina, Bisnis, Fakultas, Humaniora, Dan, &
Yogyakarta, Universitas Teknologi. (2024). Analisis Teknik TF-IDF Dalam
Identifikasi Faktor-Faktor PenyebabDepresi Pada Individu. January, 2, 302–307.
Annur, Ahsinil Amal, Murtopo, Aang Alim, & Fadilah, Nurul. (2022). Analisis
Sentimen Aplikasi E-Learning Selama Pandemi Covid-19 Dengan Menggunakan
Metode Support Vector Machine Dan …. IJIR, 3(2), 9–17.
Arsi, Primandani, & Waluyo, Retno. (2021). Analisis Sentimen Wacana Pemindahan
Ibu Kota Indonesia Menggunakan Algoritma Support Vector Machine (SVM).
Jurnal Teknologi Informasi Dan Ilmu Komputer, 8(1), 147.
https://doi.org/10.25126/jtiik.0813944
Bonta, Venkateswarlu, Kumaresh, Nandhini, & Janardhan, N. (2019). A Comprehensive
Study on Lexicon-Based Approaches for Sentiment Analysis. Asian Journal of
Computer Science and Technology, 8(S2), 1–6. https://doi.org/10.51983/ajcst-
2019.8.s2.2037
Kamal, Willy Wildan. (2021). Analisis Sentimen Ulasan Produk Skincare
Menggunakan Metode Support Vector Machine (Studi Kasus: Forum Female
Daily). 65.
Lestandy, Merinda, Abdurrahim, Abdurrahim, & Syafa’ah, Lailis. (2021). Analisis
Sentimen Tweet Vaksin COVID-19 Menggunakan Recurrent Neural Network dan
Naïve Bayes. Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), 5(4),
802–808. https://doi.org/10.29207/resti.v5i4.3308
Oktaviana, Natasya Eldha, Sari, Yuita Arum, & Indriati, Indriati. (2022). Analisis
Sentimen terhadap Kebijakan Kuliah Daring Selama Pandemi Menggunakan
Pendekatan Lexicon Based Features dan Support Vector Machine. Jurnal
Teknologi Informasi Dan Ilmu Komputer, 9(2), 357–362.
https://doi.org/10.25126/jtiik.2022925625
Roiqoh, Salsabila, Zaman, Badrus, & Kartono, Kartono. (2023). Analisis Sentimen
Berbasis Aspek Ulasan Aplikasi Mobile JKN dengan Lexicon Based dan Naïve
Bayes. Jurnal Media Informatika Budidarma, 7(3), 1582–1592.
https://doi.org/10.30865/mib.v7i3.6194
Sinaga, Ricka Putri Yani Br, & Hutapea, Joan Yuliana. (2022). Analisis pengaruh brand
image, harga, dan review product terhadap keputusan pembelian skincare wardah
pada mahasiswa unai. Jurnal Ekonomi, Sosial & Humaniora, 3(08), 12–25.