p–ISSN: 2723 - 6609 e-ISSN: 2745-5254
Vol. 5, No. 10, October 2024 http://jist.publikasiindonesia.id/
Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4253
Implementation of the YOLO Method for Detection of
Human Emotions Based on Facial Mimics
Thomas Felison
1*
, Erwin conery firtan
2
, Steven
3
, Willyam Chandra
4
, Saut Dohot
Siregar
5
Universitas Prima Indonesia, Indonesia
1*
2
,
3
4
*Correspondence
ABSTRACT
Keywords: emotion
detection, facial
expressions, digital
imagery, yolo method.
The specific purpose of this study is to test the accuracy of
the YOLO method in recognizing human facial expressions
through tests that involve several types of expressions such
as angry, surprised, happy, neutral, and afraid. Emotion
detection through facial expression recognition plays an
important role in everyday life, such as how to respond
correctly to emotional expressions in social interactions so
that you can establish and build verbal or nonverbal
communication with other people and so on. Facial
expressions are facial changes in response to a person's
emotional state, intentions, or social communication. Face
detection is the first step that must be taken in facial analysis,
including facial expression recognition. Many methods can
be used to carry out the face detection process, such as the
YOLO method. This YOLO method reframes object
detection as a single regression problem, directly from image
pixels to bounding box coordinates and class probabilities.
By using the YOLO method, the process only needs to look
once at the input image, to predict what objects are in the
image and where those objects are. Based on the results of
the tests carried out, the YOLO method can be used to detect
human facial expressions with a success rate of 80%, with
neutral, surprised, and disgusted facial expressions having a
good level of accuracy and fearful facial expressions having
a poor accuracy level. The YOLO method can detect
the facial expressions of humans who wear accessories such
as glasses.
Introduction
The use of digital images has been widely used to identify a shape object, one of
which is to recognize human facial expressions. Research on facial expression analysis
has many uses, such as detecting facial expressions in the presence system, using facial
feature points to control the movement of 3D Augmented Reality objects, creating a
Thomas Felison, Erwin conery firtan, Steven, Willyam Chandra, Saut Dohot Siregar
Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4254
synthesis of human faces in a human-computer dialogue system, or detecting the presence
of micro-expressions of subtle movements in the face (Rosiani et al., 2018). One of the
real applications of human facial expression analysis is to recognize the facial expressions
of e-learning users, where learner interaction is a weakness that must be considered in e-
learning learning (Husdi, 2016). The process of recognizing human facial expressions is
also applied to the MOODSIC music player application, which will play music according
to the user's emotions obtained by detecting the user's facial expressions (Wijaya et al.,
2018). Emotion detection through facial expression recognition plays an important role
in daily life, such as how to properly respond to emotional expressions in social
interactions so that it can establish and build verbal or nonverbal communication with
others and so on. Another advantage is being able to see and understand the intention of
the interlocutor so that it will minimize deception and falsehood. The inability to
recognize facial emotional expressions can lead to inaccuracies in interpreting other
people's emotions/feelings, which will automatically lead to ambiguity and inaccurate
decision responses (Hartanto, 2019).
Facial expressions are facial changes in response to a person's emotional state,
intentions, or social communication. Face detection is the first step that must be done in
facial analysis, including facial expression recognition. Face detection aims to determine
whether or not there is a face in the image, and if there is a location of the face and the
size of each face in the image. (Budiyanta et al., 2021). In face detection, there are several
challenges such as the position of the face not facing directly to the camera, face scale,
facial expressions, face obstructed by other objects, and lighting conditions.
(Prasetyawan, 2020). Many methods can be used to carry out the face detection process,
such as Jatmoko has implemented the Viola-Jones algorithm in facial recognition, with
the results of research from all experiments obtaining an average accuracy score of 65%
(Jatmoko et al., 2020). Another research conducted by Putra and Krishna using the
eigenface and haar cascade classifier methods to carry out the facial recognition process,
with the results of the study obtaining facial recognition accuracy with a maximum
distance of 3 meters from the camera is 63% (Putra et al., 2023). To solve this problem,
the You Only Look Once (YOLO) method can be used. This YOLO method reframes
object detection as a single regression problem, directly from the image pixel to the
bounding box coordinates and class probability. Using this YOLO method, the process
only needs to look once at the input image, to predict what object is contained in the
image and where it is located. (Redmon, 2016).
Some similar studies that have been conducted before can be detailed as follows:
Table 1
Previous Research
Author's Name
(Year)
Heading
Recency
Implementation of the YOLO Method for Detection of Human Emotions Based on Facial
Mimics
Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 4255
Rizki
Rafiif
Amaanullah, Gracia
Rizka Pasfica, Satria
Adi
Nugraha
, Mohammad Rifqi
Zein, Faisal Dharma
Subordination (2022)
Implementation of
convol
utional neural network for
emotion detection through
faces. The results of the study
obtained an accuracy of
81.92% for training and
81.69% for testing.
Maximum accuracy
only as large
as
81.92%, so it is
expected that the
YOLO method can
have more accuracy
good.
Evan
Tanuwijaya
, Timotius,
David Christian
Kartamihardja,
Timothy Leonardo
Lianoto (2021)
Detection of Human Facial
Expressions Using
Conv
olution Neural Network on
Online Learning Images. An
accuracy of 0.94 was obtained
during training.
Method CNN
has a
long execution process,
so it is expected that the
YOLO method can
have a faster execution
time.
The purpose of the research is to implement the YOLO method in carrying out the
process of recognizing human facial expressions. After that, tests will be carried out on
the YOLO method so that the accuracy of the YOLO method in carrying out the process
of recognizing human facial expressions can be known. (Bamba et al., 2022).
The benefit of this study is to enrich the literature related to the detection of human
emotions through facial expression analysis using the YOLO method, as well as to
provide a reference for further research in the development of more efficient and accurate
methods of object detection and facial expressions. Practically, the results of this research
have the potential to be applied in various fields, such as human-computer interaction
(HCI), supervision and monitoring systems in work or education environments, as well
as the entertainment and technology industries. The implementation of the YOLO method
in facial expression detection can help improve interaction between humans and
machines, develop monitoring systems that are more responsive to human emotions, and
be used in entertainment applications that adjust content based on user expressions and
emotions. Thus, this research is expected not only to contribute to the development of
science but also to have a real impact on various technological and social sectors.
Method
Type of Research
This study is a quantitative research that adopts an experimental approach. This
research collects data from various online sources, and then conducts an analysis process
to produce relevant conclusions.
Time and Place of Research
This research took approximately half a year and has been carried out starting
January 2024 in the city of Medan, Indonesia.
Table 2
Thomas Felison, Erwin conery firtan, Steven, Willyam Chandra, Saut Dohot Siregar
Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4256
Research Schedule
Time
Activities
January
2024
March
2024
April
2024
May
2024
June
2024
Topic
Discus
sion
Research
Collection
Reference
Determinatio
n
Topic
Manufacture
Proposal
Collection
Data
Data
Analysis
Testing
Method
Evaluation
and
Repair
Manufacture
Report
Publication
Journal
Disseminatio
n
The following are the steps of the working procedure in this study:
1. Preliminary Stage
This research begins by looking for references from previous studies that are
relevant to the research topic to be conducted.
2. Problem Determination Stage
Determine the formulation of problems that occur in the identification of types of
facial expressions. In addition, the study also establishes problem boundaries to direct
focus on the scope of the research.
3. Data Collection Stage
This stage is carried out by collecting data regarding the You Only Look Once
(YOLO) method.
4. Analysis Stages
This process analyzes the working process of the method used and the method used
in detecting facial expressions. After that, an application will be designed that implements
the methods applied in this research.
5. Testing Stage
In this step, the collected data will be tested using a pre-prepared application.
6. Report Stage
The last stage of the report-making process is carried out by the guidelines for
writing research that has been set.
Implementation of the YOLO Method for Detection of Human Emotions Based on Facial
Mimics
Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 4257
The YOLO algorithm is a method in deep learning that uses a Convolutional Neural
Network (CNN) to detect objects in images. This algorithm divides the image into grids
of a certain size, where each grid predicts a bounding box and a class map for the objects
it contains. If a grid predicts the existence of an object, then a bounding box will be
predicted that surrounds the object. The confidence score is calculated for each bounding
box and selection is made based on the confidence value. (Nasution & Kartika, 2022).
The data sources for this study come from journal literature and books that support
the research topic. To ensure the success of the research, the tools used are as follows.
Hardware Requirements
In the process of designing and testing the system, a set of computer hardware is
used that has the following specifications.
a. Intel Core i3.
b. 4 GB RAM.
c. Hard drive 2 TB.
d. LCD monitor with a minimum screen resolution of 1024 x 768.
Software Requirements
The software to design this program is:
a. Windows 10 operating system.
b. Visual Studio C# 2013
c. Microsoft SQL Server 2012.
Results and Discussion
The working process of the implementation of the YOLO method for detecting
human emotions consists of two parts, namely:
The training process by entering the dataset
The user can select the expression type from the image file and click the Save button
to save the data. The dataset used in this study is 455 pieces of data, with details of 315
pieces of training data ( 70%) and 140 pieces of data ( 30%), namely each expression
has 45 pieces of training data and 20 pieces of testing data. The result of the training
process is in the form of a confidence value for each training image. This value will be
compared with the confidence value of the testing image. In this Expression Type Dataset
Input form, there are seven types of expressions, namely angry, disgusted, fearful, happy,
neutral, sad, and surprised. The display of the Expression Type Dataset Input form after
data storage can be seen in the following image:
Figure 1 Input Dataset
Thomas Felison, Erwin conery firtan, Steven, Willyam Chandra, Saut Dohot Siregar
Indonesian Journal of Social Technology, Vol. 5, No. 10, October 2024 4258
The testing process to detect human emotions is based on facial mimicry.
Users can select the image file to be recognized. The trick is to click the Browse
button so that the system will display the Browse dialog box. Then, the user can select
the desired image file and click the Open button, so the system will read the contents of
the selected image file and display it on the monitor screen. (Hutauruk et al., 2020). The
display of the Human Face Expression Recognition form after data input can be seen in
the following image:
Figure 2 Detection of Human Emotion
The next test will be to detect human facial expressions against input images. In
this test, there are seven types of human facial expressions with a breakdown of each
facial expression has 140 dataset images. The results of the tests carried out can be
detailed as follows:
Table 1
Angry Expression
Test Image
Types of
Expressions
Detection
Results
Value
Information
Angry
Angry
0
Succeed
Angry
Angry
0
Succeed
Angry
Angry
0
Succeed
Angry
Angry
0
Succeed
Angry
Angry
0
Succeed
Angry
Angry
0
Succeed
Angry
Sad
1692.7865
FAIL
Angry
Angry
0
Succeed
Implementation of the YOLO Method for Detection of Human Emotions Based on Facial
Mimics
Indonesian Journal of Social Technology, Vol. 5, No. 11, November 2024 4259
Angry
Angry
0
Succeed
Angry
Angry
0
Succeed
Angry
Angry
0
Succeed
Angry
Neutral
1710.5479
FAIL
Angry
Angry
0
Succeed
Angry
Angry
0
Succeed
Angry
Angry
0
Succeed
Angry
Fear
1568.9315
FAIL
Angry
Angry
0
Succeed
Angry
Angry
0
Succeed
Angry
Surprise
1718.3687
FAIL
Angry
Neutral
1730.9356
FAIL
The number of correct facial expression detections = 15 pieces and incorrect = 5 pieces
Table 2
Disgust Expression
Test Image
Types of
Expressions
Detection
Results
Value
Information
Disgust
Disgust
1438.016
Succeed
Disgust
Disgust
1161.1985
Succeed
Disgust
Disgust
1515.1654
Succeed
Disgust
Disgust
1609.9096
Succeed
Disgust
Disgust
1245.4341
Succeed
Disgust
Disgust
1634.6125
Succeed
Disgust
Disgust
1760.2670
Succeed