pISSN: 2723 - 6609 e-ISSN: 2745-5254
Vol. 5, No. 5 Mei 2024 http://jist.publikasiindonesia.id/
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2201
Optimization of Early Detection of Tuberculosis: Use of
Multilayer Perceptron and Extreme Learning Machine with
Clinical Data
Ammar Waliyuddin Jannah
1*
, Berlian Al Kindhi
2
Institut Teknologi Sepuluh Nopember, Indonesia
*Correspondence
ABSTRACT
Keywords: Tuberculosis;
Early Detection; Machine
Learning; Multilayer
Perceptron; Extreme
Learning Machine.
This research takes an innovative step in the fight against
Tuberculosis (TB), one of Indonesia's prominent public
health challenges, by developing and evaluating Multilayer
Perceptron (MLP) and Extreme Learning Machine (ELM)
models in machine learning for early detection of TB using
clinical data. The main result of this study was the discovery
that the MLP model, when applied without the Synthetic
Minority Over-sampling Technique (SMOTE), achieved an
impressive accuracy of 95.00%, signalling significant
progress in TB early detection efforts. This discovery not
only highlights the great potential of applying machine
learning technology in improving the accuracy of TB
diagnosis but also paves the way for the possible application
of advanced technology in the health sector to deal with
infectious diseases. This research illustrates how machine
learning technology can be integrated into clinical practice
to detect TB cases early, thus enabling faster and more
precise treatment, ultimately reducing the spread of the
disease. This is particularly important given TB's significant
impact on public health, especially in developing countries.
The results also open up opportunities for further research
into applying machine learning techniques to other
infectious diseases, promising a paradigm shift in how we
detect and manage various health conditions.
Introduction
Tuberculosis (TB) remains one of the deadliest infectious diseases in the world,
with millions of new cases and deaths each year. According to a World Health
Organization (WHO) report, TB is included in the top 10 leading causes of death globally
(Alwarthan, Aslam, & Khan, 2022). In Indonesia, TB conditions are very concerning,
with the third rank in the world for the number of TB cases. Factors such as HIV
prevalence, poor socioeconomic conditions, and limited access to health services
exacerbate the situation. Low awareness and limited knowledge about TB among the
public often lead to inappropriate or late treatment. In addition, increasing resistance to
TB drugs is an additional challenge in controlling this disease (Al Amien, Rizki, &
Ammar Waliyuddin Jannah, Berlian Al Kindhi
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2202
Nasution, 2022). This situation requires innovative and effective detection and treatment
strategies to reduce the burden of TB in Indonesia.
Early detection of TB is essential to reduce the spread of the disease and ensure
effective treatment. Challenges in the early detection of TB are often associated with the
need for more accurate and affordable diagnostic tools, especially in developing countries
(Arumnisaa & Wijayanto, 2023). Many TB cases go undiagnosed or diagnosed late due
to nonspecific early symptoms. Therefore, advanced technologies such as machine
learning can help overcome these challenges, enabling more accurate and rapid analysis
of patient clinical data (Baharuddin, Azis, & Hasanuddin, 2019). This approach can help
identify cases of TB at an early stage, which is crucial to prevent transmission and initiate
appropriate treatment. Implementing this technology could be a paradigm shift in how we
detect and manage TB, especially in regions with limited resources (World Health
Organization, 2022).
Machine learning, as a branch of artificial intelligence, offers the possibility of
complex health data analysis. In the context of TB, machine learning algorithms can
identify patterns in clinical data that may not be seen with traditional methods (Ahmed &
Prakasam, 2023). Techniques such as Multilayer Perceptron (MLP) and Extreme
Learning Machine (ELM) hold promise in TB research, processing non-linear data
effectively. Machine learning can potentially improve TB diagnosis accuracy, reduce the
time it takes to detect cases and facilitate TB identification at an earlier stage compared
to conventional methods (Charles et al., 2016). The utilisation of this technology in
research and clinical practice could revolutionise how we deal with TB (Yeo,
Balakrishnan, Selvaperumal, & Nor, 2022).
Multilayer Perceptron (MLP) is a model of feedforward artificial neural networks
that relies on a layered structure that includes an input layer, one or more hidden layers,
and an output layer (Bikku, 2020). MLP's main advantage lies in its ability to model
complex non-linear relationships between variables, a common characteristic of clinical
data. MLP's multi-layered structure and adaptability make it suitable for analysing and
interpreting complex and varied patient data (Huang, Wang, & Lan, 2011).
In contrast, the Extreme Learning Machine offers a different approach. With its
main characteristic of randomly generating weights and biases for hidden layers, ELM
eliminates the need for weight adjustment during the learning process. The speed and
efficiency of ELM learning are key advantages, especially in the context of limited
datasets (Erlin, Desnelita, Nasution, Suryati, & Zoromi, 2022). ELM's ability to produce
fast and accurate results, albeit with smaller datasets, makes it an invaluable method in
this study (Peralez-González, Pérez-Rodríguez, & Durán-Rosal, 2023).
These two methods were chosen to compare the effectiveness of two different
approaches in processing and analysing limited data. MLP and ELM, each with its
strengths and uniqueness, offer valuable insights into how machine learning can be
applied in this research.
In machine learning research for TB, the available data is often biased. To overcome
this, this study will use data synthesis techniques to balance the dataset (Rashidi et al.,
Optimization of Early Detection of Tuberculosis: Use of Multilayer Perceptron and Extreme
Learning Machine with Clinical Data
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2203
2022). This will allow machine learning models to learn from more balanced datasets,
reduce prediction bias, and improve accuracy (Kavvas et al., 2018). Data synthesis not
only helps in providing a better representation of minority cases but also enables model
testing across multiple scenarios, improving model reliability in real-world conditions
(Gao et al., 2023).
This research aims to develop an effective machine-learning model for TB
prediction, with a particular focus on classifying two categories: TB positive and TB
negative. Researchers will collect patient clinical data from lung poly in hospitals, then
develop and test Multilayer Perceptron (MLP) and Extreme Learning Machine (ELM)
models, with and without data balancing techniques. The evaluation will focus on the
accuracy of classifying these two labels, with the hope that this study will make an
important contribution to the early detection of TB.
Table 1
Previous Research
Research 1
Heading
Evaluating the performance of multilayer perceptron
algorithm for tuberculosis disease Raman data
Year
2022
Researchers
Rahatullah, Suranjam Khan, Zahra Ali, Heena Ali, Adnan
Ahmed, & Iftikhar Ahmed
Research
Review
This study evaluated the use of Multilayer Perceptron (MLP)
in analysing Raman spectroscopic data for tuberculosis (TB)
patients.
The study examined peak Raman intensity related to the
concentration of specific biomolecules in blood serum
samples.
Research
Results
MLP's performance in classifying samples from active TB
cured TB and healthy control patients showed sensitivity
and specificity of 62-92% and 81-88%, respectively.
Relationship
with Research
to be
Researched
This study provides insight into the potential use of MLP in
TB diagnosis. In the context of the researchers' analysis,
which aimed to detect early TB using machine learning by
comparing ELM and MLP on 115 patient clinical datasets,
Ullah et al.'s study offers valuable insights into MLP
implementation in a health context. Researchers can utilise
their methodology and results as a reference to strengthen
Ammar Waliyuddin Jannah, Berlian Al Kindhi
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2204
aspects of data analysis in this study, specifically in
comparing the accuracy of MLP with ELM based on various
clinical parameters such as temperature, weight loss, pulse,
appetite, cough, shortness of breath, etc. This will help in
evaluating the effectiveness of these two methods in
diagnosing TB, as well as enable more accurate identification
of positive/negative TB
Improvements
in Research
Ullah et al., 2022 Ullah et al., 2022
Based on the problem formulation that has been set previously, the research
objectives are as follows:
1. To improve disease detection and management, Build and test machine learning
models to accurately classify two TB diagnostic labels.
2. Assess and compare the effectiveness of Multilayer Perceptron and Extreme Learning
Machine in TB diagnosis using clinical data of patients in hospitals.
3. Develop models that can overcome the problem of data imbalance in TB studies using
data synthesis techniques.
4. Determine the most efficient and effective machine learning approach for TB
detection, considering factors such as accuracy and data variability.
Research Methods
The researchers' research methodology aims to detect cases of pulmonary TB using
machine learning models. They adopt a framework, as shown in Figure 3.1, "IBM Data
Science Methodology," which consists of ten important stages. In this chapter, they will
detail how they apply these stages in the context of this study.
This research was conducted with the understanding that early detection of TB can
play a crucial role in limiting the spread of the disease, improving treatment outcomes,
and reducing the economic burden associated with advanced disease care. By referring to
clinical data from the Lung Poly of RSUD Prof. Dr W Z Johannes Kupang, the study aims
to develop data-based solutions to optimise TB detection. The proposed machine learning
model seeks to recognise patterns indicative of TB from clinical data, enabling faster and
more accurate detection, which could facilitate timely and effective medical
interventions.
This research will investigate how existing clinical data can be leveraged to inform
and train machine-learning models. This will involve collecting, cleaning, and analysing
data to identify significant variables that might predict the presence of TB. The success
of this approach is measured not only in terms of the technical performance of the
resulting model but also in its effect on improving early detection of TB and decreasing
TB disease prevalence and mortality rates in the region.
Optimization of Early Detection of Tuberculosis: Use of Multilayer Perceptron and Extreme
Learning Machine with Clinical Data
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2205
Analytic Approach
In the context of this study, the analytics approach is the primary focus in
determining how patient clinical data will be used to develop predictive models using
machine learning techniques. The model to be developed will utilize a dataset consisting
of a basic physical examination of the patient that includes the following variables:
temperature, Age, Sex, Weight, Pulse, Oxygen Saturation, Presence of Cough, Cough
with Phlegm, Ronchi, Wheezing, Alcohol Consumption, Presence of Fever, Weight Loss,
and Cigarettes Consumed.
Data Requirements
This research requires comprehensive and specific data collection for an accurate
classification model. The data required should reflect various aspects of the patient's
essential physical examination related to TB symptoms and risk factors.
This study's data collection was done ethically and with patient privacy in mind.
All data is anonymised before analysis to ensure the safety of patients' personal
information. This data is also maintained in quality to ensure the model's integrity. This
means that the data must be complete, consistent, and accurate. Data checks will include
validation for unreasonable values or outliers, handling missing data, and verifying
documented values.
Data was collected at the Lung Poly of RSUD Prof. Dr W Z Johannes Kupang. The
main objective was to collect comprehensive and high-quality data, which supported the
development of machine-learning classification models in the early detection of
tuberculosis. Figure 3.2 above shows the assessment form used at the Lung Poly of RSUD
Prof. Dr W Z Johannes Kupang, which is the main instrument in collecting patient clinical
data.
In the data collection process, selecting the right subject is the key to ensuring the
validity and reliability of the research results. Therefore, inclusion and exclusion criteria
are carefully set to choose the most suitable subjects; here are the inclusion and exclusion
criteria of this study
Inclusion Criteria:
1. Patients undergoing assessment at the Lung Poly of RSUD Prof. Dr W Z Johannes
Kupang in 2023.
2. Patients who seek outpatient treatment, are sick or check related to lung disease at the
poly.
3. Patients diagnosed or suspected of tuberculosis and those who have recovered from
tuberculosis only do routine examinations or regular checks.
Exclusion Criteria:
1. Patients who do not undergo an assessment at the Lung Poly of RSUD Prof. Dr W Z
Johannes Kupang in 2023.
2. Patients with medical conditions unrelated to lung disease, such as cognitive
impairment or severe comorbidities that may affect assessment results.
Ammar Waliyuddin Jannah, Berlian Al Kindhi
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2206
The researchers used purposive sampling to select patients who met the inclusion
criteria. Subjects were selected based on their relationship with the study's purpose,
namely, early detection of tuberculosis. This ensures that the sample obtained is highly
relevant to the research objectives and produces data that can provide significant insight
into the research problem.
Results and Discussion
Data Understanding Phase
This sub-chapter will describe the data understanding phase, an integral part of the
data analysis process. This phase concerns the collection, examination, and initial
understanding of the dataset used in this study. The main focus of this phase is to gain
deep initial insight into the characteristics and structure of the data, which will help in the
analysis and interpretation of subsequent results.
Data Description
This study relied on data from the assessment form provided by the Lung Poly of
RSUD Prof. Dr W Z Johannes Kupang. This assessment form is specifically designed to
record extensive and detailed clinical information from patients visiting pulmonary poly.
The data collected includes:
1. Clinical Information: This includes symptoms experienced, history of smoking and
alcohol, and results of physical examination. This information helps to obtain a
comprehensive picture of the patient's lung health status and represents an essential
variable in the clinical assessment of lung conditions.
2. Relevance to Lung Conditions: These data include common symptoms such as
coughing or shortness of breath and specific physical examination results such as
pulmonary auscultation revealing Ronchi or wheezing. Information about weight loss
or appetite is also collected, which can indicate chronic lung diseases such as
Tuberculosis.
Table 2
Variables and data types
Variable
Data Type
Gender
Demographic
Age
Demographic
Fever
Symptom
The presence
of a cough
Symptom
Optimization of Early Detection of Tuberculosis: Use of Multilayer Perceptron and Extreme
Learning Machine with Clinical Data
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2207
Coughing up
phlegm
Symptom
Coughing up
blood
Symptom
Nyeri Dada
Symptom
Weight Loss
Symptom
Decreased
appetite
Symptom
Shortness of
breath
Symptom
Ronchi
Symptom
Wheezing
Symptom
Weight
Physical
Examination
Weight
Physical
Examination
Pulse
Physical
Examination
Oxygen
Saturation
Physical
Examination
Body
Temperature
Physical
Examination
Blood
pressure
Physical
Examination
Alcohol
Consumption
Medical
History
Cigarette
Consumption
Medical
History
Ammar Waliyuddin Jannah, Berlian Al Kindhi
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2208
Table 1 presents the variables analysed in the study and their relevance to early
detection of Tuberculosis. These variables include:
1. Gender and Age (Demographic)
2. Fever, Presence of Cough, Cough with Phlegm/Blood, Chest Pain, Weight
Loss/Appetite, Shortness of Breath, Ronchi, and Wheezing (Symptoms)
3. Body Weight, Pulse, Oxygen Saturation, Body Temperature, and Blood Pressure
(Physical Examination)
4. Alcohol and Cigarette Consumption (Medical History)
Each of these variables has a vital role in understanding and identifying TB early,
increasing the chance for successful treatment and reducing the spread of the disease.
Gender and age can determine which groups are more susceptible to TB. Symptoms such
as coughing up phlegm and fever are the leading clinical indicators that trigger further
examination. Physical examination provides objective data about the patient's physical
condition, while medical history, such as alcohol and cigarette consumption, provides
context regarding risk factors.
Data Transformation
Transformation is the process of transforming data from one form or representation
to another form or representation for better analysis. The following data transformations
are carried out:
1. Age Grouping
Age grouping in clinical data aims to help machine learning models recognise
patterns related to tuberculosis risk. By reducing variability in age data, models can learn
from general trends related to specific age groups without being distracted by irrelevant
individual fluctuations. This facilitates learning and improves the accuracy of model
predictions.
Age grouping in researchers' clinical datasets is simplified into four categories to
help machine learning algorithms identify important patterns more efficiently. By
excluding unnecessary age variations, models are more accessible to train and tend to be
more accurate in their predictions. These age categories are:
1. 0 = Children and Adolescents (0-18 years)": These are children and adolescents in a
phase of growth and development in which TB's immune response and manifestations
differ from those of adults.
2. 1 = Young Adults (19-35 years)": This group was studied to evaluate the influence of
social and behavioural risk factors in early adult life.
3. 2 = Adults (36-60 years)": This phase is analysed by considering working conditions,
lifestyle, and the presence of comorbidities that can affect TB risk.
4. 3 = Elderly (61 years and over)": This age group focuses on treating TB in older
people, decreased immune function and potential complications.
Categorical Variables
The categorical variables in the researchers' dataset were originally in text form.
They needed to be converted into a numeric format to allow machine learning algorithms
to process and learn the data more effectively. Here are the changes made:
Optimization of Early Detection of Tuberculosis: Use of Multilayer Perceptron and Extreme
Learning Machine with Clinical Data
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2209
1. Gender was changed using '0' for females and '1' for males.
2. Change features that indicate the presence or absence of a condition, such as 'cough',
'fever', 'chest pain', 'Ronchi', or 'wheezing', and confirm that there are only two possible
values (0 for 'no' and 1 for 'yes').
3. Ordinal variables, such as "Weight Loss" with sequential categories, are coded
according to the order shown in Table 2.
Table 3
Ordinal Variable Encoding
Weight Loss
Code
No drop
0
There is a decrease of 1-5 kg
1
There is a decrease of 6-10 kg
2
There is a decrease of 11-15 kg
3
There is a decrease of more than 15 kg
4
Data distribution before and after SMOTE
The application of Synthetic Minority Over-sampling Technique (SMOTE) in this
study is aimed at overcoming the issue of class imbalance discussed in subchapter 4.1.2.
The total dataset includes 101 data divided into TB- and TB+ classes. The SMOTE
technique was chosen for its effectiveness in multiplying synthetic samples from minority
classes, which helped achieve a more balanced distribution of courses for model training.
Figure 4.3.3 shows the distribution of data on the first fold before and after the
implementation of SMOTE, illustrating the balance obtained between the two classes in
the training set after SMOTE.
Figure 1 Stratified cross-validation k-fold data distribution
For k-fold stratified cross-validation performed in five iterations, the data was
divided into groups, with 80% of the data used as training sets and 20% as test sets in
each fold. This implies that in each fold, approximately 80 data (63 tb+ and 17 tb-) are
used as training sets and 21 (16 tb+ and five tb-) as test sets. SMOTE is applied only to
training sets, increasing minority class representation without changing the natural
distribution of test sets. This approach ensures that model evaluation can reflect predictive
Ammar Waliyuddin Jannah, Berlian Al Kindhi
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2210
capabilities on data not involved in the training process, confirming the accuracy and
reliability of the model under realistic test conditions.
Modelling
In this chapter, researchers will explore the process of creating models using two
different methods: Extreme Learning Machine (ELM) and Multi-Layer Perceptron
(MLP). Both methods will be tested using synthesis data and without synthesis data and
then compared to determine which is more effective in the early detection of tuberculosis.
Implementasi Extreme Learning Machine (ELM)
In this study, an implementation of Extreme Learning Machine (ELM) using
Python was applied to a dataset consisting of 101 samples with 20 features. To determine
the optimal configuration of neurons in the hidden layer, extensive testing was carried out
on a range of 5 to 50 neurons. Aims to determine the configuration that provides the best
balance between the capacity of the model to learn complex patterns in the data and the
risk of overfitting and the need to maintain adequate learning capacity, especially in the
face of the complexity of classification tasks with two different classes. This
determination of the number of neurons ensures that the model has sufficient capacity to
identify meaningful patterns in the data.
Evaluation
Evaluation of model results is a critical step in measuring the effectiveness and
accuracy of the algorithms applied in predicting data. In this sub-chapter, we will analyze
and compare the performance of the two methods used in this study: Extreme Learning
Machine (ELM) and Multi-Layer Perceptron (MLP), both with the application of data
synthesis techniques using SMOTE and without data synthesis.
Comparison and Overall Analysis
Tabel 4
Perbandingan ELM dan MLP tanpa dan dengan SMOTE
Accuracy (%)
Without
SMOTE
SMOTE
ELM
67.33
81.33
MLP
95.00
94.00
This sub-chapter presents a synthesis of evaluations that have been performed on
Extreme Learning Machine (ELM) and Multi-Layer Perceptron (MLP) in the context of
using SMOTE data synthesis techniques and without SMOTE. This analysis aims to
explore more profound insights into how these two algorithms react to class imbalance
handling techniques and their implications for classification performance.
In Figure 4.8, ELM shows a significant improvement in accuracy from 67.33% to
81.33% after the implementation of SMOTE, which confirms the effectiveness of
SMOTE in overcoming class imbalances. This is reflected in the minority class's
increased precision (TB) and a more balanced F1 Score between the two classes,
signaling an improvement in fairer classification capabilities.
Optimization of Early Detection of Tuberculosis: Use of Multilayer Perceptron and Extreme
Learning Machine with Clinical Data
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2211
Conversely, MLP, which showed excellent performance without SMOTE with an
accuracy of 95.00%, only experienced a slight decrease in accuracy to 94.00% after the
implementation of SMOTE. This suggests that MLP is more resistant to class imbalance
and that SMOTE does not provide significant improvements in this context. However,
the decrease in recall for class TB- following SMOTE adoption requires further
consideration in the context of medical applications where the detection of false negatives
is critical.
From the comparative analysis that has been carried out, it becomes clear that the
decision of model selection depends not only on the evaluation metrics but also on the
context of the practical application in which the model will be applied. ELM, with the
application of SMOTE, showed significant improvements in addressing class imbalances,
while MLP showed robustness and high performance without the support of data
synthesis techniques. In clinical contexts, where precision and reliability of detection are
priorities, the model with the highest accuracy will take precedence for deployment.
Therefore, given the results that have been obtained, MLP without SMOTE, which shows
the highest accuracy, will be the top choice for further implementation. This step reflects
a commitment to the utilization of models that are not only statistically superior but also
that best fit real, immediate diagnostic needs.
Deployment
Deployment is an essential step in the machine learning model development cycle,
where the trained model is integrated into a production environment for use in real
applications. In the context of this research, the deployment process will focus on
deploying machine learning models developed using Python for early detection of
tuberculosis.
Deployment Preparation
Before the deployment process, it is necessary to ensure that the model has been
comprehensively tested and tuned to achieve optimal performance. Models selected based
on the evaluations in the previous subchapter should be serialized or saved in a reusable
format, such as using the pickle or joblib libraries in Python.
Deployment Techniques
The deployment technique will be creating an API (Application Programming
Interface) using the Flask framework in Python. This API will enable easy and flexible
integration with various front-end applications, both web-based and mobile.
The steps include:
1. API creation: Build an API that accepts input data, processes that data using a saved
model, and returns prediction results.
2. Integration with Models: This involves loading serialized models into the API so that
they can make predictions based on the inputs provided.
3. API testing: Performs comprehensive testing to ensure that the API works
appropriately, including error handling and invalid input.
API Publication
Ammar Waliyuddin Jannah, Berlian Al Kindhi
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2212
Once the API has been successfully created and tested, the next step is to publish
or host it so that users can access it. This is done using cloud services.
Issuance steps include:
1. Server Setup: Configure and set up servers in the cloud service.
2. API Deployment: Upload API code to the server and configure the server environment
to run it.
3. Security and Access Settings: Set up security protocols such as HTTPS and set up API
access.
4. Monitoring and Maintenance: Monitor API performance and perform maintenance
regularly to ensure that the API continues to operate optimally.
Integration with Client Applications
Published APIs can be integrated with client applications, which are website
platforms. This integration allows users to input data that the model will predict and
receive prediction results directly through the user interface.
Conclusion
Machine Learning Model Development (MLP and ELM): This research
successfully developed two machine learning models, Multilayer Perceptron (MLP) and
Extreme Learning Machine (ELM), for the classification of Tuberculosis (TB) based on
clinical data. The MLP model shows awe-inspiring performance with an accuracy of
95.00% without SMOTE implementation and 94.00% with SMOTE. On the other hand,
the ELM model showed a significant improvement in its performance after the
implementation of SMOTE, with accuracy increasing from 67.33% to 81.33%. These
findings signal success in developing effective models for early detection of TB.
Model Effectiveness in TB Detection: In the context of TB detection effectiveness,
MLP models without SMOTE show superior results. This model successfully achieves
high precision and recall balance for both classes (TB- and TB+), which is very important
in medical contexts. ELM, although showing improvement with SMOTE, remains less
effective than MLP. These findings underscore the importance of selecting appropriate
models in clinical applications for optimal outcomes.
Data Synthesis to Address Data Imbalance: This study also explores the
effectiveness of the Synthetic Minority Over-sampling Technique (SMOTE) technique in
addressing data imbalance. SMOTE managed to significantly improve the performance
of ELM models, demonstrating its effectiveness in overcoming class imbalances in
datasets. Although the application of SMOTE to MLP models resulted in a slight decrease
in accuracy, it still showed outstanding performance, confirming the importance of
techniques such as SMOTE in the development of machine learning models. MLP and
ELM Performance Comparison: In the comparison between MLP and ELM models, the
findings show that MLP without SMOTE has the best performance with 95.00%
accuracy, followed by MLP with SMOTE (94.00%) and ELM with SMOTE (81.33%).
Meanwhile, ELM without SMOTE showed the lowest accuracy (67.33%), which
highlights the importance of addressing class imbalances in model development. These
Optimization of Early Detection of Tuberculosis: Use of Multilayer Perceptron and Extreme
Learning Machine with Clinical Data
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2213
conclusions clarify the advantages of MLP in TB classification and its potential for
implementation in clinical environments.
Ammar Waliyuddin Jannah, Berlian Al Kindhi
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2214
Bibliography
Ahmed, S. Nafees, & Prakasam, P. (2023). A systematic review on intracranial aneurysm
and hemorrhage detection using machine learning and deep learning techniques.
Progress in Biophysics and Molecular Biology.
Al Amien, Januar, Rizki, Yoze, & Nasution, Mukhlis Ali Rahman. (2022). Implementasi
Adasyn Untuk Imbalance Data Pada Dataset UNSW-NB15 Adasyn
Implementation For Data Imbalance on UNSW-NB15 Dataset. Jurnal CoSciTech
(Computer Science and Information Technology), 3(3), 242248.
Alwarthan, Sarah, Aslam, Nida, & Khan, Irfan Ullah. (2022). An explainable model for
identifying at-risk student at higher education. IEEE Access, 10, 107649107668.
https://doi.org/10.1109/ACCESS.2022.3211070
Arumnisaa, Ressa Isnaini, & Wijayanto, Arie Wahyu. (2023). Comparison of Ensemble
Learning Method: Random Forest, Support Vector Machine, AdaBoost for
Classification Human Development Index (HDI). Sistemasi: Jurnal Sistem
Informasi, 12(1), 206218.
Baharuddin, Mus Mulyadi, Azis, Huzain, & Hasanuddin, Tasrif. (2019). Analisis
Performa Metode K-Nearest Neighbor Untuk Identifikasi Jenis Kaca. ILKOM
Jurnal Ilmiah, 11(3), 269274.
Bikku, Thulasi. (2020). Multi-layered deep learning perceptron approach for health risk
prediction. Journal of Big Data, 7(1), 50.
Charles, M. Katherine, Lindegren, Mary Lou, Wester, C. William, Blevins, Meridith,
Sterling, Timothy R., Dung, Nguyen Thi, Dusingize, Jean Claude, Avit-Edi, Divine,
Durier, Nicolas, & Castelnuovo, Barbara. (2016). Implementation of tuberculosis
intensive case finding, isoniazid preventive therapy, and infection control (“ three
I’s”) and HIV-tuberculosis service integration in lower income countries. PloS One,
11(4), e0153243.
Erlin, Erlin, Desnelita, Yenny, Nasution, Nurliana, Suryati, Laili, & Zoromi, Fransiskus.
(2022). Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan
Data Tidak seimbang. MATRIK: Jurnal Manajemen, Teknik Informatika Dan
Rekayasa Komputer, 21(3), 677690.
Gao, Cong, Killeen, Benjamin D., Hu, Yicheng, Grupp, Robert B., Taylor, Russell H.,
Armand, Mehran, & Unberath, Mathias. (2023). Synthetic data accelerates the
development of generalizable learning-based algorithms for X-ray image analysis.
Nature Machine Intelligence, 5(3), 294308.
Huang, Guang Bin, Wang, Dian Hui, & Lan, Yuan. (2011). Extreme learning machines:
a survey. International Journal of Machine Learning and Cybernetics, 2, 107122.
Kavvas, Erol S., Catoiu, Edward, Mih, Nathan, Yurkovich, James T., Seif, Yara, Dillon,
Optimization of Early Detection of Tuberculosis: Use of Multilayer Perceptron and Extreme
Learning Machine with Clinical Data
Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2215
Nicholas, Heckmann, David, Anand, Amitesh, Yang, Laurence, & Nizet, Victor.
(2018). Machine learning and structural analysis of Mycobacterium tuberculosis
pan-genome identifies genetic signatures of antibiotic resistance. Nature
Communications, 9(1), 4306.
Peralez-González, Carlos, Pérez-Rodríguez, Javier, & Durán-Rosal, Antonio M. (2023).
Boosting ridge for the extreme learning machine globally optimised for
classification and regression problems. Scientific Reports, 13(1), 11809.
Rashidi, Hooman H., Khan, Imran H., Dang, Luke T., Albahra, Samer, Ratan, Ujjwal,
Chadderwala, Nihir, To, Wilson, Srinivas, Prathima, Wajda, Jeffery, & Tran, Nam
K. (2022). Prediction of tuberculosis using an automated machine learning platform
for models trained on synthetic data. Journal of Pathology Informatics, 13, 100172.
https://doi.org/10.4103/jpi.jpi_75_21
Yeo, Yee Jin, Balakrishnan, Arun Seeralan, Selvaperumal, Sathish Kumar, & Nor, Illanur
Muhaini Binti Mohd. (2022). Android Controlled Fire Fighter Robot Using IoT.
2022 Sixth International Conference on I-SMAC (IoT in Social, Mobile, Analytics
and Cloud)(I-SMAC), 3242. IEEE.