p–ISSN: 2723 - 6609 e-ISSN: 2745-5254

Vol. 5, No. 5 Mei 2024 http://jist.publikasiindonesia.id/

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2311

Classification Of Malaria Types Using Naïve Bayes

Classification

Hadin La Ariandi

, Arief Setyanto

, Sudarmawan

Universitas Amikom Yogyakarta, Indonesia

Email: [email protected]

, [email protected]

[email protected]

*Correspondence

ABSTRACT

Keywords: Naive Bayes

Classification;

Malaria Type

Classification;

Expert System for Malaria

Diagnosis.

This study was conducted to determine the level of accuracy

of the naïve Bayes classification method in determining the

group type of malaria. This method predicts the malaria

category based on the symptoms displayed. This study

divided the dataset used into 60% for training and 40% for

testing. The results showed that the naïve Bayes algorithm

had an accuracy rate of 99.8% in predicting malaria

categories. Model performance evaluation using confusion

matrix and ROC curve also showed promising results, with

classification accuracy of 0.998, error 0.002, and AUC

0.999. The results of the classification report show that the

Quartana, Tertiana, and Tropica categories are more

dominant than the Ovale categories based on precision,

recall, and f1-score. These results show that the naïve Bayes

classification method is effective in classifying types of

malaria and can be used to diagnose malaria.

Introduction

Malaria is a disease caused by inflammation of protozoa of the genus Plasmodium

and is easily recognised by signs of heat, cold, chills, and continuous chills (Dinata, 2018).

Malaria is one of the most widespread mosquito-borne diseases (Madhusudan, 2020).

Disease caused by inflammation of protozoa from the genus Plasmodium is transmitted

through the intermediaries of various vector genera Anopheles (Alviyanil’Izzah et al.,

2021). Malaria is still a threat to public health status, especially to people living in remote

areas. This is reflected in the issuance of Presidential Regulation Number: 2 of 2015

concerning the National Medium-Term Development Plan for 2015 - 2019, where malaria

is a priority disease that needs to be overcome and in RPJMN IV for 2020-2024 it is also

stated that the prevalence of major infectious diseases, one of which is malaria is still high

accompanied by the threat of emerging diseases due to high population mobility so that

it affects the degree of public health (Ramadhan & Khoirunnisa, 2021). This commitment

to malaria control is expected to be of concern to all of us nationally, regionally, and

globally, as produced at the 60th World Health Assembly (WHA) meeting in Geneva in

2007 on malaria elimination (Prajarini, 2016).

Hadin La Ariandi, Arief Setyanto, Sudarmawan

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2312

To the World Health Organization (World Health Organization), malaria can be

classified into 5, namely plasmodium falciparum, which causes tropical malaria;

plasmodium vivax, which causes malaria Persian; plasmodium ovale, which causes maria

ovale; plasmodium malaria According According According to causes quaternary

malaria, and plasmodium knowlesi causes malaria (Madhusudan, 2020). Malaria is

categorised as one of the diseases with effects and a reasonably large mortality rate. The

World Health Organization (World Health Organization) recorded 229 million malaria

problems and 409. 000 deaths were registered in 2019. Areas at risk are mainly in Africa,

but Southeast Asia, the Western Pacific, and the Mediterranean are also listed as areas at

risk. Each country strives to overcome malaria cases by referring to the comprehensive

commitment in the 60th World Health Assembly (WHA) in 2007 regarding malaria

elimination (Jiang et al., 2021).

The objectives of this study are:

1. Knowing the level of accuracy of the naïve Bayes classification method in determining

the group of types of malaria.

2. Knowing how many results are accurate and the performance of malaria types using

the naïve Bayes algorithm.

3. Prove whether the naïve Bayes classification method effectively classifies malaria

types.

Research Benefits

With the research that will be held, several hopes for the results of this research can

be helpful and play an essential role in adding insight into science. The benefits obtained

by conducting this research are as follows:

1. Mitigating and assisting the performance of medical professionals in classifying types

of malaria.

2. Provide information on the level of accuracy in the process of classifying malaria.

3. Adding insight for readers who want to learn naïve Bayes classification.

Research Methods

Researchers use quantitative research, a process of mathematical calculations, to

achieve the desired results. In this case, the dataset was compared with the Naïve Bayes

algorithm to find the most malaria-related impacts in each Puskesmas in Irian Jaya.

Nature of Research

The nature of the research carried out is experimental. It conducts a research

experiment to obtain accurate results or parameters by comparing the Naïve Bayes

algorithm. The accuracy results obtained from the comparison can be used to make

decisions about determining the feasibility of lending.

Research Approach

This research approach is quantitative, and researchers conduct research by the

stages or lines of research that have been made.

Data Collection Methods

Classification Of Malaria Types Using Naïve Bayes Classification

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2313

The data used in this study is obtained directly from the Darun Nahdla Capita Sharia

Cooperative and includes private data that has not been used in previous studies. The data

used in this study is from datasets from cooperative customer data from 2020 to 2022,

totalling 166 data points with 10 variables: gender, marital status, occupation, dependents,

income, loan amount, term, interest, instalments, and categories.

Data Analysis Methods

The data analysis method for this study is quantitative, while the data analysis

method follows the stages in the knowledge discovery in database (kdd) process used in

this study using Excel software tools and orange tools as follows:

Research Flow

Figure 1

Research Flow

Results and Discussion

Preprocessing Data

The data preprocessing stage is carried out to clean duplicate data, missing values,

and outliers in the dataset so that they are valid during the data processing. At this stage,

data transformation is also carried out by analysing variables that do not have contributive

information to make predictions and converting object-type data into integer form to

facilitate the data processing process. The following data preprocessing process uses

Jupyter Notebook software with Python programming language (Lestari et al., 2018).

Hadin La Ariandi, Arief Setyanto, Sudarmawan

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2314

The first step is to import the library that will be used to display the dataset using

the numpy and Pandas methods, which can be seen in the code below.

import numpy as np

import pandas as PD

import matplotlib.pyplot as plt

import seaborn as sns

The second step is to call the CSV format dataset into the data frame with the

PD.read_csv function and display the dataset, code and output results, as shown in Figure

2 below.

filecsv='Dataset_Patient_Malaria.CSV

teks = pd.read_csv(files, header = 0, delimiter= ';', encoding='utf-8')

df=pd.DataFrame(teks)

print(df)

df.head()

output:

Figure 2 Import Research Dataset

Figure 2 shows the 37 dataset variables used in this study, and several are

unnecessary, such as No, province, district, health facility, and patient name.

The third step deletes the columns not needed for the next process and the columns

to be deleted.

columns = ['No.','Provinsi ', 'Kabupaten','Fasyankes','Nama Pasien']

copy = df

dfClean = dfCopy.drop(columns, inplace=True, axis=1)

list(df.columns)

After deleting the columns that are not needed, the following columns will be used

for the following process: type of discovery, number, month/year, gender, pregnant / not

pregnant, hamlet address, village kelurahan, type of parasite, symptoms1, symptoms2,

Classification Of Malaria Types Using Naïve Bayes Classification

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2315

symptoms3, symptoms4, symptoms5, symptoms6, symptoms7, symptoms8, symptoms9,

symptoms10, livestock sheds, leaving the house at night, use of mosquito repellent,

ventilation gauze, puddles, history of living in endemic areas, the use of mosquito nets,

walls, the state of the house sky, mosquito breeding grounds, air temperature (°C),

humidity (%), rainfall (mm), malaria diagnosis (Shofia, Putri, & Arwan, 2017).

The fourth step separates variables into category and number variables using the

following code command:

#untuk define category variables

categorical = [var for var in pdf. columns if df[var].dtype=='O']

Output:

Discovery Type', 'Month/Year', 'Gender', 'Pregnant/Not Pregnant', 'Dusun_Alamat',

'Village Village', 'Parasite Type', 'Symptoms1', 'Symptoms2', 'Symptoms3', 'Symptoms4',

'Symptoms5', 'Symptoms6', 'Symptoms7', 'Symptoms8', 'Symptoms9', 'Symptoms10',

'Kandang_Ternak', 'Night rumah_pada Exit', 'Mosquito Obat_Anti Use',

'Kassa_Ventilasi', 'Genangan_Air', 'History of tinggal_di endemic areas',

'Penggunaan_Kelambu', 'Walls', 'House sky conditions', 'Mosquito Breeding Sites',

'Diagnosa_Malaria']

#to define a number variable

numerical = [var for var in pdf.columns if df[var].dtype!='O']

output:

['Number', 'Air Temperature (°C)', 'Humidity (%)', 'Rainfall (mm)']

Next, do data cleaning to clean up duplicate data or unused variables, missing

values and outliers. The code and output results can be seen in Figure 3 below.

df[categorical].isnull().sum()

df[numerical].isnull().sum()

Hadin La Ariandi, Arief Setyanto, Sudarmawan

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2316

Figure 3

Check the Dataset missing value category variable.

In Figure 3, the results above show that no null values are used in the dataset of

category variables other than symptom variables because symptoms can be empty (only

some symptoms).

Figure 4

Check the Dataset missing value variable number.

The result above in Figure 4 shows that no null values are used in the numeric

variable dataset. Each column has the same number of null values as zero. With no null

values other than symptom variables for category variables, this dataset appears to be

pretty clean and does not require any special steps to handle missing values (Fajar et al.,

2018).

Next, define the dependent and independent variables on the dataset. The dependent

variables selected are type of discovery, number, month/year, gender, pregnant / not

pregnant, hamlet address, village kelurahan, type of parasite, symptom1, symptom2,

symptom3, symptom4, symptom5, symptom6, symptom7, symptom8, symptom9,

symptom10, livestock shed, leaving the house at night, use of mosquito repellent,

Classification Of Malaria Types Using Naïve Bayes Classification

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2317

ventilation gauze, puddles, history of living in endemic areas, use of mosquito nets, walls,

state of the house sky, mosquito breeding site, air temperature (°C), humidity (%), rainfall

(mm) as independent variables with the ILOC method to select dependent and

independent variables based on column/variable index. In this case, it will use x, which

contains all dependent variables, and y, which contains the independent or target variable.

The code and output results can be seen in Figures 5 and 6 below.

#Menentukan dependent and independent variables

X = df.drop(['Diagnosa_Malaria'], axis=1)

y = df['Diagnosa_Malaria']

#Display dependent variables and independent variables

print (X)

print (y)

Output x:

Figure 5 Dependent Variables

Hadin La Ariandi, Arief Setyanto, Sudarmawan

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2318

Output y:

Figure 6 Independent Variables

The output above shows that the dependent variable (X) consists of 31 variables for

the independent variable (Y), namely the malaria diagnosis.

1. Correlation of the independent variable to the dependent variable

The correlation of the dependent variable to the independent variable is carried out

to determine how much influence the dependent variable/predictor has on the independent

/ target variable (Shofia et al., 2017). The correlation of independent variables based on

the dependent variable/predictor can be seen in Table 1 below.

Tabel 1

Korelasi Antar Variabel

Variable

Result

Types of Inventions

Month / Year

-0.043122

Gender

-0.045759

Pregnant / Not Pregnant

0.013941

Hamlet Address

-0.014255

Village Village

-0.02373

Types of parasites

0.246237

Gejala1

Gejala2

-0.071117

10.

Gejala3

0.208725

11.

Gejala4

0.037009

12.

Gejala5

-0.078445

13.

Gejala6

0.038522

14.

Gejala7

-0.127901

15.

Gejala8

0.121262

16.

Gejala9

-0.23095

17.

Gejala10

-0.656569

18.

Cattle shed

-0.03218

19.

Go out at night

-0.016352

20.

Use of mosquito repellent

-0.027067

21.

Cashier Ventilasi

-0.002299

22.

Puddle

-0.031197

23.

History of living in endemic areas

-0.016352

Classification Of Malaria Types Using Naïve Bayes Classification

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2319

Based on Table 1 above, it can be seen that the variables of discovery type,

month/year, gender, hamlet Address, village, symptom 1, symptom 2, symptom 5,

symptom 7, symptom 9, symptom 10, livestock drums, leaving the house at night, use of

mosquito repellent, ventilation gauze, puddles, history of living in endemic areas, use of

mosquito nets, mosquito breeding sites and rainfall (mm) do not affect the dependent

variable or target variable. Based on the calculation results, the correlation value obtained

is negative, so it can be said that the variable does not strongly influence the dependent

variable or target (Setiawan & Prihandono, 2019).

Model Testing

The model used to perform testing on the research dataset is the naïve Bayes

algorithm model. Model testing is performed to display the classification report of the

model used to see the value of classification evaluation metrics such as precision, recall,

F1-score, and accuracy.

Naïve bayes Algorithm Model Testing

Testing on datasets is carried out using the Naïve Bayes algorithm to determine the

classification report and accuracy in making classifications or predictions. The following

testing process uses Jupyter Notebook software with Python programming language.

Testing the naïve Bayes algorithm with split or 90/10 data sharing for code and

output results can be seen below.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.10, random_state =

y_pred = gnb.predict(X_test)

from sklearn import metrics

from sklearn.metrics import classification_report

cr1 = classification_report(y_test, y_pred)

akurasi = metrics.accuracy_score(y_test, y_pred)

rint (cr1)

print ('The value of accuracy possessed by the model: %0.2f ' %(akurasi*100),'%')

precision recall f1-score support

Ovale 1.00 1.00 1.00 5

Quartana 1.00 0.86 0.92 7

Tertiana 0.98 1.00 0.99 61

Tropica 1.00 1.00 1.00 56

24.

Use of mosquito nets

-0.016352

25.

Wall

0.041548

26.

The state of the house's sky

0.02998

27.

Mosquito Breeding Place

-0.03218

28.

Angka

0.024472

29.

Air Temperature (°C)

0.075598

30.

Humidity (%)

0.050986

31.

Precipitation (mm)

-0.02056

Hadin La Ariandi, Arief Setyanto, Sudarmawan

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2320

accuracy 0.99 129

macro avg 1.00 0.96 0.98 129

weighted avg 0.99 0.99 0.99 129

Accuracy value possessed by the model: 99.22 %

The above results can be explained. Precision is the ratio of correctly predicted

positive observations to predicted positive totals. The precision for the Ovale class is 1.00,

which means all class data predicted as the Ovale class is correct. The precision for the

Quartana class is 1.00, which means all class data predicted as the Quartana class is

correct. The precision for the Tertiana class is 0.98, which means that 98% of the class

data predicted as the Tertiana class is the Tertiana class. The precision for the Tropica

class is 1.00, which means all class data predicted as the Tropica class is correct. Recall

is the ratio of correctly predicted positive observations to all actual positives. The recall

for the Ovale, Quartana, and Tropica classes is 1.00, indicating that the model correctly

identifies all instances of those classes. The recall for the Tertiana class is 0.98, which

means the model manages to capture 98% of the actual instances of the Tertiana class.

The F1-Score is a weighted average of precision and recall. The range is from 0 to 1,

where 1 is the best F1-Score. The F1-Score for the Ovale and Tropica classes is 0.97,

reflecting a good balance between precision and recall for the Ovale and Tropica classes.

The F1-score for the Quartana class is 0.92, and the Tertiana class is 0.99, indicating a

somewhat lower balance between precision and recall for the Quartana class Tertiana

class compared to the Ovale class and Tropica class. Support indicates the actual number

of class occurrences in the specified dataset. There are 5 Ovale class data, 7 Quartana

class data, 61 Tertiana class data and 56 Tropica class data. The overall accuracy is

99.22%, representing the ratio of correctly predicted class data to total class data. Overall,

the model performs well, especially for Ovale-class, Tertiana-class and Tropica-class

data, achieving high precision and recall. For the Quartana class, the precision is perfect,

but the recall is slightly lower, showing some difficulty in capturing all the data for the

Quartana class (Shen & Shafiq, 2020).

Testing the naïve Bayes algorithm with split or 80/20 data division for code and

output results can be seen below.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state =

y_pred = gnb.predict(X_test)

from sklearn import metrics

from sklearn.metrics import classification_report

cr1 = classification_report(y_test, y_pred)

akurasi = metrics.accuracy_score(y_test, y_pred)

print(cr1)

print ('The accuracy value possessed by the model: %0.2f ' %(akurasi*100),'%')

Classification Of Malaria Types Using Naïve Bayes Classification

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2321

precision recall f1-score support

Ovale 0.95 1.00 0.97 18

Quartana 1.00 1.00 1.00 15

Tertiana 1.00 0.99 1.00 118

Tropica 1.00 1.00 1.00 106

accuracy 1.00 257

macro avg 0.99 1.00 0.99 257

weighted avg 1.00 1.00 1.00 257

Accuracy value owned by the model: 99.61%

Based on the results above, 80% of training and 20% of testing data sharing can be

explained. The precision for an Ovale class is 0.95, which means that 95% of the class

data predicted to be an Ovale class is an Ovale class. The precision for the Quartana,

Tertiana, and Tropica classes is 1.00, meaning all class data is predicted as correct. The

recall for the Ovale, Quartana, and Tropica classes is 1.00, indicating that the model

correctly identifies all instances of those classes. The recall for the Tertiana class is 0.99,

which means the model captures 99% of the actual instances of the Tertiana class. The

F1-Score is a weighted average of precision and recall. The range is from 0 to 1, where 1

is the best F1-Score. The F1-Score for the Quartana, Tertiana and Tropica classes is 1.00,

reflecting a good balance between precision and recall for the Quartana Tropica and

Tertiana classes. The F1-score for the Ovale class is 0.97, indicating a somewhat lower

balance between precision and recall for the Ovale class compared to the Quartana

Tropica and Tertiana classes. Support indicates the actual number of class occurrences in

the specified dataset. There are 18 Ovale class data, 15 Quartana class data, 118 Tertiana

class data and 106 Tropica class data. The overall accuracy is 99.61%, representing the

ratio of correctly predicted class data to total class data. Overall, the model performs well,

especially for Tropica-class and Quartana-class data, achieving high precision and recall.

For the Tertiana class, the precision is perfect, but the recall is slightly lower, showing

some difficulty in capturing all the data of the Tertiana class.

Testing the naïve Bayes algorithm with split or 70/30 data division for code and

output results can be seen below.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state =

y_pred = gnb.predict(X_test)

from sklearn import metrics

from sklearn.metrics import classification_report

cr1 = classification_report(y_test, y_pred)akurasi = metrics.accuracy_score(y_test,

y_pred)

print(cr1)

print ('Nilai akurasi yang dimiliki oleh model: %0.2f ' %(akurasi*100),'%')

Hadin La Ariandi, Arief Setyanto, Sudarmawan

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2322

precision recall f1-score support

Ovale 0.96 1.00 0.98 27

Quartana 1.00 1.00 1.00 18

Tertiana 1.00 0.99 1.00 178

Tropica 1.00 1.00 1.00 163

accuracy 1.00 386

macro avg 0.99 1.00 0.99 386

weighted avg 1.00 1.00 1.00 386

Accuracy value possessed by the model: 99.74%

Based on the results above, 70% of training and 30% of testing data sharing can be

explained. The precision for an Ovale class is 0.96, which means 96% of the class data

predicted as an Ovale class is an Ovale class. The precision for the Quartana, Tertiana,

and Tropica classes is 1.00, meaning all class data is predicted as correct. The recall for

the Quartana and Tropica classes is 1.00, indicating that the model correctly identifies all

instances of those classes. The recall for the Tertiana class is 0.99, which means the model

captures 99% of the actual instances of the Tertiana class. The F1-Score is a weighted

average of precision and recall. The range is from 0 to 1, where 1 is the best F1-Score.

The F1-Score for the Quartana, Tertiana and Tropica classes is 1.00, reflecting a good

balance between precision and recall for the Quartana, Tertiana and Tropica classes.

Support indicates the actual number of class occurrences in the specified dataset. There

are 27 Ovale class data, 18 Quartana class data, 178 Tertiana class data and 163 Tropica

class data. The overall accuracy is 99.74%, representing the ratio of correctly predicted

class data to total class data. The model performs well, especially for Quartana and

Tropica class data, where high precision and recall are achieved. For the Tertiana class,

the precision is perfect, but the recall is slightly lower, showing some difficulty in

capturing all the data of the Tertiana class.

Testing the naïve Bayes algorithm with split or 60/40 data division for code and

output results can be seen below.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.40, random_state =

y_pred = gnb.predict(X_test)

from sklearn import metrics

from sklearn.metrics import classification_report

cr1 = classification_report(y_test, y_pred)

akurasi = metrics.accuracy_score(y_test, y_pred)

print(cr1)

print('The value of accuracy possessed by the model: %0.2f ' %(akurasi*100),'%')

precision recall f1-score support

Ovale 0.97 1.00 0.99 36

Classification Of Malaria Types Using Naïve Bayes Classification

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2323

Quartana 1.00 1.00 1.00 23

Tertiana 1.00 1.00 1.00 242

Tropica 1.00 1.00 1.00 213

accuracy 1.00 514

macro avg 0.99 1.00 1.00 514

weighted avg 1.00 1.00 1.00 514

The accuracy value possessed by the model is 99.81 %

Based on the above results, 60% training and 40% testing can be explained with

data sharing. The precision for an Ovale class is 0.97, which means 97% of the class data

predicted as an Ovale class is an Ovale class. The precision for the Quartana, Tertiana,

and Tropica classes is 1.00, meaning all class data is predicted as correct. The recall for

classes Ovale, Quartana, Tertiana, and Tropica is 1.00, indicating that the model correctly

identifies all instances of those classes. The F1-Score is a weighted average of precision

and recall. The range is from 0 to 1, where 1 is the best F1-Score. The F1-Score for the

Quartana, Tertiana and Tropica classes is 1.00, reflecting a good balance between

precision and recall for the Quartana, Tertiana and Tropica classes. Support indicates the

actual number of class occurrences in the specified dataset. There are 36 Ovale class data,

23 Quartana class data, 242 Tertiana class data and 213 Tropica class data. The overall

accuracy is 99.81%, representing the ratio of correctly predicted class data to total class

data. Overall, the model performs well, especially for the Quartana-class, Tertiana-class

and Tropica-class data, achieving high precision and recall.

Testing the naïve Bayes algorithm with split or 50/50 data division for code and

output results can be seen below.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.50, random_state =

y_pred = gnb.predict(X_test)

from sklearn import metrics

from sklearn.metrics import classification_report

cr1 = classification_report(y_test, y_pred)

akurasi = metrics.accuracy_score(y_test, y_pred)

print(cr1)

print ('The value of accuracy possessed by the model: %0.2f ' %(akurasi*100),'%')

precision recall f1-score support

Ovale 0.98 1.00 0.99 44

Quartana 1.00 1.00 1.00 31

Tertiana 1.00 1.00 1.00 299

Tropica 1.00 1.00 1.00 268

accuracy 1.00 642

macro avg 0.99 1.00 1.00 642

Hadin La Ariandi, Arief Setyanto, Sudarmawan

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2324

weighted avg 1.00 1.00 1.00 642

Accuracy value possessed by the model: 99.84%

The data sharing of 50% training and 50% testing can be explained based on the

results above. The precision for an Ovale class is 0.98, which means that 98% of the class

data predicted as an Ovale class is an Ovale class. The precision for the Quartana,

Tertiana, and Tropica classes is 1.00, meaning all class data is predicted as correct. The

recall for classes Ovale, Quartana, Tertiana, and Tropica is 1.00, indicating that the model

correctly identifies all instances of those classes. The F1-Score is a weighted average of

precision and recall. The range is from 0 to 1, where 1 is the best F1-Score. The F1-Score

for the Quartana, Tertiana and Tropica classes is 1.00, reflecting a good balance between

precision and recall for the Quartana, Tertiana and Tropica classes. Support indicates the

actual number of class occurrences in the specified dataset. There are 44 Ovale class data,

31 Quartana class data, 299 Tertiana class data and 268 Tropica class data. The overall

accuracy is 99.84%, representing the ratio of correctly predicted class data to total class

data. Overall, the model performs well, especially for the Quartana-class, Tertiana-class

and Tropica-class data, achieving high precision and recall.

Based on the classification results of the Naïve Bayes algorithm, it can be concluded

that the results of the classification report on the algorithm show that the Quartana,

Tertiana and Tropica categories are more dominant than the Ovale category because the

precision, recall and f1-score values in the Quartana, Tertiana and Tropica categories are

higher than the precision, recall and f1-score values in the Ovale category. Then, the

highest accuracy value was obtained by the naïve Bayes algorithm in the fifth test with a

50/50 data division of 99.84%. More details can be seen in Table 2 below.

Table 2

Classification Report Naïve Bayes

Algoritma

Klasifikasi

Category

Precisio

Recall

F1-

Scor

Support

Accuracy

Naïve

Bayes

(90/10)

Oval

1.00

99.22%

Quartana

1.00

0.86

0.92

Tertiana

0.98

1.00

0.99

Tropica

1.00

Naïve

Bayes

(80/20)

Oval

0.95

1.00

0.97

99.61%

Quartana

1.00

0.92

Tertiana

1.00

0.99

1.00

118

Tropica

1.00

106

Naïve

Bayes

(70/30)

Oval

0.96

1.00

0.98

99.74%

Quartana

1.00

Tertiana

1.00

0.99

1.00

178

Tropica

1.00

163

Naïve

Bayes

(60/40)

Oval

0.97

1.00

0.99

99.81%

Quartana

1.00

Tertiana

1.00

242

Tropica

1.00

213

Classification Of Malaria Types Using Naïve Bayes Classification

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2325

Naïve

Bayes

(50/50)

Oval

0.98

1.00

0.99

99.84%

Quartana

1.00

Tertiana

1.00

299

Tropica

1.00

268

In Table 2 above, it can be seen that the highest value obtained by the naïve Bayes

algorithm in the fifth test, whose accuracy value was 99.84%, with a 50/50 data division.

Evaluation

At this stage, the Naïve Bayes algorithm was evaluated using the Confusion Matrix

method and the Receiver Operating Characteristic (ROC) curve. To find out the model's

performance on each algorithm with the help of jupyter notebook software Python

programming language.

Based on the results of the confusion matrix model evaluation, it can be seen that

the performance accuracy of the naïve Bayes algorithm model is 0.992, and the

classification error is 0.008. Furthermore, evaluation of the naïve Bayes algorithm model

was carried out using ROC to visually measure the performance of the classification

model, focusing on True Positive Rate and False Positive Rate at one point to provide

information on the performance of the naïve Bayes algorithm model in general.

Based on the figure above, the evaluation results of the naïve Bayes algorithm,

which compares the performance of data classification with the Area Under Curve (AUC)

technique of 0.976, are included in the excellent classification.

Based on the results of the confusion matrix model evaluation, it can be seen that

the performance accuracy of the naïve Bayes algorithm model is 0.998, and the

classification error is 0.002. Furthermore, an evaluation of the naïve Bayes algorithm

model was carried out using the ROC curve to visually measure the performance of the

classification model, focusing on the True Positive Rate and False Positive Rate at one

point to be able to provide information on the performance of the naïve Bayes algorithm

model in general.

Table 3 below shows the results of the performance evaluation of the Naïve Bayes

algorithm model using the confusion matrix and the ROC curve.

Tabel 3

Evaluasi Confusion Matrix dan Kurva ROC Naïve Bayes

Evaluation

Algoritma

Confusion Matrix

Fucking ROC

Classification

Accuracy

Classification

errors

AUC

Naïve Bayes

(90/10)

0.992

0.008

0.976

Naïve Bayes

(80/20)

0.996

0.004

0.999

Naïve Bayes

(70/30)

0.997

0.003

0.999

Naïve Bayes

(60/40)

0.998

0.002

0.999

Naïve Bayes

(50/50)

0.998

0.002

0.999

Hadin La Ariandi, Arief Setyanto, Sudarmawan

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2326

Best Results

Based on the results of data processing using jupyter notebook software using the

Python programming language on the naïve Bayes algorithm in classifying or predicting

the Tertiana, Tropica, Quartana and Ovale categories in malaria diagnosis, it is known

that the naïve Bayes algorithm with 60/40 and 50/50 dataset division evaluation has the

highest accuracy rate with an accuracy of 99.8%. Furthermore, in the evaluation of the

performance model using the confusion matrix, a classification accuracy of 0.998 and a

classification error of 0.002 was obtained, then an evaluation using the ROC curve that

focuses on True Positive Rate and False Positive Rate at one point to be able to provide

general algorithm performance information with an AUC of 0.999.

Conclusion

Based on the results of research on malaria diagnosis with the algorithm used,

namely Naïve Bayes, conclusions can be drawn:

1. The classification results of the Naïve Bayes algorithm have an accuracy of 99.8%

2. The performance evaluation of the confusion matrix model and the ROC curve of the

Naïve Bayes algorithm has a classification accuracy of 0.998, an error of 0.002, and

an AUC of 0.999.

3. The results of the classification report from the algorithm show that the Quartana,

Tertiana and Tropica categories are more dominant than the categories because the

precision, recall and f1-score values in the Quartana, Tertiana and Tropica categories

are higher than the precision, recall and f1-score values in the Ovale category.

Bibliography

Alviyanil’Izzah, Nur, Martia, Dina Yeni, Imaculata, Maria, Hidayatullah, Moh Iqbal,

Pradana, Andhika Bagus, Setiyani, Diyah Ayu, & Sapuri, Enes. (2021). Analisis

Teknikal Pergerakan Harga Saham Dengan Menggunakan Indikator Stochastic

Oscillator Dan Weighted Moving Average. Keunis, 9(1), 36–53.

https://doi.org/10.32497/keunis.v9i1.2307

Classification Of Malaria Types Using Naïve Bayes Classification

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 5, Mei 2024 2327

Dinata, A. (2018). Bersahabat dengan Nyamuk: Jurus Jitu Atasi Penyakit Bersumber

Nyamuk. Arda Publishing House.

Fajar, Riyant, Perdana, Rizal Setya, & Indriati, Indriati. (2018). Implementasi Metode

Naïve Bayes Dengan Perbaikan Missing Value Menggunakan Metode Nearest

Neighbor Imputation Studi Kasus: Penyakit Malaria Di Kabupaten Malang. Jurnal

Pengembangan Teknologi Informasi Dan Ilmu Komputer, 2(8), 2430–2434.

Jiang, G., Liu, Fen, Liu, Wenping, Shan, Chen, Yufeng, & Xu, Dongming. (2021). Effects

of information quality on information adoption on social media review platforms:

The moderating role of perceived risk. Data Science and Management, 1(1), 13–

22.

Lestari, Indri Dwi, Setiadi, Tedy, & Zahrotun, Lisna. (2018). Penerapan Data Mining

Menggunakan Metode Naïve Bayes Untuk Klasifikasi Tindakan Jenis Abortus Di

Rsud Duta Mulya. Jurnal Sarjana Teknik Informatika, 6(2), 60–68.

Madhusudan, Desai Mitesh. (2020). Stock Closing Price Prediction Using Machine

Learning SVM Model. International Journal for Research in Applied Science and

Engineering Technology.

Prajarini, D. (2016). Perbandingan Algoritma Klasifikasi Data Mining Untuk Prediksi

Penyakit Kulit. INFORMAL: Informatics Journal, 1(3), 137–141.

Ramadhan, Nur Ghaniaviyanto, & Khoirunnisa, Azka. (2021). Klasifikasi Data Malaria

Menggunakan Metode Support Vector Machine. Jurnal Media Informatika

Budidarma, 5(4), 1580–1584. https://doi.org/10.30865/mib.v5i4.3347

Setiawan, Aries, & Prihandono, Adi. (2019). Klasifikasi Tingkat Kerentanan Malaria

Pada Suatu Wilayah Menggunakan Na ร Ve Bayes Data Mining. VISIKES: Jurnal

Kesehatan Masyarakat, 18(1).

Shen, Jingyi, & Shafiq, M. Omair. (2020). Short-term stock market price trend prediction

using a comprehensive deep learning system. Journal of Big Data, 7, 1–33.

Shofia, Elsa Nuramilus, Putri, Rekyan Regasari Mardi, & Arwan, Achmad. (2017).

Sistem Pakar Diagnosis Penyakit Demam: DBD, Malaria dan Tifoid Menggunakan

Metode K-Nearest Neighbor-Certainty Factor. Jurnal Pengembangan Teknologi

Informasi Dan Ilmu Komputer, 1(5), 426–435.