p–ISSN: 2723 - 6609 e-ISSN: 2745-5254

Vol. 5, No. 7 July 2024 http://jist.publikasiindonesia.id/

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 7, July 2024 3392

Shallot Production Prediction System Using the C.45 Decision

Tree Algorithm

Aghnie Kurnia Fadhila

Universitas Jendral Achmad Yani, Indonesia

Email:

[email protected]

*Correspondence

ABSTRACT

Keywords: C4.5

algorithm; shallot;

decision tree; data mining.

This research applies the C4.5 algorithm, which is a machine

learning algorithm for classification using decision trees, in

a case study for predicting the performance of shallot

production. The data used includes attributes such as

production yield, land area, and productivity. The C4.5

Decision Tree algorithm is utilized to build an accurate

prediction model after going through data cleaning and

training processes. This study results in an application that

can perform the entire process of initial data processing to

data analysis using the aforementioned technique, making it

efficient and effective in analyzing large amounts of data to

obtain optimal prediction results.

Introduction

Shallots (Allium ascalonicum L) are vegetables that have high commercial value

both in terms of economy and nutrition (Damayanti, 2022). The health benefits of shallots

have been widely felt, and the related industry is booming, leading to an increase in

demand in the domestic market. (Nurhayati, Sibuea, Kusbiantoro, Silaban, & Wanto,

2022) The demand for shallots in Indonesia, both as a vegetable and as a seed, continues

to increase by 5% every year as the population grows and consumer interest increases

(Baihaqi, Handayani, & Pujianto, 2019).

Shallot production is one of the crucial agricultural sectors for the Indonesian

economy. Shallots, as one of the main horticultural commodities, have a significant role

in meeting the food needs of the community and making an important contribution to

farmers' income (Priyaungga, Aji, Syahroni, Aji, & Saifudin, 2020). However,

fluctuations in production caused by various factors such as climate change, pest attacks,

and suboptimal cultivation techniques are often a major challenge for farmers and

stakeholders (Hana, 2020).

To overcome these problems, a production prediction system is needed that can

provide accurate and reliable information. With a prediction system, farmers can better

plan their cultivation activities, optimize the use of resources, and minimize the risk of

losses due to production that is not by estimates (Kesuma & Kholifah, 2019). Decision

Tree C4.5 works by dividing the dataset into several subsets based on the attributes that

are most significant in influencing the target variable, in this case, the production of

Shallot Production Prediction System Using the C.45 Decision Tree Algorithm

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 7, July 2024 3393

shallots. Each branch in the decision tree represents a specific condition or decision that

leads to the outcome of the prediction.

Based on the trend of increasing demand for shallots in the domestic market, it is

important to analyze this problem to predict production that is likely to increase or

decrease (Wajhillah & Yulianti, 2017). Previous research (Zulkarnain & Marciano, 2022)

has been conducted on the Prediction of Shallot Harvest Production Using a Simple

Linear Regression Method. The results show a Mean Squared Error (MSE) of 2073311,

a Root Mean Squared Error (RMSE) score of 45533, and an R2 score of 0.98% or 98%

accuracy (Ghozali & Wibowo, 2019). Although previous studies predicted increased

production, evaluations with different analysis methods could provide a new perspective

on accuracy. Therefore, this study will compare the prediction results with previous

studies to determine significant differences in accuracy or other factors that need to be

considered (Maulana, Martanto, & Ali, 2023).

Research Methods

This research involves several stages in the data mining process. First, data was

included as a research subject. Next, preprocessing is carried out to prepare the data.

Then, the prediction process is carried out by applying the C4.5 Decision Tree algorithm

for testing. Finally, the results are analyzed and reported in the form of publications. The

diagram in Figure 1 shows the overall stages of the research.

Figure 1 Description of the research flow

Data Collection

In this study, data was obtained from BPS (Central Statistics Agency).

Aghnie Kurnia Fadhila

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 7, July 2024 3394

Pre-Processing

At this stage, the existing data is transformed into a suitable format to be used as an

object in the research. There are two stages of pre-processing carried out in this study,

namely Data Cleaning and Data Selection.

1. Data Cleaning

Data Cleaning is an important first step to maintaining the quality of data from non-

compliant data sources. This step involves correcting and removing inappropriate data to

make the information obtained more relevant. Incomplete, redundant, or inconsistent data

will be eliminated to make the analysis process easier.

2. Data Selection

The second stage after Data Cleaning is Data Selection, where data that already has

complete information is selected according to the needs of the necessary information. In

this stage, the data is selected from the overall 8 required attributes.

Prediction Process using Decision Tree C4.5

At this stage, data derived from the latest BPS and assessment sources that have

been processed into a data set during the previous pre-processing stage will be examined.

The process and steps of this method are as follows:

Data Training

Record data from BPS and assessments that have gone through the pre-processing

stage and combined into one dataset will be used as training data to train the C4.5 decision

tree algorithm. It aims to produce an accurate model to make predictions.

Algoritma Decision Tree C4.5

In this stage, the rules of the C4.5 Decision Tree algorithm are used to form a

decision tree model based on training data.

Decision Tree Model Creation

The decision tree model is made based on the classification results using the C4.5

Decision Tree algorithm, by the predetermined tree formation rules.

Data Testing

The data is used to test the performance of a previously trained algorithm when

faced with new data that has never been seen before. The test data will be classified using

the tree model that has been created.

Decision Tree Model Performance Testing

Testing is carried out on the decision tree model by entering test data into the tree

model that has been created, to evaluate the performance of the algorithm that has been

trained.

Labelling Data Uji

Once the test data has been classified using the prediction model, the next step is to

label the test data results between the categories of "less", "very", and "adequate".

Software Design and Development

This stage of software design and development is carried out by applying a

prediction tree model to the software creation process.

Testing and Evaluation

Shallot Production Prediction System Using the C.45 Decision Tree Algorithm

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 7, July 2024 3395

The evaluation is carried out by calculating the level of accuracy and assessing the

extent to which the system has succeeded in generating correct information based on the

model that has been created.

Results and Discussion

Analysis of Prediction of Shallot Production Using Decision Tree C4.5 Algorithm

Analysis using the Decision Tree C4.5 algorithm was carried out to predict the yield

of shallots based on historical data and factors that affect it such as weather, soil type, and

cultivation techniques. The methodology includes data collection, data cleaning, sharing

of training and test data, C4.5 model development, model evaluation, optimization, and

interpretation of results. This analysis aims to identify the main factors that affect shallot

production so that it can help farmers and stakeholders in making decisions related to

agricultural practices to increase production efficiently.

Data Collection

In this study, secondary data is taken from various websites to find data that is the

focus of the research. The purpose of these efforts is to ensure that the data used is valid

and supports the smooth and successful conduct of the research. After various search

efforts, finally, the dataset used consisted of 243 data records taken from the Central

Statistics Agency (BPS). By using this dataset, it is hoped that research can be carried out

well and produce accurate and relevant results.

Shallot Dataset

The following is a view of the dataset taken from the Central Statistics Agency with

the number of 8 attributes which can be seen in Table 1 of the Shallot Daset.

Aghnie Kurnia Fadhila

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 7, July 2024 3396

Data Cleaning

In the data cleaning stage, identification is an important step to ensure the quality

of the data used in the analysis. Therefore, in this study, the data-cleaning process is

carried out by identifying empty values. In the Shallot data obtained at the Central

Statistics Agency, the dataset has been checked that there is a missing value in the data.

As well as the removal of attributes on the shallot dataset. By carrying out this data-

cleaning process, the data used in the analysis becomes more accurate and reliable. In

Provincial

Code

Name

Provinsi

District

Code

Kota

Broad

Onion

Production

Red

Year

Parameter

West

Java

2013

Less

West

Java

285

2013

Very

West

Java

183

2013

Enough

West

Java

2915

31682

2013

Very

West

Java

1967

19728

2013

Very

West

Java

2013

Enough

West

Java

2013

Less

West

Java

237

2218

2013

Enough

West

Java

3658

36449

2013

Enough

West

Java

2150

23683

2013

Very

West

Java

204

2013

Enough

West

Java

197

950

2013

Less

West

Java

2013

Less

West

Java

2013

Less

West

Java

2013

Less

West

Java

2013

Enough

West

Java

2013

Enough

West

Java

2013

Less

West

Java

2013

Less

West

Java

2013

Less

Shallot Production Prediction System Using the C.45 Decision Tree Algorithm

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 7, July 2024 3397

addition, the proper data cleaning process also ensures that the results of the analysis

produced are of good quality and can be accounted for.

City district

code

Broad

Shallot production

Parameter

less

285

very

183

enough

2915

31682

very

1967

19728

very

enough

237

2218

enough

3658

36449

enough

2150

23683

very

204

enough

197

950

less

enough

Transformation Data

In this data analysis process, Data Transformation is carried out to facilitate data

processing and analysis more effectively. Data Transformation is carried out by changing

data variables into numerical data forms.

The dataset used in this study consists of 8 attributes which include ID, province

code, province name, city district code, Area, shallot production, year, and Parameters.

Of the datasets, 6 attributes have numerical data types, while the other two attributes have

nominal data types. To overcome this, this study transforms data by converting two

nominal attributes into numerical ones.

The nama_provinsi attribute and the Parameter attribute which originally consisted

of less, sufficient, and very were also changed to numeric by replacing less with a value

of 0, sufficient with a value of 1, and very with a value of 2. This altered data is then used

in research to facilitate analysis and testing.

Table 3

Transformation Data

City district code

Broad

Shallot production

Parameter

285

183

2915

31682

1967

19728

237

2218

Aghnie Kurnia Fadhila

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 7, July 2024 3398

3658

36449

2150

23683

204

197

950

In Table 3. Data Transformation is an example of a dataset that has been pre-

processed with Data Cleaning and Data Transformation, the dataset is ready to be carried

out for the Prediction process with the C4.5 Model.

Prediction Process with Model C4.5

Of all the data generated through preprocessing, this study divided the data into two

parts, namely training and testing data, where 171 or 70% was partitioned for training

data and 72 or 30% for testing data.

Table 4

Proportion of each class

Status

Code

Kabupaten_kota

Broad

Shallot

production

Parameter

Sum

Propos

ition

Sum

Prop

ositio

Sum

Propos

ition

Sum

Prop

ositio

Less

0.37

0.30

0.23

0.25

Enough

0.27

0.35

134

0.55

0.25

Very

0.36

0.35

0.22

121

0.50

Total(s)

243

1.00

243

1.00

243

1.00

243

1.00

 



󰇛󰇜





(Parameter 0,1,2) = 󰇡󰇡





󰇢



󰇡





󰇢󰇢 

󰇡󰇡





󰇢



󰇡





󰇢󰇢 󰇡󰇡





󰇢



󰇡





󰇢󰇢 1.557

1. Calculating Information Gain

This calculation is intended for each attribute used with S as a class set of less,

sufficient, and very much. Classes are less with code 0, classes are sufficient with ode 1

and classes are very with code 2 s reobtain. Cap

󰇛



󰇜







󰇛󰇜





Table 5

Entropy (City District Code)

City district

code

Less

Enough

Very

Shallot Production Prediction System Using the C.45 Decision Tree Algorithm

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 7, July 2024 3399





󰇛



󰇜

󰇡󰇡





󰇢



󰇡





󰇢󰇢  󰇡󰇡





󰇢



󰇡





󰇢󰇢  󰇡󰇡





󰇢



󰇡





󰇢󰇢



󰇛



󰇜

󰇡󰇡





󰇢



󰇡





󰇢󰇢  󰇡󰇡





󰇢



󰇡





󰇢󰇢 

󰇡󰇡





󰇢



󰇡





󰇢󰇢

󰇛



󰇜

󰇡󰇡





󰇢



󰇡





󰇢󰇢 

󰇡󰇡





󰇢



󰇡





󰇢󰇢  󰇡󰇡





󰇢



󰇡





󰇢󰇢

󰇛



󰇜















󰇛󰇜







󰇛



󰇜



󰇡





󰇢  󰇡





󰇢  󰇡





󰇢

Calculating Retro Gain



󰇛



󰇜































󰇛



󰇜

󰇧





















󰇨  󰇧





















󰇨

 󰇧





















󰇨

󰇛



󰇜



󰇛󰇜

󰇛󰇜



󰇛



󰇜









From the calculation process, the result of the information gain of the

kode_kabupaten_kota attribute was -1,471. The calculation of information acquisition is

calculated for all attributes, so the results are shown in Table 6.

Table 6

Attribute calculation results

No.

Attribute

Info Gain

Split Info

Gain Ratio

Code-

kabupaten_kota

-2.319

1.557

-1.471

Broad

-0.012

0.981

-0.012

Produksi_bawan

g_merah

- 1.739

1.482

-1.735

Based on the construction of the model that has been made, a tree model is created

that depicts the relationship between attributes.

Aghnie Kurnia Fadhila

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 7, July 2024 3400

Figure 2

Tree model

The results of the decision tree algorithm for the classification of shallot production

show that shallot production can be predicted using the parameters of land area and

production amount. This decision tree classifies production into three categories: less,

adequate, and very. In this tree, the main variable that affects the classification is the

amount of shallot production, with the initial node dividing the data based on whether the

shallot production is less than or equal to 1.5. This node has a high entropy, indicating

great uncertainty in the early stages. The division continues with additional parameters

such as land area, which helps to clarify the classification. For example, shallot

production below 1.5 is generally classified as "poor", while production above 1.5 but

below 4013.5 is generally classified as "adequate". Overall, this model shows that the

increase in the amount of shallot production and the variation in land area significantly

affect the classification of shallot production levels.

Conclusion

The above problem about the data of shallot prediction results can be solved by the

C4.5 algorithm method using classification rules to develop a prediction model that can

predict the quality of shallots. And identify important attributes that affect the quality of

shallots.

Shallot Production Prediction System Using the C.45 Decision Tree Algorithm

Jurnal Indonesia Sosial Teknologi, Vol. 5, No. 7, July 2024 3401

Bibliography

Baihaqi, Dimas Imam, Handayani, Anik Nur, & Pujianto, Utomo. (2019). Perbandingan

metode Naive Bayes dan C4. 5 untuk memprediksi mortalitas pada peternakan ayam

broiler. Simetris: Jurnal Teknik Mesin, Elektro Dan Ilmu Komputer, 10(1), 383–390.

Damayanti, Desi. (2022). Implementasi Algoritma C4. 5 Prediksi Produksi Komoditas

Tanaman Perkebunan Berdasarkan Luas Lahan. Tin: Terapan Informatika

Nusantara, 2(10), 571–579.

Ghozali, Muhammad Rizal, & Wibowo, Rudi. (2019). Analisis Risiko Produksi

Usahatani Bawang Merah di Desa Petak Kecamatan Bagor Kabupaten Nganjuk.

Jurnal Ekonomi Pertanian Dan Agribisnis, 3(2), 294–310.

Hana, Fida Maisa. (2020). Klasifikasi Penderita Penyakit Diabetes Menggunakan

Algoritma Decision Tree C4. 5. Jurnal SISKOM-KB (Sistem Komputer Dan

Kecerdasan Buatan), 4(1), 32–39.

Kesuma, Chandra, & Kholifah, Desiana Nur. (2019). Sistem Informasi Akademik

Berbasis Web Pada Lkp Rejeki Cilacap. EVOLUSI: Jurnal Sains Dan Manajemen,

7(1).

Maulana, Alfin, Martanto, Martanto, & Ali, Irfan. (2023). Prediksi Hasil Produksi Panen

Bawang Merah Menggunakan Metode Regresi Linier Sederhana. JATI (Jurnal

Mahasiswa Teknik Informatika), 7(4), 2884–2888.

Nurhayati, Nurhayati, Sibuea, Mhd Buhari, Kusbiantoro, Dedi, Silaban, Martina, &

Wanto, Anjar. (2022). Implementasi Algoritma Resilient untuk Prediksi Potensi

Produksi Bawang Merah di Indonesia. Building of Informatics, Technology and

Science (BITS), 4(2), 1051–1060. https://doi.org/10.47065/bits.v4i2.2269

Priyaungga, Bayu Aji, Aji, Dwi Bayu, Syahroni, Mukron, Aji, Nurul Tri Sukma, &

Saifudin, Aries. (2020). Pengujian Black Boxpada Aplikasi Perpustakaan

Menggunakan Teknik Equivalence Partitions. Jurnal Teknologi Sistem Informasi

Dan Aplikasi ISSN, 2654, 3788.

Wajhillah, Rusda, & Yulianti, Ita. (2017). Penerapan algoritma c4. 5 untuk prediksi

penggunaan jenis kontrasepsi berbasis web. Klik-Kumpul. J. Ilmu Komput, 4(2), 160.

Zulkarnain, Muhammad, & Marsisno, Waris. (2022). Penerapan Pembelajaran Mesin

Untuk Estimasi Luas Lahan Bawang Merah Berdasarkan Data Citra Satelit Resolusi

Menengah. Seminar Nasional Official Statistics, 2022(1), 1005–1016.

https://doi.org/10.34123/semnasoffstat.v2022i1.1307