Text Classification Services Using Naïve Bayes for Bahasa Indonesia

05 Dec 2020

Achievements

Information Systems Department, Bina Nusantara University, Jakarta, Indonesia

Abstract:

Today, it is estimated that over 80% of data is unstructured and mostly in text format. Contributing to it is the rise of Internet usage and content from individuals, media, as well as academic, commercial, industrial, and financial organizations and corporations [2], [3], [11]. Thus, an indispensable need for text classification or categorization in order to distill information from this overwhelming amount of data. Additionally, as more corporations outsource their services, web services have become preferable due to its versatility and ease of integration to existing hardware and software. This project aims to create a reusable text classification service for Bahasa Indonesia, a language spoken by over 20 million people, yet any text processing for it is still uncommon, inaccessible, or costly. This text classifier will use Naïve Bayes Classification, a simple yet preferable method due to its computational simplicity and effectiveness. To test the functionalities and efficacy of the machine, the study used data from web articles, reaching an accuracy of 83.75%.

Published in: 2018 International Conference on Information Management and Technology (ICIMTech)

Date of Conference: 3-5 Sept. 2018

Date Added to IEEE Xplore: 12 November 2018

ISBN Information:

INSPEC Accession Number: 18233615

DOI: 10.1109/ICIMTech.2018.8528258

Conference Location: Jakarta, Indonesia