Text Classification Services Using Naïve Bayes for Bahasa Indonesia

Information Systems Department, Bina Nusantara University, Jakarta, Indonesia
Information Systems Department, Bina Nusantara University, Jakarta, Indonesia

Abstract:

Today, it is estimated that over 80% of data is unstructured and mostly in text format. Contributing to it is the rise of Internet usage and content from individuals, media, as well as academic, commercial, industrial, and financial organizations and corporations [2], [3], [11]. Thus, an indispensable need for text classification or categorization in order to distill information from this overwhelming amount of data. Additionally, as more corporations outsource their services, web services have become preferable due to its versatility and ease of integration to existing hardware and software. This project aims to create a reusable text classification service for Bahasa Indonesia, a language spoken by over 20 million people, yet any text processing for it is still uncommon, inaccessible, or costly. This text classifier will use Naïve Bayes Classification, a simple yet preferable method due to its computational simplicity and effectiveness. To test the functionalities and efficacy of the machine, the study used data from web articles, reaching an accuracy of 83.75%.
Date of Conference: 3-5 Sept. 2018
Date Added to IEEE Xplore: 12 November 2018
ISBN Information:
INSPEC Accession Number: 18233615
Publisher: IEEE
Conference Location: Jakarta, Indonesia