BAYESIAN OPTIMIZED RANDOM FOREST FOR SENTIMENT CLASSIFICATION: A PROBABILISTIC HYPERPARAMETER TUNING FRAMEWORK
Keywords:
Bayesian Optimization; Random Forest; Sentiment Classification; Hyperparameter Tuning; Gaussian Processes; IMDB DatasetAbstract
Sentiment classification is a fundamental task in NLP and it has a variety of applications in the field of natural language processing, including social media monitoring, customer feedback analysis, and market research. While the Random Forest classifiers provide a strong performance for text classification, it is highly sensitive to hyperparameters. This paper presents a new Bayesian optimization method of random forest hyperparameter tuning in sentiment classification problems. We approach the hyperparameter search problem using a GP regression by using an acquisition function to balance between exploration and exploitation in order to efficiently find the optimal parameter configurations. The proposed method is tested on three benchmark datasets of sentiment analysis - Large Movie Review Dataset (IMDB), Yelp reviews and Amazon product reviews. Experimental results show that Bayesian optimized Random Forest can achieve an accuracy of 84.3% on IMDB dataset which will be able to beat default Random Forest (77.7%) and to compete with the state-of-the-art deep learning methods while maintaining interpretability and computational efficiency. The time taken by the optimization framework for convergence is less than 50 iterations, which is a 10x improvement compared to grid search.