Classification Techniques

Out of the many available supervised machine learning and deep learning algorithms, one algorithm for each of the four most- used categories can be chosen. These categories are, the Generalized Linear Models (GLM), the Naïve Bayes (NB), the Support Vector Machines (SVM), and the Neural Networks (NN). From the GLM family we choose the Logistic Regression algorithm, from the NB we choose the Bernoulli Naïve Bayes, and from the SVMs we chose the Linear SVC algorithm.

Logistic Regression (LR)

It is a popular algorithm that belongs to the Generalized Linear Models methods—despite its name—and it is also known as Maximum Entropy. In this model, the probabilities describing the possible outcomes of a single trial are modeled using a logistic function. The previous studies of Lin, Mao, and Zeng (2017) and Wu, Huang, and Yuan (2017) used Logistic Regression for sentiment classification in microblogging.

Bernoulli Naïve Bayes (BNB)

Naïve Bayes algorithms are the simplest probabilistic classification algorithms that are widely used in Sentiment Analysis. They are based on the Bayes Theorem, which assumes that a complete independence of variables exists. The Bernoulli algorithm is an alternative of Naïve Bayes, where the weight of each term is equal to 1 if it exists in the sentence and 0 if not. Its difference from Boolean Naïve Bayes is that it takes into account terms that do not appear in the sentence. It is a fast algorithm that deals well with high dimensionality.

Linear SVC (LSVC)

One of the most popular machine learning methods for classification of linear problems are SVMs ( Cherkassky, 1997 ). They try to find a set of hyperplanes that separate the space into dimensions representing classes. These hyperplanes are chosen in a way which maximizes the distance from the nearest data point of each class. The Linear SVC is the simplest and fastest SVM algorithm assuming a linear separation between classes.

Leave a comment

Design a site like this with WordPress.com
Get started