Introduction of Sentiment Analysis:
Opinion mining also termed as sentiment analysis is the mining of opinions of individuals, their appraisals, and feelings in the direction of particular objects, facts, and their attributes.
In sentiment analysis, the opinionated text is essential for decision making based on its analysis. So while collecting the opinionated text and treating it as an input for the opinion mining systems, one has to understand the basic terminologies associated with the opinion mining.
Following are some terminologies:
An object is mainly an entity which can be anything in the real world, i.e., person, organization, event, product, topic, etc.
The object itself can act as a feature and becomes a unique feature, which represents the root in the tree representation. So the opinion expressed on this object is then known as general opinion on the object.
- Explicit and implicit features:
When a user expresses the opinion on any object by considering the feature of an object, then one has to carefully categorize these opinions based on whether these are express explicitly or implicitly
Use Cases of Sentiment Analysis:
- Public reviews about the movie, product, etc
- Public sentiments for a political party.
- Public sentiments for a particular person or an organization.
- Predict election based on public sentiments.
The baseline algorithm of sentiment analysis is an algorithm in which polarity is detected whether a review is positive or negative. In this type of algorithm, the selection of word is a crucial factor in making it useful. Now the question arises which type of words to use for such algorithm, so the answer is only adjectives, or for better and effective output all words turns out to work better.
We have to handle negation by adding NOT_ to every word if it contains didn’t or something similar to that. For example “I didn’t like this movie” and “I like this move” are approximately the same sentences. So we change the first sentence as following to handle negation:
“I didn’t NOT_like NOT_this NOT_movie.”
In baseline algorithm polarity of each word is extracted and the percentage is calculated by the total polarity of the sentence.
A semisupervised learning of lexicon involves a small amount of information, a few labeled data and a few hand-built patterns. It bootstraps a lexicon to achieve the results.
Hatzivassiloglou and McKeown intuition for identifying word polarity:
In everyday English grammar adjectives conjoined by “and” have the same polarity as a phrase:
- Fair and legitimate
- Corrupt and brutal
Moreover, adjective conjoined by “but” do not and have the same polarity as:
- fair but brutal.
An alternative family of methods defines lexicons by propagating labels on the graph, an idea suggested in early work by Hatzivassiloglou and McKeown. Simple sentiment propagation has four steps:
- Label the seed set with positive or negative
- Expand seed set to conjoined adjectives
- Than the supervised classifier assigns polarity similarity to each word
- Than clustering is applied for partitioning the graph into two clusters
Instead of making lexicon of words and adjective Turney algorithm extract the phrasal lexicon from the sentence. It learns the polarity of each phrase and answers by the average polarity of the phrase.
This algorithm works by measuring the co-occurrence of two words like positive phrases co-occur more with excellent or excellent and negative phrases co-occur more with poor or bad.
To calculate the co-occurrence of two-word method of Pointwise mutual information is used. The formula of this method is as follows:
It is the probability of the extent of how much two words occur one after the other divided by the event x and y if they are independent.
Sentiment analysis is generally modeled as a classification or regression task. It predicts binary or ordinal labels. It is essential nowadays as many systems depend on the sentiment analysis government and organization use these to predict what is going to next and what will be profitable doing. But still it is not that much effective, it is unable to handle offensive statements, and for that, we have to hand code the result. For feature selection, negation is necessary using all words work well for some tasks but finding a subset of the word may help more. Furthermore, hand-built polarity lexicons and by using seeds and semi-supervised learning to induce lexicons to make it more efficient.
A generative model tries to learn the model that generates the data behind the scenes by estimating the assumptions and distribution of the model. It then uses this to predict unseen data, because it assumes the model that was learned captures the real model.
A discriminative classifier tries to model by just depending on the observed data. It makes a fewer assumption on the distributions but depends heavily on the quality of the data.
A generative model works on a principal of join probability that P( d | c ) and tries to maximize this joint likelihood whereas a conditional model gives probabilities P ( c | d ). It takes the data as given and models only the conditional probability of the class. We seek to maximize the conditional likelihood
- Bayes net diagrams draw a circle for random variables, and line for direct dependencies/
- Some variable is observed, and some are hidden
- Each node is a little classifier (conditional probability table) based on incoming arcs.