Encoding classes for sequence labeling:
There are two types of sequence encoding:
- IO encoding
- IOB encoding
The IO encoding stands for inside out encoding. In this encoding, each word is encoded as a person, place or time, etc. It is a simple form of encoding in which encoding is done sequentially.
However, there is a problem in IO encoding, and that is as shown in a figure that the second name of the person is also considered a named entity. It does not give a reasonable boundary around an entity. To tackle this situation IOB encoding is used in which better boundary is provided around a given entity.
IO encoding is fast as compare to IOB encoding, but IOB encoding provides good accuracy as compare to IO encoding.
- Named entities can be numbered or indexed
- Sentiment can be related to some industry, industrial products, movies, etc.
- The information extraction technique is done using named entities along with them.
- For extracting question answers, answers are most probably the name entities.
Many web pages tag various entities, with link to bio or topic pages, etc.
ML sequence model approach to NER:
- Collect a set of representative training documents.
- Label tokens of the data to its class name.
- Apply count vectorizer and TFIDF to convert text to vector.
- Choose the classifier and train the data on it.
- Call the testing data.
- Change it into vectors.
- Predict the class by calling predict function.
Sequence Models for named entity recognition:
MEMM inference in systems:
For a conditional Markov model also known as maximum entropy Markov model the classifier makes a single decision at a time, conditioned on evidence from observations and previous decisions. For example in POS tagging in which we have some assumed labels to use for prior positions, and we use features of those and the observed data which can include current, previous and next words to predict the current label.
POS tagging features can include:
- current, previous, next words in isolation or together.
- Previous one, tow, three tags.
- Word-internal features word types, suffixes, dashes, etc.
- We start from the left and label the data one by one.
- The classifier depends on the data that is labeled previously.
- Fast, no need for extra or large memory space
- It is straightforward to implement the method
- With rich features, it performs extraordinarily.
- Do not provide an optimum result.
- Dynamic programming or memoization
- Require small window state influence
- Exact the global sequence is returned.
- Harder to implement long distance state to state interaction.
CRF stands for conditional random fields. It is another type of sequence model which used a whole sequence conditional model rather than chaining of local models. Mathematically it can be represented as:
Training a model on CRF is slower, but CRF avoids causal competition biases. CRF is a variant of a hidden Markov model using a max-margin criterion.
As described in above-given figure relation extraction is used to extract the relationship between two entities. For example, we have data, and in it, it is written that “Cyanide is a drug” so we can say that cyanide-related to a drug. We can find which thing is said to what and what is the context of the sentence.
In the above Figure a string is given in which it describes as one woman ran, 2 men ran, 3 people are walking, etc.
Similarly, we can extract relations from the data according to our need.
Why Relation Extraction:
It creates a new structured knowledge base, useful for any app. It augments current knowledge bases means adding words to Wordnet thesaurus, facts to freebase or DBpedia. It supports question answering too.
Automated Content Extraction:
- Physical-Located (PER-GPE)
He was in Tennessee
- Part-Whole-Subsidiary (ORG-ORG)
XYZ, the parent company of ABC
- Person-Social-Family (PER-PER)
John’s wife, Yoko.
- Org-AFF-Founder (PER-ORG)
Steve Jobs is a co-founder of Apple
How to build relation extractors:
- Hand-Written patterns
- Supervised machine learning
- Semi-supervised and unsupervised
Unsupervised learning from the web
Hearst's Patterns for extracting IS-A relations:
These are the keywords like such as, a, including, etc. which are commonly used to relate one thing to other. This type of patterns can be made by hand coding to extract the information relates.
Moreover, there are some other relations often hold between specific entities like located in, founded and cure.
Supervised Machine Learning for Relations:
In training of relation extractor by supervised machine learning we first choose a set of relevant named entities and label them then train a classifier. In supervised learning one more thing can be included that is known as a Gazetteer and trigger word feature for relation extraction. Trigger list for a family contains parent, wife, husband, grandparent, etc. It may also include the list of the country or other geopolitical words.
Semi-Supervised Machine Learning:
In the bootstrapping method no training set is required. A few seed tuples are required to bootstrap the data. It uses the seeds to learn to populate a relation directly. It gathers seed pairs that relation and iterates to find the sentences with these pairs. For example, seeds are (George Washington, Virginia) so it will generate the following results.
- George Washington is buried in Virginia.
- George Washington was born in Virginia.
These new patterns are then used as a new tuple.
Unsupervised Machine Learning:
- extract information from the web with no training data, no list of relationship.
- Used parsed data to train a trustworthy tuple classifier.
- Since it extracts new relations from the web, there is no gold set of correct instances of relation. We can compute precision. Instead, we can approximate precision only by drawing a random sample of relations from the output, check precision manually by the formula:
P = # of correctly extracted relations in the sample / total # of extracted relations in sample