Věci k zapamatování
Probability
Joint and conditional probability: ;
Bayes Rule:
Chain Rule:
The Golden Rule (of stat. NLP):
Information Theory
Entropy:
Perplexity:
Conditional entropy:
Chain Rule:
Kullback-Leibler distance:
Mutual Information:
Cross Entropy:
conditional:
conditional over data:
Language Modeling
The Golder Rule (again): , where
– application specific model
– the language model
Markov Chain (n-gram LM):
Maximum Likelihood Estimate (3-grams):
Smoothing
Adding 1:
Adding less than 1
Good-Turing:
normalize
Linear Interpolation using MLE:
minimize entropy:
compute expected counts for lambdas:
compute next lambdas:
Bucketed Smoothing – divide heldout data into buckets according to frequency and use LI+MLE
Mutual Information and Word Clasess
Word Classes
3-gram LM using classes:
Which classes (words) to merge - objective function: , where are LHS and RHS classes of the bigrams in
Greedy Algorithm
Start with each word in separate class
Merge classes , so that:
Repeat the previous step until is as small as desired