Prior and Posterior

Prior probability * test evidence ---> posterior probability

cancer test:

prior:

$p(c) = 0.01$

$p(pos|c)=0.9$

$p(neg|\neg{c})=0.9$

joint:

$p(c,pos)=p(c)*p(pos|c)=0.009$

$p(\neg{c},pos)=p(\neg{c)}*p(pos|\neg{c})=0.099$

normalize

p(pos)=0.108

posterior:

$p(c|pos)=p(c)*p(pos|c)=0.0833$

$p(\neg{c}|pos)=p(\neg{c)}*p(pos|\neg{c})=0.9167$

Bayes Rule For Classification

not consider word length

Bayesian Learning

$argmax_{h\in{H}}Pr(h|D)$

for each $h\in{H}$ ,calculate

$Pr(h|D)=\frac{Pr(D|h)Pr(h)}{Pr(D)}$

Minimum Description Length

决策树高度：length(h)

误差：length(D|h)

Bayesian Classification

best hypothesis 不等于 best label

weighted vote $h\in{H}$ , $Pr(h|D)$

Bayesian Inference

Conditional Independence

Definition: X is conditionally independent of y given z if the probability distribution governing X is independent of the value of y given the value of z;that is, if

$\forall{(x,y,z)} ,p(X=x|Y=y|Z=z)=P(X=x|Z=z)$

more completely we write

$p(x|y,z)=p(x|z)$

Independent

$Pr(x,y)=Pr(x)\cdot{Pr(y)}$

Chain rule

$Pr(x,y)=Pr(x|y)\cdot{Pr(y)}$

$\therefore Pr(x|y)=Pr(x)$

Belief Network

信念网络

a,b,c 条件独立

知道 $p(a), p(b|a),p(b|\neg{a}), p(c|b),p(c|\neg{b})$ ，可以求其他所有

directed graph

$Pr(A,B,C,D,E)=Pr(A)\cdot{Pr(B)}\cdot{Pr(C|A,B)}\cdot{Pr(D|B,C)}\cdot{Pr(E|C,D)}$

boolean: 2^5 -1 =31 > 14= 1 + 1 + 4 + 4 + 4

$Pr(2=blue|1=green)=Pr(2=blue|1=green,Box=1)\cdot{Pr(Box=1|1=green)}+Pr(2=blue|1=green,Box=2)\cdot{Pr(Box=2|1=green)}$ $=0\cdot{Pr(Box=1|1=green)}+\frac{3}{4}\cdot{Pr(Box=2|1=green)}$

$Pr(Box=1|1=green)=\frac{Pr(1=green|Box=1)\cdot{Pr(Box=1)}}{Pr(1=green)}=\frac{\frac{3}{4}\cdot{\frac{1}{2}}}{Pr(1=green)}$

$Pr(Box=2|1=green)=\frac{Pr(1=green|Box=2)\cdot{Pr(Box=2)}}{Pr(1=green)}=\frac{\frac{2}{5}\cdot{\frac{1}{2}}}{Pr(1=green)}$

norm: $<3/8, 1/5>=<15/40,8/40>=<15/23,8/23>$

$\therefore Pr(Box=1|1=green)=\frac{15}{23}$

$\therefore Pr(Box=2|1=green)=\frac{8}{23}$

$\therefore Pr(2=blue|1=green)=0\cdot{\frac{15}{23}}+\frac{3}{4}\cdot{\frac{8}{23}}=\frac{6}{23}$

Naive Bayes

Why Naive Bayes is cool

Inference is cheap
few parameters
estimate parameters with labeled data
connects inference and classification
empirically successful

sample_memo = '''
Milt, we're gonna need to go ahead and move you downstairs into storage B. We have some new people coming in, and we need all the space we can get. So if you could just go ahead and pack up your stuff and move it down there, that would be terrific, OK?
Oh, and remember: next Friday... is Hawaiian shirt day. So, you know, if you want to, go ahead and wear a Hawaiian shirt and jeans.
Oh, oh, and I almost forgot. Ahh, I'm also gonna need you to go ahead and come in on Sunday, too...
Hello Peter, whats happening? Ummm, I'm gonna need you to go ahead and come in tomorrow. So if you could be here around 9 that would be great, mmmk... oh oh! and I almost forgot ahh, I'm also gonna need you to go ahead and come in on Sunday too, kay. We ahh lost some people this week and ah, we sorta need to play catch up.
'''

#
#   Maximum Likelihood Hypothesis
#
#
#   In this quiz we will find the maximum likelihood word based on the preceding word
#
#   Fill in the NextWordProbability procedure so that it takes in sample text and a word,
#   and returns a dictionary with keys the set of words that come after, whose values are
#   the number of times the key comes after that word.
#   
#   Just use .split() to split the sample_memo text into words separated by spaces.

def NextWordProbability(sampletext,word):
    samplewords = sampletext.split()
    next_words = {}
    for i in range(len(samplewords)-1):
        if samplewords[i] == word:
            next_words[samplewords[i+1]] = next_words.get(samplewords[i+1], 0) + 1

    return next_words

#------------------------------------------------------------------

#
#   Bayes Optimal Classifier
#
#   In this quiz we will compute the optimal label for a second missing word in a row
#   based on the possible words that could be in the first blank
#
#   Finish the procedurce, LaterWords(), below
#
#   You may want to import your code from the previous programming exercise!
#

sample_memo = '''
Milt, we're gonna need to go ahead and move you downstairs into storage B. We have some new people coming in, and we need all the space we can get. So if you could just go ahead and pack up your stuff and move it down there, that would be terrific, OK?
Oh, and remember: next Friday... is Hawaiian shirt day. So, you know, if you want to, go ahead and wear a Hawaiian shirt and jeans.
Oh, oh, and I almost forgot. Ahh, I'm also gonna need you to go ahead and come in on Sunday, too...
Hello Peter, whats happening? Ummm, I'm gonna need you to go ahead and come in tomorrow. So if you could be here around 9 that would be great, mmmk... oh oh! and I almost forgot ahh, I'm also gonna need you to go ahead and come in on Sunday too, kay. We ahh lost some people this week and ah, we sorta need to play catch up.
'''

corrupted_memo = '''
Yeah, I'm gonna --- you to go ahead --- --- complain about this. Oh, and if you could --- --- and sit at the kids' table, that'd be --- 
'''

data_list = sample_memo.strip().split()

words_to_guess = ["ahead","could"]

def NextWordProbability(sampletext,word):
    samplewords = sampletext.split()
    next_words = {}
    for i in range(len(samplewords)-1):
        if samplewords[i] == word:
            next_words[samplewords[i+1]] = next_words.get(samplewords[i+1], 0) + 1
    total = sum(next_words.values())
    for word in next_words:
        next_words[word] = next_words.get(word) *1. / total
    return next_words

def get_probability(first_word_dict, second_word_dict):
    second_word_probs = {}
    for first_word in second_word_dict:
        for second_word in second_word_dict[first_word]:
            first_word_porb = first_word_dict.get(first_word)
            second_word_prob = second_word_dict.get(first_word).get(second_word)
            second_word_probs[second_word] = second_word_probs.get(second_word, 0) + first_word_porb * second_word_prob           
    return second_word_probs

def LaterWords(sample,word,distance):
    '''@param sample: a sample of text to draw from
    @param word: a word occuring before a corrupted sequence
    @param distance: how many words later to estimate (i.e. 1 for the next word, 2 for the word after that)
    @returns: a single word which is the most likely possibility
    '''

    # TODO: Given a word, collect the relative probabilities of possible following words
    # from @sample. You may want to import your code from the maximum likelihood exercise.
    first_word_dict = NextWordProbability(sample_memo, word)
    # TODO: Repeat the above process--for each distance beyond 1, evaluate the words that
    # might come after each word, and combine them weighting by relative probability
    # into an estimate of what might appear next.
    second_word_dict = {}
    for first_word in first_word_dict:
        second_word = NextWordProbability(sample_memo, first_word)
        second_word_dict[first_word] = second_word
    if distance == 1:
        return sorted(first_word_dict,key=first_word_dict.get,reverse=True)
    elif distance == 2:
        second_word_probs = get_probability(first_word_dict, second_word_dict)

    return sorted(second_word_probs,key=second_word_probs.get,reverse=True)[0]

print LaterWords(sample_memo,"ahead",2)

Beyesian Methods