Princeton University

Computer Science 302
Introduction to Artificial Intelligence
Problem Set 8 Answer Key

Fall 2001


(a) What are the parameters of the model?

We have 3 independent features, so we would have one number for each bit/feature for each class, totally 6 numbers. Let's write p(i,j) to be the probability of bit i under class j.

This assumes that the two classes are equally likely. It is also possible to have a separate parameter which is the probability of selecting class I.

(b) Imagine we are given data consisting of the three(a typo made this "two" in the original statement) feature values for each sample from the model. We are not given the class label. Describe an "expectation" procedure to compute class labels for the data given a model.

Ok, we have probabilities p(i,j) and a triple of bits. We want to know the probability that this triple came from class I vs. class II.

Pr(triple t | class j) = product_{i} p(i,j) if bit i is on in the triple t, 1-p(i,j) otherwise

Pr(class j) = 1/2 (or the parameter)

Pr(triple t) = sum_{j} Pr(triple t | class j) Pr(class j)

Pr(class j | triple t) = Pr(triple t | class j) Pr(class j) / Pr(triple t)

Pr(class j | triple t) is the (fractional) class label j for the triple t.

(c) How do you use this procedure to learn a maximum likelihood model for the data?

Use EM: Start by setting up the parameters, doing expectation to get the labels. Use fractional counting (maximization) to get new parameter values. Repeat until things settle down.

The new parameters are computed by the following formula:

p(i,j) = (sum_{triple t in which feature i is on} Pr (class j | triple t)) / ( sum_{triple t} Pr(class j | triple t})