For this homework, do the following things:
Compute the maximum likelihood estimates (by hand) for the parameters of a Gaussian distribution modeling the following continuous datasets:
Thrun explains how to compute these estimates in his Gaussian learning videos. The one-dimensional estimators are given in the text equation 20.4.
Load the data points pickled in data.txt and use EM to find the structure of the Gaussian mixture model we used to create the data. You need to figure out the number of clusters, the weights, means and covariances.
You can load the data using this code:
import pickle f = open('data.txt', 'r') data = pickle.load(f) print data
Determine the appropriate k for this data using SciKit Learn’s
BIC implementation as described here: Gaussian Mixture Model Selection
Final project suggestion: Figure out how to learn Bayesian network parameters and/or structures and apply that to an interesting problem. The text goes into this in some detail on this subject, see Sections 20.2 and 20.3. Thrun also mentions a number of practical applications of unsupervised learning.
Submit the files specified above in Moodle under homework 9. We will grade your work according to the following criteria: