July 21st (Tuesday)
July 23rd (Thursday)
The goal of the course project is for you to gain in-depth experience with specific machine learning techniques and algorithms by applying them to some interesting problem domains. In addition you should also gain an appreciation for what it means to have a result in the field. The projects should have a research component and/or attempt to solve a large-scale problem.
Deliverables
May 22: | Team Formation and Project Abstract |
June 10: | Informal Project Proposal |
June 10 to July 8: | One on one discussions |
July 8: | Project Progress Report |
July 17 (extended to July 24): | Final Project Report |
July 21: | Project Presentations |
Projects must be done in teams of three or four students.
Team Formation and Project Abstract. Submit a one page document listing your team members and an abstract that describes your project.
Informal Project Proposal. Submit a two to three page project proposal. Things to include,
Project Progress Report. Submit a four to five page report about the current status of your project. Discuss the problem description, research goals, project plan in more detail. Elaborate on completed tasks and any changes that have been made since the initial proposal. In addition include,
Final Project Report. Submit a six to seven page final report of your project using a NIPS-style conference format. The focus will be on the machine learning algorithm design, results and associated analysis.
Project Presentations. Each group will give a 15 minute project presentation describing the problem, the approach, experiments, results obtained, analysis and conclusions.
Yelp Dataset Challenge. Predict ratings, user behavior from the Yelp dataset.
Airfare prediction. Look at the current air fare for various flights and build a system that predicts the best time to buy a ticket.
Handwriting Recognition. Use Hidden Markov Models for designing a handwriting recognition system or something as cool as Detexify.
Moneyball. Use player statistics to predict player performance. Then see how well your prediction works by testing it on a tournament dataset. Basically make an algorithm that can play fantasy baseball, football or soccer. See Soccermetrics.
Sound denoising or Shazzam. Projects for doing audio processing. You can use unsupervised learning techniques to remove noise from an audio signal. You can work on a semi-supervised learning algorithm for performing speech recognition or developing an application like Shazzam.
Quantified Self and Data Analysis. Use any of the many available quantified self devices, record dataset and try to make sense out of it. You will find many such devices to gather the data you are interested in. You can use your phone to track your runs, bike rides and devices like fitbit, ecg monitors and so on. I am especially interested in this project because of this competition.
Image Completion. Use Graphical Models to content awareness to complete or fill-in images.
Reinforcement Learning Challenges. Solve a problem from the ICML 2013 Reinforcement Learning competition.
Kaggle competitions. Participate in a Kaggle competition.
Given a web page that (probably) contains glossary entries and definitions, extract the fields.
Given multiple database with addresses, create a unified database of places.
Create a more accurate battery power indicator.
Extract titles, authors, references from pdf files.
Self organization of a peer-to-peer network.
Predict server response time for nodes in a wireless network.
In RL, there are several algorithms that trade off exploration and exploitation in a theoretically motivated way. Evaluate them empirically.
Compare existing RL techniques for "mountain car" or Tetris.
Figure out how to beat a fixed set of TAC agents.
Use ML to find errors and bugs in code.
Compare techniques for merging probability distributions theoretically.
Use ML to predict the language based on adjacent letters in strings of text.
To solve multiple choice synonym questions, we've shown that multiple experts is a smart way to do this. Training is done using supervised data. Can the multiple modules be used to train each other? ("Labeling via collectives of sufficiently accurate modules").
How about modules for RL? Is there an advantage for doing policy search, table, neural net all together?
Applications:
Natural language dialog
robotics
financial trading
network diagnosis
object recognition
combining speech and commands and images
problem solving (Sokoban)
video games
"Comparing Kernel-based Learning Method with Application to Face Recognition"
"Image-based Stress Recognition from a Model-based Face Tracking System"
"Chessman Position Recognition Using Artificial Neural Networks"
"Latent Learning in Agents"
"LLE and ISOMAP Analysis of Robot Images"
"Human Identification using Silhouette Gait Data"
"Reconstruction of Walking People Images by Principal Component Analysis"
"Empirical Analysis of Predictive Algorithms for Recommender Systems"
"Estimating Constraint Costs using Regression Trees"
"Extending Implicit Negotiation to Repeated Grid Games"
"A New Evolutionary Algorithm for Multi-objective Optimization Problems"
"Evolutionary Learning Networks"
"Learning Change Patterns in Software Engineering, Practicable or Not?"
"An Empirical Comparison between ANN and SVM as Classifiers"
"Dynamic Topic Analysis: Classification Without Established Classes using Distance Thresholds"
"Evaluation of Kernel function Modification in Text Classification Using SVM"
"Using TF-IDF to Determine Word Relevance in Document Queries"
"Stock Price Prediction from Natural Language Understanding of News Headlines"
"An Evaluation of Kea: An Automatic Keyphrase Extraction Algorithm"
"ID Identification in Online Communities"
"Document Quality Prediction with Statistical Textual Features"