CS 4641

Schedule of Presentations

July 21st (Tuesday)

12:03pm: Facebook Recruiting: Human or Robot
12:20pm: Learning from Chess Endgames
12:37pm: Handwriting Recognition
12:54pm: S&P Stock Data Clustering Analysis
1:11pm: Machine Learning for Text Classification

July 23rd (Thursday)

12:03pm: Category Inference on Yelp Data
12:20pm: Decoding APRS Modem Signals
12:37pm: Automatic Traffic Magic

Group Project Guidelines

The goal of the course project is for you to gain in-depth experience with specific machine learning techniques and algorithms by applying them to some interesting problem domains. In addition you should also gain an appreciation for what it means to have a result in the field. The projects should have a research component and/or attempt to solve a large-scale problem.

Deliverables

May 22:	Team Formation and Project Abstract
June 10:	Informal Project Proposal
June 10 to July 8:	One on one discussions
July 8:	Project Progress Report
July 17 (extended to July 24):	Final Project Report
July 21:	Project Presentations

Details

Projects must be done in teams of three or four students.

Team Formation and Project Abstract. Submit a one page document listing your team members and an abstract that describes your project.

Informal Project Proposal. Submit a two to three page project proposal. Things to include,

Problem description. A concise problem description, including a description of the data sets you will use (if applicable). Make sure you communicate why this project is interesting.
Research goals. A discussion of what the different outcomes might be. In particular tell us how you will know when the project is successful.
Project Plan. A project plan, outlining in detail the method/s used, what type of experiments will be performed and when (i.e. a timeline).
Individual Tasks. A description of the tasks that will be performed by the individual team members. Each member should contribute to the machine learning.

Project Progress Report. Submit a four to five page report about the current status of your project. Discuss the problem description, research goals, project plan in more detail. Elaborate on completed tasks and any changes that have been made since the initial proposal. In addition include,

Related Work. Provide details about existing work relevant to your problem and highlight similarities and differences to your approach.
Approach. Provide details on the algorithms you used, modifications made, data preparation methods and most importantly the reasons behind the choices made.
Implementation Details. Provide software implementation details for your project, i.e. programming languages used, libraries and packages used, communication protocols, etc.
Preliminary Results. A series of graphs, tables, or whatever that show the results of your initial experiments and the lessons learned.

Final Project Report. Submit a six to seven page final report of your project using a NIPS-style conference format. The focus will be on the machine learning algorithm design, results and associated analysis.

Project Presentations. Each group will give a 15 minute project presentation describing the problem, the approach, experiments, results obtained, analysis and conclusions.

Project Ideas

Here are some project ideas (borrowed from Pushkar Kolhe, Ph.D. student at GT).

Yelp Dataset Challenge. Predict ratings, user behavior from the Yelp dataset.
Airfare prediction. Look at the current air fare for various flights and build a system that predicts the best time to buy a ticket.
Handwriting Recognition. Use Hidden Markov Models for designing a handwriting recognition system or something as cool as Detexify.
Moneyball. Use player statistics to predict player performance. Then see how well your prediction works by testing it on a tournament dataset. Basically make an algorithm that can play fantasy baseball, football or soccer. See Soccermetrics.
Sound denoising or Shazzam. Projects for doing audio processing. You can use unsupervised learning techniques to remove noise from an audio signal. You can work on a semi-supervised learning algorithm for performing speech recognition or developing an application like Shazzam.
Quantified Self and Data Analysis. Use any of the many available quantified self devices, record dataset and try to make sense out of it. You will find many such devices to gather the data you are interested in. You can use your phone to track your runs, bike rides and devices like fitbit, ecg monitors and so on. I am especially interested in this project because of this competition.
Image Completion. Use Graphical Models to content awareness to complete or fill-in images.
Reinforcement Learning Challenges. Solve a problem from the ICML 2013 Reinforcement Learning competition.
Kaggle competitions. Participate in a Kaggle competition.

Here are some project ideas (borrowed from Prof. Charles Isbell at GT).

Given a web page that (probably) contains glossary entries and definitions, extract the fields.
Given multiple database with addresses, create a unified database of places.
Create a more accurate battery power indicator.
Extract titles, authors, references from pdf files.
Self organization of a peer-to-peer network.
Predict server response time for nodes in a wireless network.
In RL, there are several algorithms that trade off exploration and exploitation in a theoretically motivated way. Evaluate them empirically.
Compare existing RL techniques for "mountain car" or Tetris.
Figure out how to beat a fixed set of TAC agents.
Use ML to find errors and bugs in code.
Compare techniques for merging probability distributions theoretically.
Use ML to predict the language based on adjacent letters in strings of text.
To solve multiple choice synonym questions, we've shown that multiple experts is a smart way to do this. Training is done using supervised data. Can the multiple modules be used to train each other? ("Labeling via collectives of sufficiently accurate modules").
How about modules for RL? Is there an advantage for doing policy search, table, neural net all together?
Applications:
- Natural language dialog
- robotics
- financial trading
- network diagnosis
- object recognition
- combining speech and commands and images
- problem solving (Sokoban)
- video games
"Comparing Kernel-based Learning Method with Application to Face Recognition"
"Image-based Stress Recognition from a Model-based Face Tracking System"
"Chessman Position Recognition Using Artificial Neural Networks"
"Latent Learning in Agents"
"LLE and ISOMAP Analysis of Robot Images"
"Human Identification using Silhouette Gait Data"
"Reconstruction of Walking People Images by Principal Component Analysis"
"Empirical Analysis of Predictive Algorithms for Recommender Systems"
"Estimating Constraint Costs using Regression Trees"
"Extending Implicit Negotiation to Repeated Grid Games"
"A New Evolutionary Algorithm for Multi-objective Optimization Problems"
"Evolutionary Learning Networks"
"Learning Change Patterns in Software Engineering, Practicable or Not?"
"An Empirical Comparison between ANN and SVM as Classifiers"
"Dynamic Topic Analysis: Classification Without Established Classes using Distance Thresholds"
"Evaluation of Kernel function Modification in Text Classification Using SVM"
"Using TF-IDF to Determine Word Relevance in Document Queries"
"Stock Price Prediction from Natural Language Understanding of News Headlines"
"An Evaluation of Kea: An Automatic Keyphrase Extraction Algorithm"
"ID Identification in Online Communities"
"Document Quality Prediction with Statistical Textual Features"