Project #1: Scale Invariant Feature Transform
Objective
The objective of this project is to get familiar with the scale invariant feature transform (SIFT). You will download and test David Lowe's code, and test out its capabilities by designing and running a simple object recognition algorithm. You will compare the performance of the algorithm on different types of images, and discuss your results.
Procedure
- Download the code for SIFT from the SIFT home page. Read the README, look through the matlab files, and try out the code on the sample images. You should try out both programs (sift.m and match.m), and understand their inputs and outputs. If you are using Windows, you will find it easiest to run the SIFT code using Matlab. If you are using Linux, you can also run the code from the command line.
- Download and unpack the image dataset. You will find a set of 5 object images (named [name].pgm, where [name] = {book1,book2,kit,ball,juice} is the object shown), and two sets of 10 cluttered scene images. One set is the training set and the images are named Img0[i].pgm, where [i]=1...10. The other set is the test set, and the images are named TestImg0[i].pgm, where [i]=1...10. Every image (in training and test sets) contains 0-5 of the objects represented in the object images. Each object is contained in exactly five images in each set (training and test), and is not present in the other five. You will also find a file gt.txt, which contains the ground truth for the cluttered images - it shows which of the five objects are present in each images. Look through the images and compare to the ground truth, and make sure you understand how the two are related.
- Using the SIFT code, compute the number of matches between each object image and each training image. You should compute a 5x10 matrix of integers. Although you can do this by hand (calling match.m 50 times),
you may find it easier to write a short matlab script to automate the process.
- Design a simple classifier for each object separately (based only on the training data) that tells whether the object is present in an image by thresholding the number of SIFT matches. Evaluate your classifier on each image in the training set.
Note: Designing a classifier means coming up with a method for computing a threshold based only on the training data, that will eventually work well on test data. An example of such a method is to set the threshold to the largest number of matches for an image that did not contain the object. Another is to set the threshold to the smallest number of matches for an image that did contain the object.
- Now compute the number of SIFT key matches between the each object image and each test image. Again, you should compute a 5x10 matrix of integers.
- Using your classifiers, classify each test image now as either containing each object or not.
- Compare your classifications to the ground truth. You should compute the number of misses (number of images that contained the object that were classified as not containing the object) and the number of false positives (number of images that do not contain the object that were classified as containing the object). Ideally, you want zero in both.
Write up
Write a paper describing your experiment. Your paper should give all the information necessary to someone who would like to try to repeat your experiment. You should include the following elements:
- Introduction describe the problem you are solving, and the method you are using to solve it (SIFT). Give appropriate references.
- Data describe the data set you are using. Show example images.
- Experiments describe the experiments you peformed, and give the results in tables. You should show the two 5x10 matrices of matches, your five thresholds, and also the miss and false positive rates for each of the 5 objects (so another 10 numbers for a total of 115 numbers).
- Discussion discuss your results. You should consider the following questions in your discussion
- Are SIFT features good for object recognition. In what cases do they fail (show example images)? In what cases do they succeed (again show examples)?
- Is it possible to set a single threshold to get all training data right for a particular object? If so, does the threshold then classify all test data correctly?
- Given an object with many SIFT keys, is this simple type of thresholding classifier sufficient to do object recognition. Under what conditions would (or does) your classifier fail? Under what conditions does it do well?
- How could the shortcomings of this classification method be overcome?
Your paper should be roughly 5 double-spaced pages in length including all figures, or about 1000-2000 words, 2 or 3 tables, and 2 or 3 images or
other figures. Please submit a printed version of the paper in class. Remember to indicate your name clearly on the paper.
Each student is expected to write their own paper. Duplicate papers will receive
no marks and will subject to disciplinary action. Please see the University's plagiarism code.