EE32b: Projects
and Datasets
The students will complete an experimental project in the general area
spanned by the syllabus of this class. Ideas for pojects:
-
Analysis of time series: stockmarket, neurons, speech ...
-
Analysis/synthesis of 2D / nD signals: images, spectrograms...
-
Feedback stabilization of a dynamical system...
A typical project will take 20-40 hours and will either propose and test
a new idea, or implement and explore an idea/algorithm from the literature.
Teams of two students per project are encouraged.
The students will complete a report on the project. The report is due
on March 10. Each student will write her/his report individually. The report
will be delivered either on paper or as a web page/site. It will typically
consist of the following sections:
-
Cover page: Name of author, name of team-mate, project title, project abstract
(150 words or less summarizing the aim of the project, the main idea(s)
and the main findings). Possibly a pretty picture that exemplifies the
aims/results of the project.
-
Introduction: What is the aim/goal of the project, why is this aim important
(or why it is not important), what are the possible approaches to achieve
the goal, which approach was chosen and why.
-
Technical approach: Detailed description of the technical ideas.
-
Datasets used: What do the data look like, how many samples, who collected
them, ...
-
Experiments: For each experiment: what was the aim of the experiment, how
was it conducted, what are the main findings. Put plenty of pictures, plots...
-
Discussion: What does one learn from the experiments? Did the proposed
technique work as well as expected?
-
Conclusions: What are the main questions that are left open? Other experiments
you would do next?
Ideas for projects
-
Digital watermarking (either images or music)
Design and implement and algorithm that allows you to add approximately
100bytes of information (typically text) to either any image or any piece
of music (discrete data of course). You should also of course design and
implement an algorithm that, given either an image or a sound, is able
to recover the watermark. Of course the process should not make the
image/sound different for a human observer.
The ideal watermark is robust with respect to: cropping, resampling,
adding small amounts of noise, compression/decompression and all other
typical manipulations that do not alter the image/sound percieavably.
Variations on the theme: work on other types of data that are typically
exchanged over the internet: text files, streaming images/sound ...
-
Analysis and synthesis of birdsong
Analize zebra finch songs coming from the Konishi laboratory.
ideas for projects: Devise a method for describing each chirp (e.g.
as an oriented line in time-frequency). Design and implement a method (manual
or automatic, start from manual) for calculating the parameters of each
chirp. Perform unsupervised clustering (e.g. using the k-means algorithm
- see Matlab command kmeans) to discover how many types of chirps
there are. Collect statistics on the frequency and order of chirps and
study variability of songs of individual birds and different birds. Write
code that produces synthetic songs - see if birds are fooled.
-
Model of the cochlea
-
Nonlinear dimensionality reduction
-
Analysis and synthesis of handwritten signatures
-
Reverse-engineering of music. Write computer program that takes
a simple musical piece and extracts the score (notes, durations, tempo,
stresses).
Each project team should write
up a description of their project and post it to the class newsgroup by
Wed, Feb. 21 at noon.
Datasets
HANDWRITING
(SIGNATURES)
Collected by Mario E. Munich at Caltech.
Each file name is the concatenation of the string `s' and two 3-digit
numbers. The first number is the identity of the person who signed and
the second number identifies the signature. For example: s022001
is the 2nd signature given by subject 22 (the first signature is s022000).
The format of each file is simple: the first line indicates the number
of samples in the signature; the following lines contain (X,Y) pairs for
each of the samples in order of time acquired. I.e.:
N
X1 Y1
X2 Y2
...
XN YN
The `y' coordinate is inverted, i.e. it increases top-down rather than
bottom-up.
A simple Matlab function for reading and displaying the contents of
a signature file is provided in the same directory (display_signature.m).
These signatures were collected at 60 samples per second.
HUMAN
VOICE
Dataset collected by various people. Used for the Mus
Silicium competition (you will find a few samples in a local
directory).
The format is `.wav' which Matlab can read. (type `help
wavread' in Matlab).
MATLAB SOUNDS
Run the command `which sound' in Matlab. This will give you the path
of the file `sound.m'. In the same directory you will find six files that
end in `.mat' (chirp.mat, train.mat, ...). Those files contain sounds.
You may load them into Matlab; type `help sound' to find out more.
STOCKMARKET
Many sources. For example the NASDAQ
web page. Pick a stock, obtain its time-chart and click on it (example).
BIRDSONGS
Songs from
zebra finches collected by Anthony Leonardo in the Konishi Lab at Caltech.
Load the file `Songdata-20k.mat' into your Matlab. Type `who' to list six
vectors whose name starts with `i'. The vectors contain 3 songs each for
two birds. They are sampled at 20KHz.
Also check out the Cornell
Birdsong web page. It includes a tutorial on spectrum analysis, and
birdsong analysis software. You will find lots of interesting data including
songs from whales.
ROCK MUSIC
Search the web and download your favorite piece. Take only a 3-10s
segment.
If you hate rock then take classical. Any music will do.
You may want to compare the Fourier spectrum of different instruments.
E,g. compare a percussion with a wind.