I have another couple posts half-written that are more reflective about the structure of this data, and what it means to take differently sized batches and input lengths and sequences.
I was thinking a lot about this kind of data representation stuff, and what it would mean for the kinds of models I should use and how I should go about training… This is the kind of thing that is interesting to me, and I really have trouble seeing how the model is learning any structure from the kinds of input/target data we’re giving it … batch splits that generate just a couple hundred or thousand training examples, on sections of audio that don’t share much patterning as far as I can see…
But I tend to get lost in details easily, and it seems like great audio is being generated by a lot of people in the class, so I decided I should be doing less reading and thinking, and more random-decision-making and bad-audio-generation!
In fact my decisions have been heavily conditioned on some of the great work of other students, particularly Chris, Melvin, & Ryan. I’ve also found Andrej Karpathy’s posts and code very helpful. My LSTM and knowledge of Blocks/Fuel is based mostly on Mohammed’s very clear code.
In point form, this is what I’ve done so far:
- Data exploration: Used scipy.io.wavfile to read the wave file and matplotlib do some visualizations
- Data preprocessing: Used numpy and raw python/Theano to split the data into train, test, and validation sets at an 80:10:10 split (about 140mil:17mil:17mil frames), and subtracted the mean and normalized to [-1,1] *
- Example creation: Cut up the data into examples as follows (I was going to use to Bart’s transformer, but it doesn’t let you create overlapping examples.)**
- Examples with
- window_shift of 1000 frames (i.e. about 140 000 training, 17 000 each test and validation examples)
- x_length of 8000 frames (i.e. a half a second of data per timestep)
- seq_length 25 (i.e. a sequence of 25 steps, 200 000 frames, 12 seconds)
- Mini-batches of 100 examples
- Truncated BPTT of 100 timesteps
- Examples with
- Set up an LSTM (using Tanh) using Blocks
- Attempted to train using squared-error loss
*I asked a question about this on the course website and made a blog post about it – how are people doing their mean subtraction and normalization? Before, or after the test/train/validation split? Is there a better way? Does it matter?
**I started making my own version of a Fuel transformer, but realized I don’t know anything about how it iterates and that it may not be trivial to start the next example at an index other than the end of your current example. I made my own data slicer instead, similar to Chris, and fed the examples to Fuel after.