Making noise: first attempt

I have another couple posts half-written that are more reflective about the structure of this data, and what it means to take differently sized batches and input lengths and sequences.

I was thinking a lot about this kind of data representation stuff, and what it would mean for the kinds of models I should use and how I should go about training… This is the kind of thing that is interesting to me, and I really have trouble seeing how the model is learning any structure from the kinds of input/target data we’re giving it … batch splits that generate just a couple hundred or thousand training examples, on sections of audio that don’t share much patterning as far as I can see…

But I tend to get lost in details easily, and it seems like great audio is being generated by a lot of people in the class, so I decided I should be doing less reading and thinking, and more random-decision-making and bad-audio-generation!

In fact my decisions have been heavily conditioned on some of the great work of other students, particularly Chris, Melvin, & Ryan. I’ve also found Andrej Karpathy’s posts and code very helpful. My LSTM and knowledge of Blocks/Fuel is based mostly on Mohammed’s very clear code.

In point form, this is what I’ve done so far:

  1. Data exploration: Used to read the wave file and matplotlib do some visualizations
  2. Data preprocessing: Used numpy and raw python/Theano to split the data into train, test, and validation sets at an 80:10:10 split (about 140mil:17mil:17mil frames), and subtracted the mean and normalized to [-1,1] *
  3. Example creation: Cut up the data into examples as follows (I was going to use to Bart’s transformer, but it doesn’t let you create overlapping examples.)**
    1. Examples with
      • window_shift of 1000 frames (i.e. about 140 000 training, 17 000 each test and validation examples)
      • x_length of 8000 frames (i.e. a half a second of data per timestep)
      • seq_length 25 (i.e. a sequence of 25 steps, 200 000 frames, 12 seconds)
    2. Mini-batches of 100 examples
    3. Truncated BPTT of 100 timesteps
  4. Set up an LSTM (using Tanh) using Blocks
  5. Attempted to train using squared-error loss

*I asked a question about this on the course website and made a blog post about it – how are people doing their mean subtraction and normalization? Before, or after the test/train/validation split? Is there a better way? Does it matter?

**I started making my own version of a Fuel transformer, but realized I don’t know anything about how it iterates and that it may not be trivial to start the next example at an index other than the end of your current example. I made my own data slicer instead, similar to Chris, and fed the examples to Fuel after.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s