That title is very accurate – I’ve now succeeding in making pretty pure noise, although it does seem to have a liiitle bit of pulsation in it.
[UPDATE: Alex pointed out that doing mysequence.astype(‘int16’) did not actually modify the array in place, so my audio wasn’t really encoded properly. Proper version below]
I started using bokeh to do live-plotting of learning curves and thought I was saving them as pngs, but it turns out I was not, so I only have the first couple batches of the latest experiment I ran to put here for now:
[UPDATE: Here’s the training loss for the model described below]
But the good news is I’m now truly set up with Blocks, Fuel, and bokeh to run and log multiple experiments.
I’m currently using a 2-layer LSTM, with the following hyperparameters:
- frame length: 4000 samples
- sequence length: 50 frames
- batch size: 128
- hidden layer size: 200
- learning rate: 0.002
- epochs: 30
I’m also using gradient clipping. Most of these decisions are fairly arbitrary at the moment – I’ve just been working on getting plots and audio made so that now I can look at the details of my experiments.
The model trains and everything, but the audio generated has a lot of noise in it and does not sound very much like the vocal data it was trained on – I’m going to try a better overlapping to smooth out the transitions.