First assignment: 1-hidden-layer network on MNIST

I was hoping there would be an easy way to embed Jupyter notebooks in wordpress, but I haven’t found it.

So instead, I’ll try to duplicate some visual parts of the notebooks here, but mostly talk kind of meta about the class and the tools I’m using and problems I’ve run in to, while the actual work (notebooks with diagrams and derivations, code, results etc.) will all be in the gihub repo.

This is the first section of my notebook for this assignment. I used for the diagram – it’s a great web-based tool for drawings and diagrams. I’ve found Michael Neilsen’s online deep learning textbook really helpful.


I find diagrams really helpful to keep the dimensions of everything straight. This is a typical one-hidden-layer network:


And this is kind of a functional diagram of the network described in the assignment. Chris Beckham started doing something like this and I found it really helpful for making the connection between the loss derivatives done “on paper”, and the actual matrices and functions we need to code.



Input layer:

x is the n by p input vector

Hidden layer:

W is the m by n weight matrix
b is the m by p bias vector
h’ = Wx+b is the m by p vector of preactivations
h = f(h’) is the m by p vector of activations f is the activation function (often σ, the sigmoid, or tanh)

Output layer:

V is the q by m weight matrix
c is the m by p bias vector
y’ = Vh+c is the q by p vector of preactivations y = s(h’) is the q by p vector of activations s is the activation function (often softmax)


n is the number of input units
p is usually 1, i.e. the input is a vector not a matrix
m is the number of hidden units q is the number of output units
Bold denotes vectors/matrices
h’ and y’, the preactivations, are often denoted z
h and y, the activations, are often denoted a or o

Getting set up for research

I’m creating this blog for Yoshua Bengio‘s deep learning class, IFT6266, offered at Université de Montréal in the winter semester of 2016.

We don’t have coding assignments yet, but I thought I’d share some of the tools and things that I use and/or have heard are useful, in preparation for actually using them later in the semester. Any comments, complementary software or workflows, etc. all welcome!


Python is the main language I’ve used for any project whose description includes the words “data”, “parse”, “scrape”, or “quick”.  If you’re just getting started, the Anaconda package has a bunch of useful libraries (including numpy, scipy, scikit-learn etc.). I’ve mostly used iPython to code in python – it gives you an enhanced shell environment that’s pretty useful in a lot of ways. Since the last big python project I’ve done, I guess iPython has been merging with Jupyter, which is kind of like an even GUI-er layer over python/iPython that lets you merge research notes and code bits (including equations and stuff). I’ll be trying to use that for this semester.

Another reason to use python is Theano, a library developed at UdeM for deep learning. It does automatic differentiation, which is really cool. I’ve used Theano pretty out-of-the-box, and the tutorials are great, but they’re like tiny boats in a very large ocean … I’m going to try to understand a lot more about how Theano works this semester and hopefully use it in a more sophisticated way.


My background is in biology/ecology, so R is one of the first coding environments I ever encountered. The documentation for different packages ranges from cryptically sparse to overwhelmingly comprehensive, but most of the popular ones are fortunately somewhere in between. RStudio is a good IDE package (I linked to the website, but I’ve only ever installed it through R’s package manager). R can be great for running stats and generating plots – it’s what I’m used to using, but I’ll be trying to figure out how to do some of the things I usually do in R in python this semester.


I was introduced to Weka in a data mining course, and because it’s java-based I think it’s used pretty extensively in industry. It’s really great for visualizing and exploring a dataset, and running preliminary analyses before coding something more custom. Again, I think I’ll try to replicate some of the things I’ve done in Weka before in python for this class, but for cross-validation, and for seeing 16 plots at once … we’ll see how it goes 🙂


Most of my thesis research was done in Matlab. I installed Octave at first (the open-source alternative) but had trouble getting a couple packages to work. It looks like the newest version has a GUI, so maybe I’ll try using that the next time I need to do something in Matlab. I’ve used the very well-documented matconvnet package for convolutional neural networks – it made it really easy to use CUDA libraries for GPU processing.