I was hoping there would be an easy way to embed Jupyter notebooks in wordpress, but I haven’t found it.
So instead, I’ll try to duplicate some visual parts of the notebooks here, but mostly talk kind of meta about the class and the tools I’m using and problems I’ve run in to, while the actual work (notebooks with diagrams and derivations, code, results etc.) will all be in the gihub repo.
This is the first section of my notebook for this assignment. I used draw.io for the diagram – it’s a great web-based tool for drawings and diagrams. I’ve found Michael Neilsen’s online deep learning textbook really helpful.
I find diagrams really helpful to keep the dimensions of everything straight. This is a typical one-hidden-layer network:
And this is kind of a functional diagram of the network described in the assignment. Chris Beckham started doing something like this and I found it really helpful for making the connection between the loss derivatives done “on paper”, and the actual matrices and functions we need to code.
x is the n by p input vector
W is the m by n weight matrix
b is the m by p bias vector
h’ = Wx+b is the m by p vector of preactivations
h = f(h’) is the m by p vector of activations f is the activation function (often σ, the sigmoid, or tanh)
V is the q by m weight matrix
c is the m by p bias vector
y’ = Vh+c is the q by p vector of preactivations y = s(h’) is the q by p vector of activations s is the activation function (often softmax)
n is the number of input units
p is usually 1, i.e. the input is a vector not a matrix
m is the number of hidden units q is the number of output units
Bold denotes vectors/matrices
h’ and y’, the preactivations, are often denoted z
h and y, the activations, are often denoted a or o