Guideline for CMU Deep Learning
This GitBook notes are maintained by zealscott.
Course matrial
Notes
- L02 What can a network represent
- As an universal Boolean function / classifiers / approximators
- Discuss the depth and width in network
- L03 Learning the network
- Empirical Risk
- Optimization problem statement
- L03.5 A brief note on derivatives
- Multiple variables
- Minimization
- L04 Backpropagation
- Chain rule / Subgradient
- Backpropagation / Vector formulation
- L05 Convergence
- Backpropagation prefers consistency over perfection(which is good)
- Second-order method problem / learning rate choose
- L06 Optimization
- Rprop / Quickprop
- Momentum / Nestorov’s Accelerated Gradient
- Batch / Stochastic / Mini-batch gradient descent
- L07 Optimizers and regularizers
- Second moments: RMS Prop / Adam
- Batch normalization
- Regularizer / dropout
- L08 Motivation of CNN
- The need for shift invariance
- Scan network / Why distributing scan / Receptive Field / Stride / Pooling
- L09 Cascade Correlation
- Why Is Backprop So Slow?
- The advantages of cascade correlation
L10 CNN architecture
- Architecture / size of parameters / convolution layer / maxpooling
- L11 Using CNNs to understand the neural basis of vision (guest lecture)
- L12 Backpropagation in CNNs
- Computing / /
- Regular convolution running on shifted derivative maps using flipped filter
- Derivative of Max pooling / Mean pooling
- Transposed Convolution / Depth-wise convolution
- Le-net 5 / AlexNet / VGGNet / Googlenet / Resnet / Densenet
- Computing / /
- L13 Recurrent Networks
- Model / Architecture
- Back Propagation Through Time
- Bidirectional RNN
- L14 Stability analysis and LSTMs
- Stability: memory ability / saturate / different activation
- Vanishing gradient
- LSTM: architecture / forward / backward
- Gated Recurrent Units (GRU)
-
- One to one / Many to many / Many to one / Seq2seq divergence
- Language modelling: Representing words
- L16 Connectionist Temporal Classification
- Sequence to sequence model / time synchronous / order synchronous
- Iterative estimate output table: viterbi algorithm / expected divergence
- Repetitive decoding problem / Beam search
- L17 Seq2seq & Attention
- Autoencoder / attention weight / beam search
L18 Representation
- Autoencoder / non-linear manifold
- L19 Hopfield network
- Loopy network / energy / content-addressable memory
- Store a specific pattern / orthogonal patterns
-
- Training hopfield nets: Geometric approach / Optimization
- Boltzmann Distribution
-
- Stochastic system: Boltzmann machines
- Training / Sampling of this model , as well as Restricted Boltzmann Machines
- L22 Variational Autoencoders 1
- Generative models: PCA, Mixture Gaussian, Factor analysis, Autoencoder
- EM algorithm for generative model
- L23 Variational Autoencoders 1
- Non-linear Gaussian Model
- VAEs
Ref
- 优化器对比
- BGD / SGD / MBGD / Momentum / NAG / Adagrad / Adadelta / RMSprop / Adam
- Activation Functions
- Sigmoid / Relu / Leaky ReLU
- Vanishing gradient problem
- A good illustration of NN
- KL散度与交叉熵区别与联系