Guideline for CMU Deep Learning

This GitBook notes are maintained by zealscott.

Course matrial

Notes

L02 What can a network represent
- As an universal Boolean function / classifiers / approximators
- Discuss the depth and width in network
L03 Learning the network
- Empirical Risk
- Optimization problem statement
L03.5 A brief note on derivatives
- Multiple variables
- Minimization
L04 Backpropagation
- Chain rule / Subgradient
- Backpropagation / Vector formulation
L05 Convergence
- Backpropagation prefers consistency over perfection(which is good)
- Second-order method problem / learning rate choose
L06 Optimization
- Rprop / Quickprop
- Momentum / Nestorov’s Accelerated Gradient
- Batch / Stochastic / Mini-batch gradient descent
L07 Optimizers and regularizers
- Second moments: RMS Prop / Adam
- Batch normalization
- Regularizer / dropout
L08 Motivation of CNN
- The need for shift invariance
- Scan network / Why distributing scan / Receptive Field / Stride / Pooling
L09 Cascade Correlation
- Why Is Backprop So Slow?
- The advantages of cascade correlation
L10 CNN architecture
- Architecture / size of parameters / convolution layer / maxpooling
L11 Using CNNs to understand the neural basis of vision (guest lecture)
L12 Backpropagation in CNNs
- Computing $\nabla_{Z(l)} D i v$ $\nabla_{Z (l)} D i v$ / $\nabla_{Y(l-1)} D i v$ $\nabla_{Y (l - 1)} D i v$ / $\nabla_{w(l)} D i v$ $\nabla_{w (l)} D i v$
  - Regular convolution running on shifted derivative maps using flipped filter
- Derivative of Max pooling / Mean pooling
- Transposed Convolution / Depth-wise convolution
- Le-net 5 / AlexNet / VGGNet / Googlenet / Resnet / Densenet
L13 Recurrent Networks
- Model / Architecture
- Back Propagation Through Time
- Bidirectional RNN
L14 Stability analysis and LSTMs
- Stability: memory ability / saturate / different activation
- Vanishing gradient
- LSTM: architecture / forward / backward
- Gated Recurrent Units (GRU)
L15 Divergence of RNN
- One to one / Many to many / Many to one / Seq2seq divergence
Language modelling: Representing words
L16 Connectionist Temporal Classification
- Sequence to sequence model / time synchronous / order synchronous
- Iterative estimate output table: viterbi algorithm / expected divergence
- Repetitive decoding problem / Beam search
L17 Seq2seq & Attention
- Autoencoder / attention weight / beam search
L18 Representation
- Autoencoder / non-linear manifold
L19 Hopfield network
- Loopy network / energy / content-addressable memory
- Store a specific pattern / orthogonal patterns
L20 Boltzmann machines 1
- Training hopfield nets: Geometric approach / Optimization
- Boltzmann Distribution
L21 Boltzmann machines 2
- Stochastic system: Boltzmann machines
- Training / Sampling of this model , as well as Restricted Boltzmann Machines
L22 Variational Autoencoders 1
- Generative models: PCA, Mixture Gaussian, Factor analysis, Autoencoder
- EM algorithm for generative model
L23 Variational Autoencoders 1
- Non-linear Gaussian Model
- VAEs

Ref

优化器对比
- BGD / SGD / MBGD / Momentum / NAG / Adagrad / Adadelta / RMSprop / Adam
Activation Functions
- Sigmoid / Relu / Leaky ReLU
Vanishing gradient problem
- CS224d
- The Vanishing Gradient Problem
A good illustration of NN
KL散度与交叉熵区别与联系

Introduction

Guideline for CMU Deep Learning

Course matrial

Notes

Ref

results matching ""

No results matching ""