Guideline for CMU Deep Learning

This GitBook notes are maintained by zealscott.

Course matrial

Notes

  • L02 What can a network represent
    • As an universal Boolean function / classifiers / approximators
    • Discuss the depth and width in network
  • L03 Learning the network
    • Empirical Risk
    • Optimization problem statement
  • L03.5 A brief note on derivatives
    • Multiple variables
    • Minimization
  • L04 Backpropagation
    • Chain rule / Subgradient
    • Backpropagation / Vector formulation
  • L05 Convergence
    • Backpropagation prefers consistency over perfection(which is good)
    • Second-order method problem / learning rate choose
  • L06 Optimization
    • Rprop / Quickprop
    • Momentum / Nestorov’s Accelerated Gradient
    • Batch / Stochastic / Mini-batch gradient descent
  • L07 Optimizers and regularizers
    • Second moments: RMS Prop / Adam
    • Batch normalization
    • Regularizer / dropout
  • L08 Motivation of CNN
    • The need for shift invariance
    • Scan network / Why distributing scan / Receptive Field / Stride / Pooling
  • L09 Cascade Correlation
    • Why Is Backprop So Slow?
    • The advantages of cascade correlation
  • L10 CNN architecture

    • Architecture / size of parameters / convolution layer / maxpooling
  • L11 Using CNNs to understand the neural basis of vision (guest lecture)
  • L12 Backpropagation in CNNs
    • Computing Z(l)Div\nabla_{Z(l)} D i v / Y(l1)Div\nabla_{Y(l-1)} D i v / w(l)Div\nabla_{w(l)} D i v
      • Regular convolution running on shifted derivative maps using flipped filter
    • Derivative of Max pooling / Mean pooling
    • Transposed Convolution / Depth-wise convolution
    • Le-net 5 / AlexNet / VGGNet / Googlenet / Resnet / Densenet
  • L13 Recurrent Networks
    • Model / Architecture
    • Back Propagation Through Time
    • Bidirectional RNN
  • L14 Stability analysis and LSTMs
    • Stability: memory ability / saturate / different activation
    • Vanishing gradient
    • LSTM: architecture / forward / backward
    • Gated Recurrent Units (GRU)
  • L15 Divergence of RNN

    • One to one / Many to many / Many to one / Seq2seq divergence
  • Language modelling: Representing words
  • L16 Connectionist Temporal Classification
    • Sequence to sequence model / time synchronous / order synchronous
    • Iterative estimate output table: viterbi algorithm / expected divergence
    • Repetitive decoding problem / Beam search
  • L17 Seq2seq & Attention
    • Autoencoder / attention weight / beam search
  • L18 Representation

    • Autoencoder / non-linear manifold
  • L19 Hopfield network
    • Loopy network / energy / content-addressable memory
    • Store a specific pattern / orthogonal patterns
  • L20 Boltzmann machines 1

    • Training hopfield nets: Geometric approach / Optimization
    • Boltzmann Distribution
  • L21 Boltzmann machines 2

    • Stochastic system: Boltzmann machines
    • Training / Sampling of this model , as well as Restricted Boltzmann Machines
  • L22 Variational Autoencoders 1
    • Generative models: PCA, Mixture Gaussian, Factor analysis, Autoencoder
    • EM algorithm for generative model
  • L23 Variational Autoencoders 1
    • Non-linear Gaussian Model
    • VAEs

Ref

@Last updated at 5/11/2021

results matching ""

    No results matching ""