Discrete

Bernoulli distribution

  • pmf

fX(x)=P(X=x)={(1p)1xpx for x=0 or 10 otherwise  f_{X}(x)=P(X=x)=\left\{\begin{array}{cl} (1-p)^{1-x} p^{x} & \text { for } \mathrm{x}=0 \text { or } 1 \\ 0 & \text { otherwise } \end{array}\right.

  • expectation
    • E(X)=pE(X) = p
  • variance
    • var(X)=(1p)pvar(X) = (1-p)p

Binomial distribution

  • pmf

fX(k)=P(X=k)={Cnkpk(1p)nk for k=0,1,,n0 otherwise  f_{X}(k)=P(X=k)=\left\{\begin{aligned} C_{n}^{k} p^{k}(1-p)^{n-k} & \text { for } \mathrm{k}=0,1, \ldots, \mathrm{n} \\ 0 & \text { otherwise } \end{aligned}\right.

  • expectation
    • E(X)=npE(X) = np
  • variance
    • var(X)=np(1p)var(X) = np(1-p)

Geometric distribution

  • pmf

fX(k)=P(X=k)={p(1p)k1 for k=1,2,30 otherwise  f_{X}(k)=P(X=k)=\left\{\begin{aligned} p(1-p)^{k-1} & \text { for } \mathrm{k}=1,2,3 \ldots \\ 0 & \text { otherwise } \end{aligned}\right.

  • expectation
    • E(X)=1PE(X) = \frac{1}{P}
  • variance
    • var(X)=1PP2var(X) = \frac{1-P}{P^2}
  • memoryless
    • P(X>m+nX>m)=P(X>n)P(X>m+n|X>m) = P(X>n)

Negative binomial distribution(Pascal)

  • The negative binomial distribution arises as a generalization of the geometric distribution.
  • Suppose that a sequence of independent trials each with probability of success pp is performed until there are rr successes in all.
    • so can be denote as pCk1r1pr1(1p)(k1)(r1)p \cdot C_{k-1}^{r-1} p^{r-1}(1-p)^{(k-1)-(r-1)}
      • XNB(r,p)X\sim NB(r,p)
  • pmf

fX(k)=P(X=k)={Ck1r1pr(1p)kr for k=r,r+1,r+20 otherwise  f_{X}(k)=P(X=k)=\left\{\begin{aligned} C_{k-1}^{r-1} p^{r}(1-p)^{k-r} & \text { for } \mathrm{k}=\mathrm{r}, \mathrm{r}+1, \mathrm{r}+2 \ldots \\ 0 & \text { otherwise } \end{aligned}\right.

  • expectation
    • E(X)=rpE(X) = \frac{r}{p}
  • variance
    • var(X)=r(1p)p2var(X) = \frac{r(1-p)}{p^2}
  • the conduct method can be seen there.

Hypergeometric distribution

  • Suppose that an urn contains nn balls, of which rr are black and nrn-r are white. Let XX denote the number of black balls drawn when taking mm balls without replacement.
  • denoted as Xh(m,n,r)X\sim h(m,n,r)
  • pmf

fX(k)=P(X=k)={Ck1r1pr(1p)kr for k=r,r+1,r+20 otherwise  f_{X}(k)=P(X=k)=\left\{\begin{array}{cl} C_{k-1}^{r-1} p^{r}(1-p)^{k-r} & \text { for } \mathrm{k}=\mathrm{r}, \mathrm{r}+1, \mathrm{r}+2 \ldots \\ 0 & \text { otherwise } \end{array}\right.

  • expectation
    • E(X)=mrnE(X) = m\frac{r}{n}
  • variance
    • var(X)=mr(nm)(nr)n2(n1)var(X) = \frac{mr(n-m)(n-r)}{n^2(n-1)}

Poisson distribution

  • can be derived as the limit of a binomial distribution as the number of trials approaches infinity and the probability of success on each trial approaches zero in such a way that np=λnp = \lambda,λ\lambda can be seen as the successful trials
  • pmf
    • P(X=k)=λkk!eλk=0,1,2...P(X = k) = \frac{\lambda^k }{k!} e^{-\lambda} \quad k = 0,1,2...
  • expectation
    • E(X)=λE(X) = \lambda
  • variance
    • var(X)=λvar(X) = \lambda
  • Property
    • Let XX and YY are independent Poisson r.v.s with parameters θ1\theta_1 and θ2\theta_2, and X+YPossion(θ1+θ2)X+Y \sim Possion(\theta_1 + \theta_2)

Continuous

Uniform distribution

  • A uniform r.v on the interval [a,b] is a model for what we mean when we say "choose a number at random between a and b"
  • pdf

fX(x)={1baaxb0 otherwise  f_{X}(x)=\left\{\begin{aligned} \frac{1}{b-a} & a \leq x \leq b \\ 0 & \text { otherwise } \end{aligned}\right.

  • cdf(easy to get)

FX(x)={0xaxabaaxb1bx F_{X}(x)=\left\{\begin{array}{rl} 0 & x \leq a \\ \frac{x-a}{b-a} & a \leq x \leq b \\ 1 & b \leq x \end{array}\right.

  • expectation
    • E(X)=a+b2E(X) = \frac{a+b}{2}
  • variance
    • var(X)=(ba)212var(X) = \frac{(b-a)^2}{12}

Exponential distribution

  • Exponential distribution is often used to model lifetimes or waiting times, in which context it is conventional to replace xx by tt.
  • pdf

fX(x)={λeλxx00 otherwise  f_{X}(x)=\left\{\begin{array}{rl} \lambda e^{-\lambda x} & x \geq 0 \\ 0 & \text { otherwise } \end{array}\right.

  • cdf(easy to get)

FX(x)={1eλxx00 otherwise  F_{X}(x)=\left\{\begin{array}{rl} 1-e^{-\lambda x} & x \geq 0 \\ 0 & \text { otherwise } \end{array}\right.

  • expectation
    • E(X)=1λE(X) = \frac{1}{\lambda}
  • variance
    • var(X)=1λ2var(X) = \frac{1}{\lambda^2}

property

  • let X,YX,Y are independent Poisson r.v.s with θ1,θ2\theta_1,\theta_2,then X+YPoisson(θ1+θ2)X+Y\sim Poisson (\theta_1+\theta_2)

  • Memoryless

    • P(X>s+tX>s)=P(X>t)P(X > s+t | X> s) = P(X>t)

Gamma distribution

  • pdf

g(t)={λατ(α)tα1eλtt00 otherwise  g(t)=\left\{\begin{array}{rl} \frac{\lambda^{\alpha}}{\tau(\alpha)} t^{\alpha-1} e^{-\lambda t} & t \geq 0 \\ 0 & \text { otherwise } \end{array}\right.

  • τ(x)=0ux1eudu,x>0\tau(x) = \int _0^\infty u^{x-1}e^{-u}du,x>0
  • expectation
    • E(X)=αλE(X) = \frac{\alpha}{\lambda}
  • variance
    • Var(X)=αλ2Var(X)= \frac{\alpha}{\lambda ^2}

Property

  • Ga(1,λ)=exp(λ)Ga(1,\lambda) = \exp (\lambda)
  • Ga(n2,12)=χ2(n)Ga(\frac{n}{2},\frac{1}{2}) = \chi ^2 (n)
    • E(X)=nE(X) = n
    • Var(X)=2nVar(X) = 2n
  • XGa(α,λ)kXGa(α,λk),k>0X\sim Ga(\alpha,\lambda) \to kX\sim Ga(\alpha,\frac{\lambda}{k}),k>0
  • if XGa(α,λ),YGa(β,λ),i.i.dX\sim Ga(\alpha,\lambda),Y\sim Ga(\beta,\lambda),i.i.d,then X+YGa(α+β,λ)X+Y \sim Ga(\alpha+\beta ,\lambda)
  • conduct
    • τ(α)=0xα1exdx\because \tau(\alpha ) =\int_{0}^{\infty} x^{\alpha-1}e^{-x}dx
    • x=λt,τ(α)=λα0tα1eλtdt\therefore x = \lambda t,\to \tau (\alpha) = \lambda^\alpha \int _{0}^{\infty} t^{\alpha-1}e^{-\lambda t}dt
    • 1τ(α)λα0tα1eλtdt=1\therefore \frac{1}{\tau (\alpha)}\lambda^\alpha \int _{0}^{\infty} t^{\alpha-1}e^{-\lambda t}dt = 1
    • g(t)=λατ(α)tα1eλt\therefore g(t) =\frac{\lambda^\alpha}{\tau(\alpha)}t^{\alpha-1}e^{-\lambda t}
  • α\alpha is called a shape parameter for the gamma density,
    • Varying α\alpha changes the shape of the density
  • λ\lambda is called a scale parameter
    • Varying λ\lambda corresponds to changing the units of measurement and does not affect the shape of the density
  • how to understand gamma?

Normal distribution

  • pdf

g(t)={1σ2πe(xμ)2/(2σ2)t00 otherwise  g(t)=\left\{\begin{aligned} \frac{1}{\sigma \sqrt{2 \pi}} e^{-(x-\mu)^{2} /\left(2 \sigma^{2}\right)} & t \geq 0 \\ 0 & \text { otherwise } \end{aligned}\right.

  • μ\mu is the mean
  • σ\sigma is the standard deviation
  • If XN(μ;σ2)X \sim N(\mu; \sigma^2) ,and Y=aX+bY = aX + b, then YN(aμ+b,a2σ2)Y \sim N(a\mu+b,a^2\sigma^2)
    • especially, if XN(μ,σ2)X \sim N(\mu,\sigma^2), then Z=xμσN(0,1)Z = \frac{x-\mu}{\sigma}\sim N(0,1)
  • aX+bYN(aμX+bμY,a2σX2+b2σY2+2abρσXσY)aX+bY \sim N(a\mu_X+b\mu_Y,a^2\sigma_X^2 + b^2\sigma_Y^2 + 2ab\rho \sigma_X\sigma_Y)

property

  • if X,YN(0,1)X,Y \sim N(0,1),then U=XYU = \frac{X}{Y} is Cauchy r.v (lec3)
    • fU(u)=1π(u2+1)f_U(u) = \frac{1}{\pi (u^2+1)}
  • if X1,..,XnN(0,1)X_1,..,X_n\sim N(0,1) ,i.i.d,, then
    • X12+...Xn2χ2(n)X_1^2 + ... X_n^2 \sim \chi^2(n)

Logistic distribution

  • consider the special logistic distribution(0,1):
    • FX(x)=11+exF_X(x) = \frac{1}{1+e^{-x}}

Exponential family

  • A family of pdfs or pmfs is called an exponential family if it can be expressed as:
    • p(x,θ)=H(x)exp(θTϕ(x)A(θ))p(x,\theta) = H(x)\exp(\theta^T \phi(x) - A(\theta))
    • H(x)H(x) is the normalization factor
  • It is very helpful to model heterogeneous data in the era of big data.
  • Bernoulli, Gaussian, Binomial, Poisson, Exponential, Weibull, Laplace, Gamma, Beta, Multinomial, Wishart distributions are all exponential families
  • for Bernoulli:
    • Xpx(1p)1x,forx{0,1}X\sim p^x(1-p)^{1-x}, for x\in \{0,1\}
    • Px(1P)1x=exp{xlnp+(1x)ln(1p)}=exp{lnp1px+ln(1p)}P^x(1-P)^{1-x} = \exp\{x\ln p + (1-x)\ln (1-p)\} = \exp\{\ln \frac{p}{1-p} x + \ln (1-p)\}
    • θ=lnp1p,ϕ(x)=x,A(θ)=ln11p,H(x)=1\theta =\ln \frac{p}{1-p}, \phi(x) = x,A(\theta ) = \ln\frac{1}{1-p},H(x) = 1
  • the explain can be seen here

Sample

  • Var(Xˉ)=σ2nVar(\bar{X} ) = \frac{\sigma^2}{n}

  • (n1)S2=X2nXˉ2(n-1)S^2 = \sum X^2 - n\bar{X}^2

  • Xˉ\bar{X}S2S^2相互独立

  • XˉN(μ,σ2n)\bar{X} \sim N(\mu,\frac{\sigma^2}{n})

  • (n1)S2σ2χ2(n1)\frac{(n-1)S^2}{\sigma^2}\sim \chi^2(n-1)

Property

  • E(X)=E(E(XY))E(X) = E(E(X|Y))
    • 可以理解为先分组求期望,与直接求期望一样
  • Var(X)=E(Var(XY))+Var(E(XY))Var(X) = E(Var(X|Y)) + Var(E(X|Y))
    • 可以理解为组内方差的期望 + 组间方差
  • if r.v.s X and Y are independent, E(XY)=E(X)E(X|Y) = E(X)

Inequality

Markov's inequality

P(Xa)E(X)aP(X\ge a) \le \frac{E(X)}{a}

Chebyshev's inequality

P(XE(X)a)Var(X)a2P(|X-E(X)| \ge a) \le \frac{Var(X)}{a^2}

Chernoff bounds

The generic Chernoff bound requires only the moment generating function of XX, defined as MX(t)=E(etX)M_X(t) = E(e^{tX}), provided it exists.

P(Xa)E(etx)etaP(X\ge a) \le \frac{E(e^{tx})}{e^{t\cdot a}}

other inequalities can be seen here.

results matching ""

    No results matching ""