Basics of Probability Theorylec1_{lec1}

The Calculus of Probabilities

Probability operators

stat1

Probability properties

  • If PP is a probability function, AA and BB are any two sets in BB, then
    • P()=0P(\emptyset ) = 0, where \emptyset is the empty set
    • P(A)1P(A) \le 1
    • P(BAc)=P(B)P(AB)P(B \cap A^c ) = P(B) - P(A \cap B)
    • P(AB)=P(A)+P(B)P(AB)P(A \cup B) = P(A) + P(B) - P(A \cap B)
    • If ABA \subset B, then P(A)P(B)P(A) \le P(B)
  • Events A1A_1 and A2A_2 are pair-wise independent (statistically independent) if and only if
    • P(A1A2)=P(A1)P(A2)P(A_1 \cap A_2) = P(A_1)P(A_2)
  • mutually independent:
    • P(A1A2...An)=P(A1)P(A2)...P(An)P(A_1 \cap A_2 \cap ... \cap A_n) = P(A_1)P(A_2)...P(A_n)
  • Note the difference between independent and mutually exclusive
    • mutually exclusive: cov(X,Y)=0cov(X,Y) = 0
    • independent: P(X,Y)=P(X)P(Y)P(X,Y) = P(X)P(Y)
  • Let AA andB B be events with P(B)>0P(B) > 0. The conditional probability of A givenB B, denoted by P(AB)P(A|B), is defined as
    • P(AB)=P(AB)P(B)P(A|B) = \frac{P(A\cap B)}{P(B)}
  • Total probability theorem:
    • P(A)=P(B1)P(AB1)+P(B2)P(AB2)+P(B3)P(AB3)P(A) = P(B_1) P(A|B_1) + P(B_2) P(A|B_2) + P(B_3) P(A|B_3)
    • P(A)=i=1nP(Bi)(ABi)P(A) = \sum\limits_{i=1}^{n}P(B_i) {(A|B_i)}
  • Bayes' Theorem
    • P(BiA)=P(ABi)P(Bi)k=1nP(ABk)P(Bk)P(B_i|A) = \frac{P(A|B_i)P(B_i)}{\sum\limits_{k=1}^{n}P(A|B_k)P(B_k)}

Counting

  • inclusion-exclusion
    • AB=A+BAB|A\cup B| = |A| + |B| - |A\cap B|
  • Permutations and combinations
    • P(n,m)=n!(nm)!P(n,m) = \frac{n!}{(n-m)!}
    • C(n,m)=n!m!(nm)!C(n,m) = \frac{n!}{m!(n-m)!}

Random Variablelec2_{lec2}

  • A random variable (r.v.) X is a function from sample space of an experiment to the set of real numbers in R:
    • wΩ,X(w)=xR\forall w\in \Omega, X(w) = x \in R
  • Note that a random variable is a function, and not a variable, and not random.

Cumulative distribution function

  • The cdf of a r.v denoted by Fx(X)F_x(X) is defined by :
    • FX(x)=PX(Xx)F_X(x) = P_X(X\le x)
  • limx=0\lim _{x \rightarrow -\infty} = 0
  • limx=1\lim _{x \rightarrow \infty} = 1
  • F(x)F(x) is nondecreasing function of xx
  • F(x)F(x) is right-continuous
  • two r.v.s that are identically distributed are not necessarily equal.

Probability mass function

  • The pmf of a discrete r.v. XX is given by fX(x)=P(X=x)f_X(x) = P(X = x)

Probability density function

  • The probability density function or pdf, fX(x)f_X(x), of a continuous r.v. XX is the function that satisfies:
    • FX(x)=xfX(t)dtF_X(x) = \int_{- \infty}^x f_X(t) dt
  • XX has a distribution given by FX(x)F_X (x) is abbreviated symbolically by XFX(x)X \sim F_X (x) or XfX(x)X \sim f_X (x).

Joint distributionlec3_{lec3}

  • P((X,Y)A)=(x,y)Af(x,y)P((X,Y)\in A) = \sum_{(x,y)\in A} f(x,y)
  • P((X,Y)A)=Af(x,y)dxdyP((X,Y)\in A) = \int \int_A f(x,y)dxdy
  • fX(x)=+fX,Y(x,y)dyf_X(x) = \int_{-\infty}^{+\infty}f_{X,Y}(x,y)dy
  • 2F(x,y)xy=f(x,y)\frac{\partial ^2F(x,y)}{\partial x\partial y} = f(x,y)
  • f(xy)=f(x,y)fY(y)f(x|y) = \frac{f(x,y)}{f_Y(y)}
  • if f(x,y)=fX(x)fY(y)f(x,y) = f_X(x)f_Y(y), then X,YX,Y are independent.
    • 若变量可分离,则不需要计算边际分布,直接可判断相互独立

Bivariate function

  • (X,Y)(X,Y) be a bivariate r.v, consider a new bivariate r.v (U,V)(U,V), define by U=g1(X,Y)U = g_1(X,Y) and V=g2(X,Y)V = g_2(X,Y)

Transformation of discrete

  • B={(u,v)u=g1(x,y),v=g2(x,y),(x,y)A}B = \{(u,v) | u = g_1(x,y), v=g_2(x,y) ,(x,y) \in A \}
  • Auv={(x,y)Au=g1(x,y),v=g2(x,y)}A_{uv} = \{ (x,y)\in A | u = g_1(x,y), v=g_2(x,y) \}
  • fu,v=P(I=u,V=v)=P((X,Y)Auv)=(x,y)AuvfX,Y(x,y)f_{u,v} = P(I = u,V= v) = P((X,Y) \in A_{uv}) = \sum_{(x,y)\in A_{uv} } f_{X,Y}(x,y)

Transformation of continuous

J=xuxvyuyv J=\left|\begin{array}{ll} \frac{\partial x}{\partial u} & \frac{\partial x}{\partial v} \\ \frac{\partial y}{\partial u} & \frac{\partial y}{\partial v} \end{array}\right|

  • fu,v=fX,Y(h1(u,v),h2(u,v))Jf_{u,v} = f_{X,Y}(h_1(u,v),h_2(u,v))|J|
  • 这是用反函数求解的方法,若有些题无法用反函数求解,则使用累计密度函数带入计算

Expectation & covariancelec4_{lec4}

Expectation value

  • denoted as R(g(X))R(g(X)):

E(g(X))=+g(x)fX(x) if X is continuous =xXg(x)P(X=x) if X is discrete  \begin{aligned} &E(g(X))=\int_{-\infty}^{+\infty} g(x) f_{X}(x) \text { if } X \text { is continuous }\\ &=\sum_{x \in X} g(x) P(X=x) \text { if } \mathrm{X} \text { is discrete } \end{aligned}

  • note: expectation is not always exist
    • Cauchy r.v, the pdf:
      • fX(x)=1π(1+x2)f_X(x) = \frac{1}{\pi (1+x^2)}
    • E(X)=E(X) = \infty

Linearity of expectations

  • E(ag1(X)+bg2(X)+c)=aE(g1(X))+bE(g2(X))+cE(ag_1(X) + bg_2(X) + c ) = aE(g_1(X)) + bE(g_2(X)) + c
  • if ag1(x)ba \le g_1(x) \le b for all xx, then aE(g1(X))ba\le E(g_1(X)) \le b

Uniform exponential relationship

  • can use uniform distribution to form other distribution: exponential, normalization, which is actually do in computer
  • suppose XU(0,1)X \sim U(0,1),let Y=g(X)=logXY = g(X) = -\log X
    • FY(y)=P(Yy)=P(logXy)=PX(xey)=1eyF_Y(y) = P(Y\le y) = P(-\log X \le y) = P_X(x\ge e^{-y}) = 1- e^{-y}
    • fY(y)=eyf_Y(y) = e^{-y}
    • so Yexp(1)Y \sim \exp(1)

Moment

  • For each integer nn, the nthn-th moment of XX, is μn=E(Xn)\mu_n = E(X^n)
  • The nthn-th central moment of XX, μn=E(Xμ)n\mu_n = E(X - \mu)^n

Variance

  • The variance of a r.v. XX is its second central moment:

    • var(X)=E(Xμ)2var (X) = E(X-\mu)^2
  • var(X)=E(X2)(E(X))2var(X) = E(X^2) - (E(X))^2

Nonlinearity of variance

  • var(aX+b)=a2var(X)var(aX+b) = a^2var(X)
  • if XX and YY are tow independent r.v.s on a sample space Ω\Omega, then:
    • var(X+Y)=var(X)+var(Y)var(X+Y) = var(X) + var(Y)

Independence

  • if XX and YY are independent r.v.s on a sample space Ω\Omega, then:
    • E(XY)=E(X)E(Y)E(XY) = E(X)E(Y)
    • var(X+Y)=var(X)+var(Y)var(X+Y)= var(X)+var(Y)
    • var(XY)=var(X)+var(Y)var(X-Y) = var(X) +var(Y)

Moment Generating Function

  • can be used to calculate moment
  • the moment generating function of XX, denoted by MX(t)M_X(t), is:
    • MX(t)=E(etX)M_X(t) = E(e^{tX})
  • MaX+b(t)=ebtEX(at)M_{aX+b}(t) =e^{bt}E_X(at)
  • is applied to Chernoff bound
  • if the expectation dose not exist, the moment generating function dose not exist.
  • XX is continuous, MX(t)=+etxfX(x)dxM_X(t) = \int _{-\infty}^{+\infty} e^{tx}f_X(x) dx
  • XX is discrete, MX(t)=xetxP(X=x)M_X(t) = \sum_xe^{tx}P(X= x)

Theorem

  • if XX has moment generating function MX(t)M_X(t), then :
    • E(Xn)=Mn(n)(0)E(X^n) = M_n^{(n)}(0)
  • where we define:
    • MX(n)(0)=dndtnMX(t)t=0M_X^{(n)}(0) = \frac{d^n}{dt^n} M_X(t) | _{t=0}
  • can be used to calculate Gamma E(X)E(X)

Property

  • MaX+b(t)=ebtMX(at)M_{aX+b}(t) = e^{bt}M_X(at)

Covariance

  • The covariance and correlation of XX and YY are the numbers defined by:
    • Cov(X,Y)=E((XμX)(YμY))Cov(X,Y) = E((X-\mu _X)(Y-\mu_Y))
    • ρXY=Cov(X,Y)σXσY\rho_{XY} = \frac{Cov(X,Y)}{\sigma_X\sigma_Y}
  • Cov(X,Y)=E(XY)μXμYCov(X,Y) = E(XY) - \mu_X\mu_Y
  • if X,YX,Y are independent r.v.s, then Cov(X,Y)=0Cov(X,Y) = 0 and ρXY=0\rho_{XY} = 0
  • Var(aX+bY)=a2Var(X)+b2Var(Y)+2abCov(X,Y)Var(aX+bY) = a^2Var(X) + b^2Var(Y) + 2abCov(X,Y)
  • 相关系数只能说明是否存在线性关系,若等于0,不能说没有关系。
    • 但若使用ρ(X2,Y)\rho(X^2,Y),也可以衡量。
    • 由于任何函数都可以用多项式拟合,因此都可以用相关系数衡量

Bivariate normal pdf

  • f(x,y)=(2πρXρY1ρ2)1exp(12(1ρ2)((xμxσX)22ρ(xμxρX)(yμYρY)+(yμYσY)2))f(x,y) = (2\pi \rho_X\rho_Y\sqrt{1-\rho^2})^{-1}\cdot \exp(-\frac{1}{2(1-\rho^2)}((\frac{x-\mu_x}{\sigma_X})^2 - 2\rho(\frac{x-\mu_x}{\rho_X})(\frac{y-\mu_Y}{\rho_Y}) + (\frac{y-\mu_Y}{\sigma_Y})^2))
  • marginal distribution
    • XN(μX,σX2)X\sim N(\mu_X,\sigma_X^2)
    • YN(μY,σY2)Y\sim N(\mu_Y,\sigma_Y^2)
  • ρ=ρXY\rho = \rho_{XY}
  • aX+bYN(aμX+bμY,a2σX2+b2σY2+2abρσXσY)aX+bY \sim N(a\mu_X+b\mu_Y,a^2\sigma_X^2 + b^2\sigma_Y^2 + 2ab\rho \sigma_X\sigma_Y)

conditional expectationlec4_{lec4}

Theorem

  • E(X)=E(E(XY))E(X) = E(E(X|Y))
    • 可以理解为先分组求期望,与直接求期望一样
  • Var(X)=E(Var(XY))+Var(E(XY))Var(X) = E(Var(X|Y)) + Var(E(X|Y))
    • 可以理解为组内方差的期望 + 组间方差

Mixture distribution

Binomial-Poisson hierarchy

  • if XYBinomial(Y,P),YPossion(λ)X| Y \sim Binomial(Y,P),Y\sim Possion(\lambda):
  • P(X=x)=P(X=x,Y=y)=P(X=xY=y)P(Y=y)=(λP)xx!eλPP(X=x)= \sum P(X=x,Y=y) = \sum P(X=x|Y=y)P(Y=y) = \frac{(\lambda P)^x}{x!} e^{\lambda P}
  • XPossion(λP)\therefore X\sim Possion(\lambda P)
  • using E(X)=E(E(XY))E(X) = E(E(X|Y)), can easily get E(X)=E(pY)=pλE(X) = E(pY) = p\lambda

Beta-binomial hierarchy

  • if XPBinomial(n,p),Pβ(α,β)X|P \sim Binomial(n,p),P\sim \beta (\alpha,\beta)

  • so E(X)=E(E(XP))=E(np)=nαα+βE(X) = E(E(X|P)) = E(np) = \frac{n\alpha}{\alpha + \beta}

results matching ""

    No results matching ""