Probability and Bayesian Networks

Uncertainty

Why is uncertainty important?

Probability

Frequentist vs. Bayesian

Frequentist view:
Bayesian view:

Probability Measure

If X is a random variable (feature, attribute)

Axioms of Probability

Axioms are things we have to assume about probability

Independence

Describe a system with n features: 2n1 probabilities
Pasted image 20250418111454.png

Conditional Probability

If X and Y are random variables, the

P(is_bird|flies)=P(flies|is_bird)P(is_bird)P(flies)

Sum Rule

We know

xP(X=x,Y)=P(Y)

We call P(Y) the marginal distribution over Y

Conditional Independence

X and Y are independent iff

Expected Values

Expected value of a function on X, V(X):

E(V)=xDom(X)P(x)V(x)

where P(x) is the probability of X=x
This is useful in decision making

E(V(decision))=outcomeP(outcome|decision)V(outcome)

Value of Independence

Bayesian Networks

Bayesian Networks or Belief Networks

Example

Pasted image 20250418114123.png

Correlation and Causality

Directed links in Bayes’ net causal

Example

A Bayesian Network or BN over variables {X1,X2,,XN} consists of:

Semantics of a Bayes’ Net

The structure of the BN means that:

every Xi is conditionally independent of all its non-descendants given its parents:

P(Xi|S,Parents(Xi))=P(Xi|Parents(Xi))

for any subset SNonDescendants(Xi)

The BN defines a factorization of the joint probability distribution

P(X1,X2,,Xn)=iP(Xi|parents(Xi))

Constructing Belief Networks

To represent a domain in a belief network, you need to consider:

Three Basic Bayesian Networks

Pasted image 20250418134600.png

Updating Belief: Bayes’ Rule

Agent has a prior belief in a hypothesis, h,P(h)

P(h|e)=P(e|h)P(h)P(e)=P(e|h)P(h)hP(e|h)P(h)

Probabilistic Inference

Simple Forward Inference (Chain)

Pasted image 20250418143030.png

Computing marginal requires simple forward propagation of probabilities
Same idea when evidence COVID=true “upstream”
With multiple parents the evidence is “pooled”

Pasted image 20250418143958.png

P(Fev)=Flu,M,TS,ETP(Fev,Flu,M,TS,ET)=Flu,MP(Fev|M,Flu)[TSP(Flu|TS)P(TS)][ETP(M|ET)P(ET)]
Also works with “upstream” evidence
P(Fev|ts,m)=FluP(Fev,Flu|m,ts)=FluP(Fev|Flu,m,ts)P(Flu|m,ts)=FluP(Fev|Flu,m)P(Flu|ts)

Simple Backward Inference

When evidence is downstream of query, then we must reason “backwards”.

P(B|r)=P(r|B)P(B)/P(r)P(r,B) (proportional to the joint probability)P(r,B)=P(r|B)P(B) (chain rule)=m,cP(m,c,B,r)=m,cP(m)P(c|m)P(B|m,c)P(r|B,m,c) (marginalization)=m,cP(m)P(c)P(B|m,c)P(r|B) (independence and conditional independence)

Normalizing constant is 1P(r), but this can be computed as $$P(r) = \sum_bP(r, b)$$

Variable Elimination

More general algorithm:

Factors

a factor is a representation of a function from a tuple of random variables into a number

Multiplying Factors

The product of factor f1(X,Y) and f2(Y,Z), where Y are the variables in common, is the factor (f1×f2)(X,Y,Z) defined by:

(f1×f2)(X,Y,Z)=f1(X,Y)f2(Y,Z)
Summing out variables

We can sum out a variable, say X1 with domain {v1,,vk}, from factor f(X1,,Xj), resulting in a factor on X2,,Xj defined by:

X1f(X1,X2,,Xj)=f(X1=v1,,Xj)++f(X1=vk,,Xj)

Evidence

If we want to compute the posterior probability of Z given evidence Y1=v1Yj=vj:

P(Z|Y1=v1,,Yj=vj)=P(Z,Y1=v1,,Yj=vj)P(Y1=v1,,Yj=vj)=P(Z,Y1=v1,,Yj=vj)ZP(Z,Y1=v1,,Yj=vj)

The computation reduces to the joint probability of P(Z,Y1=v1,,Yj=vj), normalize at the end

Probability of a conjunction

Suppose the variables of the belief network are X1,,Xn

P(Z,Y1=v1,,Yj=vj)=ZkZ1P(X1,,Xn)Y1=v1,,Yj=vj=ZkZ1i=1nP(Xi|parents(Xi))Y1=v1,,Yj=vj
Computing sums of products

Computation in belief networks reduces to computing the sums of products

Variable Elimination Algorithm

To compute P(Z|Y1=v1Yj=vj):

Summing our a Variable

To sum out a variable Zj from a product f1,,fk of factors:

Zjf1××fk=f1××fi×(Zjfi+1××fk)
Notes on Variable Elimination

Variable Ordering

Polytrees

Pasted image 20250418165015.png

Relevance

Pasted image 20250418184906.png
Certain variables have no impact

Probability and Time

Pasted image 20250418214157.png

Markov Assumption

P(St+1|S1,,St)=P(St+1|St)
This distribution gives the dynamics of the Markov Chain

Hidden Markov Models (HMMs)

Pasted image 20250418214404.png
Add: observations Ot

Speech Recognition

Pasted image 20250418214659.png

Dynamic Bayesian Networks (DBNs)

In general, any Bayesian network can repeat over time: DBN

Stochastic Simulation

Idea: probabilities samples

Generating Samples from a distribution

Pasted image 20250418222206.png
For a variable X with a discrete domain or a one-dimensional real domain:

Forward Sampling in a Belief Network

Example

Pasted image 20250418223034.png