블로그 이미지
Leeway is... the freedom that someone has to take the action they want to or to change their plans.
maetel

Notice

Recent Post

Recent Comment

Recent Trackback

Archive

calendar

1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
  • total
  • today
  • yesterday

Category

20p
2.1 Introduction

state of nature

prior (probability)

http://en.wikipedia.org/wiki/Prior_probability
a marginal probability, interpreted as a description of what is known about a variable in the absence of some evidence
(The posterior probability is then the conditional probability of the variable taking the evidence into account. The posterior probability is computed from the prior and the likelihood function via Bayes' theorem.)

decision rule

probability mass function = pmf

http://en.wikipedia.org/wiki/Probability_mass_function
a function that gives the probability that a discrete random variable is exactly equal to some value
(A pmf differs from a probability density function (abbreviated pdf) in that the values of a pdf, defined only for continuous random variables, are not probabilities as such. Instead, the integral of a pdf over a range of possible values (a, b] gives the probability of the random variable falling within that range.)

probability density function = pdf

http://en.wikipedia.org/wiki/Probability_density_function
a function that represents a probability distribution in terms of integrals

class-conditional probability density function = state-conditional probability density
: the probability density function for x given that a state of nature is w

http://en.wikipedia.org/wiki/Conditional_probability
the probability of some event A, given the occurrence of some other event B
(Conditional probability is written P(A|B), and is read "the probability of A, given B".)


Bayes formula:
posterior = likelihood * prior / evidence

P(w_j) -- (x) --> P(w_j|x)
: By observing the value of x when we can convert the prior probability P(w_j) to the a posterior probability (or posterior) P(w_j|x), to measure the probability of the state of nature being w_j given that feautre value x

likelihood

evidence
: scale factor (to guarantee the posterior probabilities sum to one)

http://en.wikipedia.org/wiki/Bayes%27_Theorem

http://www.aistudy.com/pattern/parametric_gose.htm#_bookmark_3c54af0


Bayesian decision rule (for minimizing the probability of error)



24p
2.2 Bayesian Decision Theory - Continuous Features

feature vector

feature space

http://en.wikipedia.org/wiki/Feature_space
an abstract space where each pattern sample is represented as a point in n-dimensional space
(Its dimension is determined by the number of features used to describe the patterns. Similar samples are grouped together, which allows the use of density estimation for finding patterns.)

loss function (for an action)
cost function (for classification mistakes)

a probability determination -- loss function --> decision

risk: an expected loss

conditional risk

decision function

The dicision rule specifies the action.

Bayes decision procedure -> optimal performance
Bayes decision rule:
to minimize the overall risk, select the action for
the minimum conditional risk = R* : Bayes risk
-> the best performance


25p
2.2.1 Two-Category Classification

The loss incurred for making an error is greater than the loss incurred for being correct.


likelihood ratio

The Bayes decision rule can be interpreted as calling for deciding w_1 if the likelihood ratio exceeds a threshold value that is independent of the observation x.


26p
2.3 Minimum-error-rate Classification

to seek a decision rule that minimizes the probability of error, the error rate

symmetrical / zero-one loss function

for minimum error rate,
decide w_i if P(W_1|x) > P(w_j|x)

2.3.1 Minimax Criterion
2.3.2 Neyman-Pearson Criterion


29p
2.4 Classifiers, Discriminant Functions, and Decision Surfaces

2.4.1 The Multicategory Case

Fig. 2.5 The fuctional structure of a general statistical pattern classfier
input x -> discriminant functions g(x) + costs -> action (classification)

classifier
: a network or machine that computes c discriminant functions and selects the category corresponding to the largest discriminant

Bayes classifier
i) the maximum discriminant fn. <=> the minimum conditional risk
ii) for the minimum-error-rate,
the maximum discriminant fn. <=> the maximum posterior probability
iii) replacing the disciminant fn. by a monotonically increasing fn.

(28)

Decision rule divides the feature space into c decision regions which are separated by decision boundaries, surfaces in feature space where ties occur among the largest discriminant functions.


2.4.2 The Two-Category Case

dichotomizer


31p
2.5 Normal Density

the multivariate normal / Gaussian density

expected value

2.5.1 Univariate Density

expected value of x (: an average over the feature space)
(35)

expected squared deviation = variance
(36)


The entropy measures the fundamental uncertainty in the values of points selected randomly from a distribution.

The normal distribution has the maximum entropy of all distributions having a given mean and variance.

http://en.wikipedia.org/wiki/Central_limit_theorem
The central limit theorem (CLT) states that the sum of a sufficiently large number of identically distributed independent random variables each with finite mean and variance will be approximately normally distributed (Rice 1995). Formally, a central limit theorem is any of a set of weak-convergence results in probability theory. They all express the fact that any sum of many independent identically distributed random variables will tend to be distributed according to a particular "attractor distribution".

33p
2.5.2 Multivariate Density

covaraince matrix
The covariance matrix allows us to calculate the dispersion of the data in any direction, or in any subspce.

http://en.wikipedia.org/wiki/Covariance_matrix

http://en.wikipedia.org/wiki/Covariance
covariance is a measure of how much two variables change together (the variance is a special case of the covariance when the two variables are identical).
If two variables tend to vary together (that is, when one of them is above its expected value, then the other variable tends to be above its expected value too), then the covariance between the two variables will be positive. On the other hand, when one of them is above its expected value the other variable tends to be below its expected value, then the covariance between the two variables will be negative.
 
the center of the cluster - the mean vector
the shape of the cluster - the covaraince matrix


Whitening Transform
making the spectrum of eigenvalues of the transformed distribution uniform

http://en.wikipedia.org/wiki/Whitening_transform
The whitening transformation is a decorrelation method that converts the covariance matrix S of a set of samples into the identity matrix I. This effectively creates new random variables that are uncorrelated and have the same variances as the original random variables. The method is called the whitening transform because it transforms the input matrix closer towards white noise.
This can be expressed as  A_w = \Phi \Lambda^{-\frac{1}{2}}
where Φ is the matrix with the eigenvectors of "S" as its columns and Λ is the diagonal matrix of non-increasing eigenvalues.


hyperellipsoids
- principal axes

- Mahalanobis distance
http://en.wikipedia.org/wiki/Mahalanobis_distance

- volume


36p
2.6 Discriminant Functions for the Normal Density

2.6.1 case 1: covariance matrix = a contant times the identity matrix

equal-size hypersherical clusters

linear machine

The hyper plane is the perpendicular bisector of the line between the means

minimum-distance classfier

template-matching -> the nearest-neighbor algorithm

2.6.2 case 2: covariance matrices = identical but arbitrary



2.6.3 case 3: covariance matrix = arbitrary



2.9 Bayes Decision Theory - Discrete Features

posted by maetel