Ch.9 Object Recognition

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

2008. 11. 25. 22:10 @GSMC/박래홍: Computer Vision

380p

pattern recognition
: classifying based on the features

http://en.wikipedia.org/wiki/Pattern_recognition

Knowledge about objects and their classes gives the necessary information for object classification.

381p

9.1 Knowledge representation

artificial intelligence (AI)
http://en.wikipedia.org/wiki/Artificial_intelligence

http://en.wikipedia.org/wiki/Knowledge_representation

knowledge representation design
- simple control strategies
- a rich, well structured representation of a large set of a priori data and hypotheses

syntax
-> the symbols that may be used and the ways that they may be arranged

semantics
-> how meaning is embodied in the symbols and the symbol arrangement allowed by the syntax

representation
-> a set of syntactic and semantic conventions that make it possible ot describe things

formal grammars and languages
predicate logic
production rules
semantic nets
frames
-> data structures (lists, trees, graphs, tables, hierarchies, sets, rings, nets, and matrices)

Descriptions, features
: scalar properties of objects -> feature vectors

Grammars, languages
- structures from primitives (elementary structural properties of the objects) and the relations between them
-> rules defining how the chains, trees, or graphs can be constructed from a set of symbols (primitives)

Predicate logic (술어논리)
- combinations of logic variables (true/false), quantifiers, and logic operators
- modus ponens and resolution
- PROLOG
- 'pure truth'
-> logic conditions and constraints into knowledge processing

Production rules
- condition action pairs
- procedural character of knowledge
=> handling uncertainty information -> expert systems

Fuzzy logic
- some reasonable threshold
- fuzzy rule
- linguistic variables

Semantic nets
- objects, their description, and a description of relations between objects
- local information and local knowledge
- relations between partial representations described at all appropriate hierarchical levels (nodes: objects / arcs: relations between objects)
- degrees of closeness

Frames, scripts
- a substitute for missing information
-> to overcome the lack of continuous information
- a tool for organizing knowledge in prototypical objects, and for description of mutual influences of objects using stereotypes of behavior in specific situations

386p

9.2 Statistical pattern recognition

object recognition -- classifier --> assigning classes to objects

pattern
: formal description of sensed objects

The classes form clusters in the feature space, which can be separated by a discrimination curve (or hyper-surface in a multi-dimensional feature space).

If the discrimination hyper-surfaces are hyper-planes, it is called a linearly separable task.

387p
9.2.1 Classification principles

input - information about features from an object
output - decision about the class of the object
class identifiers - generated output symbols
decision rule - the function describing relations between the classifier inputs and the output
discrimination functions - scalar functions defining the discrimination hyper-surfaces

The determination of discrimination hyper-surfaces (or definition of the decision rule) is the goal of classifier design.

decision rule of a classifier
(1) find a maximum discrimination function -> linear classifier
(2) calculate the minimum distance from exemplars (sample patterns) -> piecewise linear discrimination hyper-planes

Φ-classifier
non-linear discrimination hypersurfaces of the original feature space -- non-linear transformation --> hyper-planes in the transformed freature space

390p
9.2.2 Classifier setting

Settings of the optimal classifier should be probabilistic.

optimality criteria: the value of the mean loss caused by classification

optimum decision rule:
minimum mean loss + the vector of optimal parameters

(1) minimum error criterion (Bayes criterion, maximum likelihood)

minimum-loss optimality criterion

Bayes formula
- a posteriori probabilities of classes, the conditional probability densities of objects, a priori probability, the mixture density

theoretical optimum
: "no other classifier setting can give a lower probability of the decision loss."

posteriori probabilities -- a classifier --> corresponding discrimination hyper-surfaces

(2) best approximation criterion
: the best approximation of discrimination functions by linear combinations of pre-determined functions

classifier learning: training set --> classification parameters

The information obtained from the elements of the training set must be generalized to cover the whole feature space.
(=> The classifier should be able to recognize even those objects that it has never seen before.)

sequential increase in training set size -> sequential processing of information -> classification correctness

learning: the process of automated system optimization based on the sequential presentation of examples, to minimize the optimality criterion

object description
- permissible classification error
- complexity of the classifier construction
- time required for classification
-> determination of informativity and discriminativity of measured features

393p
9.2.3 Classifier learning

(1) probability density estimation
-> minimum error criterion -> discriminant functions

priori information -> the shape of probability density functions => parametric learning

eg. normal distribution - dispersion matrix, mean value vector
+ priori estimate + confidence (weight)

(2) direct loss minimization
-> best approximation criterion -> decision rule

no prior information -> stochastic approximations
http://en.wikipedia.org/wiki/Stochastic_approximation

Jack Sklansky
Pattern Classifiers and Trainable Machines

minimum distance classifier

396p
9.2.4 Support Vector Machines

http://en.wikipedia.org/wiki/Support_vector_machines

maximizing the width of the empty area (margin) between the two classes

The margin width is defined as the distance between the discrimination hypersurface in n-dimensional feature space and the closest training patterns called support vectors.
-> An optimal hypersurface has the lowest capacity.

The support vectors specify the discrimination function.

To maximize the margin, we need to minimize ∥w∥ for a quadratic programming optimization problem.

( http://en.wikipedia.org/wiki/Lagrangian 아니고 이거 http://en.wikipedia.org/wiki/Lagrange_multipliers )

If we want to solve:

maximize $f\left( x, y \right)$

subject to $g\left( x, y \right) = c$

We introduce a new variable ( $λ$ ) called a Lagrange multiplier to rewrite the problem as:

maximize $f\left( x, y \right) + \lambda\left( g\left( x, y \right) - c \right)$

Solving this new unconstrained problem for x, y, and $λ$ will give us the solution (x, y) for our original constrained problem. The gradients of $f$ and $g$ are parallel vectors at the maximum, since the gradients are always normal to the contour lines. This occurs just when the gradient of $f$ has no component tangential to the level sets of $g$ . This condition is equivalent to $\nabla_{x,y} f(x,y) = \lambda \nabla_{x,y} g(x,y)$ for some $λ$ .

$F \left( x , y, \lambda \right) = f \left(x, y \right) + \lambda \left(g \left(x, y \right) - c \right),$

and solve

$\nabla_{x,y,\lambda} F \left( x , y, \lambda \right)=0$ .

Lagrange duality

Given a convex optimization problem in standard form

$\begin{align} \text{minimize } &f_0(x) \\ \text{subject to } &f_i(x) \leq 0,\ i \in \left \{1,\dots,m \right \} \\ &h_i(x) = 0,\ i \in \left \{1,\dots,p \right \} \end{align}$

with the domain $\mathcal{D} \subset \mathbb{R}^n$ having non-empty interior, the Lagrangian function $L: \mathbb{R}^n \times \mathbb{R}^m \times \mathbb{R}^p \to \mathbb{R}$ is defined as

$L(x,\lambda,\nu) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x).$

The vectors $λ$ and $ν$ are called the dual variables or Lagrange multiplier vectors associated with the problem. The Lagrange dual function $g:\mathbb{R}^m \times \mathbb{R}^p \to \mathbb{R}$ is defined as

$g(\lambda,\nu) = \inf_{x\in\mathcal{D}} L(x,\lambda,\nu) = \inf_{x\in\mathcal{D}} \left ( f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x) \right ).$

The dual function $g$ is concave, even when the initial problem is not convex. The dual function yields lower bounds on the optimal value $p *$ of the initial problem; for any $\lambda \geq 0$ and any $ν$ we have $g(\lambda,\nu) \leq p^*$ . If a constraint qualification such as Slater's condition holds and the original problem is convex, then we have strong duality, i.e. $d^* = \max_{\lambda \ge 0, \nu} g(\lambda,\nu) = \inf f_0 = p^*$ .

ref.
Lagrange Multipliers without Permanent Scarring

lagrange-multipliers[1].pdf

Dan Klein (Assistant Professor, Computer Science Division, University of California at Berkeley)

http://en.wikipedia.org/wiki/Karush-Kuhn-Tucker_conditions
a generalization of the method of Lagrange multipliers to inequality constraints.

Each of the Lagrange multipliers shares the corresponding support vectors.

The discrimination hyperplane in the feature space can be obrained from vectors in the input space and dot products in the feature space.

ref.
Dmitriy Fradkin and Ilya Muchnik "Support Vector Machines for Classification" in J. Abello and G. Carmode (Eds) "Discrete Methods in Epidemiology", DIMACS Series in Discrete Mathematics and Theoretical Computer Science, volume 70, pp. 13-20, 2006.

svm[1].pdf

Succinctly describes theoretical ideas behind SVM.

soft margin -> support vector machine

kernel trick to determine the similarity of two patterns
http://en.wikipedia.org/wiki/Kernel_trick

(eg. Gaussian radial basis fn. kernel -> Hilbert space)

402p
9.2.5 Cluster analysis

http://en.wikipedia.org/wiki/Cluster_analysis

1) MacQueen k-means
http://en.wikipedia.org/wiki/K-means_algorithm

within-cluster variance

2) ISODATA cluster analysis

3) mean shift algorithm

4) fuzzy c-means algorithm

5) expectation-maximazation algorithm
http://en.wikipedia.org/wiki/Expectation-maximization_algorithm
EM alternates between performing an expectation (E) step, which computes an expectation of the likelihood by including the latent variables as if they were observed, and a maximization (M) step, which computes the maximum likelihood estimates of the parameters by maximizing the expected likelihood found on the E step. The parameters found on the M step are then used to begin another E step, and the process is repeated.

404p

9.3 Neural nets

http://en.wikipedia.org/wiki/Neural_net

ref.
Ben Krose & Patrick van der Smagt <An introduction to Neural Networks>

nn_intro[1].pdf

9.3.1 Feed-forward networks

9.3.2 Unsupervised learning

9.3.3 Hopfield neural nets

http://en.wikipedia.org/wiki/Hopfield_neural_network

410p

9.4 Syntactic pattern recognition

9.4.1 Grammers and languages

formal language

9.4.2 Syntactic analysis, syntactic classifier

9.4.3 Syntactic classifier learning, grammar inference

418p

9.5 Recognition as graph matching

9.5.1 Isomorphism of graphs and sub-graphs

9.5.2 Similarity of graphs

424p

9.6 Optimization techniques in recognition

9.6.1 Genetic algorithms

Reproduction
Crossover
Mutation

9.6.2 Simulated annealing

430p

9.7 Fuzzy systems

9.7.1 Fuzzy sets and fuzzy membership functions

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch.10 Image Understanding (0)	2008.12.15
2.4 Color images & 2.5 Cameras (0)	2008.10.16
Ch. 6 Segmentation I (0)	2008.09.20
Ch. 2 The image, its representations and properties (0)	2008.09.10
Ch.1 Introduction (0)	2008.09.04

posted by maetel

돛단배

Tag

Notice

Recent Post

Recent Comment

Recent Trackback

Archive

My Link

calendar

Category