'@GSMC/박래홍: Computer Vision' 카테고리의 글 목록

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

'@GSMC/박래홍: Computer Vision'에 해당되는 글 6건

2008.12.15 Ch.10 Image Understanding
2008.11.25 Ch.9 Object Recognition
2008.10.16 2.4 Color images & 2.5 Cameras
2008.09.20 Ch. 6 Segmentation I
2008.09.10 Ch. 2 The image, its representations and properties
2008.09.04 Ch.1 Introduction

Ch.10 Image Understanding

2008. 12. 15. 17:45 @GSMC/박래홍: Computer Vision

464p

10.3 Point distribution models

http://en.wikipedia.org/wiki/Point_distribution_model

Autostitch

1. Aligning the training data
the transformations to reduce (in a least-squares sense) the difference between an aligned shape and a 'mean' shape

2. Deriving the model
The eigen-vectors of then 2N*2N covariance matrix S will provides a basis, meaning that we can represent any vector x as a linear combination of the 2N different p_i.

http://en.wikipedia.org/wiki/Principle_Component_Analysis
the components of a basis vactor indicate how much variation is exhibited with respect to each of the eigenvectors

-> dimensinal compression of the representation

3. Fitting models to data

http://en.wikipedia.org/wiki/Eigenvalue,_eigenvector_and_eigenspace

http://en.wikipedia.org/wiki/Active_shape_model

Tim Cootes <An Introduction to Active Shape Models>

asm_overview.pdf

- overview paper about ASMs and AAMs (a useful introduction)

asm_aam_overview[1].pdf

4. Extensions

475p

10.4 Active Appearance Models

1. Modeling image appearance

http://en.wikipedia.org/wiki/Hidden_Markov_model

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch.9 Object Recognition (0)	2008.11.25
2.4 Color images & 2.5 Cameras (0)	2008.10.16
Ch. 6 Segmentation I (0)	2008.09.20
Ch. 2 The image, its representations and properties (0)	2008.09.10
Ch.1 Introduction (0)	2008.09.04

posted by maetel

Ch.9 Object Recognition

2008. 11. 25. 22:10 @GSMC/박래홍: Computer Vision

380p

pattern recognition
: classifying based on the features

http://en.wikipedia.org/wiki/Pattern_recognition

Knowledge about objects and their classes gives the necessary information for object classification.

381p

9.1 Knowledge representation

artificial intelligence (AI)
http://en.wikipedia.org/wiki/Artificial_intelligence

http://en.wikipedia.org/wiki/Knowledge_representation

knowledge representation design
- simple control strategies
- a rich, well structured representation of a large set of a priori data and hypotheses

syntax
-> the symbols that may be used and the ways that they may be arranged

semantics
-> how meaning is embodied in the symbols and the symbol arrangement allowed by the syntax

representation
-> a set of syntactic and semantic conventions that make it possible ot describe things

formal grammars and languages
predicate logic
production rules
semantic nets
frames
-> data structures (lists, trees, graphs, tables, hierarchies, sets, rings, nets, and matrices)

Descriptions, features
: scalar properties of objects -> feature vectors

Grammars, languages
- structures from primitives (elementary structural properties of the objects) and the relations between them
-> rules defining how the chains, trees, or graphs can be constructed from a set of symbols (primitives)

Predicate logic (술어논리)
- combinations of logic variables (true/false), quantifiers, and logic operators
- modus ponens and resolution
- PROLOG
- 'pure truth'
-> logic conditions and constraints into knowledge processing

Production rules
- condition action pairs
- procedural character of knowledge
=> handling uncertainty information -> expert systems

Fuzzy logic
- some reasonable threshold
- fuzzy rule
- linguistic variables

Semantic nets
- objects, their description, and a description of relations between objects
- local information and local knowledge
- relations between partial representations described at all appropriate hierarchical levels (nodes: objects / arcs: relations between objects)
- degrees of closeness

Frames, scripts
- a substitute for missing information
-> to overcome the lack of continuous information
- a tool for organizing knowledge in prototypical objects, and for description of mutual influences of objects using stereotypes of behavior in specific situations

386p

9.2 Statistical pattern recognition

object recognition -- classifier --> assigning classes to objects

pattern
: formal description of sensed objects

The classes form clusters in the feature space, which can be separated by a discrimination curve (or hyper-surface in a multi-dimensional feature space).

If the discrimination hyper-surfaces are hyper-planes, it is called a linearly separable task.

387p
9.2.1 Classification principles

input - information about features from an object
output - decision about the class of the object
class identifiers - generated output symbols
decision rule - the function describing relations between the classifier inputs and the output
discrimination functions - scalar functions defining the discrimination hyper-surfaces

The determination of discrimination hyper-surfaces (or definition of the decision rule) is the goal of classifier design.

decision rule of a classifier
(1) find a maximum discrimination function -> linear classifier
(2) calculate the minimum distance from exemplars (sample patterns) -> piecewise linear discrimination hyper-planes

Φ-classifier
non-linear discrimination hypersurfaces of the original feature space -- non-linear transformation --> hyper-planes in the transformed freature space

390p
9.2.2 Classifier setting

Settings of the optimal classifier should be probabilistic.

optimality criteria: the value of the mean loss caused by classification

optimum decision rule:
minimum mean loss + the vector of optimal parameters

(1) minimum error criterion (Bayes criterion, maximum likelihood)

minimum-loss optimality criterion

Bayes formula
- a posteriori probabilities of classes, the conditional probability densities of objects, a priori probability, the mixture density

theoretical optimum
: "no other classifier setting can give a lower probability of the decision loss."

posteriori probabilities -- a classifier --> corresponding discrimination hyper-surfaces

(2) best approximation criterion
: the best approximation of discrimination functions by linear combinations of pre-determined functions

classifier learning: training set --> classification parameters

The information obtained from the elements of the training set must be generalized to cover the whole feature space.
(=> The classifier should be able to recognize even those objects that it has never seen before.)

sequential increase in training set size -> sequential processing of information -> classification correctness

learning: the process of automated system optimization based on the sequential presentation of examples, to minimize the optimality criterion

object description
- permissible classification error
- complexity of the classifier construction
- time required for classification
-> determination of informativity and discriminativity of measured features

393p
9.2.3 Classifier learning

(1) probability density estimation
-> minimum error criterion -> discriminant functions

priori information -> the shape of probability density functions => parametric learning

eg. normal distribution - dispersion matrix, mean value vector
+ priori estimate + confidence (weight)

(2) direct loss minimization
-> best approximation criterion -> decision rule

no prior information -> stochastic approximations
http://en.wikipedia.org/wiki/Stochastic_approximation

Jack Sklansky
Pattern Classifiers and Trainable Machines

minimum distance classifier

396p
9.2.4 Support Vector Machines

http://en.wikipedia.org/wiki/Support_vector_machines

maximizing the width of the empty area (margin) between the two classes

The margin width is defined as the distance between the discrimination hypersurface in n-dimensional feature space and the closest training patterns called support vectors.
-> An optimal hypersurface has the lowest capacity.

The support vectors specify the discrimination function.

To maximize the margin, we need to minimize ∥w∥ for a quadratic programming optimization problem.

( http://en.wikipedia.org/wiki/Lagrangian 아니고 이거 http://en.wikipedia.org/wiki/Lagrange_multipliers )

If we want to solve:

maximize $f\left( x, y \right)$

subject to $g\left( x, y \right) = c$

We introduce a new variable ( $λ$ ) called a Lagrange multiplier to rewrite the problem as:

maximize $f\left( x, y \right) + \lambda\left( g\left( x, y \right) - c \right)$

Solving this new unconstrained problem for x, y, and $λ$ will give us the solution (x, y) for our original constrained problem. The gradients of $f$ and $g$ are parallel vectors at the maximum, since the gradients are always normal to the contour lines. This occurs just when the gradient of $f$ has no component tangential to the level sets of $g$ . This condition is equivalent to $\nabla_{x,y} f(x,y) = \lambda \nabla_{x,y} g(x,y)$ for some $λ$ .

$F \left( x , y, \lambda \right) = f \left(x, y \right) + \lambda \left(g \left(x, y \right) - c \right),$

and solve

$\nabla_{x,y,\lambda} F \left( x , y, \lambda \right)=0$ .

Lagrange duality

Given a convex optimization problem in standard form

$\begin{align} \text{minimize } &f_0(x) \\ \text{subject to } &f_i(x) \leq 0,\ i \in \left \{1,\dots,m \right \} \\ &h_i(x) = 0,\ i \in \left \{1,\dots,p \right \} \end{align}$

with the domain $\mathcal{D} \subset \mathbb{R}^n$ having non-empty interior, the Lagrangian function $L: \mathbb{R}^n \times \mathbb{R}^m \times \mathbb{R}^p \to \mathbb{R}$ is defined as

$L(x,\lambda,\nu) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x).$

The vectors $λ$ and $ν$ are called the dual variables or Lagrange multiplier vectors associated with the problem. The Lagrange dual function $g:\mathbb{R}^m \times \mathbb{R}^p \to \mathbb{R}$ is defined as

$g(\lambda,\nu) = \inf_{x\in\mathcal{D}} L(x,\lambda,\nu) = \inf_{x\in\mathcal{D}} \left ( f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \nu_i h_i(x) \right ).$

The dual function $g$ is concave, even when the initial problem is not convex. The dual function yields lower bounds on the optimal value $p *$ of the initial problem; for any $\lambda \geq 0$ and any $ν$ we have $g(\lambda,\nu) \leq p^*$ . If a constraint qualification such as Slater's condition holds and the original problem is convex, then we have strong duality, i.e. $d^* = \max_{\lambda \ge 0, \nu} g(\lambda,\nu) = \inf f_0 = p^*$ .

ref.
Lagrange Multipliers without Permanent Scarring

lagrange-multipliers[1].pdf

Dan Klein (Assistant Professor, Computer Science Division, University of California at Berkeley)

http://en.wikipedia.org/wiki/Karush-Kuhn-Tucker_conditions
a generalization of the method of Lagrange multipliers to inequality constraints.

Each of the Lagrange multipliers shares the corresponding support vectors.

The discrimination hyperplane in the feature space can be obrained from vectors in the input space and dot products in the feature space.

ref.
Dmitriy Fradkin and Ilya Muchnik "Support Vector Machines for Classification" in J. Abello and G. Carmode (Eds) "Discrete Methods in Epidemiology", DIMACS Series in Discrete Mathematics and Theoretical Computer Science, volume 70, pp. 13-20, 2006.

svm[1].pdf

Succinctly describes theoretical ideas behind SVM.

soft margin -> support vector machine

kernel trick to determine the similarity of two patterns
http://en.wikipedia.org/wiki/Kernel_trick

(eg. Gaussian radial basis fn. kernel -> Hilbert space)

402p
9.2.5 Cluster analysis

http://en.wikipedia.org/wiki/Cluster_analysis

1) MacQueen k-means
http://en.wikipedia.org/wiki/K-means_algorithm

within-cluster variance

2) ISODATA cluster analysis

3) mean shift algorithm

4) fuzzy c-means algorithm

5) expectation-maximazation algorithm
http://en.wikipedia.org/wiki/Expectation-maximization_algorithm
EM alternates between performing an expectation (E) step, which computes an expectation of the likelihood by including the latent variables as if they were observed, and a maximization (M) step, which computes the maximum likelihood estimates of the parameters by maximizing the expected likelihood found on the E step. The parameters found on the M step are then used to begin another E step, and the process is repeated.

404p

9.3 Neural nets

http://en.wikipedia.org/wiki/Neural_net

ref.
Ben Krose & Patrick van der Smagt <An introduction to Neural Networks>

nn_intro[1].pdf

9.3.1 Feed-forward networks

9.3.2 Unsupervised learning

9.3.3 Hopfield neural nets

http://en.wikipedia.org/wiki/Hopfield_neural_network

410p

9.4 Syntactic pattern recognition

9.4.1 Grammers and languages

formal language

9.4.2 Syntactic analysis, syntactic classifier

9.4.3 Syntactic classifier learning, grammar inference

418p

9.5 Recognition as graph matching

9.5.1 Isomorphism of graphs and sub-graphs

9.5.2 Similarity of graphs

424p

9.6 Optimization techniques in recognition

9.6.1 Genetic algorithms

Reproduction
Crossover
Mutation

9.6.2 Simulated annealing

430p

9.7 Fuzzy systems

9.7.1 Fuzzy sets and fuzzy membership functions

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch.10 Image Understanding (0)	2008.12.15
2.4 Color images & 2.5 Cameras (0)	2008.10.16
Ch. 6 Segmentation I (0)	2008.09.20
Ch. 2 The image, its representations and properties (0)	2008.09.10
Ch.1 Introduction (0)	2008.09.04

posted by maetel

2.4 Color images & 2.5 Cameras

2008. 10. 16. 20:42 @GSMC/박래홍: Computer Vision

2.4 Color images

2.4.1 Physics of color

2.4.2 Color perceived by humans

XYZ color space
http://en.wikipedia.org/wiki/Xyz_color_space

color matching functions
: numerical description of the chromatic response of the observer

color gamut:
subsapce of colors perceived by humans

http://en.wikipedia.org/wiki/Gamut
Digitizing a photograph, converting a digitized image to a different color space, or outputting it to a given medium using a certain output device generally alters its gamut, in the sense that some of the colors in the original are lost in the process.

37p

2.4.3 Color spaces

http://en.wikipedia.org/wiki/Color_space

http://en.wikipedia.org/wiki/Rounding_error

(1) RGB color space
- CRT
- relative color standard
- additive color mixing
- sRGB, Adobe RGB, Adobe Wide Gamut RGB

http://en.wikipedia.org/wiki/RGB_color_space
The complete specification of an RGB color space also requires a white point chromaticity and a gamma correction curve.

(2) YIQ color space

- additive color mixing
- luminance (the perceived energy ot a light source)

http://en.wikipedia.org/wiki/YIQ
The Y component represents the luma information, and is the only component used by black-and-white television receivers. I and Q represent the chrominance information.

(3) YUV color space

http://en.wikipedia.org/wiki/YUV
Y' stands for the luma component (the brightness) and U and V are the chrominance (color) components.

(4) CMY color space
- subtractive color mixing (in printing process)
- sets of inks, substrates, and press characteristics

http://en.wikipedia.org/wiki/CMYK_color_model
masking certain colors on the typically white background (that is, absorbing particular wavelengths of light)

(5) HSV color space
- Hue, Saturation, Value (; HSB, Hue, Saturation, Brightness ; HSI; Hue, Saturation, Intensity)
- => image enhancement algorithms

http://en.wikipedia.org/wiki/HSL_and_HSV
Because HSL and HSV are simple transformations of device-dependent RGB, the color defined by a (h, s, l) or (h, s, v) triplet depends on the particular color of red, green, and blue “primaries” used. Each unique RGB device therefore has unique HSL and HSV spaces to accompany it. An (h, s, l) or (h, s, v) triplet can however become definite when it is tied to a particular RGB color space, such as sRGB.

39p

2.4.4 Palette images

palette images; indexed images

lookup table; color table; color map; index register; palette

The number of colors in the image exceeds the number of entries in the lookup table and a subset of colors has to be chosen, and a loss of information occurs.

http://en.wikipedia.org/wiki/Palette_(computing)

http://en.wikipedia.org/wiki/List_of_palettes

http://en.wikipedia.org/wiki/Indexed_color

vector quantization (-> 14.4)
1) to check which colors appear in the image by creating histograms for all three color components
2) to quantize them to provide more shades for colors which occur in the image frequently
3) to find the nearest color in the lookup table to represent the color

=> to analyze large multi-dimensional datasets

http://en.wikipedia.org/wiki/Vector_quantization

http://en.wikipedia.org/wiki/Cluster_analysis
Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often proximity according to some defined distance measure.

http://en.wikipedia.org/wiki/K_means
The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k < n. It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data.

pseudocolor

Palette selection depends on the semantics of the image and is an interactive process.

2.4.5 Color constancy

The same surface color under diffrent illumination

von Kries coefficients
http://en.wikipedia.org/wiki/Color_vision

> color compensations

- to scale the sensitivity of each sensor type

=> automatic white balancing

- to assume that the brightest point in the image has the color of the illumination

2.5 Cameras: an overview

2.5.1 Photosensitive sensors

> photosensitive sensors in cameras
(1) photo-emission

photoelectric effect (; Hertz Effect)
http://en.wikipedia.org/wiki/Photoelectric_effect

photomultipliers
vacuum tube TV cameras

(2) photovoltanic

http://en.wikipedia.org/wiki/Solar_cell

photodiode
night vision camera
photoresistor
Schottky photodiode

> semiconductor photoresistive sensors
(1) CCDs (charge-coupled devices)
- arranged into a matrix-like grid of pixels (CCD chip)
- blooming effect
- shift registers
- SNR fropped

(2) CMOS (complementary metal oxide semiconductor)
- all of the pixel area devoted to light capture
- matrix-like sensors

2.5.2 A monochromatic camera

2.5.3 A color camera

2.6 Summary

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch.10 Image Understanding (0)	2008.12.15
Ch.9 Object Recognition (0)	2008.11.25
Ch. 6 Segmentation I (0)	2008.09.20
Ch. 2 The image, its representations and properties (0)	2008.09.10
Ch.1 Introduction (0)	2008.09.04

posted by maetel

Ch. 6 Segmentation I

2008. 9. 20. 17:11 @GSMC/박래홍: Computer Vision

175p

> to divide an image into parts with a strong correlation with objects or areas of the real world contained in the image
- complete segmentation
: lower-level (context independent processing using no object-related model)
- partial segmentation
: dividing an image into separate regions that are homogeneous with respect to a chosen property (such as brightness, color, reflectivity, texture, etc)

http://en.wikipedia.org/wiki/Segmentation_(image_processing)

> segmentation methods
(1) global knowledge - a histogram of image features
(2) edge-based segmentations <- edge detection
(3) region-based segmentations <- region growing

eg. (2) + (3) => a region adjacency graph
http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=16938&objectType=FILE

cf. topological data structure

176p

6.1 Thresholding

Thresholding
: the transformation of an input image to an output (segmented) binary image

interactive threshold selection / threshold detection method

> gray-level
global thresholding
using a single threshold for the whole image
adaptive thresholding
using variable thresholds, dep. on local image characteristics
(<- non-uniform lighting, non-uniform input device parameters)
band thresholding
using limited gray-level subsets
(-> blood cell segmentations, border detector)
+ multiple thresholds
semi-thresholding
human-assisted analysis

> gradient, a local texture property, an image decomposition criterion

http://en.wikipedia.org/wiki/Thresholding_(image_processing)

179p

6.1.1 Threshold detection methods

p-tile thresholding
<- prior information about area ratios

histogram shape analysis
: gray values between the two peaks probably result from border pixels between objects and background

multi-thresholding

mode method
: to find the highest local maxima first and detect the threshold as a minimum between them

to build a histogram with a better peak-to-valley ratio
(1) to weight histogram contributions to suppress the influence of pixels with a high image gradient
(2) to use only high-gradient pixels to form the gray-level histogram which should be unimodal corresponding to the gray-level of borders

histogram transformation

hitogram concavity analysis

entropic methods

relaxation methods

multi-thresholding methods

ref. A survey of thresholding techniques
Computer Vision, Graphics, and Image Processing, 1988
PK Sahoo, S Soltani, AKC Wong, YC Chen

180p

6.1.2 Optimal thresholding

optimal thresholding
: using a weighted sum of two or more probability densities with normal distribution

"estimating normal distribution parameters together with the uncertainty that the distribution may be considered normal"

ref.
Rosenfeld and Kak, 1982
Gonzalez and Wintz, 1987

minimization of variance of the histogram
sum of square errors
spatial entropy
average clustering

ref. An analysis of histogram-based thresholding algorithms
CVGIP: Graphical Models and Image Processing, 1993
CA Glasbey

The threshold is set to give minimum probability of segmentation error.

The method of a combination of optimal and adaptive thresholding:
(1) to determine optimal gray-level segmentation parameters in local sub-regions for which local histograms are constructed
(2) to model each local histogram as a sum of n Gaussian distribution
(3) to determine the optimal parameters of the Gaussian distributions by minimizaing the fit function.

Levenberg-Marquardt minimization
http://en.wikipedia.org/wiki/Levenberg-Marquardt_algorithm

183p

6.1.3 Multi-spectral thresholding

to determine thresholds independently in each spectral band and combine them into a single segmented image

to analyze multi-dimensional histograms (instead of histograms for each spectral band)

pre-processing - adjusting region shape (eg. boundar stretching)

Regions are formed from pixels with similar properties in all spectral bands, with similar n-dimensional description vectors.

184p

6.2 Edge-based segmentation

Edges mark image locations of discontinuities in gray-level, color, texture, etc.

to combine edges into edge chains that correspond better with borders in the image

partial segmentation
- to group local edges into an image where only edge chains with a correspondence to existing objects or image parts are present

185p

6.2.1 Edge image thresholding

quantization noise, small lighting irregularities => edge image

p-tile thresholding
using orthogonal basis functions (Flynn, 1972)

Sobel Mask (<- 5.3.2)
http://en.wikipedia.org/wiki/Sobel_operator

Canny edge detection (<- 5.3.5)
http://en.wikipedia.org/wiki/Canny_edge_detector

simple detectors => thickening - (directional information) -> non-maximal suppression (Canny edge detection <- 5.3.5)

Non-maximal suppression of directional edge data:
(1) quantize edge directions
(2) inspect the two adjacent pixels indicated by the direction of its edge
(3) if the edge magnitude of either of these two exceeds that of the pixel under inspection, delete it

Hysteresis approach (<- 5.3.5)
http://100.naver.com/100.nhn?docid=227115
http://en.wikipedia.org/wiki/Hysteresis
http://en.wikipedia.org/wiki/Chaotic_hysteresis

Hysteresis to filter output of an edge detector:
- if a pixel with suitable edge magnitude borders another already marked as an edge, then mark it too

Canny reports choosing the ratio of higher to lower threshold to be in the range 2 to 3.

A computational approach to edge detection
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986
J Canny

188p

6.2.2 Edge relaxation

edge context evaluation -> continuous border construction:
Based on the strength of edges in a specified local neighborhood, the confidence of each edge is either increased or decreased.

ref. Extracting and labeling boundary segments in natural scenes
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1980
Prager, J. M.

ref. Hanson and Riseman, 1978

vertex type -> type of edge => possible border continuations:
the number of edges emanating from the vertex of an edge represent the type of the edge.

edge relaxation
: an iterative method, with edge confidences converging either to zero (edge termination) or one (the edge forms a border)

production system
http://en.wikipedia.org/wiki/Production_system

ref. Ballard and Brown's Computer Vision, 1982
3 Early Processing - 3.3 Finding Local Edges - 3..3.5 Edge Relaxation

bandb3_3.pdf

parallel implimentation
http://en.wikipedia.org/wiki/Parallel_adoption

191p

6.2.3 Border tracing

(1) inner boundary tracing
(2) outer boundary tracing

The outer region border is useful for deriving properties such as perimeter, compactness, etc.

inter-pixel boundary of adjacent regions -> extended borders:
The existence of a common border between regions makes it possible to incorporate into the boundary tracing a boundary description process.
-> an evaluated graph consisting of border segments and vertices

extended boundary (Fig. 6.18)
: obtained by shifting all the UPPER outer boundary points one pixel down and right,
shifting all the LEFT outer boundary points one pixel ot the right,
shifting all the RIGHT outer boundary points one pixel down,
and the LOWER outer boundary point positions remain unchanged

Extended boundary has the same shape and size as the natural object boundary.

(3) extended boundary tracing

Detecting common boundary segments between adjacent regions and vertex points in boundary segment connections is based on a look-up table depending on the previous detected direction of boundary and on the status of window pixels which can be inside or outside a region.

chain code
double-linked lists

ref. A contour tracing algorithm that preserves common boundaries between regions
CVGIP: Image Understanding
Yuh-Tay Liow, 1991

(4) multi-dimensional gradients
Edge gradient magnitudes and directions are computed in pixels of probable border continuation.

(5) finite topological spaces and cell complexes
-> component labeling (-> 8.1), object filling, shape description

197p

6.2.4 Border detection as graph searching

graph
: a general structure consisting of a set of nodes and arcs between the nodes.

costs
: weights of arcs

The border detection process ->
a search for the optimal path in the weighted graph + cost minimization

pixels - graph nodes weighted by a value of edge magnitude
edge diretions - an arcs matched with local border direction

(1)

A-algorithm graph search:
1. put all successors of the starting node with pointers to an OPEN list
2. remove the node with lowest associated cost
3. expand the node and put its successors on the OPEN list with pointers
4. if the node is the ending point, stop

http://en.wikipedia.org/wiki/A*_search_algorithm
Nils J. Nilsson

http://en.wikipedia.org/wiki/Monotonic
a monotonic function (or monotone function) is a function which preserves the given order.

oriented weighted-graph expansion

OPEN list

The optimal path is defined by back-tracking.

(2)

pre-processing (straightening the image data) for thin, elongated objects:
The edge image is geometrically warped by re-sampling the image along profile lines perpendicular to the approximate position of the sought border.

(3)
gradient field transform (based on a graph-search)

The estimate of the cost of the path from the current node to the end node has a substantial influence on the search behavior.

breadth-first search

raw cost function; inverted edge image

optimal graph search vs. heuristic graph search

> the evaluation cost functions (for graph-search border detection)
1) strength of edges forming a border
"If a border consists of strong edges, the cost of that border is small."
2) border curvature
"Borders with a small curvature are preferered."
3) proximity to an approximate border location
"The distance from the approximate boundary has additive or multiplicative influence on the cost."
4) estimates of the distance to the goal (end point)

Gaussian cost transformation:
the mean of the Gaussian distribution - the desired edge strength
the standard deviation - the interval of acceptable edge strengths

a good path with a higher cost vs. worse paths with lower costs
=> expansion of 'bad' nodes representing shorter paths with lower total costs
(* But, a good estimate of the path cost from the current node to the goal is not usually available.)

> modifications
- pruning the solution tree
- least maximum cost
- branch and bound
- lower bound
- multi-resolution processing
- incorporation of higher-level knowledge

Graph searching techniques ensure global optimality of the detected contour.

> detection of approximately stright contours
1) geometrical transformation (: polar-torectangular co-ordinate transformation)
-> to straighten the contour
2) dividing line (to divide the image with the non-convex parts of the contour)
-> to search in opposite directions to separate

> searching without knowledge of the start and end points
- based on magnitudes and directions of edges
- to merge edge into edge chains (partial borders)
- bi-directional heuristic search
- bottom-up control strategy (->ch.10)

ref. Edge and Line Feature Extraction Based on Covariance Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Ferdinand van der Heijden, 1995

207p

6.2.5 Border detection as dynamic programming

http://en.wikipedia.org/wiki/Dynamic_programming
: a method of solving problems exhibiting the properties of overlapping subproblems and optimal substructure (described below) that takes much less time than naive methods

Dynamic programming is an optimization method based on the principle of optimality.

(6.22) (8-connected border with n nodes, m-th graph layer)
C(x_k(m+1)) = min ( C(x_i(m) + g(i,k)_(m) )

the number of cost combination computations for each layer = 3n
the total number of cost combination computatios = 3n(M-1) + n

- A-algorithm-based graph search does not require explicit definition of the graph.
- DP presents an efficient way of searching for optimal paths from multiple or unknown starting and ending points.
- Which approach is more efficient depends on evaluation functions and on the quality of heuristics for an A-algorithm.
- DP is faster and less memory demanding for a word recoginition
- DP is more computationally efficient than edge relaxation.
- DP is more flexible and less restrictive than the Hough transform.
- DP is powerful in the presence of noise and in textured images.

live wire; intelligent scissors
: an interactive real-time border detection method combines automated border detection with manual definition of the boundary start point and interactive positioning of the end point.
http://en.wikipedia.org/wiki/Livewire_Segmentation_Technique

In DP, the graph that is searched is always completely constructed at the beginning of the search process.

live lane
: tracing the border by moving a square window whose size is adaptively defined from the speed and acceleration of the manual tracing

+ automated determination of optimal border features from examples of the correct borders
+ specification of optimal parameters of cost transforms

212p

6.2.6 Hough transforms

Hough transform:
objects with known shape and size -- (data processing) --> shape distortion, rotation, zoom -- (moving a mask) --> correlation (determining the pixel with the highest frequency of occurrence) + known information

- images with incomplete information about the searched objects
- additional structures and noise

http://en.wikipedia.org/wiki/Hough_transform

The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform.

http://en.wikipedia.org/wiki/Generalised_Hough_Transform
In the Generalized Hough Transform, the problem of finding the model's position is transformed into a problem of finding the transformation parameter that maps the model onto the image. As long as we know the value of the transformation parameter, then the position of the model in the image can be determined.

original Hough transform (if analytic equations of object borderlines are known - no prior knowledge of region position is necessary)
--> generalized Hough transform (even if an analytic expression of the border is not known)

Any straight line in the image is represented by a single point in the parameter space and any part of this straight line is transformed into the same point.

Hough transform
1) to determine all the possible line pixels in the image
2) to transform all lines that can go through these pixels into corresponding points in the parameter space
3) to detect the points in the parameter space

The possible directions of lines define a discretization of the parameter.

parameter space -> rectangular structure of cells -> accumulator array (whose elements are accumulator cells)

Lines existing in the image may be detected as high-values accumulator cells in the accumulator array, and the parameters of the detected line are specified by the accumulator array co-ordinates.
-> Line detection in the image is transformed to detection of local maxima in the accumulator space.
-> A noisy or approximately straight line will not be transformed into a point in the parameter space, but rather will result in a cluster of points, (and the cluster center of gravity can be considered the straight line representation.)

- missing parts, image noise, other non-line structures co-existing in the image, data imprecision

Generalized Hough transform
- arbitrary shapes
- partial or slightly deformed shapes, occluded objects
- measuring similarity between a model and a detected object
- robust to image noise
- search for several occurrences of a shape

Randomized Hough transform

221p

6.2.7 Border detection using border location informationd

(1) the location of significant edges positioned close to an assumed border
the edge directions of the significant edges match the assumed boundary direction
-> an approximate curve is computed to result in a new, more accurate border

(2) prior knowledge of end points (assuming low image noise and straight boundaries)
search for the strongest edge located on perpendiculars to the line connecting end points of each partition (; perpendiculars are located at the center of the connecting straight line)

(3) contour detection - active contour models (snakes) (-> 7.2)
user-provided knowledge about approximate position and shape of the required contour

222p

6.2.8 Region construction from borders

methods to construct regions from partial borders

(1) superslice method
thresholding for which the detected boundaries best coincide with assumed boundary segments

(2) probabilities that pixels are located inside a region closed by the partial borders
A pixel is a potential region member if it is on a straight line connecting two opposite edge pixels.

223p

6.3 Region-based segmentation

to divide an image into zones of maximum homogeneity

225p

6.3.1 Region merging

http://en.wikipedia.org/wiki/State_space_search
The set of states form a graph where two states are connected if there is an operation that can be performed to transform the first state into the second.

6.3.2 Region splitting

6.3.3 Splitting and merging

6.3.4 Watershed segmentation

6.3.5 Region growing post-processing

6.4 Matching

6.4.1 Matching criteria

6.4.2 Control strategies of matching

6.5 Evaluation issues in segmentation

6.5.1 Supervised evaluation

6.5.2 Unsupervised evaluation

6.6 Summary

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch.10 Image Understanding (0)	2008.12.15
Ch.9 Object Recognition (0)	2008.11.25
2.4 Color images & 2.5 Cameras (0)	2008.10.16
Ch. 2 The image, its representations and properties (0)	2008.09.10
Ch.1 Introduction (0)	2008.09.04

posted by maetel

Ch. 2 The image, its representations and properties

2008. 9. 10. 23:15 @GSMC/박래홍: Computer Vision

ltlslChapter 2. The image, its representations and properties

2.1 Image representations, a few concepts

mathematical models
signals
scalar (monochrome image) / vector (color image) function

The domain of a given function is the set of "input" values for which the function is defined.
The range of a function is the set of all "output" values produced by that function.

12p
Image Functions
f(x,y) or f(x,y,t)
- perspective: parallel (or orthographic) projection
- static/dynamic, monochrome/color
- resolution: spatial/spectral/radiometric/time
- deterministic/stochastic

The 2D intensity image is the result of a perspective projection of the 3D scene.

13p
Parallel/Orthographic projection
http://en.wikipedia.org/wiki/Orthographic_projection
http://en.wikipedia.org/wiki/Orthographic_projection_(geometry)
http://en.wikipedia.org/wiki/Orthogonal_projection
http://en.wikipedia.org/wiki/Graphical_projection

A full 3D representation is
(1) independent of the viewpoint
(2) expressed in the co-ordinate system of the object (rather than of the viewer)

=> Any intensity image view of the objects may be synthesized by standard computer graphics techniques.

microstructure
http://en.wikipedia.org/wiki/Microstructure

14p

Quality of a digital image
1. spatial resolution
2. spectral resolution
3. radiometric resolution
4. time resolution

Images f(x,y)
: deterministic functions / realizations of stochastic processes

linear system theory
integral transforms
discrete mathematics
theory of stochastic processes

A 'well-behaved' image function f(x,y) is integrable, has an invertible Fourier transform, etc.

2.2 Image digitization

sampled - a matrix with M rows and N columns
quantization - K interval (an integer value)

Image quantization assigns to each continuous sample an integer value - the continuous range of the image function f(x,y) is split into K intervals.

2.2.1 Sampling

Shannon's theorem (->3.2.5)

http://en.wikipedia.org/wiki/Shannon%27s_theorem
In information theory, the noisy-channel coding theorem establishes that however contaminated with noise interference a communication channel may be, it is possible to communicate digital data (information) nearly error-free up to a given maximum rate through the channel. This surprising result, sometimes called the fundamental theorem of information theory, or just Shannon's theorem, was first presented by Claude Shannon in 1948.The Shannon limit or Shannon capacity of a communications channel is the theoretical maximum information transfer rate of the channel, for a particular noise level.

TV - 512 * 512
PAL - 768 * 576
NTSC - 640 * 480

15p
raster
: the grid on which a neighborhood relation between points is defined
http://en.wikipedia.org/wiki/Rasterisation

Dirac impulses
http://en.wikipedia.org/wiki/Dirac_delta_function

The pixel captured by a real digitization device has finite size, since the sampling function is not a collection of ideal Dirac impulses but a collection of limited impulses.
-> 3.2.5

2.2.2 Quantization

Quantization is the transition between continuous values of the image function (brightness) and its digital equivalent.

The number of brightness of displays normally provide a range of at least 100 intensity levels.

average local brightness => gray-scale transformation techniques
(-> 5.1.2)

2.3 Digital image properties

17p
2.3.1 Metric and topological properties of digital images

Distance - identity, symmetry, triangular inequality

1) Euclidean distance, D_E
Pythagorean metric
http://en.wikipedia.org/wiki/Euclidean_distance
2) 'city block' distance (; L1 metric; Manhattan distance), D_4
rectilinear distance, L1 distance or L1 norm (see Lp space), city block distance, Manhattan distance, or Manhattan length
http://en.wikipedia.org/wiki/City_block_distance
3) 'chessboard' distance, D_8
Chebyshev distance (or Tchebychev distance), or L∞ metric
http://en.wikipedia.org/wiki/Chebyshev_distance
4) quasi-Euclidean distance D_QE

18p

region
: a connected set (in a set theory)
: a set of pixels in which there is a path between any pair of its pixels, all of whose pixels also belong to the set
: a set of pixels in which each pair of pixels is contiguous

object (image data interpretation: segmentation)

hole
: points which do not belong to the object and are surrounded by the object

background

The relation 'to be contiguous' decomposes an image into individual regions.

19p

contiguity paradox
paradoxes of crossing lines
connectivity problems

ref.
Digital Geometry: Geometric Methods for Digital Picture Analysis
Reinhard Klette, Azriel Rosenfeld (Morgan Kaufmann, 2004)

topology based on cellular complexes
http://en.wikipedia.org/wiki/Bernhard_Riemann
http://en.wikipedia.org/wiki/Differential_geometry

ref.
Algorithms in Digital Geometry Based on Cellular Topology

kovalevski.pdf

V. Kovalevsky
University of Applied Sciences Berlin
http://www.kovalevsky.de/; kovalev@tfh-berlin.de
Abstract. The paper presents some algorithms in digital geometry based on the
topology of cell complexes. The paper contains an axiomatic justification of the
necessity of using cell complexes in digital geometry. Algorithms for solving
the following problems are presented: tracing of curves and surfaces,
recognition of digital straight line segments (DSS), segmentation of digital
curves into longest DSS, recognition of digital plane segments, computing the
curvature of digital curves, filling of interiors of n-dimensional regions
(n=2,3,4), labeling of components (n=2,3), computing of skeletons (n=2, 3).

20p

distance transform; distance function; chamfering algorithm
woodcarving operation
http://en.wikipedia.org/wiki/Distance_transform

ref.
Distance Transform

DistanceTransform.pdf

David Coeurjolly, Laboratoire LIRIS, France, 2006

A Method for Obtaining Skeletons Using a Quasi-Euclidean Distance

p600-montanari.pdf

U Montanari - Journal of the ACM (JACM), 1968

A linear time algorithm for computing exact Euclidean distance transforms of binary images in arbitrary dimensions

maurer2003short.pdf

Maurer, C.R., Jr.; Rensheng Qi; Raghavan, V

> applications of the distance transformation
discrete geometry
path planning and obstacle avoidance in mobile robotics
finding the closest feature in the image
skeletonization (mathematical morphology methods)

21p

edge
: a local property of a pixel and its immediate neighborhood

The edge tells us how fast the image intensity varies in a small neighborhood of a pixel.

The gradient of the image function is used to compute edges.

The edge direction is perpendicular to the gradient direction which points in the direction of the fastest image function growth.

crack edge

22p

border (boundary)
: the set of pixels within the region that have one or more neighbors outside
: inner/outer

The border is a global concept related to a region, while edge expresses local properties of an image function.

23p

convex
: If any two points within a region are connected by a straight line segment, and the whole line lies within the region, then this region is convex.

convex hull
: the smallest convex region containing the input region

deficit of convexity - lakes & bays

topology
topological invariant = topological invariant
http://en.wikipedia.org/wiki/Topological_property
a property of a topological space which is invariant under homeomorphisms. That is, a property of spaces is a topological property if whenever a space X possesses that property every space homeomorphic to X possesses that property. Informally, a topological property is a property of the space that can be expressed using open sets.

homeomorphism = topological isomorphism
http://en.wikipedia.org/wiki/Homeomorphism
the mappings which preserve all the topological properties of a given space. Two spaces with a homeomorphism between them are called homeomorphic, and from a topological viewpoint they are the same.

rubber sheet transform:
Stretching does not change contiguity of the object parts and does not change the number of holes in regions.

Euler-Poincaré Characteristi
http://en.wikipedia.org/wiki/Euler_characteristic
http://mathworld.wolfram.com/EulerCharacteristic.html

24p

2.3.2 Histograms

brightness histogram
: the freqency of the brightness value in the image

The histogram is usually the only global information about the image which is available.

> applications of histogram
finding optimal illumination conditions for capturing an image
gray-scale transformations
image segmentation to objects and background

A change of the object position on a constant background does not affect the histogram.

> local smoothing of the histogram
(1) local averaging of neighboring histogram elements + boundary adjustment
(2) Gaussian blurring: 1-d simplification of the 2-d Gaussian blur

25p

2.3.3 Entropy

information entropy
: the amount of uncertainty about an event associated with a given probability distribution

As the level of disorder rises, entropy increases and events are less predictable.

The uncertainty for such set of $\displaystyle n$ outcomes is defined by

$\displaystyle u = \log_b (n)$

since the probability of each event is

1 / n

, we can write

$\displaystyle u = \log_b \left(\frac{1}{p(x_i)}\right) = - \log_b (p(x_i)) \ , \ \forall i = 1, \cdots , n$

The average uncertainty $\displaystyle \langle u \rangle$ , with $\displaystyle \langle \cdot \rangle$ being the average operator, is obtained by

$\displaystyle \langle u \rangle = \sum_{i=1}^{n} p(x_i) u_i = - \sum_{i=1}^{n} p(x_i) \log_b (p(x_i))$

gray-level histogram -> probability density p(x_k) -> entropy

http://en.wikipedia.org/wiki/Entropy_(information_theory)

Shannon's source coding theorem shows that, in the limit, the average length of the shortest possible representation to encode the messages in a given alphabet is their entropy divided by the logarithm of the number of symbols in the target alphabet.

C.E. Shannon, "A Mathematical Theory of Communication",

shannon1948.pdf

Bell System Technical Journal, vol. 27, pp. 379-423, 623-656, July, October, 1948

2.3.4 Visual perception of the image

Contrast

http://en.wikipedia.org/wiki/Contrast_(vision)

Acuity 예민함

http://en.wikipedia.org/wiki/Visual_acuity

visual illusions

http://en.wikipedia.org/wiki/Optical_illusion

http://en.wikipedia.org/wiki/Ebbinghaus_illusion

perceptual grouping -> image segmentation

Gestalt theory

http://en.wikipedia.org/wiki/Gestalt_psychology

Patterns take precedence over elements and have properties that are not inherent in the elements themselves.

28p

2.3.5 Image quality

Azriel Rosenfeld, Avinash C. Kak, Digital Picture Processing, Academic Press, 1982

parameter optimization
correlation between images
resolution of small or proximate objects in the image
measures of image similarity (retrieval from image databases)

2.3.6 Noise in image

white noise
http://en.wikipedia.org/wiki/White_noise

-> Gaussian noise

additive noise
http://en.wikipedia.org/wiki/Additive_white_Gaussian_noise

multiplicative noise

quantization noise
http://en.wikipedia.org/wiki/Quantization_noise

impulse noise

-> salt-and-pepper noise
http://en.wikipedia.org/wiki/Salt_and_pepper_noise

unknown about noise properties -> local pre-processing methods
known noise parameters -> image restoration techniques

SNR = signal-to-noise ration
http://en.wikipedia.org/wiki/Signal-to-noise_ratio

2.4 Color images

2.4.1 Physics of color
2.4.2 Color perceived by humans
2.4.3 Color spaces
2.4.4 Palette images
2.4.5 Color constancy

2.5 Cameras: an overview

2.5.1 Photosensitive sensors
2.5.2 A monochromatic camera
2.5.3 A color camera

2.6 Summary

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch.10 Image Understanding (0)	2008.12.15
Ch.9 Object Recognition (0)	2008.11.25
2.4 Color images & 2.5 Cameras (0)	2008.10.16
Ch. 6 Segmentation I (0)	2008.09.20
Ch.1 Introduction (0)	2008.09.04

posted by maetel

Ch.1 Introduction

2008. 9. 4. 21:49 @GSMC/박래홍: Computer Vision

Image Representation (4 levels):
objects/ a scene - (2-d projection, sampling/quantizing)-> digital image - (regions, edges, scale, interest points, texture, motion) -> image with features -> objects

Image Analysis - computer vision techniques
- low level
1) image acquisition (captaured/sensed, digitized)
2) preprocessing (noise suppression, enhancement of some object features - eg. edge extraction)
3) image segmentation (objects from the background)

> higher level
1) reducing the data quantity
2) knowledge about the image content - object size, shape, etc
3) expressed in symbolic form (http://en.wikipedia.org/wiki/Formal_language)
4) feedback from high-level to low-level

digital manipulation -> preprocessing -> segmentation -> recognition -> understanding

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch.10 Image Understanding (0)	2008.12.15
Ch.9 Object Recognition (0)	2008.11.25
2.4 Color images & 2.5 Cameras (0)	2008.10.16
Ch. 6 Segmentation I (0)	2008.09.20
Ch. 2 The image, its representations and properties (0)	2008.09.10

posted by maetel

돛단배

Tag

Notice

Recent Post

Recent Comment

Recent Trackback

Archive

My Link

calendar

Category

'@GSMC/박래홍: Computer Vision'에 해당되는 글 6건

Ch.10 Image Understanding

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch.9 Object Recognition

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

2.4 Color images & 2.5 Cameras

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch. 6 Segmentation I

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch. 2 The image, its representations and properties

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch.1 Introduction

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

티스토리툴바

돛단배

Search

Tag

Notice

Recent Post

Recent Comment

Recent Trackback

Archive

My Link

calendar

Category

'@GSMC/박래홍: Computer Vision'에 해당되는 글 6건

Ch.10 Image Understanding

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch.9 Object Recognition

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

2.4 Color images & 2.5 Cameras

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch. 6 Segmentation I

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch. 2 The image, its representations and properties

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch.1 Introduction

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

티스토리툴바