radiant intensity

> radiant
1) Warn's Spotlight
2) isotropic point source
3) distance directional source

projected solid angle
hemispherical source
disk source
spherical source


> reflection equation
1) perfectly diffuse reflection


2) common diffuse reflection

3) perfectly specular reflection

> diffuse interface reflection
non-Lambertian (Specular) diffuse
Cook-Torrance Model (Torrance-Sparrow Model)

Shape Recovery from Photometry

reflectance map

photometric stereo

Projective reconstruction
= Affine upgrade (H_p) + Euclidean upgrade (H_a)
=> Auto-calibration

Kruppa equations

Kruppa’s Equations Derived from the Fundamental Matrix
Richard I. Hartley

Model Building
Image-based Rendering (IBR)

image rectification
epipolar rectification




Range Imaging
Range Sensor Calibration

calibrated vs. uncalibrated

essential matrix -  calibrated cameras (K=I)

fundamental matrix - uncalibrated cameras

reconstruction via factorization

Olivier Faugeras Quang-Tuan Luong, Olivier Faugeras, Quang-Tuan Luong, Theo Papadopoulo
The Geometry of Multiple Images: The Laws That Govern the Formation of Multiple Images of a Scene and Some of Their Applications
서강대 e-book 보기

An Invitation to 3-D Vision
Yi Ma, Stefano Soatto, Jana Kosecka, Shankar Sastry
Springer Verlag, 2003





Fundamental Matrix

In computer vision, the fundamental matrix  \mathbf{F} is a  3 \times 3 matrix of rank 2 which relates corresponding points in stereo images. In epipolar geometry, with homogeneous image coordinates  \mathbf{y_1} and  \mathbf{y_2} of corresponding points in a stereo image pair,  \mathbf{F y_1} describes a line (an epipolar line) on which the corresponding point y2 on the other image must lie. That means, for all pairs of corresponding points holds

 \mathbf{ y_2^T  F y_1} = 0.

Being of rank two and determined only up to scale, the fundamental matrix can be estimated given at least seven point correspondences. Its seven parameters represent the only geometric information about cameras that can be obtained through point correspondences alone.

epipolar line
epipolar plane


RANdom SAmple Consensus



An Invitation to 3-D Vision
Yi Ma, Stefano Soatto, Jana Kosecka, Shankar Sastry
Springer Verlag, 2003

Learning Epipolar Geometry
The Java code for this page was created by Sylvain Bougnoux.

image filtering
image warping

LSI (Linear Shift Invariance) - convolution

Convolution is a mathematical operation on two functions f and g, producing a third function that is typically viewed as a modified version of one of the original functions.


Mathematically, a Gaussian filter modifies the input signal by convolution with a Gaussian function; this transformation is also known as the Weierstrass transform.
Gaussian filters are designed to give no overshoot to a step function input while minimizing the rise and fall time (which leads to the steepest possible slope). This behavior is closely connected to the fact that the Gaussian filter has the minimum possible group delay.


Structure from Stereo

two or more images, calibrated cameras (K(f), R, T known)

UCR stereographs
The Art of Stereo Photography
History of Stereo Photography
Double Exposure
Stereo Photography
3D Photography links
National Stereoscopic Association
Books on Stereo Photography



Computing Rectifying Homographies for Stereo Vision
Charles Loop, Zhengyou Zhang

Microsoft Research, April 8, 1999 

cross correlation

S. M. Seitz and C. R. Dyer, View Morphing, Proc. SIGGRAPH 96, 1996, pp. 21-30.

L. McMillan and G. Bishop. Plenoptic Modeling: An Image-Based Rendering System, Proc. of SIGGRAPH 95, 1995, pp. 39-46.

> Stereo Reconstruction
1) Calibrate cameras
2) Rectify images
3) Compute disparity
4) Estimate depth

SSD (Sum of Squared Difference) to choose the baseline 

Structure from Motion

Incremental Motion: Kinematic relation  

7 Unknowns respect to motion
 translation:  V = (VX , VY , VZ )
 rotation :     W = (WX, WY, WZ )
 depth :         Z
2 equations for one point
6 points with 12 unknowns -> 12 equations



optical flow, image flow

Horn-Schunck Algorithm

Determining Optical Flow

Berthold K.P. Horn and Brian G. Schunck
Artificial Intelligence Laboratory, Massachusetts Institute of Technology

Larger Motion - feature matching

Lucas Kanade style motion estimation

Good Features to Track
Jianbo Shi (Computer Science Department, Cornel1 University)
Carlo Tomasi (Computer Science Department, Stanford University)


cross ratio

measuring height: algebraic derivation from the cross ratio

Planar Perspective Map (homography) H
H Maps reference plane X-Y coordinates to image plane x-y coordinates
Fully determined from 4 known points on ground plane
Option A:  physically measure 4 points on ground
Option B:  find a square, guess the size
Option C:  Note  H = [avX  bvY  l]  (columns 1,2,4 of P )
              Play with scale factors a and b until the model “looks right”
Given x-y , can find X-Y by H-1 :


To compute camera projection matrix, we can measure

- Positions within a plane
- Height (More generally — distance between any two parallel planes)
- Camera position

Single View Metrology

A. Criminisi, I. Reid and A. Zisserman
Department of Engineering Science, University of Oxford

> Assumptions
1)   3 orthogonal sets of parallel lines
2)   4 known points on ground plane
3)   1 height in the scene

> Complete approach
- Load in an image
- Click on parallel lines defining X, Y, and Z directions
- Compute vanishing points
- Specify points on reference plane, ref. height
- Compute 3D positions of several points
- Create a 3D model from these points
- Extract texture maps
- Output a VRML model

VRML (Virtual Reality Modeling Language, pronounced vermal or by its initials, originally — before 1995 — known as the Virtual Reality Markup Language) is a standard file format for representing 3-dimensional (3D) interactive vector graphics, designed particularly with the World Wide Web in mind. It has been superseded by X3D.

3-D Metrology

single view reconstruction


radial distortion


plane projective transformation
(Z_w = 0 -> 3-by-3 matrix representing a general plane to plane projective transformation)

rotation about the optical center
: depth-independent transformation
(H does not depend on 3D structure.)

synthetic rotations
projective warping

null space
The null space of a matrix with n columns is a linear subspace of n-dimensional Euclidean space.
The null space of A is the same as the solution set to the homogeneous system.
(cf. http://en.wikipedia.org/wiki/Kernel_(mathematics)
http://en.wikipedia.org/wiki/Kernel_(linear_operator) )

> 2d transformation
- Euclidian transformation
- affine transformation
- similarity transformation
- projective transformation

> projective plane

homogeneous coordinates

perspective projection

projective space

A point in the image is a ray in projective space.

=> All points on the ray are equivalent.

3D Projective Geometry

These concepts generalize naturally to 3D
- Homogeneous coordinates
Projective 3D points have four coords:  X = (X,Y,Z,W)
- Duality
A plane L is also represented by a 4-vector
Points and planes are dual in 3D: LTP = 0
- Projective transformations
Represented by 4x4 matrices T:  X’ = TX,    X’ = L X -1

- Can’t use cross-products in 4D.  We need new tools
Grassman-Cayley Algebra
: generalization of cross product, allows interactions between points, lines, and planes via “meet” and “join” operators
- Or just use inhomogeneous representation in 3D.


vanishing point

vanishing line


Cameras using small apertures, and the human eye in bright light both act like a pinhole camera. The smaller the hole, the sharper the image, but the dimmer the projected image. Optimally, the size of the aperture, should be 1/100 or less of the distance between it and the screen.

a projection is a linear transformation P from a vector space to itself such that P2 = P. It leaves its image unchanged. Though abstract, this definition of "projection" formalizes and generalizes the idea of graphical projection.

Increasing the focal length and distance of the camera to infinity in a perspective projection results in an orthographic projection. It is a form of parallel projection, where the view direction is orthogonal to the projection plane.

The homogeneous coordinates of a point of projective space of dimension n are usually written as (x : y : z : ... : w), a row vector of length n + 1, other than (0 : 0 : 0 : ... : 0). Two sets of coordinates that are proportional denote the same point of projective space: for any non-zero scalar c from the underlying field K, (cx : cy : cz : ... : cw) denotes the same point. Therefore this system of coordinates can be explained as follows: if the projective space is constructed from a vector space V of dimension n + 1, introduce coordinates in V by choosing a basis, and use these in P(V), the equivalence classes of proportional non-zero vectors in V.

In geometry, an affine transformation or affine map or an affinity (from the Latin, affinis, "connected with") between two vector spaces (strictly speaking, two affine spaces) consists of a linear transformation followed by a translation:
              x \mapsto A x+ b
In the finite-dimensional case each affine transformation is given by a matrix A and a vector b, satisfying certain properties described below.
Physically, an affine transformation is one that preserves

1. Collinearity between points, i.e., three points which lie on a line continue to be collinear after the transformation
2. Ratios of distances along a line, i.e., for distinct colinear points p1, p2, p3, the ratio | p2 − p1 | / | p3 − p2 | is preserved

In general, an affine transform is composed of zero or more linear transformations (rotation, scaling or shear) and a translation (or "shift"). Several linear transformations can be combined into a single matrix, thus the general formula given above is still applicable.

The idea of a projective space relates to perspective, more precisely to the way an eye or a camera projects a 3D scene to a 2D image. All points which lie on a projection line (i.e. a "line-of-sight"), intersecting with the focal point of the camera, are projected onto a common image point. In this case the vector space is R3 with the camera focal point at the origin and the projective space corresponds to the image points.
( For example, in the standard geometry for the plane two lines always intersect at a point except when the lines are parallel. In a projective representation of lines and points, however, such an intersection point exists even for parallel lines, and it can be computed in the same way as other intersection points. )

Projective geometry is the most general and least restrictive in the hierarchy of fundamental geometries, i.e. Euclidean - metric (similarity) - affine - projective. It is an intrinsically non-metrical geometry, whose facts are independent of any metric structure. Under the projective transformations, the incidence structure and the cross-ratio are preserved. It is a non-Euclidean geometry. In particular, it formalizes one of the central principles of perspective art: that parallel lines meet at infinity and therefore are to be drawn that way. In essence, a projective geometry may be thought of as an extension of Euclidean geometry in which the "direction" of each line is subsumed within the line as an extra "point", and in which a "horizon" of directions corresponding to coplanar lines is regarded as a "line". Thus, two parallel lines will meet on a horizon line in virtue of their possessing the same direction.

When a camera is used, light from the environment is focused on an image plane and captured. This process reduces the dimensions of the data taken in by the camera from three to two (light from a 3D scene is stored on a 2D image). Each pixel on the image plane therefore corresponds to a shaft of light from the original scene. Camera resectioning determines which incoming light is associated with each pixel on the resulting image.

Since the camera matrix  is involved in the mapping between elements of two projective spaces, it too can be regarded as a projective element. This means that it has only 11 degrees of freedom since any multiplication by a non-zero scalar results in an equivalent camera matrix.

An n-dimensional space with notions of distance and angle that obey the Euclidean relationships is called an n-dimensional Euclidean space.
An essential property of a Euclidean space is its flatness. Other spaces exist in geometry that are not Euclidean.

The Euclidean group E(n) is a subgroup of the affine group for n dimensions, and in such a way as to respect the semidirect product structure of both groups.
1. by a pair (A, b), with A an n×n orthogonal matrix, and b a real column vector of size n; or
2. by a single square matrix of size n + 1, as explained for the affine group.

Q^T Q = Q Q^T = I . \,\!


M = U\Sigma V^*, \,\!

where U is an m-by-m unitary matrix over K, the matrix Σ is m-by-n diagonal matrix with nonnegative numbers on the diagonal, and V* denotes the conjugate transpose of V, an n-by-n unitary matrix over K. Such a factorization is called a singular-value decomposition of M.

  • The matrix V thus contains a set of orthonormal "input" or "analysing" basis vector directions for M
  • The matrix U contains a set of orthonormal "output" basis vector directions for M
  • The matrix Σ contains the singular values, which can be thought of as scalar "gain controls" by which each corresponding input is multiplied to give a corresponding output.

A common convention is to order the values Σi,i in non-increasing fashion. In this case, the diagonal matrix Σ is uniquely determined by M (though the matrices U and V are not).

total least squares = errors in variables = rigorous least squares = orthogonal regression


