- low-level process: both inputs and outputs are images
eg. image preprocessing to reduce noise, contrast enhancement, image sharpening
- mid-level process: inputs generally are images, but its outputs are attributes like edges, contours, identity of objects
eg. segmentation, description of the objects, classification (recognition)
- higher-level process: recognized objects performs the cognitive functions associated with vision
"modern digital computer"
with the introduction by John von Neumann of two key concepts: (1) a memory (2) conditional branching, which are the foundation of a CPU
+
mass storage & display systems
=> digital image processing
> the birth of digital image processing
- space probe
Work on using computer techniques for improving images from a space probe began at the Jet Propulsion Laboratory (Pasadena, California) in 1964 when pictures of the moon transmitted by Ranger 7 were processed by a computer to correct various types of image distortion inherent in the on-board television camera.
- medical diagnosis
Tomography consists of algorithms that use the sensed data to construct an image that represents a "slice" through the object. Motion of the object in a direction perpendicular to the ring of detectors produces a set of such slices, which constitute a three-dimensional rendition of the inside of the object. Tomography was invented independently by Sir Godfrey N. Hounsfield and Professor Allan M. Cormack, who shared the 1979 Nobel Prize in Medicine for their invention.
1절 - 디지털 이미지 프로세싱을 정의하고, 상관 분야인 이미지 분석(Image Analysis)과 컴퓨터 비젼(Computer Vision)과의 영역 구분에 대해 논한다.
디지털 이미지 + 디지털 컴퓨터 => 디지털 이미지 프로세싱
이미지 프로세싱 -> 이미지 분석 -> 컴퓨터 비젼
2절 - 디지털 이미지의 발전 단계를 예시를 통해 짚고, 컴퓨터의 탄생과 현대적 의미의 디지털 컴퓨터에 대한 규정을 소개한다. 디지털 이미지 프로세싱의 발전 역사는 1960년대의 우주 탐사와 의료 진단에서 시작되었다. 이후 (1) 인간의 해석(human interpretation)을 돕는 수단으로서는 생물학, 지리학, 고고학, 고에너지 플라즈마와 전자 현미경 분야의 실험 물리학, 천문학, 핵의학, 법률 집행, 방위 산업 등에서 (2) 기계의 인식(machine perception)을 구현하는 수단으로서는 자동 문자 인식, 제품 조립과 검사 공정을 위한 산업 기계, 군사 정찰, 지문 자동 처리, X선과 혈액 샘플 검사, 기상 예보와 환경 평가를 위한 항공 또는 위성 사진 처리 등에서 광범위하게 응용하고 있다.
3절 - 현재 이미지를 얻는 원천은 주로 전자기 스펙트럼이다. 전자기파는 (1) 다양한 파장의 사인파의 진행이나 (2) 질량 없이 광속으로 움직이는 입자들의 흐름으로 생각할 수 있다. 감마선, X선, 자외선, 가시광/적외선, 마이크로파, 라디오파의 응용 실례들을 예시한다. 다른 원천으로는 음향, 초음파, 전자 빔이 있다.
4절 - 디지털 이미지 프로세싱의 각 단계들을 간략 소개한다. image acquisition, image enhancement, image restoration, color image processing, wavelets (-> image data compression / pyramidal representation), compression, morphological processing, segmentation, (boundary/regional) representation & description (feature selection), recognition
5절 - 이미지 프로세싱 시스템은 센싱, 하드웨어, 범용 컴퓨터, 소프트웨어, 대용량 저장소(short-term/on-line/archival), 프레임 버퍼(줌/스크롤/팬 기능), 이미지 디스플레이(컬러 모니터), 하드카피 장치(레이저 프린터, 필름 카메라, 열감지 장치, 잉크젯 기구, 디지털 매체 등), 네트워킹으로 구성된다.
Michael I. Jordan & Christopher M. Bishop, "Neural Networks", In Tucker, A. B. (Ed.) CRC Handbook of Computer Science, Boca Raton, FL: CRC Press, 1997.
Neural network methods have had their greatest impact in problems where statistical issues dominate and where data are easily obtained.
"conjunction of graphical algorithms and probability theory":
A neural network is first and foremost a graph with patterns represented in terms of numerical values attached to the nodes of the graph and transformations between patterns achieved via simple message-passing algorithms. Many neural network architectures, however, are also statistical processors, characterized by making particular probabilistic assumptions about data.
Based on a source of training data, the aim is to produce a statistical model of the process from which the data are generated so as to allow the best predictions to be made for new data.
statistical modeling - density estimation (unsupervised learning), classification & regression
density estimation ("unsupervised learning")
: to model the unconditional distribution of data described by some vector
- to train samples and a network model to build a representation of the probability density
Michael I. Jordan, Generic constraints on underspecified target trajectories, Proceedings of international conference on neural networks, (1989), 217-225
"The state space approach is more general than the "classical" Laplace and Fourier transform theory. Consequently, state space theory is applicable to all systems that can be analyzed by integral transforms in time, and is applicable to many systems for which transform theory breaks down"
(1) Linear systems with time-varying parameters can be analyzed in essentially the same manner as time-invariant linear systems.
(2) Problems formulated by state space methods can easily be programmed on a computer.
(3) High-order linear systems can be analyzed.
(4) Multiple input - multiple output systems can be treated almost as easily as single input - single output linear systems.
(5) State space theory is the foundation for further studies such areas as nonlinear systems, stochastic systems, and optimal control.
"Because state space theory describes the time behaviors of physical systems in a mathematical manner, the reader is assumed to have some knowledge of differential equations and of Laplace transform theory."
FFMV-03M2M-CS
6p-pin right angle IEEE-1394 Connector
Max 752*480 at 60 FPS
1/3" Micron CMOS , BW
Progressive Scan
Plastic Case Included
FFMV Metal Case
LM5NCL - F1.4 ~1.6 , C-Mount
Adaptor for NMV-4/5WA 렌트 Filter
IR Longpass - 830nm
1394a PCI Adpapter
1394b FWB-LDR-CAT5 Repeater SET
The Point Grey Image Filter Driver (PGRGIGE.sys) was developed for use with GigE Vision cameras. This driver operates as a network service between the camera and the Microsoft built-in UDP stack to filter out GigE vision stream protocol (GVSP) packets.
The filter driver is installed and enabled by default as part of the FlyCapture SDK installation process. Use of the filter driver is recommended, as it can reduce CPU load and improve image streaming performance.
Point Grey GigE Vision cameras can operate without the filter driver, by communicating directly with the Microsoft UDP stack. GigE Vision cameras operating on Linux systems can communicate directly with native Ubuntu drivers.
FlyCapture SDK 중 이미지 저장 관련 예제 코드 위치: Program Files > Point Grey Research Inc. > FlyCapture2 > Examples > SaveImageToAviEx 설명: Demonstrates saving a series of images to an AVI file
You can install the CMU 1394 driver for your camera and then use the API of this driver to capture video from the camera. In this way, you can avoid the use of DirectX. See example for details.
Computer Manual in MATLAB to accompany Pattern Classification, 2nd EditionDavid G. Stork ( Ricoh Silicon Valley ), Elad Yom-Tov, John Wiley & Sons, 2004
PhD, Chief Scientist at Ricoh Innovations, Inc., and Consulting Professor of Electrical Engineering at Stanford University. A graduate of MIT and the University of Maryland, he is the founder and leader of the Open Mind Initiative and the coauthor, with Richard Duda and Peter Hart, of Pattern Classification, Second Edition, as well as four other books.
PhD, research scientist at IBM Research Lab in Haifa, working on the applications of machine learning to search technologies, bioinformatics, and hardware verification (among others). He is a graduate of Tel-Aviv University and the Technion.
Preface
"(Our purpose is) to give a systematic account of the major topics in pattern recognition, based on fundamental principles"
pattern recognition
speech recognition
optical character recognition
signal classification
pattern classification
scene analysis
machine learning
handwriting & gesture recognition
lipreading
geological analysis
document searching
recognition of bubble chamber tracks of subatomic particles
human-machine interface - eg. pen-based computing
human and animal nervous systems
neurobiology
psychology
"We address a specific class of problems - pattern recognition problems - and consider the wealth of different techniques that can be applied to it."
"We discuss the relative strengths and weaknesses of various classification techniques"
adative thresholding (option: CV_CALIB_CB_ADAPTIVE_THRESH) or thresholding by "image mean-10" or "10"
3. dilate the binarized image -> save "imgThresh"
4. find rectangles to draw white lines around the image edge -> save "imgRect"
1. check if a chessboard is in the input image
1) erode and dilate ( cvErode() & cvDilate() )
2) find a threshold value to make contours ( "flag" )
3) select contours to make quadrangles
4) check if there are many hypotheses with similar sizes ( floodfill
-style algorithm )
2. (if pattern was no found using binarization) multi-level quads
extraction
3. draw white lines around the image edges ( "thresh_image" )
4. compute corners in clockwise order
5. find the quadrangle's neighbors
6. find connected quadrangles to order them and the corners
7. remove extra quadrangles to make a nice square pattern
8. check if each row and column of the chessboard is monotonous
9. refine corner locations
1. check if a chessboard is in the input image
1) erode and dilate ( cvErode() & cvDilate() )
2) find a threshold value to make contours ( "flag" )
3) select contours to make quadrangles
4) check if there are many hypotheses with similar sizes ( floodfill-style algorithm )
2. (if pattern was no found using binarization) multi-level quads extraction
3. draw white lines around the image edges ( "thresh_image" )
4. compute corners in clockwise order
5. find the quadrangle's neighbors
6. find connected quadrangles to order them and the corners
7. remove extra quadrangles to make a nice square pattern
8. check if each row and column of the chessboard is monotonous
9. refine corner locations
코드에서 CV_THRESH_OTSU는 CV_THRESH_BINARY | CV_THRESH_OTSU 와 같은 효과.
CV_THRESH_OTSU는 함수의 인자 "threshold"의 초기값과 무관하게 입력 영상에 대해 내부적으로 threshold 값을 구하고 이에 따라 선택된 픽셀들에 max_value 값을 준다.
Janne Heikkila, Olli Silven, "A Four-step Camera Calibration Procedure with Implicit Image Correction," cvpr, pp.1106, 1997 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'97), 1997
4-step
1) DLT (direct linear transformation) for initial parameter values [Tsai 1987, Abdel-Aziz 1971, Melen 1994]
2) Nonlinear parameter estimation for final parameter values [Slama 1980]
3) Correcting errors sourced from feature extraction
4) image correction: new implicit model to interpolate the correct image points
2. Explicit camera calibration
The pinhole camera model is based on the principle of collinearity, where each point in the object space is projected by a straight line through the projection center into the image plane.
ref.
Trucco 1998
Jaehne 1995;1997
Hartley and Zisserman 2006
Forsyth and Ponce 2003
Shapiro and Stockman 2002
Xu and Zhang 1996
projective geometry
lens distortion
camera calibration
1) to correct mathematically for the main deviations from the simple pinhole model with lenses
2) to relate camera measurements with measurements in the real, 3-dimensional world
3-D scene reconstruction
: Camera Model (371p)
camera calibration => model of the camera's geometry & distortion model of the lens : intrinsic parameters
pinhole camera:
single ray -> image plane (projective plane)
(...) the size of the image relative to the distant object is given by a single parameter of the camera: its focal length. For our idealized pinhole camera, the distance from the pinhole aperture to the screen is precisely the focal length.
The point in the pinhole is reinterpreted as the center of projection.
The point at the intersection of the image plane and the optical axis is refereed to as the principal point.
(...) the individual pixels on a typical low-cost imager are rectangular rather than square. The focal length(fx) is actually the product of the physical focal length(F) of the lens and the size (sx) of the individual imager elements. (*sx converts physical units to pixel units.)
: Basic Projective Geometry (373p)
projective transform
homogeneous coordinates
The homogeneous coordinates associated with a point in a projective space of dimension n are typically expressed as an (n+1)-dimensional vector with the additional restriction that any two points whose values are proportional are equivalent.
For a camera to form images at a faster rate, we must gather a lot of light over a wider area and bend (i.e., focus) that light to converge at the point of projection. To accomplish this, we uses a lens. A lens can focus a large amount of light on a point to give us fast imaging, but it comes at the cost of introducing distortions.
: Lens Distortions (375p)
Radia distortions arise as a result of the shape of lens, whereas tangential distortions arise from the assembly process of the camera as a whole.
radial distortion:
External points on a frontfacing rectangular grid are increasingly displaced inward as the radial distance from the optical center increases.
"barrel" or "fish-eye" effect -> barrel distortion
tangential distortion:
due to manufacturing defects resulting from the lens not being exactly parallel to the imaging plane.
cvCalibrateCamera2() or cv::calibrateCamera
The method of calibration is to target the camera on a known structure that has many individual and identifiable points. By viewing this structure from a variety of angles, it is possible to then compute the (relative) location and orientation of the camera at the time of each image as well as the intrinsic parameters of the camera.
Ultimately, a rotation is equivalent to introducing a new description of a point's location in a different coordinate system.
Using a planar object, we'll see that each view fixed eight parameters. Because the six parameters on two additional parameters that we use to resolve the camera intrinsic matrix. We'll then need at least two views to solve for all the geometric parameters.
: Chessboards (381p)
OpenCV opts for using multiple view of a planar object (a chessboard)
rather than one view of a specially constructed 3D object. We use a
pattern of alternating black and white squares, which ensures that there
is no bias toward one side or the other in measurement. Also, the
resulting gird corners lend themselves naturally to the subpixel
localization function.
use a chessboard grid that is asymmetric and of even and odd dimensions -
for example, (5,6). using such even-odd asymmetry yields a chessboard
that has only one symmetry axis, so the board orientation can always be
defined uniquely.
The chessboard interior corners are simply a special case of the more general Harris corners; the chessboard corners just happen to be particularly easy to find and track.
cf. Chapter 10: Tracking and Motion: Subpixel Corners
319p: If you are processing images for the purpose of extracting geometric measurements, as opposed to extracting features for recognition, then you will normally need more resolution than the simple pixel values supplied by cvGoodFeaturesToTrack().
fitting a curve (a parabola)
ref. newer techniques Lucchese02 Chen05