'Computer Vision' 태그의 글 목록 (2 Page)

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Sola & Monin & Devy & Lemaire, "Undelayed initialization in bearing only SLAM"

2010. 2. 9. 17:50 Computer Vision

Undelayed initialization in bearing only SLAM

Sola, J. Monin, A. Devy, M. Lemaire, T.
CNRS, Toulouse, France;

This paper appears in: Intelligent Robots and Systems, 2005. (IROS 2005). 2005 IEEE/RSJ International Conference on
Publication Date: 2-6 Aug. 2005
On page(s): 2499- 2504
ISBN: 0-7803-8912-3
INSPEC Accession Number: 8750433
Digital Object Identifier: 10.1109/IROS.2005.1545392
Current Version Published: 2005-12-05

ref.

SolaAbstract.pdf

http://homepages.laas.fr/jsola/JoanSola/eng/bearingonly.html

기존 SLAM에서 쓰이는 레이저 레인지 스캐너 등 range and bearing 센서 대신 공간에 대한 풍부한 정보를 주는 카메라를 쓰면, 1차원 (인식된 물체까지의 거리 정보, depth)을 잃게 되어 bearing-only SLAM이 된다.

EKF requires Gaussian representations for all the involved random variables that form the map (the robot pose and all landmark's positions). Moreover, their variances need to be small to be able to approximate all the non linear functions with their linearized forms.

두 입력 이미지 프레임 사이에 baseline을 구할 수 있을 만큼 충분한 시점 차가 존재해야 랜드마크의 위치를 결정할 수 있으므로, 이를 확보하기 위한 시간이 필요하게 된다.

http://en.wikipedia.org/wiki/Structure_from_motion

Extract features from images
Find an initial solution for the structure of the scene and the motion of the cameras
Extend the solution and optimise it
Calibrate the cameras
Find a dense representation of the scene
Infer geometric, textural and reflective properties of the scene.

sequential probability ratio test
http://en.wikipedia.org/wiki/Sequential_probability_ratio_test
http://www.agrsci.dk/plb/bembi/africa/sampling/samp_spr.html
http://eom.springer.de/S/s130240.htm

EKF (extended Kalman filter) - inconsistency and divergence
GSF (Gaussian sum filter) - computation load
FIS (Federated Information Sharing)

저작자표시 비영리 동일조건 (새창열림)

'Computer Vision' 카테고리의 다른 글

Seong-Woo Park & Yongduek Seo & Ki-Sang Hong <Real-Time Camera Calibration for Virtual Studio> (0)	2010.02.10
Moons & Gool & Vergauwen [3D Reconstruction from Multiple Images] (0)	2010.02.09
2-D visual SLAM with Extended Kalman Filter 연습 (0)	2010.01.25
Kragic & Vincze <Vision for Robotics> (0)	2010.01.25
Z. Wang, S. Huang and G. Dissanayake "D-SLAM: A Decoupled Solution to Simultaneous Localization and Mapping" (0)	2010.01.22

posted by maetel

Kragic & Vincze <Vision for Robotics>

2010. 1. 25. 02:50 Computer Vision

Foundations and Trends^® in
Robotics
Vol. 1, No. 1 (2010) 1–78
© 2009 D. Kragic and M. Vincze
DOI: 10.1561/2300000001

Vision for Robotics

Danica Kragic¹ and Markus Vincze²
¹ Centre for Autonomous Systems, Computational Vision and Active Perception Lab, School of Computer Science and Communication, KTH, Stockholm, 10044, Sweden, dani@kth.se
² Vision for Robotics Lab, Automation and Control Institute, Technische Universitat Wien, Vienna, Austria, vincze@acin.tuwien.ac.at

SUGGESTED CITATION:
Danica Kragic and Markus Vincze (2010) “Vision for Robotics”,
Foundations and Trends® in Robotics: Vol. 1: No. 1, pp 1–78.
http:/dx.doi.org/10.1561/2300000001

Abstract

Robot vision refers to the capability of a robot to visually perceive the environment and use this information for execution of various tasks. Visual feedback has been used extensively for robot navigation and obstacle avoidance. In the recent years, there are also examples that include interaction with people and manipulation of objects. In this paper, we review some of the work that goes beyond of using artificial landmarks and fiducial markers for the purpose of implementing visionbased control in robots. We discuss different application areas, both from the systems perspective and individual problems such as object tracking and recognition.

1 Introduction 2
1.1 Scope and Outline 4

2 Historical Perspective 7
2.1 Early Start and Industrial Applications 7
2.2 Biological Influences and Affordances 9
2.3 Vision Systems 12

3 What Works 17
3.1 Object Tracking and Pose Estimation 18
3.2 Visual Servoing–Arms and Platforms 27
3.3 Reconstruction, Localization, Navigation, and Visual SLAM 32
3.4 Object Recognition 35
3.5 Action Recognition, Detecting, and Tracking Humans 42
3.6 Search and Attention 44

4 Open Challenges 48
4.1 Shape and Structure for Object Detection 49
4.2 Object Categorization 52
4.3 Semantics and Symbol Grounding: From Robot Task to Grasping and HRI 54
4.4 Competitions and Benchmarking 56

5 Discussion and Conclusion 59

Acknowledgments 64
References 65

'Computer Vision' 카테고리의 다른 글

Sola & Monin & Devy & Lemaire, "Undelayed initialization in bearing only SLAM" (0)	2010.02.09
2-D visual SLAM with Extended Kalman Filter 연습 (0)	2010.01.25
Z. Wang, S. Huang and G. Dissanayake "D-SLAM: A Decoupled Solution to Simultaneous Localization and Mapping" (0)	2010.01.22
Paul Michael Newman "On the Structure and Solution of the Simultaneous Localisation and Map Building Problem" (0)	2010.01.22
Randall C. Smith and Peter Cheeseman "On the representation and estimation of spatial uncertainly" (0)	2010.01.21

posted by maetel

RoSEC 2010 winter school

2010. 1. 14. 17:27 Footmarks

RoSEC international summer/winter school
Robotics-Specialized Education Consortium for Graduates sponsored by MKE

로봇 특성화 대학원 사업단 주관
2010 RoSEC International Winter School
2010년 1월 11일(월)부터 1월 16일(토)
한양대학교 HIT(한양종합기술연구원) 6층 제1세미나실(606호)

schedule.pdf

Robot mechanism
Byung-Ju Yi (Hanyang University, Korea)
한양대 휴먼로보틱스 연구실 이병주 교수님 bj@hanyang.ac.kr
- Classification of robotic mechanism and Design consideration of robotic mechanism
- Design Issue and application examples of master slave robotic system
- Trend of robotic mechanism research

Actuator and Practical PID Control
Youngjin Choi (Hanyang University, Korea)
한양대 휴먼로이드 연구실 최영진 교수님 cyj@hanyang.ac.kr
- Operation Principle of DC/RC/Stepping Motors & Its Practice
- PID Control and Tuning
- Stability of PID Control and Application Examples

Coordination of Robots and Humans
Kazuhiro Kosuge (Tohoku University, Japan)
일본 도호쿠 대학 시스템 로보틱스 연구실 고스게 카즈히로 교수님
- Robotics as systems integration
- Multiple Robots Coordination
- Human Robot Coordination and Interaction

Robot Control
Rolf Johansson (Lund University, Sweden)
스웨덴 룬드 대학 로보틱스 연구실 Rolf.Johansson@control.lth.se
- Robot motion and force control
- Stability of motion
- Robot obstacle avoidance

Lecture from Industry or Government
(S. -R. Oh, KIST)

Special Talk from Government
(Y. J. Weon, MKE)

Mobile Robot Navigation
Jae-Bok Song (Korea University, Korea)
고려대 지능로봇 연구실 송재복 교수님 jbsong@korea.ac.kr
- Mapping
- Localization
- SLAM

3D Perception for Robot Vision
In Kyu Park (Inha University, Korea)
인하대 영상미디어 연구실 박인규 교수님 pik@inha.ac.kr
- Camera Model and Calibration
- Shape from Stereo Views
- Shape from Multiple Views

Lecture from Industry or Government
(H. S. Kim, KITECH)

Roboid Studio
Kwang Hyun Park (Kwangwoon University, Korea)
광운대 정보제어공학과 박광현 교수님 akaii@kw.ac.kr
- Robot Contents
- Roboid Framework
- Roboid Component

Software Framework for LEGO NXT
Sanghoon Lee (Hanyang University, Korea)
한양대 로봇 연구실 이상훈 교수님
- Development Environments for LEGO NXT
- Programming Issues for LEGO NXT under RPF of OPRoS
- Programming Issues for LEGO NXT under Roboid Framework

Lecture from Industry or Government
(Robomation/Mobiletalk/Robotis)

Robot Intelligence : From Reactive AI to Semantic AI
Il Hong Suh (Hanyang University, Korea)
한양대 로봇 지능/통신 연구실 서일홍 교수님
- Issues in Robot Intelligence
- Behavior Control: From Reactivity to Proactivity
- Use of Semantics for Robot Intelligence

AI-Robotics
Henrik I. Christensen (Georgia Tech., USA)

- Semantic Mapping
- Physical Interaction with Robots
- Efficient object recognition for robots

Lecture from Industry or Government
(M. S. Kim, Director of CIR, 21C Frontier Program)

HRI
Dongsoo Kwon (KAIST, Korea)

- Introduction to human-robot interaction
- Perception technologies of HRI
- Cognitive and emotional interaction

Robot Swarm for Environmental Monitoring
Nak Young Chong (JAIST, Japan)

- Self-organizing Mobile Robot Swarms: Models
- Self-organizing Mobile Robot Swarms: Algorithms
- Self-organizing Mobile Robot Swarms: Implementation

'Footmarks' 카테고리의 다른 글

2010 공개 SW 개발자대회 2차 기술세미나 - 모바일 오픈소스 플랫폼 '안드로이드' (0)	2010.08.24
3D 영상산업의 전망과 동향 (Stereo Pictures 성필문 회장) (0)	2010.03.24
CUDA (0)	2009.04.24
Richard Hartley <Camera Motion Estimation for Multi-Camera Systems> (0)	2009.04.16
음악의 경계를 넘다: (3) 음악과 소리, 그 경계의 애매함 (0)	2008.05.11

posted by maetel

Joan Solà - 6DOF SLAM toolbox for Matlab

2009. 12. 2. 21:33 Computer Vision

Joan Solà

6DOF SLAM toolbox for Matlab http://homepages.laas.fr/jsola/JoanSola/eng/toolbox.html

References

[1] J. Civera, A.J. Davison, and J.M.M Montiel. Inverse depth parametrization for monocular SLAM. IEEE Trans. on Robotics, 24(5), 2008.

[2] J. Civera, Andrew Davison, and J. Montiel. Inverse Depth to Depth Conversion for Monocular SLAM. In IEEE Int. Conf. on Robotics and Automation, pages 2778 –2783, April 2007.

[3] A. J. Davison. Real-time simultaneous localisation and mapping with a single camera. In Int. Conf. on Computer Vision, volume 2, pages 1403–1410, Nice, October 2003.

[4] Andrew J. Davison. Active search for real-time vision. Int. Conf. on Computer Vision, 1:66–73, 2005.

[5] Andrew J. Davison, Ian D. Reid, Nicholas D. Molton, and Olivier Stasse. MonoSLAM: Real-time single camera SLAM. Trans. on Pattern Analysis and Machine Intelligence, 29(6):1052–1067, June 2007.

[6] Ethan Eade and Tom Drummond. Scalable monocular SLAM. IEEE Int. Conf. on Computer Vision and Pattern Recognition, 1:469–476, 2006.

[7] Thomas Lemaire and Simon Lacroix. Monocular-vision based SLAM using line segments. In IEEE Int. Conf. on Robotics and Automation, pages 2791–2796, Rome, Italy, 2007.

[8] Nicholas Molton, Andrew J. Davison, and Ian Reid. Locally planar patch features for real-time structure from motion. In British Machine Vision Conference, 2004.

[9] J. Montiel, J. Civera, and A. J. Davison. Unified inverse depth parametrization for monocular SLAM. In Robotics: Science and Systems, Philadelphia, USA, August 2006.

[10] L. M. Paz, P. Pini´es, J. Tard´os, and J. Neira. Large scale 6DOF SLAM with stereo-in-hand. IEEE Trans. on Robotics, 24(5), 2008.

[11] J. Sol`a, Andr´e Monin, Michel Devy, and T. Vidal-Calleja. Fusing monocular information in multi-camera SLAM. IEEE Trans. on Robotics, 24(5):958–968, 2008.

[12] Joan Sol`a. Towards Visual Localization, Mapping and Moving Objects Tracking by a Mobile Robot: a Geometric and Probabilistic Approach. PhD thesis, Institut National Polytechnique de Toulouse, 2007.

[13] Joan Sol`a, Andr´e Monin, and Michel Devy. BiCamSLAM: Two times mono is more than stereo. In IEEE Int. Conf. on Robotics and Automation, pages 4795–4800, Rome, Italy, April 2007.

[14] Joan Sol`a, Andr´e Monin, Michel Devy, and Thomas Lemaire. Undelayed initialization in bearing only SLAM. In IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, pages 2499–2504, Edmonton, Canada, 2005.

[15] Joan Sol`a, Teresa Vidal-Calleja, and Michel Devy. Undelayed initialization of line segments in monocular SLAM. In IEEE Int. Conf. on Intelligent Robots and Systems, Saint Louis, USA, 2009. To appear.

slamtb.m

% SLAMTB An EKF-SLAM algorithm with simulator and graphics.
%
%   This script performs multi-robot, multi-sensor, multi-landmark 6DOF
%   EKF-SLAM with simulation and graphics capabilities.
%
%   Please read slamToolbox.pdf in the root directory thoroughly before
%   using this toolbox.
%
%   - Beginners should not modify this file, just edit USERDATA.M and enter
%   and/or modify the data you wish to simulate.
%
%   - More advanced users should be able to create new landmark models, new
%   initialization methods, and possibly extensions to multi-map SLAM. Good
%   luck!
%
%   - Expert users may want to add code for real-data experiments.
%
%   See also USERDATA, USERDATAPNT, USERDATALIN.
%
%   Also consult slamToolbox.pdf in the root directory.

%   Created and maintained by
%   Copyright 2008-2009 Joan Sola @ LAAS-CNRS.
%   Programmers (for the whole toolbox):
%   Copyright David Marquez and Jean-Marie Codol @ LAAS-CNRS
%   Copyright Teresa Vidal-Calleja @ ACFR.
%   See COPYING.TXT for wull copyright license.

%% OK we start here

% clear workspace and declare globals
clear
global Map          %#ok<NUSED>

%% I. Specify user-defined options - EDIT USER DATA FILE userData.m

%userDataPnt;        % user-defined data. SCRIPT.
userData;        % user-defined data. SCRIPT.

%% II. Initialize all data structures from user-defined data in userData.m
% SLAM data
[Rob,Sen,Raw,Lmk,Obs,Tim]     = createSlamStructures(...
    Robot,...
    Sensor,...      % all user data
    Time,...
    Opt);

% Simulation data
[SimRob,SimSen,SimLmk,SimOpt] = createSimStructures(...
    Robot,...
    Sensor,...      % all user data
    World,...
    SimOpt);

% Graphics handles
[MapFig,SenFig]               = createGraphicsStructures(...
    Rob, Sen, Lmk, Obs,...      % SLAM data
    SimRob, SimSen, SimLmk,... % Simulator data
    FigOpt);                    % User-defined graphic options

%% III. Init data logging
% TODO: Create source and/or destination files and paths for data input and
% logs.
% TODO: do something here to collect data for post-processing or
% plotting. Think about collecting data in files using fopen, fwrite,
% etc., instead of creating large Matlab variables for data logging.

% Clear user data - not needed anymore
clear Robot Sensor World Time   % clear all user data

%% IV. Main loop
for currentFrame = Tim.firstFrame : Tim.lastFrame

    % 1. SIMULATION
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

    % Simulate robots
    for rob = [SimRob.rob]

        % Robot motion
        SimRob(rob) = simMotion(SimRob(rob),Tim);

        % Simulate sensor observations
        for sen = SimRob(rob).sensors

            % Observe simulated landmarks
            Raw(sen) = simObservation(SimRob(rob), SimSen(sen), SimLmk, SimOpt) ;

        end % end process sensors

    end % end process robots



    % 2. ESTIMATION
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

    % Process robots
    for rob = [Rob.rob]

        % Robot motion
        % FIXME: see how to include noise in a clever way.
        Rob(rob).con.u = SimRob(rob).con.u + Rob(rob).con.uStd.*randn(6,1);

        Rob(rob) = motion(Rob(rob),Tim);

        % Process sensor observations
        for sen = Rob(rob).sensors

            %TODO: see how to pass only used Lmks and Obs.
            % Observe knowm landmarks
            [Rob(rob),Sen(sen),Lmk,Obs(sen,:)] = correctKnownLmks( ...
                Rob(rob),   ...
                Sen(sen),   ...
                Raw(sen),   ...
                Lmk,        ...
                Obs(sen,:), ...
                Opt) ;

            % Initialize new landmarks
            [Lmk,Obs(sen,:)] = initNewLmk(...
                Rob(rob),   ...
                Sen(sen),   ...
                Raw(sen),   ...
                Lmk,        ...
                Obs(sen,:), ...
                Opt) ;

        end % end process sensors

    end % end process robots

    % 3. VISUALIZATION
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

    if currentFrame == Tim.firstFrame ...
            || currentFrame == Tim.lastFrame ...
            || mod(currentFrame,FigOpt.rendPeriod) == 0

        % Figure of the Map:
        MapFig = drawMapFig(MapFig, ...
            Rob, Sen, Lmk, ...
            SimRob, SimSen, ...
            FigOpt);

        makeVideoFrame(MapFig, ...
            sprintf('map-%04d.png',currentFrame), ...
            FigOpt, ExpOpt);

        % Figures for all sensors
        for sen = [Sen.sen]
            SenFig(sen) = drawSenFig(SenFig(sen), ...
                Sen(sen), Raw(sen), Obs(sen,:), ...
                FigOpt);

            makeVideoFrame(SenFig(sen), ...
                sprintf('sen%02d-%04d.png', sen, currentFrame),...
                FigOpt, ExpOpt);

        end

        % Do draw all objects
        drawnow;
    end


    % 4. DATA LOGGING
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    % TODO: do something here to collect data for post-processing or
    % plotting. Think about collecting data in files using fopen, fwrite,
    % etc., instead of creating large Matlab variables for data logging.

end

%% V. Post-processing

% ========== End of function - Start GPL license ==========

%   # START GPL LICENSE

%---------------------------------------------------------------------
%
%   This file is part of SLAMTB, a SLAM toolbox for Matlab.
%
%   SLAMTB is free software: you can redistribute it and/or modify
%   it under the terms of the GNU General Public License as published by
%   the Free Software Foundation, either version 3 of the License, or
%   (at your option) any later version.
%
%   SLAMTB is distributed in the hope that it will be useful,
%   but WITHOUT ANY WARRANTY; without even the implied warranty of
%   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
%   GNU General Public License for more details.
%
%   You should have received a copy of the GNU General Public License
%   along with SLAMTB. If not, see <http://www.gnu.org/licenses/>.
%
%---------------------------------------------------------------------

%   SLAMTB is Copyright 2007,2008,2009
%   by Joan Sola, David Marquez and Jean Marie Codol @ LAAS-CNRS.
%   See on top of this file for its particular copyright.

%   # END GPL LICENSE

Plucker line (HighLevel/userDataLin.m) http://en.wikipedia.org/wiki/Pl%C3%BCcker_coordinates http://www.cgafaq.info/wiki/Plucker_line_coordinates

direct observation model
http://vismod.media.mit.edu/tech-reports/TR-451/node8.html
inverse observation model
http://vismod.media.mit.edu/tech-reports/TR-451/node9.html
( source: MIT Media Laboratory's Vision and Modeling group )

'Computer Vision' 카테고리의 다른 글

Randall C. Smith and Peter Cheeseman "On the representation and estimation of spatial uncertainly" (0)	2010.01.21
Kalman filtering for SLAM 연습 (0)	2010.01.15
open source Bayesian Filtering C++ library (test log) (0)	2009.11.30
Particle filter for 2D object tracking 연습 (5)	2009.11.23
Dan Simon "Kalman Filtering" (0)	2009.11.17

posted by maetel

Particle filter for 2D object tracking 연습

2009. 11. 23. 11:48 Computer Vision

to apply Particle filter to object tracking
3차원 파티클 필터를 이용한 물체(공) 추적 (contour tracking) 알고리즘 연습

IplImage* cvRetrieveFrame(CvCapture* capture)¶

Gets the image grabbed with cvGrabFrame.

Parameter:	capture – video capturing structure.

The function cvRetrieveFrame() returns the pointer to the image grabbed with the GrabFrame function. The returned image should not be released or modified by the user. In the event of an error, the return value may be NULL.

Canny edge detection

Canny edge detection in OpenCV image processing functions

void cvCanny(const CvArr* image, CvArr* edges, double threshold1, double threshold2, int aperture_size=3)¶

Implements the Canny algorithm for edge detection.

Parameters:	image – Single-channel input image edges – Single-channel image to store the edges found by the function threshold1 – The first threshold threshold2 – The second threshold aperture_size – Aperture parameter for the Sobel operator (see `cvSobel()`)

The function cvCanny() finds the edges on the input image image and marks them in the output image edges using the Canny algorithm. The smallest value between threshold1 and threshold2 is used for edge linking, the largest value is used to find the initial segments of strong edges.

source cod...ing

// 3-D Particle filter algorithm + Computer Vision exercise
// : object tracking - contour tracking
// lym, VIP Lab, Sogang Univ.
// 2009-11-23
// ref. Probabilistic Robotics: 98p

#include <OpenCV/OpenCV.h> // matrix operations & Canny edge detection
#include <iostream>
#include <cstdlib> // RAND_MAX
#include <ctime> // time as a random seed
#include <cmath>
#include <algorithm>
using namespace std;

#define PI 3.14159265
#define N 100 //number of particles
#define D 3 // dimension of the state

// uniform random number generator
double uniform_random(void) {

    return (double) rand() / (double) RAND_MAX;

}

// Gaussian random number generator
double gaussian_random(void) {

    static int next_gaussian = 0;
    static double saved_gaussian_value;

    double fac, rsq, v1, v2;

    if(next_gaussian == 0) {

        do {
            v1 = 2.0 * uniform_random() - 1.0;
            v2 = 2.0 * uniform_random() - 1.0;
            rsq = v1 * v1 + v2 * v2;
        }
        while(rsq >= 1.0 || rsq == 0.0);
        fac = sqrt(-2.0 * log(rsq) / rsq);
        saved_gaussian_value = v1 * fac;
        next_gaussian = 1;
        return v2 * fac;
    }
    else {
        next_gaussian = 0;
        return saved_gaussian_value;
    }
}

double normal_distribution(double mean, double standardDeviation, double state) {

    double variance = standardDeviation * standardDeviation;

    return exp(-0.5 * (state - mean) * (state - mean) / variance ) / sqrt(2 * PI * variance);
}
////////////////////////////////////////////////////////////////////////////

// distance between measurement and prediction
double distance(CvMat* measurement, CvMat* prediction)
{
    double distance2 = 0;
    double differance = 0;
    for (int u = 0; u < 3; u++)
    {
        differance = cvmGet(measurement,u,0) - cvmGet(prediction,u,0);
        distance2 += differance * differance;
    }
    return sqrt(distance2);
}

double distanceEuclidean(CvPoint2D64f a, CvPoint2D64f b)
{
    double d2 = (a.x - b.x) * (a.x - b.x) + (a.y - b.y) * (a.y - b.y);
    return sqrt(d2);
}

// likelihood based on multivariative normal distribution (ref. 15p eqn(2.4))
double likelihood(CvMat *mean, CvMat *covariance, CvMat *sample) {

    CvMat* diff = cvCreateMat(3, 1, CV_64FC1);
    cvSub(sample, mean, diff); // sample - mean -> diff
    CvMat* diff_t = cvCreateMat(1, 3, CV_64FC1);
    cvTranspose(diff, diff_t); // transpose(diff) -> diff_t
    CvMat* cov_inv = cvCreateMat(3, 3, CV_64FC1);
    cvInvert(covariance, cov_inv); // transpose(covariance) -> cov_inv
    CvMat* tmp = cvCreateMat(3, 1, CV_64FC1);
    CvMat* dist = cvCreateMat(1, 1, CV_64FC1);
    cvMatMul(cov_inv, diff, tmp); // cov_inv * diff -> tmp
    cvMatMul(diff_t, tmp, dist); // diff_t * tmp -> dist

    double likeliness = exp( -0.5 * cvmGet(dist, 0, 0) );
    double bound = 0.0000001;
    if ( likeliness < bound )
    {
        likeliness = bound;
    }
    return likeliness;
//    return exp( -0.5 * cvmGet(dist, 0, 0) );
//    return max(0.0000001, exp(-0.5 * cvmGet(dist, 0, 0)));
}

// likelihood based on normal distribution (ref. 14p eqn(2.3))
double likelihood(double distance, double standardDeviation) {

    double variance = standardDeviation * standardDeviation;

    return exp(-0.5 * distance*distance / variance ) / sqrt(2 * PI * variance);
}

int main (int argc, char * const argv[]) {

    srand(time(NULL));

    IplImage *iplOriginalColor; // image to be captured
    IplImage *iplOriginalGrey; // grey-scale image of "iplOriginalColor"
    IplImage *iplEdge; // image detected by Canny edge algorithm
    IplImage *iplImg; // resulting image to show tracking process
    IplImage *iplEdgeClone;

    int hours, minutes, seconds;
    double frame_rate, Codec, frame_count, duration;
    char fnVideo[200], titleOriginal[200], titleEdge[200], titleResult[200];

    sprintf(titleOriginal, "original");
    sprintf(titleEdge, "Edges by Canny detector");
//    sprintf(fnVideo, "E:/AVCHD/BDMV/STREAM/00092.avi");
    sprintf(fnVideo, "/Users/lym/Documents/VIP/2009/Nov/volleyBall.mov");
    sprintf(titleResult, "3D Particle filter for contour tracking");

    CvCapture *capture = cvCaptureFromAVI(fnVideo);

    // stop the process if capture is failed
    if(!capture) { printf("Can NOT read the movie file\n"); return -1; }

    frame_rate = cvGetCaptureProperty(capture, CV_CAP_PROP_FPS);
//    Codec = cvGetCaptureProperty( capture, CV_CAP_PROP_FOURCC );
    frame_count = cvGetCaptureProperty( capture, CV_CAP_PROP_FRAME_COUNT);

    duration = frame_count/frame_rate;
    hours = duration/3600;
    minutes = (duration-hours*3600)/60;
    seconds = duration-hours*3600-minutes*60;

    // stop the process if grabbing is failed
    //    if(cvGrabFrame(capture) == 0) { printf("Can NOT grab a frame\n"); return -1; }

    cvSetCaptureProperty(capture, CV_CAP_PROP_POS_FRAMES, 0); // go to frame #0
    iplOriginalColor = cvRetrieveFrame(capture);
    iplOriginalGrey = cvCreateImage(cvGetSize(iplOriginalColor), 8, 1);
    iplEdge = cvCloneImage(iplOriginalGrey);
    iplEdgeClone = cvCreateImage(cvSize(iplOriginalColor->width, iplOriginalColor->height), 8, 3);
    iplImg = cvCreateImage(cvSize(iplOriginalColor->width, iplOriginalColor->height), 8, 3);

    int width = iplOriginalColor->width;
    int height = iplOriginalColor->height;

    cvNamedWindow(titleOriginal);
    cvNamedWindow(titleEdge);

    cout << "image width : height = " << width << " " << height << endl;
    cout << "# of frames = " << frame_count << endl;
    cout << "capture finished" << endl;


    // set the system

    // set the process noise
    // covariance of Gaussian noise to control
    CvMat* transition_noise = cvCreateMat(D, D, CV_64FC1);
    // assume the transition noise
    for (int row = 0; row < D; row++)
    {
        for (int col = 0; col < D; col++)
        {
            cvmSet(transition_noise, row, col, 0.0);
        }
    }
    cvmSet(transition_noise, 0, 0, 3.0);
    cvmSet(transition_noise, 1, 1, 3.0);
    cvmSet(transition_noise, 2, 2, 0.3);

    // set the measurement noise
/*
    // covariance of Gaussian noise to measurement
    CvMat* measurement_noise = cvCreateMat(D, D, CV_64FC1);
    // initialize the measurement noise
    for (int row = 0; row < D; row++)
    {
        for (int col = 0; col < D; col++)
        {
            cvmSet(measurement_noise, row, col, 0.0);
        }
    }
    cvmSet(measurement_noise, 0, 0, 5.0);
    cvmSet(measurement_noise, 1, 1, 5.0);
    cvmSet(measurement_noise, 2, 2, 5.0);
*/
    double measurement_noise = 2.0; // standrad deviation of Gaussian noise to measurement

    CvMat* state = cvCreateMat(D, 1, CV_64FC1);    // state of the system to be estimated
//    CvMat* measurement = cvCreateMat(2, 1, CV_64FC1); // measurement of states

    // declare particles
    CvMat* pb [N]; // estimated particles
    CvMat* pp [N]; // predicted particles
    CvMat* pu [N]; // temporary variables to update a particle
    CvMat* v[N]; // estimated velocity of each particle
    CvMat* vu[N]; // temporary varialbe to update the velocity
    double w[N]; // weight of each particle
    for (int n = 0; n < N; n++)
    {
        pb[n] = cvCreateMat(D, 1, CV_64FC1);
        pp[n] = cvCreateMat(D, 1, CV_64FC1);
        pu[n] = cvCreateMat(D, 1, CV_64FC1);
        v[n] = cvCreateMat(D, 1, CV_64FC1);
        vu[n] = cvCreateMat(D, 1, CV_64FC1);
    }

    // initialize the state and particles
    for (int n = 0; n < N; n++)
    {
        cvmSet(state, 0, 0, 258.0); // center-x
        cvmSet(state, 1, 0, 406.0); // center-y
        cvmSet(state, 2, 0, 38.0); // radius

//        cvmSet(state, 0, 0, 300.0); // center-x
//        cvmSet(state, 1, 0, 300.0); // center-y
//        cvmSet(state, 2, 0, 38.0); // radius

        cvmSet(pb[n], 0, 0, cvmGet(state,0,0)); // center-x
        cvmSet(pb[n], 1, 0, cvmGet(state,1,0)); // center-y
        cvmSet(pb[n], 2, 0, cvmGet(state,2,0)); // radius

        cvmSet(v[n], 0, 0, 2 * uniform_random()); // center-x
        cvmSet(v[n], 1, 0, 2 * uniform_random()); // center-y
        cvmSet(v[n], 2, 0, 0.1 * uniform_random()); // radius

        w[n] = (double) 1 / (double) N; // equally weighted particle
    }

    // initialize the image window
    cvZero(iplImg);
    cvNamedWindow(titleResult);

    cout << "start filtering... " << endl << endl;

    float aperture = 3,     thresLow = 50,     thresHigh = 110;
//    float aperture = 3,     thresLow = 80,     thresHigh = 110;
    // for each frame
    int frameNo = 0;
    while(frameNo < frame_count && cvGrabFrame(capture)) {
        // retrieve color frame from the movie "capture"
        iplOriginalColor = cvRetrieveFrame(capture);
        // convert color pixel values of "iplOriginalColor" to grey scales of "iplOriginalGrey"
        cvCvtColor(iplOriginalColor, iplOriginalGrey, CV_RGB2GRAY);
        // extract edges with Canny detector from "iplOriginalGrey" to save the results in the image "iplEdge"
        cvCanny(iplOriginalGrey, iplEdge, thresLow, thresHigh, aperture);

        cvCvtColor(iplEdge, iplEdgeClone, CV_GRAY2BGR);

        cvShowImage(titleOriginal, iplOriginalColor);
        cvShowImage(titleEdge, iplEdge);

//        cvZero(iplImg);

        cout << "frame # " << frameNo << endl;

        double like[N]; // likelihood between measurement and prediction
        double like_sum = 0; // sum of likelihoods

        for (int n = 0; n < N; n++) // for "N" particles
        {
            // predict
            double prediction;
            for (int row = 0; row < D; row++)
            {
                prediction = cvmGet(pb[n],row,0) + cvmGet(v[n],row,0)
                            + cvmGet(transition_noise,row,row) * gaussian_random();
                cvmSet(pp[n], row, 0, prediction);
            }
            if ( cvmGet(pp[n],2,0) < 2) { cvmSet(pp[n],2,0,0.0); }
//            cvLine(iplImg, cvPoint(cvRound(cvmGet(pp[n],0,0)), cvRound(cvmGet(pp[n],1,0))),
//            cvPoint(cvRound(cvmGet(pb[n],0,0)), cvRound(cvmGet(pb[n],1,0))), CV_RGB(100,100,0), 1);
            cvCircle(iplEdgeClone, cvPoint(cvRound(cvmGet(pp[n],0,0)), cvRound(cvmGet(pp[n],1,0))), cvRound(cvmGet(pp[n],2,0)), CV_RGB(255, 255, 0));
//            cvCircle(iplImg, cvPoint(iplImg->width *0.5, iplImg->height * 0.5), 100, CV_RGB(255, 255, 0), -1);
//            cvSaveImage("a.bmp", iplImg);

            double cX = cvmGet(pp[n], 0, 0); // predicted center-y of the object
            double cY = cvmGet(pp[n], 1, 0); // predicted center-x of the object
            double cR = cvmGet(pp[n], 2, 0); // predicted radius of the object

            if ( cR < 0 ) { cR = 0; }

            // measure
            // search measurements
            CvPoint2D64f direction [8]; // 8 searching directions
            // define 8 starting points in each direction
            direction[0].x = cX + cR;    direction[0].y = cY;    // East
            direction[2].x = cX;        direction[2].y = cY - cR; // North
            direction[4].x = cX - cR;    direction[4].y = cY;    // West
            direction[6].x = cX;        direction[6].y = cY + cR; // South
            int cD = cvRound( cR/sqrt(2.0) );
            direction[1].x = cX + cD;    direction[1].y = cY - cD; // NE
            direction[3].x = cX - cD;    direction[3].y = cY - cD; // NW
            direction[5].x = cX - cD;    direction[5].y = cY + cD; // SW
            direction[7].x = cX + cD;    direction[7].y = cY + cD; // SE

            CvPoint2D64f search [8];    // searched point in each direction
            double scale = 0.4;
            double scope [8]; // scope of searching

            for ( int i = 0; i < 8; i++ )
            {
//                scope[2*i] = cR * scale;
//                scope[2*i+1] = cD * scale;
                scope[i] = 6.0;
            }

            CvPoint d[8];
            d[0].x = 1;        d[0].y = 0; // E
            d[1].x = 1;        d[1].y = -1; // NE
            d[2].x = 0;        d[2].y = 1; // N
            d[3].x = 1;        d[3].y = 1; // NW
            d[4].x = 1;        d[4].y = 0; // W
            d[5].x = 1;        d[5].y = -1; // SW
            d[6].x = 0;        d[6].y = 1; // S
            d[7].x = 1;        d[7].y = 1; // SE

            int count = 0; // number of measurements
            double dist_sum = 0;

            for (int i = 0; i < 8; i++) // for 8 directions
            {
                double dist = scope[i] * 1.5;
                for ( int range = 0; range < scope[i]; range++ )
                {
                    int flag = 0;
                    for (int turn = -1; turn <= 1; turn += 2) // reverse the searching direction
                    {
                        search[i].x = direction[i].x + turn * range * d[i].x;
                        search[i].y = direction[i].y + turn * range * d[i].y;

//                        cvCircle(iplImg, cvPoint(cvRound(search[i].x), cvRound(search[i].y)), 2, CV_RGB(0, 255, 0), -1);
//                        cvShowImage(titleResult, iplImg);
//                        cvWaitKey(100);

                        // detect measurements
//                        CvScalar s = cvGet2D(iplEdge, cvRound(search[i].y), cvRound(search[i].x));
                        unsigned char s = CV_IMAGE_ELEM(iplEdge, unsigned char, cvRound(search[i].y), cvRound(search[i].x));
//                        if ( s.val[0] > 200 && s.val[1] > 200 && s.val[2] > 200 ) // bgr color
                        if (s > 250) // bgr color
                        { // when the pixel value is white, that means a measurement is detected
                            flag = 1;
                            count++;
//                            cvCircle(iplEdgeClone, cvPoint(cvRound(search[i].x), cvRound(search[i].y)), 3, CV_RGB(200, 0, 255));
//                            cvShowImage("3D Particle filter", iplEdgeClone);
//                            cvWaitKey(1);
/*                            // get measurement
                            cvmSet(measurement, 0, 0, search[i].x);
                            cvmSet(measurement, 1, 0, search[i].y);
                            double dist = distance(measurement, pp[n]);
*/                            // evaluate the difference between predictions of the particle and measurements
                            dist = distanceEuclidean(search[i], direction[i]);
                            break; // break for "turn"
                        } // end if
                    } // for turn
                    if ( flag == 1 )
                    { break; } // break for "range"
                } // for range

                dist_sum += dist; // for all searching directions of one particle

            } // for i direction

            double dist_avg; // average distance of measurements from the one particle "n"
//            cout << "count = " << count << endl;
            dist_avg = dist_sum / 8;
//            cout << "dist_avg = " << dist_avg << endl;

//            estimate likelihood with "dist_avg"
            like[n] = likelihood(dist_avg, measurement_noise);
//            cout << "likelihood of particle-#" << n << " = " << like[n] << endl;
            like_sum += like[n];
        } // for n particle
//        cout << "sum of likelihoods of N particles = " << like_sum << endl;

        // estimate states
        double state_x = 0.0;
        double state_y = 0.0;
        double state_r = 0.0;
        // estimate the state with predicted particles
        for (int n = 0; n < N; n++) // for "N" particles
        {
            w[n] = like[n] / like_sum; // update normalized weights of particles
//            cout << "w" << n << "= " << w[n] << " ";
            state_x += w[n] * cvmGet(pp[n], 0, 0); // center-x of the object
            state_y += w[n] * cvmGet(pp[n], 1, 0); // center-y of the object
            state_r += w[n] * cvmGet(pp[n], 2, 0); // radius of the object
        }
        if (state_r < 0) { state_r = 0; }
        cvmSet(state, 0, 0, state_x);
        cvmSet(state, 1, 0, state_y);
        cvmSet(state, 2, 0, state_r);

        cout << endl << "* * * * * *" << endl;
        cout << "estimation: (x,y,r) = " << cvmGet(state,0,0) << ", " << cvmGet(state,1,0)
        << ", " << cvmGet(state,2,0) << endl;
        cvCircle(iplEdgeClone, cvPoint(cvRound(cvmGet(state,0,0)), cvRound(cvmGet(state,1,0)) ),
                cvRound(cvmGet(state,2,0)), CV_RGB(255, 0, 0), 1);

        cvShowImage(titleResult, iplEdgeClone);
        cvWaitKey(1);


        // update particles
        cout << endl << "updating particles" << endl;
        double a[N]; // portion between particles

        // define integrated portions of each particles; 0 < a[0] < a[1] < a[2] = 1
        a[0] = w[0];
        for (int n = 1; n < N; n++)
        {
            a[n] = a[n - 1] + w[n];
//            cout << "a" << n << "= " << a[n] << " ";
        }
//        cout << "a" << N << "= " << a[N] << " " << endl;

        for (int n = 0; n < N; n++)
        {
            // select a particle from the distribution
            int pselected;
            for (int k = 0; k < N; k++)
            {
                if ( uniform_random() < a[k] )
                {
                    pselected = k;
                    break;
                }
            }
//            cout << "p " << n << " => " << pselected << " ";

            // retain the selection
            for (int row = 0; row < D; row++)
            {
                cvmSet(pu[n], row, 0, cvmGet(pp[pselected],row,0));
                cvSub(pp[pselected], pb[pselected], vu[n]); // pp - pb -> vu
            }
        }

        // updated each particle and its velocity
        for (int n = 0; n < N; n++)
        {
            for (int row = 0; row < D; row++)
            {
                cvmSet(pb[n], row, 0, cvmGet(pu[n],row,0));
                cvmSet(v[n], row , 0, cvmGet(vu[n],row,0));
            }
        }
        cout << endl << endl ;

//      cvShowImage(titleResult, iplImg);
//        cvWaitKey(1000);
        cvWaitKey(1);
        frameNo++;
    }

    cvWaitKey();

    return 0;
}

'Computer Vision' 카테고리의 다른 글

Joan Solà - 6DOF SLAM toolbox for Matlab (0)	2009.12.02
open source Bayesian Filtering C++ library (test log) (0)	2009.11.30
Dan Simon "Kalman Filtering" (0)	2009.11.17
A. J. Haug "A Tutorial on Bayesian Estimation and Tracking Techniques Applicable to Nonlinear and Non-Gaussian Processes" (1)	2009.11.16
Particle filter 연습 (1)	2009.11.10

posted by maetel

Branislav Kisačanin & Vladimir Pavlović & Thomas S. Huang <Real-Time Vision for Human-Computer Interaction>

2009. 11. 8. 16:31 Computer Vision

Branislav Kisačanin & Vladimir Pavlović & Thomas S. Huang
Real-Time Vision for Human-Computer Interaction
(RTV4HCI)
Springer, 2005
(google book's overview)

2004 IEEE CVPR Workshop on RTV4HCI - Papers
http://rtv4hci.rutgers.edu/04/

Computer vision and pattern recognition continue to play a dominant role in the HCI realm. However, computer vision methods often fail to become pervasive in the field due to the lack of real-time, robust algorithms, and novel and convincing applications.

Keywords:

head and face modeling
map building
pervasive computing
real-time detection

Contents:

RTV4HCI: A Historical Overview.
- Real-Time Algorithms: From Signal Processing to Computer Vision.
- Recognition of Isolated Fingerspelling Gestures Using Depth Edges.
- Appearance-Based Real-Time Understanding of Gestures Using Projected Euler Angles.
- Flocks of Features for Tracking Articulated Objects.
- Static Hand Posture Recognition Based on Okapi-Chamfer Matching.
- Visual Modeling of Dynamic Gestures Using 3D Appearance and Motion Features.
- Head and Facial Animation Tracking Using Appearance-Adaptive Models and Particle Filters.
- A Real-Time Vision Interface Based on Gaze Detection -- EyeKeys.
- Map Building from Human-Computer Interactions.
- Real-Time Inference of Complex Mental States from Facial Expressions and Head Gestures.
- Epipolar Constrained User Pushbutton Selection in Projected Interfaces.
- Vision-Based HCI Applications.
- The Office of the Past.
- MPEG-4 Face and Body Animation Coding Applied to HCI.
- Multimodal Human-Computer Interaction.
- Smart Camera Systems Technology Roadmap.
- Index.

RTV4HCI: A Historical Overview

TurkOverviewHCI05.pdf

Matthew Turk (mturk@cs.ucsb.edu)
University of California, Santa Barbara
http://www.stanford.edu/~mturk/
http://www.cs.ucsb.edu/~mturk/

The goal of research in real-time vision for human-computer interaction is to develop algorithms and systems that sense and perceive humans and human activity, in order to enable more natural, powerful, and effective computer interfaces.

Computers in the Human Interaction Loop (CHIL)

perceptual interfaces
multimodal interfaces
post-WIMP(windows, icons, menus, pointer) interfaces

implicit user awareness or explicit user control

The user interface
- the software and devices that implement a particular model (or set of models) of HCI

Computer vision technologies must ultimately deliver a better "user experience".

B Shneiderman, Designing the User Interface: Strategies for Effective Human-Computer Interaction, Third Edition, Addison-Wesley, 1998.
: 1) time to learn 2) speed of performance 3) user error rates 4) retention over time 5) subjective satisfaction

- Presence and location (Face and body detection, head and body tracking)
- Identity (Face recognition, gait recognition)
- Expression (Facial feature tracking, expression modeling and analysis)
- Focus of attention (Head/face tracking, eye gaze tracking)
- Body posture and movement (Body modeling and tracking)
- Gesture (Gesture recognition, hand tracking)
- Activity (Analysis of body movement)

eg.
VIDEOPLACE (M W Krueger, Artificial Reality II, Addison-Wesley, 1991)
Magic Morphin Mirror / Mass Hallucinations (T Darrell et al., SIGGRAPH Visual Proc, 1997)

Principal Component Analysis (PCA)
Linear Discriminant Analysis (LDA)
Gabor Wavelet Networks (GWNs)
Active Appearance Models (AAMs)
Hidden Markov Models (HMMs)

Identix Inc.
Viisage Technology Inc.
Cognitec Systems

- MIT Medial Lab
ALIVE system (P Maes et al., The ALIVE system: wireless, full-body interaction with autonomous agents, ACM Multimedia Systems, 1996)
PFinder system (C R Wren et al., Pfinder: Real-time tracking of the human body, IEEE Trans PAMI, pp 780-785, 1997)
KidsRoom project (A Bobick et al., The KidsRoom: A perceptually-based interactive and immersive story environment, PRESENCE: Teleoperators and Virtual Environments, pp 367-391, 1999)

Flocks of Features for Tracking Articulated Objects

KolschBook05.pdf

Mathias Kolsch (kolsch@nps.edu
Computer Science Department, Naval Postgraduate School, Monterey
Matthew Turk (mturk@cs.ucsb.edu)
Computer Science Department, University of California, Santa Barbara

flocking behavior

http://www.movesinstitute.org/~kolsch/HandVu/HandVu.html

Visual Modeling of Dynamic Gestures Using 3D Appearance and Motion Features

Visual Modeling of Dynamic Gestures Using 3D Appearance and.pdf

Guangqi Ye (grant@cs.jhu.edu), Jason J. Corso, Gregory D. Hager
Computational Interaction and Robotics Laboratory
The Johns Hopkins University

Map Building from Human-Computer Interactions

arsenioCVPRWorkshop.pdf

http://groups.csail.mit.edu/lbr/mars/pubs/pubs.html#publications
Artur M. Arsenio (arsenio@csail.mit.edu)
Computer Science and Artificial Intelligence Laboratory
Massachusetts Institute of Technology

Vision-Based HCI Applications
Eric Petajan (eric@f2f-inc.com)
face2face animation, inc.
eric@f2f-inc.com

The Office of the Past

kim04office.pdf

Jiwon Kim (jwkim@cs.washington.edu), Steven M. Seitz (seitz@cs.washington.edu)
University of Washington
Maneesh Agrawala (maneesh@microsoft.com)
Microsoft Research
Proceedings of the 2004 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'04) Volume 10 - Volume 10 Page: 157 Year of Publication: 2004
http://desktop.google.com
http://grail.cs.washington.edu/projects/office/
http://www.realvnc.com/

Smart Camera Systems Technology Roadmap
Bruce Flinchbaugh (b-flinchbaugh@ti.com)
Texas Instruments

'Computer Vision' 카테고리의 다른 글

A. J. Haug "A Tutorial on Bayesian Estimation and Tracking Techniques Applicable to Nonlinear and Non-Gaussian Processes" (1)	2009.11.16
Particle filter 연습 (1)	2009.11.10
Kalman filter 연습 (1)	2009.11.05
M. Armstrong & A. Zisserman <Robust object tracking> (0)	2009.10.29
R. L. Thompson et al. <Providing synthetic views for teleoperation using visual pose tracking in multiple cameras> (1)	2009.10.27

posted by maetel

M. Armstrong & A. Zisserman <Robust object tracking>

2009. 10. 29. 19:03 Computer Vision

M. Armstrong and A. Zisserman,
“Robust object tracking,”

armstrong95robust.pdf

in Proc 2nd Asian Conference on Computer Vision, 1995, vol. I.
Springer, 1996, pp. 58–62.

Abstract
We describe an object tracker robust to a number of ambient conditions which often severely degrade performance, for example partial occlusion. The robustness is achieved by describing the object as a set of related geometric primitives (lines, conics, etc.), and using redundant measurements to facilitate the detection of outliers. This improves the overall tracking performance. Results are given for frame rate tracking on image sequences.

'Computer Vision' 카테고리의 다른 글

Branislav Kisačanin & Vladimir Pavlović & Thomas S. Huang <Real-Time Vision for Human-Computer Interaction> (0)	2009.11.08
Kalman filter 연습 (1)	2009.11.05
R. L. Thompson et al. <Providing synthetic views for teleoperation using visual pose tracking in multiple cameras> (1)	2009.10.27
C. Harris & C. Stennett, <Rapid - a video rate object tracker> (0)	2009.10.27
Somkiat Wangsiripitak & David W Murray <Avoiding moving outliers in visual SLAM by tracking moving objects> (0)	2009.10.26

posted by maetel

R. L. Thompson et al. <Providing synthetic views for teleoperation using visual pose tracking in multiple cameras>

2009. 10. 27. 23:31 Computer Vision

R. L. Thompson, I. D. Reid, L. A. Munoz, and D. W. Murray,
“Providing synthetic views for teleoperation using visual pose tracking in multiple cameras,”

thompson_etal_tsmc2001.pdf

IEEE Transactions on Systems, Man and Cybernetics, Part A, vol. 31, no. 1, pp. 43–54, 2001.

Abstract - This paper describes a visual tool for teleoperative experimentation involving remote manipulation and contact tasks. Using modest hardware, it recovers in real-time the pose of moving polyhedral objects, and presents a synthetic view of the scene to the teleoperator using any chosen viewpoint and viewing direction. The method of line tracking introduced by Harris is extended to multiple calibrated cameras, and afforced by robust methods and iterative ltering. Experiments are reported which determine the static and dynamic performance of the vision system, and its use in teleoperation is illustrated in two experiments, a peg in hole manipulation task and an impact control task.

Line tracking
http://en.wikipedia.org/wiki/Passive_radar#Line_tracking
The line-tracking step refers to the tracking of target returns from individual targets, over time, in the range-Doppler space produced by the cross-correlation processing. A standard Kalman filter is typically used. Most false alarms are rejected during this stage of the processing.

- Three difficulties using the Harris tracker
First it was found to be easily broken by occlusions and changing lighting. Robust methods to mitigate this problem have been investigated monocularly by Armstrong and Zisserman [20], [21]. Although this has a marked effect on tracking performance, the second problem found is that the accuracy of the pose recovered in a single camera was poor, with evident correlation between depth and rotation about axes parallel to the image plane. Maitland and Harris [22] had already noted as much when recovering the pose of a pointing device destined for neurosurgical application [23].
They reported much improved accuracy using two cameras; but the object was stationary, had an elaborate pattern drawn on it and was visible at all times to both cameras. The third difficulty, or rather uncertainty, was that the convergence properties and dynamic performances of the monocular and multicamera methods were largely unreported.

"Harris' RAPiD tracker included a constant velocity Kalman filter."

'Computer Vision' 카테고리의 다른 글

Kalman filter 연습 (1)	2009.11.05
M. Armstrong & A. Zisserman <Robust object tracking> (0)	2009.10.29
C. Harris & C. Stennett, <Rapid - a video rate object tracker> (0)	2009.10.27
Somkiat Wangsiripitak & David W Murray <Avoiding moving outliers in visual SLAM by tracking moving objects> (0)	2009.10.26
Sebastian Thrun & Wolfram Burgard & Dieter Fox <Probabilistic Robotics> (0)	2009.10.22

posted by maetel

C. Harris & C. Stennett, <Rapid - a video rate object tracker>

2009. 10. 27. 14:40 Computer Vision

Harris' RAPiD
C. Harris and C. Stennett, “Rapid - a video rate object tracker,” in Proc 1st British Machine Vision Conference, Sep 1990, pp. 73–77.

bmvc-90-015.pdf

ref.
C. Harris, “Tracking with rigid models,” in Active Vision, A. Blake and A. Yuille, Eds. MIT Press, 1992, pp. 59–73.

RAPID (Real-time Attitude and Position Determination) is a real-time model-based tracking algorithm for a known three dimensional object executing arbitrary motion, and viewed by a single video-camera. The 3D object model consists of selected control points on high contrast edges, which can be surface markings, folds or profile edges.
The use of either an alpha-beta tracker or a Kalman filter permits large object motion to be tracked and produces more stable tracking results. The RAPID tracker runs at video-rate on a standard minicomputer equipped with an image capture board.

alpha-beta tracker
http://en.wikipedia.org/wiki/Alpha_beta_filter

Kalman filter
http://en.wikipedia.org/wiki/Kalman_filter

'Computer Vision' 카테고리의 다른 글

M. Armstrong & A. Zisserman <Robust object tracking> (0)	2009.10.29
R. L. Thompson et al. <Providing synthetic views for teleoperation using visual pose tracking in multiple cameras> (1)	2009.10.27
Somkiat Wangsiripitak & David W Murray <Avoiding moving outliers in visual SLAM by tracking moving objects> (0)	2009.10.26
Sebastian Thrun & Wolfram Burgard & Dieter Fox <Probabilistic Robotics> (0)	2009.10.22
Machine Learning Summer Schools (0)	2009.10.21

posted by maetel

Somkiat Wangsiripitak & David W Murray <Avoiding moving outliers in visual SLAM by tracking moving objects>

2009. 10. 26. 21:35 Computer Vision

Avoiding moving outliers in visual SLAM by tracking moving objects

Avoiding moving outliers in visual SLAM by tracking moving .pdf

Wangsiripitak, S. Murray, D.W.
Dept. of Eng. Sci., Univ. of Oxford, Oxford, UK;

This paper appears in: Robotics and Automation, 2009. ICRA '09. IEEE International Conference on
Publication Date: 12-17 May 2009
On page(s): 375-380
ISSN: 1050-4729
ISBN: 978-1-4244-2788-8
INSPEC Accession Number: 10748966
Digital Object Identifier: 10.1109/ROBOT.2009.5152290
Current Version Published: 2009-07-06

http://www.robots.ox.ac.uk/~lav//Research/Projects/2009somkiat_slamobj/project.html

Abstract

parallel implementation of monoSLAM with a 3D object tracker
information to register objects to the map's frame
the recovered geometry

I. Introduction

approaches to handling movement in the environment
segmentation between static and moving features
outlying moving points

1) active search -> sparse maps
2) robust methods -> multifocal tensors
3-1) tracking known 3D objects in the scene
-2) determining whether they are moving
-3) using their convex hulls to mask out features

"Knowledge that they are occluded rather than unreliable avoids the need to invoke the somewhat cumbersome process of feature deletion, followed later perhaps by unnecessary reinitialization."

[15] H. Zhou and S. Sakane, “Localizing objects during robot SLAM in semi-dynamic environments,” in Proc of the 2008 IEEE/ASME Int Conf on Advanced Intelligent Mechatronics, 2008, pp. 595–601.

"[15] noted that movement is likely to associated with objects in the scene, and classified them according to the likelihood that they would move."

the use of 3D objects for reasoning about motion segmentation and occlusion

occlusion masks

II. Underlying Processes
A. Visual SLAM

Monocular visual SLAM - EKF

idempotent 멱등(冪等)
http://en.wikipedia.org/wiki/Idempotence
Idempotence describes the property of operations in mathematics and computer science that means that multiple applications of the operation do not change the result.

http://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation
http://en.wikipedia.org/wiki/Conversion_between_quaternions_and_Euler_angles
http://en.wikipedia.org/wiki/Quaternion
http://en.wikipedia.org/wiki/Euler_Angles
Berthold K.P. Horn, "Some Notes on Unit Quaternions and Rotation"

"Standard monocular SLAM takes no account of occlusion."

B. Object pose tracking

Harris' RAPiD
[17] C. Harris and C. Stennett, “Rapid - a video rate object tracker,” in Proc 1st British Machine Vision Conference, Sep 1990, pp. 73–77
[20] C. Harris, “Tracking with rigid models,” in Active Vision, A. Blake and A. Yuille, Eds. MIT Press, 1992, pp. 59–73.

"(RAPiD makes the assumption that the pose change required between current and new estimates is sufficiently small, first, to allow a linearization of the solution and, second, to make trivial the problem of inter-image correspondence.) The correspondences used are between predicted point to measured image edge, allowing search in 1D rather than 2D within the image. This makes very sparing use of image data — typically only several hundred pixels per image are addressed."

aperture problem
http://en.wikipedia.org/wiki/Motion_perception
http://focus.hms.harvard.edu/2001/Mar9_2001/research_briefs.html

[21] R. L. Thompson, I. D. Reid, L. A. Munoz, and D. W. Murray, “Providing synthetic views for teleoperation using visual pose tracking in multiple cameras,” IEEE Transactions on Systems, Man and
Cybernetics, Part A, vol. 31, no. 1, pp. 43–54, 2001.
- "Three difficulties using the Harris tracker":

(1)First it was found to be easily broken by occlusions and changing lighting. Robust methods to mitigate this problem have been investigated monocularly by Armstrong and Zisserman. (2)Although this has a marked effect on tracking performance, the second problem found is that the accuracy of the pose recovered in a single camera was poor, with evident correlation between depth and rotation about axes parallel to the image plane. Maitland and Harris had already noted as much when recovering the pose of a pointing device destined for neurosurgical application. They reported much improved accuracy using two cameras; but the object was stationary, had an elaborate pattern drawn on it and was visible at all times to both cameras. (3)The third difficulty, or rather uncertainty, was that the convergence properties and dynamic performances of the monocular and multicamera methods were largely unreported.

(3) : little solution
(2) => [21] "recovered pose using 3 iterations of the pose update cycle per image"
(1) => [21], [22] : search -> matching -> weighting

[22] M. Armstrong and A. Zisserman, “Robust object tracking,” in Proc 2nd Asian Conference on Computer Vision, 1995, vol. I. Springer, 1996, pp. 58–62.

RANSAC
[23] M. Fischler and R. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, June 1981.

Least median of squares as the underlying standard deviation is unknown
[24] P. J. Rousseeuw, “Least median of squares regression,” Journal of the American Statistical Association, vol. 79, no. 388, pp. 871–880, 1984.

III. MonoSLAM with Tracked Objects
A. Information from SLAM to the object tracker

B. Information from the object tracker to SLAM

"The convex hull is uniformly dilated by an amount that corresponds to the projection of the typical change in pose."

'Computer Vision' 카테고리의 다른 글

R. L. Thompson et al. <Providing synthetic views for teleoperation using visual pose tracking in multiple cameras> (1)	2009.10.27
C. Harris & C. Stennett, <Rapid - a video rate object tracker> (0)	2009.10.27
Sebastian Thrun & Wolfram Burgard & Dieter Fox <Probabilistic Robotics> (0)	2009.10.22
Machine Learning Summer Schools (0)	2009.10.21
University of Cambridge <Augmented Maps> (0)	2009.10.21

posted by maetel

Chekhlov et al < Ninja on a Plane: Automatic Discovery of Physical Planes for Augmented Reality Using Visual SLAM>

2009. 9. 16. 22:04 Computer Vision

Denis Chekhlov, Andrew Gee, Andrew Calway, Walterio Mayol-Cuevas
Ninja on a Plane: Automatic Discovery of Physical Planes for Augmented Reality Using Visual SLAM

chekhlov_Ninja on a Plane.pdf

http://dx.doi.org/10.1109/ISMAR.2007.4538840

demo: Ninja on A Plane: Discovering Planes in SLAM for AR

http://www.cs.bris.ac.uk/Publications/pub_master.jsp?id=2000745

'Computer Vision' 카테고리의 다른 글

University of Cambridge <Augmented Maps> (0)	2009.10.21
University of Cambridge <Natural Feature Tracking for Mobile Phones> (0)	2009.10.21
John Pretlove "Augmenting reality for telerobotics: unifying real and virtual worlds" (0)	2009.09.07
정보통신산업진흥원 "Augmented Reality 기술 동향" (0)	2009.09.02
경영과컴퓨터/김종현 "미래 공간, Augmented Reality" (0)	2009.09.02

posted by maetel

Andrew J. Davison, Ian Reid, Nicholas Molton & Olivier Stasse <MonoSLAM: Real-Time Single Camera SLAM>

2009. 8. 24. 16:47 Computer Vision

Davison, A. J. and Molton, N. D. 2007.
MonoSLAM: Real-Time Single Camera SLAM.
IEEE Trans. Pattern Anal. Mach. Intell. 29, 6 (Jun. 2007), 1052-1067.
DOI= http://dx.doi.org/10.1109/TPAMI.2007.1049

davison_etal_pami2007.pdf

'Computer Vision' 카테고리의 다른 글

R&D특허센터 연구노트 작성관리지침 가이드라인 (0)	2009.08.28
Kurt Konolige & Motilal Agrawal <FrameSLAM: From Bundle Adjustment to Real-Time Visual Mapping> (0)	2009.08.24
Jules Bloomenthal & Jon Rokne "Homogeneous Coordinates" (0)	2009.08.20
Epipolar geometry - Fundamental matrix (0)	2009.08.19
Richard Hartley <In defense of the eight-point algorithm> (0)	2009.08.19

posted by maetel

Richard Hartley <In defense of the eight-point algorithm>

2009. 8. 19. 00:35 Computer Vision

In defense of the eight-point algorithm
Hartley, R.I.
Corp. Res. & Dev., Gen. Electr. Co., Schenectady, NY, USA;
This paper appears in: Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publication Date: June 1997
Volume: 19 , Issue: 6
On page(s): 580 - 593

fundamental.pdf

In Defense of the Eight-Point Algorithm.pdf

Zhengyou Zhang

'Computer Vision' 카테고리의 다른 글

Jules Bloomenthal & Jon Rokne "Homogeneous Coordinates" (0)	2009.08.20
Epipolar geometry - Fundamental matrix (0)	2009.08.19
Five-Point algorithm (0)	2009.08.18
UNIX references (0)	2009.08.17
PTAM to be dissected on OS X (0)	2009.08.17

posted by maetel

Five-Point algorithm

2009. 8. 18. 21:29 Computer Vision

Nistér, D. 2004. An Efficient Solution to the Five-Point Relative Pose Problem. IEEE Trans. Pattern Anal. Mach. Intell. 26, 6 (Jun. 2004), 756-777. DOI= http://dx.doi.org/10.1109/TPAMI.2004.17
An efficient solution to the five-point relative pose problem
Nister, D.
Sarnoff Corp., Princeton, NJ, USA;
This paper appears in: Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publication Date: June 2004
Volume: 26, Issue: 6
On page(s): 756-770 An Efficient Solution to the Five-Point Relative Pose Problem

nisterd_efficient.pdf

David Nist´er
Sarnoff Corporation
Center for Visualization and Virtual Environments and the Computer Science Department of University of Kentucky

H. Stewenius, C. Engels, and D. Niste. Recent developments on direct relative orientation.
ISPRS Journal of Photogrammetry and Remote Sensing, 60:284-294, June 2006.

stewenius_engels_nister_5pt_isprs.pdf

Calibrated Fivepoint Solver
http://www.vis.uky.edu/~dnister/Executables/RelativeOrientation/

Dhruv Batra, Bart Nabbe, and Martial Hebert. An Alternative Formulation for the Five Point Relative Pose Problem. IEEE Workshop on Motion and Video Computing 2007 (WMVC '07).
http://www.ece.cmu.edu/~dbatra/research/fivept/fivept.html

batra-5pt.pdf

Five-Point Motion Estimation Made Easy
Hongdong Li and Richard Hartley (RSISE, The Australian National University. Canberra Research Labs, National ICT Australia.)

new5pt_cameraREady_ver_1.pdf

preview

SfM = structure from motion
http://en.wikipedia.org/wiki/Structure_from_motion

eight-point algorithm
http://en.wikipedia.org/wiki/Eight-point_algorithm
algorithm used in computer vision to estimate the essential matrix or the fundamental matrix related to a stereo camera pair from a set of corresponding image points

Richard Hartley and Andrew Zisserman (2003). Multiple View Geometry in computer vision. Cambridge University Press.
http://www.robots.ox.ac.uk/~vgg/hzbook/
ch.8 - ch.10

Richard I. Hartley (June 1997). "In Defense of the Eight-Point Algorithm". IEEE Transaction on Pattern Recognition and Machine Intelligence 19 (6): 580–593. doi:10.1109/34.601246.

Richard Szeliski, Microsoft Research
Computer Vision: Algorithms and Applications
ch.7 Structure from Motion

Epipolar geometry - Fundamental matrix

'Computer Vision' 카테고리의 다른 글

Epipolar geometry - Fundamental matrix (0)	2009.08.19
Richard Hartley <In defense of the eight-point algorithm> (0)	2009.08.19
UNIX references (0)	2009.08.17
PTAM to be dissected on OS X (0)	2009.08.17
PTAM test log on Mac OS X (7)	2009.08.05

posted by maetel

PTAM test log on Mac OS X

2009. 8. 5. 14:36 Computer Vision

Oxford 대학 Active Vision Group에서 개발한
PTAM (Parallel Tracking and Mapping for Small AR Workspaces)
http://www.robots.ox.ac.uk/~gk/PTAM/

Questions? E-mail ptam@robots.ox.ac.uk

0. requirements 확인

readme 파일에서 언급하는 대로 프로세서와 그래픽카드를 확인하니

내가 설치할 컴퓨터 사양:
Model Name:    Mac mini
Model Identifier:    Macmini3,1
Processor Name:    Intel Core 2 Duo
Processor Speed:    2 GHz
Number Of Processors:    1
Total Number Of Cores:    2
L2 Cache:    3 MB
Memory:    1 GB
Bus Speed:    1.07 GHz
Boot ROM Version:    MM31.0081.B00

그래픽 카드:
NVIDIA GeForce 9400

"Intel Core 2 Duo processors 2.4GHz+ are fine."이라고 했는데, 2.0이면 되지 않을까? 그래픽 카드는 동일한 것이니 문제 없고.

1. library dependency 확인

1. TooN - a header library for linear algebra
2. libCVD - a library for image handling, video capture and computer vision
3. Gvars3 - a run-time configuration/scripting library, this is a sub-project of libCVD.

셋 다 없으므로,

1-1. TooN 다운로드

TooN (Tom's object oriented Numerics)은 선형대수 (벡터, 매트릭스 연산)를 위해 Cambridge Machine Intelligence lab에서 개발한 C++ 라이브러리라고 한다.

ref. TooN Documentation (<- 공식 홈보다 정리가 잘 되어 있군.)

다음과 같은 명령으로 다운로드를 받는다.

%% cvs -z3 -d:pserver:anonymous@cvs.savannah.nongnu.org:/sources/toon co TooN

실행 결과:

생성된 TooN 폴더에 들어가서

%%% ./configure

실행 결과:

1-1-1. 더 안정적인(?) 버전을 받으려면

%% cvs -z3 -d:pserver:anonymous@cvs.savannah.nongnu.org:/sources/toon co -D "Mon May 11 16:29:26 BST 2009" TooN

실행 결과:

1-2. libCVD 다운로드

libCVD (Cambridge Video Dynamics)는 같은 연구실에서 만든 컴퓨터 비전 관련 이미지 처리를 위한 C++ 라이브러리

ref. CVD documentation

%% cvs -z3 -d:pserver:anonymous@cvs.savannah.nongnu.org:/sources/libcvd co -D "Mon May 11 16:29:26 BST 2009" libcvd

실행 결과:

1-3. Gvars3 다운로드

Gvars3 (configuration system library)

%% cvs -z3 -d:pserver:anonymous@cvs.savannah.nongnu.org:/sources/libcvd co -D "Mon May 11 16:29:26 BST 2009" gvars3

실행 결과:

2. 다운로드한 기반 라이브러리 설치

2-1. TooN 설치

2-1-1. configure file 실행

configure scripts는 source code를 compile하고 실행시킬 수 있게 만들어 주는 것.

생성된 TooN 폴더에 들어가서

%%% ./configure

실행 결과:

2-1-2. 설치

(TooN은 헤더파일들의 모음이므로 compile이 필요없다.)

%%% sudo make install

실행 결과:

mkdir -p //usr/local/include/TooN
cp *.h //usr/local/include/TooN
cp -r optimization //usr/local/include/TooN/
cp -r internal //usr/local/include/TooN/

2-2. libCVD 설치

2-2-1. configure 파일 실행

생성된 libCVD 폴더에 들어가서

%%% export CXXFLAGS=-D_REENTRANT
%%% ./configure --without-ffmpeg

실행 결과:

2-2-2. documents 생성 (생략해도 되는 듯)

다시 시도했더니

%%% make docs

make: *** No rule to make target `docs'. Stop.

여전히 안 되는 듯... 아! doxygen을 맥포트로 설치해서 그런가 보다. (데이터베이스가 서로 다르다고 한다.)_M#]

2-2-3. compile 컴파일하기

%%% make

실행 결과:

2-2-4. install 설치하기

%%% sudo make install

실행 결과:

2-3. Gvars3 설치

2-3-1. configure 파일 실행

Gvars3 폴더에 들어가서

%%% ./configure --disable-widgets

실행 결과:

2-3-2. compile 컴파일하기

%%% make

실행 결과:

2-3-3. install 설치하기

%%% sudo make install

mkdir -p //usr/local/lib
cp libGVars3.a libGVars3_headless.a //usr/local/lib
mkdir -p //usr/local/lib
cp libGVars3-0.6.dylib //usr/local/lib
ln -fs //usr/local/lib/libGVars3-0.6.dylib //usr/local/lib/libGVars3-0.dylib
ln -fs //usr/local/lib/libGVars3-0.dylib //usr/local/lib/libGVars3.dylib
mkdir -p //usr/local/lib
cp libGVars3_headless-0.6.dylib //usr/local/lib
ln -fs //usr/local/lib/libGVars3_headless-0.6.dylib //usr/local/lib/libGVars3_headless-0.dylib
ln -fs //usr/local/lib/libGVars3_headless-0.dylib //usr/local/lib/libGVars3_headless.dylib
mkdir -p //usr/local/include
cp -r gvars3 //usr/local/include

2-4. OS X에서의 컴파일링과 관련하여

ref. UNIX에서 컴파일하기
Porting UNIX/Linux Applications to Mac OS X: Compiling Your Code in Mac OS X

3. PTAM 컴파일

3-1. 해당 플랫폼의 빌드 파일을 PTAM source 디렉토리로 복사

내 (OS X의) 경우, PTAM/Build/OS X에 있는 모든 (두 개의) 파일 Makefile과 VideoSource_OSX.cc를 PTAM 폴더에 옮겼다.

3-2. video source 셋업

카메라에 맞는 video input file을 컴파일하도록 Makefile을 수정해 주어야 한다.
맥의 경우, (아마도 Logitech Quickcam Pro 5000 을 기준으로 하는) 하나의 소스 파일만이 존재하므로 그대로 두면 될 듯.

3-3. video source 추가

다른 비디오 소스들은 libCVD에 클래스로 만들어져 있다고 한다. 여기에 포함되어 있지 않은 경우에는 VideoSource_XYZ.cc 라는 식의 이름을 갖는 파일을 만들어서 넣어 주어야 한다.

3-4. compile

PTAM 폴더에 들어가서

%% make

실행 결과:

g++ -g -O3 main.cc -o main.o -c -I /MY_CUSTOM_INCLUDE_PATH/ -D_OSX -D_REENTRANT
g++ -g -O3 VideoSource_OSX.cc -o VideoSource_OSX.o -c -I /MY_CUSTOM_INCLUDE_PATH/ -D_OSX -D_REENTRANT
g++ -g -O3 GLWindow2.cc -o GLWindow2.o -c -I /MY_CUSTOM_INCLUDE_PATH/ -D_OSX -D_REENTRANT
In file included from OpenGL.h:20,
from GLWindow2.cc:1:
/usr/local/include/cvd/gl_helpers.h:38:19: error: GL/gl.h: No such file or directory
/usr/local/include/cvd/gl_helpers.h:39:20: error: GL/glu.h: No such file or directory
/usr/local/include/cvd/gl_helpers.h: In function 'void CVD::glPrintErrors()':
/usr/local/include/cvd/gl_helpers.h:569: error: 'gluGetString' was not declared in this scope
make: *** [GLWindow2.o] Error 1

이 에러 메시지는 다음 링크에서 논의되고 있는 문제와 비슷한 상황인 것 같다.
http://en.allexperts.com/q/Unix-Linux-OS-1064/Compiling-OpenGL-unix-linux.htm

3-4-1. OpenGL on UNIX

PTAM이 OpenGL을 사용하고 있는데, OpenGL이 Mac에 기본으로 설치되어 있으므로 신경쓰지 않았던 부분이다. 물론 system의 public framework으로 들어가 있음을 확인할 수 있다. 그런데 UNIX 프로그램에서 접근할 수는 없는가? (인터넷에서 검색해 보아도 따로 설치할 수 있는 다운로드 링크나 방법을 찾을 수 없다.)

에러 메시지에 대한 정확한 진단 ->

philphys: 일단 OpenGL은 분명히 있을 건데 그 헤더파일과 라이브러리가 있는 곳을 지정해 주지 않아서 에러가 나는 것 같아. 보통 Makefile에 이게 지정되어 있어야 하는데 실행결과를 보니까 전혀 지정되어 있지 않네. 중간에 보면 -I /MY_CUSTOM_INCLUDE_PATH/ 라는 부분이 헤더 파일의 위치를 지정해 주는 부분이고 또 라이브러리는 뒤에 링크할 때 지정해 주게 되어 있는데 거기까지는 가지도 못 했네.

즉, "링커가 문제가 아니라, 컴파일러 옵션에 OpenGL의 헤더파일이 있는 디렉토리를 지정해 주어야 할 것 같다"고 한다.

문제의 Makefile을 들여다보고

Makefile을 다음과 같이 수정하고 (보라색 부분 추가)

COMPILEFLAGS = -I /MY_CUSTOM_INCLUDE_PATH/ -D_OSX -D_REENTRANT -I/usr/X11R6/include/

philphys: /usr/X11R6/include 밑에 GL 폴더가 있고 거기에 필요한 헤더파일들이 모두 들어 있다. 그래서 코드에선 "GL/gl.h" 하는 식으로 explicit하게 GL 폴더를 찾게 된다.

그러고 보면 아래와 같은 설명이 있었던 것이다.

Since the Linux code compiles directly against the nVidia driver's GL headers, use of a different GL driver may require some modifications to the code.

다시 컴파일 하니,
실행 결과:

설치 완료!
두 실행파일 PTAM과 CameraCalibrator이 생성되었다.

3-5. X11R6에 대하여

X11R6 = Xwindow Verion 11 Release 6

Xwindow
X.org

4. camera calibration

CameraCalibrator 파일을 실행시켜 카메라 캘리브레이션을 시도했더니 GUI 창이 뜨는데 연결된 웹캠(Logitech QuickCam Pro 4000)으로부터 입력을 받지 못 한다.

4-0. 증상

CameraCalibrator 실행파일을 열면, 다음과 같은 터미널 창이 새로 열린다.

Last login: Fri Aug 7 01:14:05 on ttys001
%% /Users/lym/PTAM/CameraCalibrator ; exit;
Welcome to CameraCalibrator
--------------------------------------
Parallel tracking and mapping for Small AR workspaces
Copyright (C) Isis Innovation Limited 2008

Parsing calibrator_settings.cfg ....
! GUI_impl::Loadfile: Failed to load script file "calibrator_settings.cfg".
VideoSource_OSX: Creating QTBuffer....
IMPORTANT
This will open a quicktime settings planel.
You should use this settings dialog to turn the camera's
sharpness to a minimum, or at least so small that no sharpening
artefacts appear! In-camera sharpening will seriously degrade the
performance of both the camera calibrator and the tracking system.
>

그리고 Video란 이름의 GUI 창이 열리는데, 이때 아무런 설정을 바꾸지 않고 그대로 OK를 누르면 위의 터미널 창에 다음과 같은 메시지가 이어지면서 자동 종료된다.

.. created QTBuffer of size [640 480]
2009-08-07 01:20:57.231 CameraCalibrator[40836:10b] ***_NSAutoreleaseNoPool(): Object 0xf70e2c0 of class NSThread autoreleasedwith no pool in place - just leaking
Stack: (0x96827f0f 0x96734442 0x9673a1b4 0xbc2db7 0xbc7e9a 0xbc69d30xbcacbd 0xbca130 0x964879c9 0x90f8dfb8 0x90e69618 0x90e699840x964879c9 0x90f9037c 0x90e7249c 0x90e69984 0x964879c9 0x90f8ec800x90e55e05 0x90e5acd5 0x90e5530f 0x964879c9 0x94179eb9 0x282b48 0xd9f40xd6a6 0x2f16b 0x2fea4 0x26b6)
! Code for converting from format "Raw RGB data"
not implemented yet, check VideoSource_OSX.cc.

logout

[Process completed]

그러므로 3-3의 문제 -- set up video source (비디오 소스 셋업) --로 돌아가야 한다.
즉, VideoSource_OSX.cc 파일을 수정해서 다시 컴파일한 후 실행해야 한다.

Other video source classes are available with libCVD. Finally, if a custom video source not supported by libCVD is required, the code for it will have to be put into some VideoSource_XYZ.cc file (the interface for this file is very simple.)

삽질...

4-1. VideoSource_OSX.cc 파일 수정

수정한 VideoSource 파일

VideoSource_OSX.cc

터미널 창:

Welcome to CameraCalibrator
--------------------------------------
Parallel tracking and mapping for Small AR workspaces
Copyright (C) Isis Innovation Limited 2008

Parsing calibrator_settings.cfg ....
VideoSource_OSX: Creating QTBuffer....
IMPORTANT
This will open a quicktime settings planel.
You should use this settings dialog to turn the camera's
sharpness to a minimum, or at least so small that no sharpening
artefacts appear! In-camera sharpening will seriously degrade the
performance of both the camera calibrator and the tracking system.
> .. created QTBuffer of size [640 480]
2009-08-13 04:02:50.464 CameraCalibrator[6251:10b] *** _NSAutoreleaseNoPool(): Object 0x9df180 of class NSThread autoreleased with no pool in place - just leaking
Stack: (0x96670f4f 0x9657d432 0x965831a4 0xbc2db7 0xbc7e9a 0xbc69d3 0xbcacbd 0xbca130 0x924b09c9 0x958e8fb8 0x957c4618 0x957c4984 0x924b09c9 0x958eb37c 0x957cd49c 0x957c4984 0x924b09c9 0x958e9c80 0x957b0e05 0x957b5cd5 0x957b030f 0x924b09c9 0x90bd4eb9 0x282b48 0xd414 0xcfd6 0x2f06b 0x2fda4)

4-2. Camera Calibrator 실행

Camera calib is [ 1.51994 2.03006 0.499577 0.536311 -0.0005 ]
Saving camera calib to camera.cfg...
.. saved.

5. PTAM 실행

Welcome to PTAM
---------------
Parallel tracking and mapping for Small AR workspaces
Copyright (C) Isis Innovation Limited 2008

Parsing settings.cfg ....
VideoSource_OSX: Creating QTBuffer....
IMPORTANT
This will open a quicktime settings planel.
You should use this settings dialog to turn the camera's
sharpness to a minimum, or at least so small that no sharpening
artefacts appear! In-camera sharpening will seriously degrade the
performance of both the camera calibrator and the tracking system.
> .. created QTBuffer of size [640 480]
2009-08-13 20:17:54.162 ptam[1374:10b] *** _NSAutoreleaseNoPool(): Object 0x8f5850 of class NSThread autoreleased with no pool in place - just leaking
Stack: (0x96670f4f 0x9657d432 0x965831a4 0xbb9db7 0xbbee9a 0xbbd9d3 0xbc1cbd 0xbc1130 0x924b09c9 0x958e8fb8 0x957c4618 0x957c4984 0x924b09c9 0x958eb37c 0x957cd49c 0x957c4984 0x924b09c9 0x958e9c80 0x957b0e05 0x957b5cd5 0x957b030f 0x924b09c9 0x90bd4eb9 0x282b48 0x6504 0x60a6 0x11af2 0x28da 0x2766)
ARDriver: Creating FBO... .. created FBO.
MapMaker: made initial map with 135 points.
MapMaker: made initial map with 227 points.

The software was developed with a Unibrain Fire-i colour camera, using a 2.1mm M12 (board-mount) wide-angle lens. It also runs well with a Logitech Quickcam Pro 5000 camera, modified to use the same 2.1mm M12 lens.

iSight를 Netmate 1394B 9P Bilingual to 6P 케이블로 MacMini에 연결하여 해 보니 더 잘 된다.

'Computer Vision' 카테고리의 다른 글

UNIX references (0)	2009.08.17
PTAM to be dissected on OS X (0)	2009.08.17
SLAM related generally (0)	2009.08.04
Kalman Filter (0)	2009.07.30
OpenCV 1.0 설치 on Mac OS X (0)	2009.07.27

posted by maetel

SLAM related generally

2009. 8. 4. 23:07 Computer Vision

SLAM 전반/기본에 관한 자료

Durrant-Whyte & Bailey "Simultaneous localization and mapping"
http://leeway.tistory.com/667

Søren Riisgaard and Morten Rufus Blas
SLAM for Dummies: A Tutorial Approach to Simultaneous Localization and Mapping
http://leeway.tistory.com/688

Joan Solà Ortega (de l’Institut National Polytechnique de Toulouse, 2007)
Towards visual localization, mapping and moving objects tracking by a mobile robot: A geometric and probabilistic approach
ch3@ http://leeway.tistory.com/628

SLAM summer school
2009 Australian Centre for Field Robotics, University of Sydney
http://www.acfr.usyd.edu.au/education/summerschool.shtml
2006 Department of Engineering Science and Robotics Research Group, Oxford
http://www.robots.ox.ac.uk/~SSS06/Website/index.html
2004 Laboratory for Analysis and Architecture of Systems (LAAS-CNRS) located in Toulouse
http://www.laas.fr/SLAM/
2002 Centre for Autonomous Systems Numerical Analysis and Computer Science Royal Institute of Technology , Stockholm
http://www.cas.kth.se/SLAM/

http://www.doc.ic.ac.uk/%7Eajd/Scene/Release/monoslamtutorial.pdf
Oxford 대학 Active Vision Lab과 Visual Information Processing (VIP) Research Group에서 개발한 SceneLib tutorial인데, Monocular Single Camera를 사용한 SLAM의 기본 개념을 정리해 놓았다.

'Computer Vision' 카테고리의 다른 글

PTAM to be dissected on OS X (0)	2009.08.17
PTAM test log on Mac OS X (7)	2009.08.05
Kalman Filter (0)	2009.07.30
OpenCV 1.0 설치 on Mac OS X (0)	2009.07.27
cameras on mac os x (0)	2009.07.27

posted by maetel

임현, 이영삼 <이동로봇의 동시간 위치인식 및 지도작성(SLAM)>

2009. 7. 21. 16:16 Computer Vision

임현, 이영삼 (인하대 전기공학부)
이동로봇의 동시간 위치인식 및 지도작성(SLAM)

이동로봇의 동시간 위치인식 및 지도작성-SLA.pdf

제어 로봇 시스템 학회지 제15권 제2호 (2009년 6월)
from kyu

> definition
mapping: 환경을 인식가능한 정보로 변환하고
localization: 이로부터 자기 위치를 추정하는 것

> issues
- uncertainty <= sensor
- data association (데이터 조합): 차원이 높은 센서 정보로부터 2-3차원 정도의 정보를 추려내어 이를 지속적으로 - 대응시키는 것
- 관찰된 특징점 자료들을 효율적으로 관리하는 방법

> localization (위치인식)
: 그 위치가 미리 알려진 랜드마크를 관찰한 정보를 토대로 자신의 위치를 추정하는 것
: 초기치 x0와 k-1시점까지의 제어 입력, 관측벡터와 사전에 위치가 알려진 랜드마크를 통하여 매 k시점마다 로봇의 위치를 추정하는 것
- 로봇의 위치추정의 불확실성은 센서의 오차로부터 기인함.

> mapping (지도작성)
: 기준점과 상대좌표로 관찰된 결과를 누적하여 로봇이 위치한 환경을 모델링하는 것
: 위치와 관측정보 그리고 제어입력으로부터 랜드마크 집합을 추정하는 것
- 지도의 부정확성은 센서의 오차로부터 기인함.

> Simultaneous Localization and Mapping (SLAM, 동시간 위치인식 및 지도작성)
: 위치한 환경 내에서 로봇의 위치를 추정하는 것
: 랜드마크 관측벡터와 초기값 그리고 적용된 모든 제어입력이 주어진 상태에서 랜드마크의 위치와 k시점에서의 로봇 상태벡터 xk의 결합확률
- 재귀적 방법 + Bayes 정리
- observation model (관측 모델) + motion model (상태 공간 모델, 로봇의 움직임 모델)
- motion model은 상태 천이가 Markov 과정임을 의미함. (현재 상태는 오직 이전 상태와 입력 벡터로서 기술되고, 랜드마크 집합과 관측에 독립임.)
- prediction (time-update) + correction (measurement-update)
- 불확실성은 로봇 주행거리계와 센서 오차로부터 유발됨.

conditional Bayes rule
http://en.wikipedia.org/wiki/Bayes%27_theorem

$P(A|B \cap C) = \frac{P(A \cap B \cap C)}{P(B \cap C)} = \frac{P(B|A \cap C) \, P(A|C) \, P(C)}{P(C) \, P(B|C)} = \frac{P(B|A \cap C) \, P(A|C)}{P(B|C)}\,.$

Markov process

total probability theorem: "law of alternatives"
http://en.wikipedia.org/wiki/Total_probability_theorem

$\Pr(A)=\sum_{n} \Pr(A\cap B_n)\,$

$\Pr(A)=\sum_{n} \Pr(A\mid B_n)\Pr(B_n).\,$

> Extended Kalman filter (EKF, 확장 칼만 필터)

http://en.wikipedia.org/wiki/Ground_truth

'Computer Vision' 카테고리의 다른 글

Brian Williams, Georg Klein and Ian Reid <Real-Time SLAM Relocalisation> (0)	2009.07.23
Durrant-Whyte & Bailey "Simultaneous localization and mapping" (0)	2009.07.22
Georg Klein & David Murrayt <Parallel Tracking and Mapping for Small AR Workspaces> (0)	2009.07.15
Trends in Augmented Reality Tracking, Interaction and Display: A Review of Ten Years of ISMAR (0)	2009.07.14
to install Freeimage on mac (0)	2009.07.08

posted by maetel

Trends in Augmented Reality Tracking, Interaction and Display: A Review of Ten Years of ISMAR

2009. 7. 14. 21:23 Computer Vision

ISMAR 2008
7th IEEE/ACM International Symposium on Mixed and Augmented Reality, 2008

http://www.augmented-reality.org/

Proceedings
State of the Art Report

Trends in Augmented Reality Tracking, Interaction and Display
: A Review of Ten Years of ISMAR
Feng Zhou (Center for Human Factors and Ergonomics, Nanyang Technological University, Singapore)
Henry Been-Lirn Duh (Department of Electrical and Computer Engineering/Interactive and Digital Media Institute, National University of Singapore)
Mark Billinghurst (The HIT Lab NZ, University of Canterbury, New Zealand)

Trends in Augmented Reality Tracking, Interaction and Displ.pdf

Tracking

1. Sensor-based tracking -> ubiquitous tracking and dynamic data fusion

2. Vision-based tracking: feature-based and model-based

1) feature-based tracking techniques:
- To find a correspondence between 2D image features and their 3D world frame coordinates.
- Then to find the camera pose from projecting the 3D coordinates of the feature into the observed 3D image coordinates and minimizing the distance to their corresponding 3D features.

2) model-based tracking techniques:
- To explicitly use a model of the features of tracked objects such as a CAD model or 2D template of the object based on the distinguishable features.
- A visual serving approach adapted from robotics to calculate camera pose from a range of model features (line, circles, cylinders and spheres)
- knowledge about the scene by predicting hidden movement of the object and reducing the effects of outlier data

3. Hybrid tracking
- closed-loop-type tracking based on computer vision techonologies
- motion prediction
- SFM (structure from motion)
- SLAM (simultaneous localization and mapping)

Interaction and User Interfaces

1. Tangible
2. Collaborative
3. Hybrid

Display

1. See-through HMDs

1) OST = optical see-through
: the user to see the real world with virtual objects superimposed on it by optical or video technologies
2) VST = video see-through
: to display graphical infromation directly on real objects or even daily surfaces in everyday life

2. Projection-based Displays
3. Handheld Displays

Limitations of AR

> tracking
1) complexity of the scene and the motion of target objects, including the degrees of freedom of individual objects and their represenation
=> correspondence analysis: Kalman filters, particle filters.
2) how to find distinguishable objects for "markers" outdoors

> interaction
ergonomics, human factors, usability, cognition, HCI (human-computer interaction)

> AR displays
- HMDs - limited FOV, image distortions,
- projector-based displays - lack mobility, self-occlusion
- handheld displays - tracking with markers to limit the work range

Trends and Future Directions

1. Tracking

1) RBPF (Rao-Blackwellized particle filters) -> automatic recognition systems
2) SLAM, ubiquitous tracking, sensor network -> free from prior knowledge
3) pervasive middleware <- information fusion algorithms

2. Interaction and User Interfaces

"Historically, human knowledge, experience and emotion are expressed and communicated in words and pictures. Given the advances in interface and data capturing technology, knowledge, experience and emotion might now be presented in the form of AR content."

3. AR Displays

pico projectors
http://gizmodo.com/tag/pico-projectors
http://en.wikipedia.org/wiki/Pico_projector

Studierstube Augmented Reality Project
: software framework for the development of Augmented Reality (AR) and Virtual Reality applications
Graz University of Technology (TU Graz)

Sharedspace project
The Human Interface Technology Laboratory (HITLab) at the University ofWashington and ATR Media Integration & Communication in Kyoto,Japan join forces at SIGGRAPH 99

The Invisible Train - A Handheld Augmented Reality Game

AR Tennis
camera based tracking on mobile phones in face-to-face collaborative Augmented Reality

Emmie - Environment Management for Multi-User Information Environments

VITA: visual interaction tool for archaeology

HMD = head-mounted displays

OST = optical see-through

VST = video see-through

ELMO: an Enhanced optical see-through display using an LCD panel for Mutual Occlusion

FOV
http://en.wikipedia.org/wiki/Field_of_view_(image_processing)

HMPD = head-mounted projective displays

The Touring Machine

MARS - Mobile Augmented Reality Systems

Klimt - the Open Source 3D Graphics Library for Mobile Devices

AR Kanji - The Kanji Teaching application

references

Ronald T. Azuma http://www.cs.unc.edu/~azuma/
A Survey of Augmented Reality. Presence: Teleoperators and Virtual Environments 6, 4 (August 1997), 355 - 385. Earlier version appeared in Course Notes #9: Developing Advanced Virtual Reality Applications, ACM SIGGRAPH '95 (Los Angeles, CA, 6-11 August 1995), 20-1 to 20-38.

Ronald Azuma, Yohan Baillot, Reinhold Behringer, Steven Feiner,Simon Julier, Blair MacIntyre
Recent Advances in Augmented Reality.IEEE Computer Graphics and Applications 21, 6 (Nov/Dec 2001),34-47.

Ivan E. Sutherland
The Ultimate Display, IFIP `65, pp. 506-508, 1965

Kato, H. Billinghurst, M. Poupyrev, I. Imamoto, K. Tachibana, K. Hiroshima City Univ.
Virtual object manipulation on a table-top AR environment

Sandor, C., Olwal, A., Bell, B., and Feiner, S. 2005.
Immersive Mixed-Reality Configuration of Hybrid User Interfaces.
In Proceedings of the 4th IEEE/ACM international Symposium on Mixed and Augmented Reality(October 05 - 08, 2005). Symposium on Mixed and Augmented Reality. IEEEComputer Society, Washington, DC, 110-113. DOI=http://dx.doi.org/10.1109/ISMAR.2005.37

An optical see-through display for mutual occlusion with a real-time stereovision system
Kiyoshi Kiyokawa, Yoshinori Kurata and Hiroyuki Ohno
Computers & Graphics Volume 25, Issue 5, October 2001, Pages 765-779

Bimber, O., Fröhlich, B., Schmalstieg, D., and Encarnação, L. M. 2005.
The virtual showcase. In ACM SIGGRAPH 2005 Courses (Los Angeles, California, July 31 - August 04, 2005). J. Fujii, Ed. SIGGRAPH '05. ACM, New York, NY, 3. DOI= http://doi.acm.org/10.1145/1198555.1198713

Bimber, O., Wetzstein, G., Emmerling, A., and Nitschke, C. 2005.
Enabling View-Dependent Stereoscopic Projection in Real Environments. In Proceedings of the 4th IEEE/ACM international Symposium on Mixed and Augmented Reality (October 05 - 08, 2005). Symposium on Mixed and Augmented Reality. IEEE Computer Society, Washington, DC, 14-23. DOI= http://dx.doi.org/10.1109/ISMAR.2005.27

Cotting, D., Naef, M., Gross, M., and Fuchs, H. 2004.
Embedding Imperceptible Patterns into Projected Images for Simultaneous Acquisition and Display. In Proceedings of the 3rd IEEE/ACM international Symposium on Mixed and Augmented Reality (November 02 - 05, 2004). Symposium on Mixed and Augmented Reality. IEEE Computer Society, Washington, DC, 100-109. DOI= http://dx.doi.org/10.1109/ISMAR.2004.30

Ehnes, J., Hirota, K., and Hirose, M. 2004.
Projected Augmentation - Augmented Reality using Rotatable Video Projectors. In Proceedings of the 3rd IEEE/ACM international Symposium on Mixed and Augmented Reality (November 02 - 05, 2004). Symposium on Mixed and Augmented Reality. IEEE Computer Society, Washington, DC, 26-35. DOI= http://dx.doi.org/10.1109/ISMAR.2004.47

Arango, M., Bahler, L., Bates, P., Cochinwala, M., Cohrs, D., Fish, R., Gopal, G., Griffeth, N., Herman, G. E., Hickey, T., Lee, K. C., Leland, W. E., Lowery, C., Mak, V., Patterson, J., Ruston, L., Segal, M., Sekar, R. C., Vecchi, M. P., Weinrib, A., and Wuu, S. 1993.
The Touring Machine system. Commun. ACM 36, 1 (Jan. 1993), 69-77. DOI= http://doi.acm.org/10.1145/151233.151239

Gupta, S. and Jaynes, C. 2006.
The universal media book: tracking and augmenting moving surfaces with projected information. In Proceedings of the 2006 Fifth IEEE and ACM international Symposium on Mixed and Augmented Reality (Ismar'06) - Volume 00 (October 22 - 25, 2006). Symposium on Mixed and Augmented Reality. IEEE Computer Society, Washington, DC, 177-180. DOI= http://dx.doi.org/10.1109/ISMAR.2006.297811

Klein, G. and Murray, D. 2007.
Parallel Tracking and Mapping for Small AR Workspaces. In Proceedings of the 2007 6th IEEE and ACM international Symposium on Mixed and Augmented Reality - Volume 00 (November 13 - 16, 2007). Symposium on Mixed and Augmented Reality. IEEE Computer Society, Washington, DC, 1-10. DOI= http://dx.doi.org/10.1109/ISMAR.2007.4538852

Neubert, J., Pretlove, J., and Drummond, T. 2007.
Semi-Autonomous Generation of Appearance-based Edge Models from Image Sequences. In Proceedings of the 2007 6th IEEE and ACM international Symposium on Mixed and Augmented Reality - Volume 00 (November 13 - 16, 2007). Symposium on Mixed and Augmented Reality. IEEE Computer Society, Washington, DC, 1-9. DOI= http://dx.doi.org/10.1109/ISMAR.2007.4538830

'Computer Vision' 카테고리의 다른 글

임현, 이영삼 <이동로봇의 동시간 위치인식 및 지도작성(SLAM)> (3)	2009.07.21
Georg Klein & David Murrayt <Parallel Tracking and Mapping for Small AR Workspaces> (0)	2009.07.15
to install Freeimage on mac (0)	2009.07.08
photometric stereo 09-07-03 (0)	2009.07.06
GO MC PS references (0)	2009.07.03

posted by maetel

Photometric stereo 2009-06-07

2009. 6. 8. 15:17 Computer Vision

To do: Surface shape reconstruction
: Least squares surface fitting

ref.
http://www.cs.washington.edu/education/courses/csep576/05wi/projects/project3/project3.htm

Forsyth and Ponce (84-86p) : 5.4.2 Shape from Normals : Shape by Integration

http://en.wikipedia.org/wiki/Lucas%E2%80%93Kanade_Optical_Flow_Method

- path integration
- least squares optimization
- Lucas-Kanade algorithm

http://mathworld.wolfram.com/LeastSquaresFitting.html

/* We want to solve Mz = v in a least squares sense. The
solution is M^T M z = M^T v. We denote M^T M as A and
M^T v as b, so A z = b. */

CMatrixSparse<double> A(M.mTm());
    assert(A.isSymmetric());
    CVector<double> r = A*z; /* r is the "residual error" */
    CVector<double> b(v*M);

// solve the equation A z = b
solveQuadratic<double>(A,b,z,300,CGEPSILON);

// copy the depths back from the vector z into the image depths
copyDepths(z,zind,depths);

template <class T>
double solveQuadratic(const CMatrixSparse<T> & A, const CVector<T> & b,
CVector<T> & x,int i_max, double epsilon)
{
//my conjugate gradient solver for .5*x'*A*x -b'*x, based on the
// tutorial by Jonathan Shewchuk (or is it +b'*x?)

printf("Performing conjugate gradient optimization\n");

int numvars = x.Length();
assert(b.Length() == numvars && A.rows() == numvars &&
A.columns() == numvars);

int i =0;

CVector<T> r = b-A*x;
CVector<T> d= r;
double delta_new = r.dot(r);
double delta_0 = delta_new;

int numrecompute = (int)floor(sqrt(float(numvars)));
//int numrecompute = 1;
printf("numrecompute = %d\n",numrecompute);

printf("delta_new = %f\n", delta_new);
while (i < i_max && delta_new > epsilon)//epsilon*epsilon*delta_0)
    {
      printf("Step %d, delta_new = %f      \r",i,delta_new);

      CVector<T> q = A*d;
      double alpha = delta_new/d.dot(q);
      x.daxpy(d,alpha); //      x += d*alpha;
      if (i % numrecompute == 0)
{
   //   printf(" ** recompute\n");
   r = b-A*x;
}
      else
r.daxpy(q,-alpha); // r = r-q*alpha;
      double delta_old = delta_new;
      delta_new = r.dot(r);
      double beta = delta_new/delta_old;
      d = r+d*beta;
      i++;
    }

return delta_new;
// return delta_new <= epsilon;
// return !(delta_new > epsilon*epsilon*delta_0);
}

surfNorm.cpp

lights.txt

'Computer Vision' 카테고리의 다른 글

Reinhard Diestel <Graph Theory> (0)	2009.06.16
photometric stereo 2009-06-10 (0)	2009.06.10
Photometric stereo 2009-06-05 (0)	2009.06.05
photometric stereo 2009-05-15 (0)	2009.05.15
Criminisi, Reid, Zisserman <Single View Metrology> (0)	2009.05.06

posted by maetel

A. J. Davison <Real-time simultaneous localisation and mapping with a single camera>

2009. 3. 31. 21:10 Computer Vision

Real-time simultaneous localisation and mapping with a single camera

Davison, A.J.
Dept. of Eng. Sci., Oxford Univ., UK;

Real-Time Simultaneous Localisation and Mapping with a Sing.pdf

This paper appears in: Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on
Publication Date: 13-16 Oct. 2003
On page(s): 1403-1410 vol.2
ISBN: 0-7695-1950-4
INSPEC Accession Number: 7971070
Digital Object Identifier: 10.1109/ICCV.2003.1238654
Current Version Published: 2008-04-03

'Computer Vision' 카테고리의 다른 글

Rosten & Drummond <Machine learning for high-speed corner detection> (0)	2009.04.03
Montemerlo & Thrun <Simultaneous localization and mapping with unknown data association using FastSLAM> (0)	2009.03.31
<A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking> (0)	2009.03.31
Frank C. Park and Bryan J. Martin <Robot Sensor Calibrattion: Solving AX = XB on the Euclidean Group> (0)	2009.03.31
Rao-Blackwellized Particle Filter (0)	2009.03.31

posted by maetel

people in SLAM

2009. 3. 27. 21:05 Computer Vision

Hugh F. Durrant-Whyte, Australian Centre for Field Robotics
http://en.wikipedia.org/wiki/Hugh_F._Durrant-Whyte

John J. Leonard, Center for Ocean Engineering, MIT

Sebastian Thrun, Stanford Artificial Intelligence Laboratory, Stanford University
http://en.wikipedia.org/wiki/Sebastian_Thrun

David Nistér, Center for Visualization and Virtual Environments, University of Kentucky

Ethan Eade, Machine Intelligence lab, Engineering Department, Cambridge University

Tom Drummond, Machine Intelligence Laboratory, Engineering Department, Cambridge University

Javier Civera, Departamento de Informática e Ingeniería de Sistemas, Universidad de Zaragoza

Andrew J. Davison, Reader in Robot Vision at the Department of Computing, Imperial College London

Jose Maria Martinez Montiel, Robotics and Real Time Group, Universidad de Zaragoza

Robert Castle, Active Vision Laboratory, Robotics Research Group, Oxford University

임현, Embedded control system 연구실, 전기공학부, 인하대학교

김정호, Robotics and Computer Vision 연구실 (권인소), 한국과학기술원

labs

Active Vision Goup, Robotics Research Group, Engineering Department, Oxford University

Computer Vision & Robotics Group, Machine Intelligence Laboratory, Department of Engineering, University of Cambridge

Image Information Processing Lab 영상정보처리연구실 (홍기상), 포항공대

Intelligent Control and Systems Lab 지능제어 및 시스템 연구실 (김상우), 포항공대

'Computer Vision' 카테고리의 다른 글

Rao-Blackwellized Particle Filter (0)	2009.03.31
Ethan Eade & Tom Drummond <Scalable Monocular SLAM> (0)	2009.03.27
Civera, Davison & Montiel <Inverse Depth Parametrization for Monocular SLAM> (0)	2009.03.26
camera calibration 09-02-16 (0)	2009.02.16
Special Issue on Visual SLAM (IEEE Transactions on Robotics, Vol. 24, No. 5) (0)	2009.02.14

posted by maetel

Special Issue on Visual SLAM (IEEE Transactions on Robotics, Vol. 24, No. 5)

2009. 2. 14. 17:27 Computer Vision

IEEE Transactions on Robotics, Volume 24, Number 5, October 2008
: Visual SLAM Special Issue

Guest Editorial: Special Issue on Visual SLAM

Special Issue on Visual SLAM.pdf

simultaneous localization mapping (SLAM)
in autonomous mobile robotics
using laser range-finder sensors
to build 2-D maps of planar environments

SLAM with standard cameras:
feature detection
data association
large-scale state estimation

SICK laser scanner

Kalman filter
Particle filter
submapping

http://en.wikipedia.org/wiki/Particle_filter
particle filter = sequential Monte Carlo methods (SMC)

http://en.wikipedia.org/wiki/Image_registration
the process of transforming the different sets of data into one coordinate system

http://en.wikipedia.org/wiki/Simultaneous_localization_and_mapping
the process of creating geometrically accurate maps of the environment
to build up a map within an unknown environment while at the same time keeping track of their current position

R.C. Smith and P. Cheeseman (1986)

Hugh F. Durrant-Whyte (early 1990s)

Sebastian Thrun

mobile robotics
autonomous vehicle

한국로봇산업협 http://www.korearobot.or.kr/

'Computer Vision' 카테고리의 다른 글

Civera, Davison & Montiel <Inverse Depth Parametrization for Monocular SLAM> (0)	2009.03.26
camera calibration 09-02-16 (0)	2009.02.16
IMU sensor calibration (0)	2009.02.06
SLAM, AR (0)	2009.02.06
[유찬기] OpenCV & CV #4: Camera Projection Model (0)	2009.02.02

posted by maetel

2.4 Color images & 2.5 Cameras

2008. 10. 16. 20:42 @GSMC/박래홍: Computer Vision

2.4 Color images

2.4.1 Physics of color

2.4.2 Color perceived by humans

XYZ color space
http://en.wikipedia.org/wiki/Xyz_color_space

color matching functions
: numerical description of the chromatic response of the observer

color gamut:
subsapce of colors perceived by humans

http://en.wikipedia.org/wiki/Gamut
Digitizing a photograph, converting a digitized image to a different color space, or outputting it to a given medium using a certain output device generally alters its gamut, in the sense that some of the colors in the original are lost in the process.

37p

2.4.3 Color spaces

http://en.wikipedia.org/wiki/Color_space

http://en.wikipedia.org/wiki/Rounding_error

(1) RGB color space
- CRT
- relative color standard
- additive color mixing
- sRGB, Adobe RGB, Adobe Wide Gamut RGB

http://en.wikipedia.org/wiki/RGB_color_space
The complete specification of an RGB color space also requires a white point chromaticity and a gamma correction curve.

(2) YIQ color space

- additive color mixing
- luminance (the perceived energy ot a light source)

http://en.wikipedia.org/wiki/YIQ
The Y component represents the luma information, and is the only component used by black-and-white television receivers. I and Q represent the chrominance information.

(3) YUV color space

http://en.wikipedia.org/wiki/YUV
Y' stands for the luma component (the brightness) and U and V are the chrominance (color) components.

(4) CMY color space
- subtractive color mixing (in printing process)
- sets of inks, substrates, and press characteristics

http://en.wikipedia.org/wiki/CMYK_color_model
masking certain colors on the typically white background (that is, absorbing particular wavelengths of light)

(5) HSV color space
- Hue, Saturation, Value (; HSB, Hue, Saturation, Brightness ; HSI; Hue, Saturation, Intensity)
- => image enhancement algorithms

http://en.wikipedia.org/wiki/HSL_and_HSV
Because HSL and HSV are simple transformations of device-dependent RGB, the color defined by a (h, s, l) or (h, s, v) triplet depends on the particular color of red, green, and blue “primaries” used. Each unique RGB device therefore has unique HSL and HSV spaces to accompany it. An (h, s, l) or (h, s, v) triplet can however become definite when it is tied to a particular RGB color space, such as sRGB.

39p

2.4.4 Palette images

palette images; indexed images

lookup table; color table; color map; index register; palette

The number of colors in the image exceeds the number of entries in the lookup table and a subset of colors has to be chosen, and a loss of information occurs.

http://en.wikipedia.org/wiki/Palette_(computing)

http://en.wikipedia.org/wiki/List_of_palettes

http://en.wikipedia.org/wiki/Indexed_color

vector quantization (-> 14.4)
1) to check which colors appear in the image by creating histograms for all three color components
2) to quantize them to provide more shades for colors which occur in the image frequently
3) to find the nearest color in the lookup table to represent the color

=> to analyze large multi-dimensional datasets

http://en.wikipedia.org/wiki/Vector_quantization

http://en.wikipedia.org/wiki/Cluster_analysis
Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait - often proximity according to some defined distance measure.

http://en.wikipedia.org/wiki/K_means
The k-means algorithm is an algorithm to cluster n objects based on attributes into k partitions, k < n. It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data.

pseudocolor

Palette selection depends on the semantics of the image and is an interactive process.

2.4.5 Color constancy

The same surface color under diffrent illumination

von Kries coefficients
http://en.wikipedia.org/wiki/Color_vision

> color compensations

- to scale the sensitivity of each sensor type

=> automatic white balancing

- to assume that the brightest point in the image has the color of the illumination

2.5 Cameras: an overview

2.5.1 Photosensitive sensors

> photosensitive sensors in cameras
(1) photo-emission

photoelectric effect (; Hertz Effect)
http://en.wikipedia.org/wiki/Photoelectric_effect

photomultipliers
vacuum tube TV cameras

(2) photovoltanic

http://en.wikipedia.org/wiki/Solar_cell

photodiode
night vision camera
photoresistor
Schottky photodiode

> semiconductor photoresistive sensors
(1) CCDs (charge-coupled devices)
- arranged into a matrix-like grid of pixels (CCD chip)
- blooming effect
- shift registers
- SNR fropped

(2) CMOS (complementary metal oxide semiconductor)
- all of the pixel area devoted to light capture
- matrix-like sensors

2.5.2 A monochromatic camera

2.5.3 A color camera

2.6 Summary

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch.10 Image Understanding (0)	2008.12.15
Ch.9 Object Recognition (0)	2008.11.25
Ch. 6 Segmentation I (0)	2008.09.20
Ch. 2 The image, its representations and properties (0)	2008.09.10
Ch.1 Introduction (0)	2008.09.04

posted by maetel

Ch. 6 Segmentation I

2008. 9. 20. 17:11 @GSMC/박래홍: Computer Vision

175p

> to divide an image into parts with a strong correlation with objects or areas of the real world contained in the image
- complete segmentation
: lower-level (context independent processing using no object-related model)
- partial segmentation
: dividing an image into separate regions that are homogeneous with respect to a chosen property (such as brightness, color, reflectivity, texture, etc)

http://en.wikipedia.org/wiki/Segmentation_(image_processing)

> segmentation methods
(1) global knowledge - a histogram of image features
(2) edge-based segmentations <- edge detection
(3) region-based segmentations <- region growing

eg. (2) + (3) => a region adjacency graph
http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=16938&objectType=FILE

cf. topological data structure

176p

6.1 Thresholding

Thresholding
: the transformation of an input image to an output (segmented) binary image

interactive threshold selection / threshold detection method

> gray-level
global thresholding
using a single threshold for the whole image
adaptive thresholding
using variable thresholds, dep. on local image characteristics
(<- non-uniform lighting, non-uniform input device parameters)
band thresholding
using limited gray-level subsets
(-> blood cell segmentations, border detector)
+ multiple thresholds
semi-thresholding
human-assisted analysis

> gradient, a local texture property, an image decomposition criterion

http://en.wikipedia.org/wiki/Thresholding_(image_processing)

179p

6.1.1 Threshold detection methods

p-tile thresholding
<- prior information about area ratios

histogram shape analysis
: gray values between the two peaks probably result from border pixels between objects and background

multi-thresholding

mode method
: to find the highest local maxima first and detect the threshold as a minimum between them

to build a histogram with a better peak-to-valley ratio
(1) to weight histogram contributions to suppress the influence of pixels with a high image gradient
(2) to use only high-gradient pixels to form the gray-level histogram which should be unimodal corresponding to the gray-level of borders

histogram transformation

hitogram concavity analysis

entropic methods

relaxation methods

multi-thresholding methods

ref. A survey of thresholding techniques
Computer Vision, Graphics, and Image Processing, 1988
PK Sahoo, S Soltani, AKC Wong, YC Chen

180p

6.1.2 Optimal thresholding

optimal thresholding
: using a weighted sum of two or more probability densities with normal distribution

"estimating normal distribution parameters together with the uncertainty that the distribution may be considered normal"

ref.
Rosenfeld and Kak, 1982
Gonzalez and Wintz, 1987

minimization of variance of the histogram
sum of square errors
spatial entropy
average clustering

ref. An analysis of histogram-based thresholding algorithms
CVGIP: Graphical Models and Image Processing, 1993
CA Glasbey

The threshold is set to give minimum probability of segmentation error.

The method of a combination of optimal and adaptive thresholding:
(1) to determine optimal gray-level segmentation parameters in local sub-regions for which local histograms are constructed
(2) to model each local histogram as a sum of n Gaussian distribution
(3) to determine the optimal parameters of the Gaussian distributions by minimizaing the fit function.

Levenberg-Marquardt minimization
http://en.wikipedia.org/wiki/Levenberg-Marquardt_algorithm

183p

6.1.3 Multi-spectral thresholding

to determine thresholds independently in each spectral band and combine them into a single segmented image

to analyze multi-dimensional histograms (instead of histograms for each spectral band)

pre-processing - adjusting region shape (eg. boundar stretching)

Regions are formed from pixels with similar properties in all spectral bands, with similar n-dimensional description vectors.

184p

6.2 Edge-based segmentation

Edges mark image locations of discontinuities in gray-level, color, texture, etc.

to combine edges into edge chains that correspond better with borders in the image

partial segmentation
- to group local edges into an image where only edge chains with a correspondence to existing objects or image parts are present

185p

6.2.1 Edge image thresholding

quantization noise, small lighting irregularities => edge image

p-tile thresholding
using orthogonal basis functions (Flynn, 1972)

Sobel Mask (<- 5.3.2)
http://en.wikipedia.org/wiki/Sobel_operator

Canny edge detection (<- 5.3.5)
http://en.wikipedia.org/wiki/Canny_edge_detector

simple detectors => thickening - (directional information) -> non-maximal suppression (Canny edge detection <- 5.3.5)

Non-maximal suppression of directional edge data:
(1) quantize edge directions
(2) inspect the two adjacent pixels indicated by the direction of its edge
(3) if the edge magnitude of either of these two exceeds that of the pixel under inspection, delete it

Hysteresis approach (<- 5.3.5)
http://100.naver.com/100.nhn?docid=227115
http://en.wikipedia.org/wiki/Hysteresis
http://en.wikipedia.org/wiki/Chaotic_hysteresis

Hysteresis to filter output of an edge detector:
- if a pixel with suitable edge magnitude borders another already marked as an edge, then mark it too

Canny reports choosing the ratio of higher to lower threshold to be in the range 2 to 3.

A computational approach to edge detection
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986
J Canny

188p

6.2.2 Edge relaxation

edge context evaluation -> continuous border construction:
Based on the strength of edges in a specified local neighborhood, the confidence of each edge is either increased or decreased.

ref. Extracting and labeling boundary segments in natural scenes
IEEE Transactions on Pattern Analysis and Machine Intelligence, 1980
Prager, J. M.

ref. Hanson and Riseman, 1978

vertex type -> type of edge => possible border continuations:
the number of edges emanating from the vertex of an edge represent the type of the edge.

edge relaxation
: an iterative method, with edge confidences converging either to zero (edge termination) or one (the edge forms a border)

production system
http://en.wikipedia.org/wiki/Production_system

ref. Ballard and Brown's Computer Vision, 1982
3 Early Processing - 3.3 Finding Local Edges - 3..3.5 Edge Relaxation

bandb3_3.pdf

parallel implimentation
http://en.wikipedia.org/wiki/Parallel_adoption

191p

6.2.3 Border tracing

(1) inner boundary tracing
(2) outer boundary tracing

The outer region border is useful for deriving properties such as perimeter, compactness, etc.

inter-pixel boundary of adjacent regions -> extended borders:
The existence of a common border between regions makes it possible to incorporate into the boundary tracing a boundary description process.
-> an evaluated graph consisting of border segments and vertices

extended boundary (Fig. 6.18)
: obtained by shifting all the UPPER outer boundary points one pixel down and right,
shifting all the LEFT outer boundary points one pixel ot the right,
shifting all the RIGHT outer boundary points one pixel down,
and the LOWER outer boundary point positions remain unchanged

Extended boundary has the same shape and size as the natural object boundary.

(3) extended boundary tracing

Detecting common boundary segments between adjacent regions and vertex points in boundary segment connections is based on a look-up table depending on the previous detected direction of boundary and on the status of window pixels which can be inside or outside a region.

chain code
double-linked lists

ref. A contour tracing algorithm that preserves common boundaries between regions
CVGIP: Image Understanding
Yuh-Tay Liow, 1991

(4) multi-dimensional gradients
Edge gradient magnitudes and directions are computed in pixels of probable border continuation.

(5) finite topological spaces and cell complexes
-> component labeling (-> 8.1), object filling, shape description

197p

6.2.4 Border detection as graph searching

graph
: a general structure consisting of a set of nodes and arcs between the nodes.

costs
: weights of arcs

The border detection process ->
a search for the optimal path in the weighted graph + cost minimization

pixels - graph nodes weighted by a value of edge magnitude
edge diretions - an arcs matched with local border direction

(1)

A-algorithm graph search:
1. put all successors of the starting node with pointers to an OPEN list
2. remove the node with lowest associated cost
3. expand the node and put its successors on the OPEN list with pointers
4. if the node is the ending point, stop

http://en.wikipedia.org/wiki/A*_search_algorithm
Nils J. Nilsson

http://en.wikipedia.org/wiki/Monotonic
a monotonic function (or monotone function) is a function which preserves the given order.

oriented weighted-graph expansion

OPEN list

The optimal path is defined by back-tracking.

(2)

pre-processing (straightening the image data) for thin, elongated objects:
The edge image is geometrically warped by re-sampling the image along profile lines perpendicular to the approximate position of the sought border.

(3)
gradient field transform (based on a graph-search)

The estimate of the cost of the path from the current node to the end node has a substantial influence on the search behavior.

breadth-first search

raw cost function; inverted edge image

optimal graph search vs. heuristic graph search

> the evaluation cost functions (for graph-search border detection)
1) strength of edges forming a border
"If a border consists of strong edges, the cost of that border is small."
2) border curvature
"Borders with a small curvature are preferered."
3) proximity to an approximate border location
"The distance from the approximate boundary has additive or multiplicative influence on the cost."
4) estimates of the distance to the goal (end point)

Gaussian cost transformation:
the mean of the Gaussian distribution - the desired edge strength
the standard deviation - the interval of acceptable edge strengths

a good path with a higher cost vs. worse paths with lower costs
=> expansion of 'bad' nodes representing shorter paths with lower total costs
(* But, a good estimate of the path cost from the current node to the goal is not usually available.)

> modifications
- pruning the solution tree
- least maximum cost
- branch and bound
- lower bound
- multi-resolution processing
- incorporation of higher-level knowledge

Graph searching techniques ensure global optimality of the detected contour.

> detection of approximately stright contours
1) geometrical transformation (: polar-torectangular co-ordinate transformation)
-> to straighten the contour
2) dividing line (to divide the image with the non-convex parts of the contour)
-> to search in opposite directions to separate

> searching without knowledge of the start and end points
- based on magnitudes and directions of edges
- to merge edge into edge chains (partial borders)
- bi-directional heuristic search
- bottom-up control strategy (->ch.10)

ref. Edge and Line Feature Extraction Based on Covariance Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Ferdinand van der Heijden, 1995

207p

6.2.5 Border detection as dynamic programming

http://en.wikipedia.org/wiki/Dynamic_programming
: a method of solving problems exhibiting the properties of overlapping subproblems and optimal substructure (described below) that takes much less time than naive methods

Dynamic programming is an optimization method based on the principle of optimality.

(6.22) (8-connected border with n nodes, m-th graph layer)
C(x_k(m+1)) = min ( C(x_i(m) + g(i,k)_(m) )

the number of cost combination computations for each layer = 3n
the total number of cost combination computatios = 3n(M-1) + n

- A-algorithm-based graph search does not require explicit definition of the graph.
- DP presents an efficient way of searching for optimal paths from multiple or unknown starting and ending points.
- Which approach is more efficient depends on evaluation functions and on the quality of heuristics for an A-algorithm.
- DP is faster and less memory demanding for a word recoginition
- DP is more computationally efficient than edge relaxation.
- DP is more flexible and less restrictive than the Hough transform.
- DP is powerful in the presence of noise and in textured images.

live wire; intelligent scissors
: an interactive real-time border detection method combines automated border detection with manual definition of the boundary start point and interactive positioning of the end point.
http://en.wikipedia.org/wiki/Livewire_Segmentation_Technique

In DP, the graph that is searched is always completely constructed at the beginning of the search process.

live lane
: tracing the border by moving a square window whose size is adaptively defined from the speed and acceleration of the manual tracing

+ automated determination of optimal border features from examples of the correct borders
+ specification of optimal parameters of cost transforms

212p

6.2.6 Hough transforms

Hough transform:
objects with known shape and size -- (data processing) --> shape distortion, rotation, zoom -- (moving a mask) --> correlation (determining the pixel with the highest frequency of occurrence) + known information

- images with incomplete information about the searched objects
- additional structures and noise

http://en.wikipedia.org/wiki/Hough_transform

The purpose of the technique is to find imperfect instances of objects within a certain class of shapes by a voting procedure. This voting procedure is carried out in a parameter space, from which object candidates are obtained as local maxima in a so-called accumulator space that is explicitly constructed by the algorithm for computing the Hough transform.

http://en.wikipedia.org/wiki/Generalised_Hough_Transform
In the Generalized Hough Transform, the problem of finding the model's position is transformed into a problem of finding the transformation parameter that maps the model onto the image. As long as we know the value of the transformation parameter, then the position of the model in the image can be determined.

original Hough transform (if analytic equations of object borderlines are known - no prior knowledge of region position is necessary)
--> generalized Hough transform (even if an analytic expression of the border is not known)

Any straight line in the image is represented by a single point in the parameter space and any part of this straight line is transformed into the same point.

Hough transform
1) to determine all the possible line pixels in the image
2) to transform all lines that can go through these pixels into corresponding points in the parameter space
3) to detect the points in the parameter space

The possible directions of lines define a discretization of the parameter.

parameter space -> rectangular structure of cells -> accumulator array (whose elements are accumulator cells)

Lines existing in the image may be detected as high-values accumulator cells in the accumulator array, and the parameters of the detected line are specified by the accumulator array co-ordinates.
-> Line detection in the image is transformed to detection of local maxima in the accumulator space.
-> A noisy or approximately straight line will not be transformed into a point in the parameter space, but rather will result in a cluster of points, (and the cluster center of gravity can be considered the straight line representation.)

- missing parts, image noise, other non-line structures co-existing in the image, data imprecision

Generalized Hough transform
- arbitrary shapes
- partial or slightly deformed shapes, occluded objects
- measuring similarity between a model and a detected object
- robust to image noise
- search for several occurrences of a shape

Randomized Hough transform

221p

6.2.7 Border detection using border location informationd

(1) the location of significant edges positioned close to an assumed border
the edge directions of the significant edges match the assumed boundary direction
-> an approximate curve is computed to result in a new, more accurate border

(2) prior knowledge of end points (assuming low image noise and straight boundaries)
search for the strongest edge located on perpendiculars to the line connecting end points of each partition (; perpendiculars are located at the center of the connecting straight line)

(3) contour detection - active contour models (snakes) (-> 7.2)
user-provided knowledge about approximate position and shape of the required contour

222p

6.2.8 Region construction from borders

methods to construct regions from partial borders

(1) superslice method
thresholding for which the detected boundaries best coincide with assumed boundary segments

(2) probabilities that pixels are located inside a region closed by the partial borders
A pixel is a potential region member if it is on a straight line connecting two opposite edge pixels.

223p

6.3 Region-based segmentation

to divide an image into zones of maximum homogeneity

225p

6.3.1 Region merging

http://en.wikipedia.org/wiki/State_space_search
The set of states form a graph where two states are connected if there is an operation that can be performed to transform the first state into the second.

6.3.2 Region splitting

6.3.3 Splitting and merging

6.3.4 Watershed segmentation

6.3.5 Region growing post-processing

6.4 Matching

6.4.1 Matching criteria

6.4.2 Control strategies of matching

6.5 Evaluation issues in segmentation

6.5.1 Supervised evaluation

6.5.2 Unsupervised evaluation

6.6 Summary

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch.10 Image Understanding (0)	2008.12.15
Ch.9 Object Recognition (0)	2008.11.25
2.4 Color images & 2.5 Cameras (0)	2008.10.16
Ch. 2 The image, its representations and properties (0)	2008.09.10
Ch.1 Introduction (0)	2008.09.04

posted by maetel

Ch. 2 The image, its representations and properties

2008. 9. 10. 23:15 @GSMC/박래홍: Computer Vision

ltlslChapter 2. The image, its representations and properties

2.1 Image representations, a few concepts

mathematical models
signals
scalar (monochrome image) / vector (color image) function

The domain of a given function is the set of "input" values for which the function is defined.
The range of a function is the set of all "output" values produced by that function.

12p
Image Functions
f(x,y) or f(x,y,t)
- perspective: parallel (or orthographic) projection
- static/dynamic, monochrome/color
- resolution: spatial/spectral/radiometric/time
- deterministic/stochastic

The 2D intensity image is the result of a perspective projection of the 3D scene.

13p
Parallel/Orthographic projection
http://en.wikipedia.org/wiki/Orthographic_projection
http://en.wikipedia.org/wiki/Orthographic_projection_(geometry)
http://en.wikipedia.org/wiki/Orthogonal_projection
http://en.wikipedia.org/wiki/Graphical_projection

A full 3D representation is
(1) independent of the viewpoint
(2) expressed in the co-ordinate system of the object (rather than of the viewer)

=> Any intensity image view of the objects may be synthesized by standard computer graphics techniques.

microstructure
http://en.wikipedia.org/wiki/Microstructure

14p

Quality of a digital image
1. spatial resolution
2. spectral resolution
3. radiometric resolution
4. time resolution

Images f(x,y)
: deterministic functions / realizations of stochastic processes

linear system theory
integral transforms
discrete mathematics
theory of stochastic processes

A 'well-behaved' image function f(x,y) is integrable, has an invertible Fourier transform, etc.

2.2 Image digitization

sampled - a matrix with M rows and N columns
quantization - K interval (an integer value)

Image quantization assigns to each continuous sample an integer value - the continuous range of the image function f(x,y) is split into K intervals.

2.2.1 Sampling

Shannon's theorem (->3.2.5)

http://en.wikipedia.org/wiki/Shannon%27s_theorem
In information theory, the noisy-channel coding theorem establishes that however contaminated with noise interference a communication channel may be, it is possible to communicate digital data (information) nearly error-free up to a given maximum rate through the channel. This surprising result, sometimes called the fundamental theorem of information theory, or just Shannon's theorem, was first presented by Claude Shannon in 1948.The Shannon limit or Shannon capacity of a communications channel is the theoretical maximum information transfer rate of the channel, for a particular noise level.

TV - 512 * 512
PAL - 768 * 576
NTSC - 640 * 480

15p
raster
: the grid on which a neighborhood relation between points is defined
http://en.wikipedia.org/wiki/Rasterisation

Dirac impulses
http://en.wikipedia.org/wiki/Dirac_delta_function

The pixel captured by a real digitization device has finite size, since the sampling function is not a collection of ideal Dirac impulses but a collection of limited impulses.
-> 3.2.5

2.2.2 Quantization

Quantization is the transition between continuous values of the image function (brightness) and its digital equivalent.

The number of brightness of displays normally provide a range of at least 100 intensity levels.

average local brightness => gray-scale transformation techniques
(-> 5.1.2)

2.3 Digital image properties

17p
2.3.1 Metric and topological properties of digital images

Distance - identity, symmetry, triangular inequality

1) Euclidean distance, D_E
Pythagorean metric
http://en.wikipedia.org/wiki/Euclidean_distance
2) 'city block' distance (; L1 metric; Manhattan distance), D_4
rectilinear distance, L1 distance or L1 norm (see Lp space), city block distance, Manhattan distance, or Manhattan length
http://en.wikipedia.org/wiki/City_block_distance
3) 'chessboard' distance, D_8
Chebyshev distance (or Tchebychev distance), or L∞ metric
http://en.wikipedia.org/wiki/Chebyshev_distance
4) quasi-Euclidean distance D_QE

18p

region
: a connected set (in a set theory)
: a set of pixels in which there is a path between any pair of its pixels, all of whose pixels also belong to the set
: a set of pixels in which each pair of pixels is contiguous

object (image data interpretation: segmentation)

hole
: points which do not belong to the object and are surrounded by the object

background

The relation 'to be contiguous' decomposes an image into individual regions.

19p

contiguity paradox
paradoxes of crossing lines
connectivity problems

ref.
Digital Geometry: Geometric Methods for Digital Picture Analysis
Reinhard Klette, Azriel Rosenfeld (Morgan Kaufmann, 2004)

topology based on cellular complexes
http://en.wikipedia.org/wiki/Bernhard_Riemann
http://en.wikipedia.org/wiki/Differential_geometry

ref.
Algorithms in Digital Geometry Based on Cellular Topology

kovalevski.pdf

V. Kovalevsky
University of Applied Sciences Berlin
http://www.kovalevsky.de/; kovalev@tfh-berlin.de
Abstract. The paper presents some algorithms in digital geometry based on the
topology of cell complexes. The paper contains an axiomatic justification of the
necessity of using cell complexes in digital geometry. Algorithms for solving
the following problems are presented: tracing of curves and surfaces,
recognition of digital straight line segments (DSS), segmentation of digital
curves into longest DSS, recognition of digital plane segments, computing the
curvature of digital curves, filling of interiors of n-dimensional regions
(n=2,3,4), labeling of components (n=2,3), computing of skeletons (n=2, 3).

20p

distance transform; distance function; chamfering algorithm
woodcarving operation
http://en.wikipedia.org/wiki/Distance_transform

ref.
Distance Transform

DistanceTransform.pdf

David Coeurjolly, Laboratoire LIRIS, France, 2006

A Method for Obtaining Skeletons Using a Quasi-Euclidean Distance

p600-montanari.pdf

U Montanari - Journal of the ACM (JACM), 1968

A linear time algorithm for computing exact Euclidean distance transforms of binary images in arbitrary dimensions

maurer2003short.pdf

Maurer, C.R., Jr.; Rensheng Qi; Raghavan, V

> applications of the distance transformation
discrete geometry
path planning and obstacle avoidance in mobile robotics
finding the closest feature in the image
skeletonization (mathematical morphology methods)

21p

edge
: a local property of a pixel and its immediate neighborhood

The edge tells us how fast the image intensity varies in a small neighborhood of a pixel.

The gradient of the image function is used to compute edges.

The edge direction is perpendicular to the gradient direction which points in the direction of the fastest image function growth.

crack edge

22p

border (boundary)
: the set of pixels within the region that have one or more neighbors outside
: inner/outer

The border is a global concept related to a region, while edge expresses local properties of an image function.

23p

convex
: If any two points within a region are connected by a straight line segment, and the whole line lies within the region, then this region is convex.

convex hull
: the smallest convex region containing the input region

deficit of convexity - lakes & bays

topology
topological invariant = topological invariant
http://en.wikipedia.org/wiki/Topological_property
a property of a topological space which is invariant under homeomorphisms. That is, a property of spaces is a topological property if whenever a space X possesses that property every space homeomorphic to X possesses that property. Informally, a topological property is a property of the space that can be expressed using open sets.

homeomorphism = topological isomorphism
http://en.wikipedia.org/wiki/Homeomorphism
the mappings which preserve all the topological properties of a given space. Two spaces with a homeomorphism between them are called homeomorphic, and from a topological viewpoint they are the same.

rubber sheet transform:
Stretching does not change contiguity of the object parts and does not change the number of holes in regions.

Euler-Poincaré Characteristi
http://en.wikipedia.org/wiki/Euler_characteristic
http://mathworld.wolfram.com/EulerCharacteristic.html

24p

2.3.2 Histograms

brightness histogram
: the freqency of the brightness value in the image

The histogram is usually the only global information about the image which is available.

> applications of histogram
finding optimal illumination conditions for capturing an image
gray-scale transformations
image segmentation to objects and background

A change of the object position on a constant background does not affect the histogram.

> local smoothing of the histogram
(1) local averaging of neighboring histogram elements + boundary adjustment
(2) Gaussian blurring: 1-d simplification of the 2-d Gaussian blur

25p

2.3.3 Entropy

information entropy
: the amount of uncertainty about an event associated with a given probability distribution

As the level of disorder rises, entropy increases and events are less predictable.

The uncertainty for such set of $\displaystyle n$ outcomes is defined by

$\displaystyle u = \log_b (n)$

since the probability of each event is

1 / n

, we can write

$\displaystyle u = \log_b \left(\frac{1}{p(x_i)}\right) = - \log_b (p(x_i)) \ , \ \forall i = 1, \cdots , n$

The average uncertainty $\displaystyle \langle u \rangle$ , with $\displaystyle \langle \cdot \rangle$ being the average operator, is obtained by

$\displaystyle \langle u \rangle = \sum_{i=1}^{n} p(x_i) u_i = - \sum_{i=1}^{n} p(x_i) \log_b (p(x_i))$

gray-level histogram -> probability density p(x_k) -> entropy

http://en.wikipedia.org/wiki/Entropy_(information_theory)

Shannon's source coding theorem shows that, in the limit, the average length of the shortest possible representation to encode the messages in a given alphabet is their entropy divided by the logarithm of the number of symbols in the target alphabet.

C.E. Shannon, "A Mathematical Theory of Communication",

shannon1948.pdf

Bell System Technical Journal, vol. 27, pp. 379-423, 623-656, July, October, 1948

2.3.4 Visual perception of the image

Contrast

http://en.wikipedia.org/wiki/Contrast_(vision)

Acuity 예민함

http://en.wikipedia.org/wiki/Visual_acuity

visual illusions

http://en.wikipedia.org/wiki/Optical_illusion

http://en.wikipedia.org/wiki/Ebbinghaus_illusion

perceptual grouping -> image segmentation

Gestalt theory

http://en.wikipedia.org/wiki/Gestalt_psychology

Patterns take precedence over elements and have properties that are not inherent in the elements themselves.

28p

2.3.5 Image quality

Azriel Rosenfeld, Avinash C. Kak, Digital Picture Processing, Academic Press, 1982

parameter optimization
correlation between images
resolution of small or proximate objects in the image
measures of image similarity (retrieval from image databases)

2.3.6 Noise in image

white noise
http://en.wikipedia.org/wiki/White_noise

-> Gaussian noise

additive noise
http://en.wikipedia.org/wiki/Additive_white_Gaussian_noise

multiplicative noise

quantization noise
http://en.wikipedia.org/wiki/Quantization_noise

impulse noise

-> salt-and-pepper noise
http://en.wikipedia.org/wiki/Salt_and_pepper_noise

unknown about noise properties -> local pre-processing methods
known noise parameters -> image restoration techniques

SNR = signal-to-noise ration
http://en.wikipedia.org/wiki/Signal-to-noise_ratio

2.4 Color images

2.4.1 Physics of color
2.4.2 Color perceived by humans
2.4.3 Color spaces
2.4.4 Palette images
2.4.5 Color constancy

2.5 Cameras: an overview

2.5.1 Photosensitive sensors
2.5.2 A monochromatic camera
2.5.3 A color camera

2.6 Summary

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

Ch.10 Image Understanding (0)	2008.12.15
Ch.9 Object Recognition (0)	2008.11.25
2.4 Color images & 2.5 Cameras (0)	2008.10.16
Ch. 6 Segmentation I (0)	2008.09.20
Ch.1 Introduction (0)	2008.09.04

posted by maetel

Search

Tag

Notice

Recent Post

Recent Comment

Recent Trackback

Archive

My Link

calendar

Category

'Computer Vision'에 해당되는 글 50건

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Footmarks' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'Computer Vision' 카테고리의 다른 글

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

'@GSMC > 박래홍: Computer Vision' 카테고리의 다른 글

티스토리툴바