Inverse Depth Parametrization for Monocular SLAM
Civera, J. Davison, A.J. Montiel, J.
This paper appears in: Robotics, IEEE Transactions on
Publication Date: Oct. 2008
Volume: 24, Issue: 5
On page(s): 932-945
ISSN: 1552-3098
INSPEC Accession Number: 10301459
Digital Object Identifier: 10.1109/TRO.2008.2003276
First Published: 2008-10-03
Current Version Published: 2008-10-31
Javier Civera, Departamento de Informática e Ingeniería de Sistemas, Universidad de Zaragoza
Andrew J. Davison, Reader in Robot Vision at the Department of Computing, Imperial College London
Jose Maria Martinez Montiel, Robotics and Real Time Group, Universidad de Zaragoza
monocular simultaneous localization and mapping (SLAM)
representation of uncertainty
the standard extended Kalman filter (EKF)
direct parametrization of the inverse depth of features
feature initialization
camera motion estimates
6-D state vector --> converted to the Euclidean XYZ form
linearity index => automatic detection and conversion to maintain maximum efficiency
I. Introduction
monocular camera
: projective sensor measuring the beairng of image features
monocular (adj) 단안(單眼)(용)의, 외눈의
A stereo camera is a type of camera with two or more lenses. This allows the camera to simulate human binocular vision.
structure from motion = SFM
1) feature matching
2) global camera location & scene feature position estimates
sliding window processing
Sliding Window Protocol is a bi-directional data transmission protocol used in the data link layer (OSI model) as well as in TCP (transport layer of the OSI model). It is used to keep a record of the frame sequences sent and their respective acknowledgements received by both the users.
In robotics and computer vision, visual odometry is the process of determining the position and orientation of a robot by analyzing the associated camera images.
Odometry is the use of data from the movement of actuators to estimate change in position over time. Odometry is used by some robots, whether they be legged or wheeled, to estimate (not determine) their position relative to a starting location.
visual SLAM
probabilistic filtering approach
initializing uncertain depth estimates for distance features
Gaussian distributions implicit in the EKF
a new feature parametrization that is able to smoothly cope with initialization of features at all depth - even up to "infinity" - within the standard EKF framework: direct parametrization of inverse depth relative to the camera position from which a feature was first observed
A. Delayed and Undelayed Initialization
main map; main probabilistic state; main state vector
test for inclusion
delayed initialization
> treating newly detected features separately from the main map to reduce depth uncertainty before insertion into the full filter (with a standard XYZ representation)
- Features that retain low parallax over many frames (those very far from the camera or close to the motion epipole) are usually rejected completely because they never pass the test for inclusion
> (in 2-D and simulation) Initialization is delayed until the measurement equation is approximately Gaussian and the point can be safely triangulated.
> 3-D monocular vision with inertial sensing + auxiliary particle filter (in high frame rate sequence)
undelayed initialization
> While features with highly uncertain depths provide little information on camera translation, they are extremely useful as bearing references for orientation estimation.
: a multiple hypothesis scheme, initializing features at various depths and pruning those not reobserved in subsequent images
> Gaussian sum filter approximated by a federated information sharing method to keep the computational overhead low
-> to spread the Gaussian depth hypotheses along the ray according to inverse depth
A Gaussian sum is more efficient representation than particles (efficient enough that the separate Gaussians can call be put into the main state vector), but not as efficient as the single Gaussian representation that the inverse depth parametrization aalows.
B. Points at Infinity
Point at infinity: a feature that exhibits no parallax during camera motion due to its extreme depth
-> not used for estimating camera translationm but for estimating rotation
The homogeneous coordinate systems of visual projective geometry used normally in SFM allow explicit representation of points at infinity(, and they have proven to play an important role during offline structure and motion estimation).
sequential SLAM system
Montiel and Davison: In special case where all features are known to be infinite -- in very-large-scale outdoor scenes or when the camera rotates on a tripod -- SLAM in pure angular coordinates turns the camera into a real-time visual compass.
Our probabilistic SLAM algorithm must be able to represent the uncertainty in depth of seemingly infinite features. Observing no parallax for a feature after 10 units of camera translation does tell us something about its depth -- it gives a reliable lower bound, which depends on the amount of motion made by the camera (if the feature had been closer than this, we would have observed parallax).
The explicit consideration of uncertainty in the locations of points has not been previously required in offline computer vision algorithms, but is very important in a more difficult online case.
C. Inverse Depth Representation
standard tracking
An explicit parametrization of the inverse depth of a feature along a semiinfinite ray from the position from which it was first viewed allows a Gaussian distribution to cover uncertainty in depth that spans a depth range from nearby to infinity, and permits seamless crossing over to finite depth estimates of features that have been apparently infinite for long periods of time.
linearity index + inverse depth parametrization
The projective nature of a camera means that the image measurement process is nearly linear in this inverse depth coordinate.
Inverse depth appears in the relation between image disparity and point depth in a stereo vision; it is interpreted as the parallax with respect to the plane at infinity. (Hartley and Zisserman)
Inverse depth is used to relate the motion field induced by scene points with the camera velocity in optical flow analysis.
modified polar coordinates
target motion analysis = TMA
EKF-based sequential depth estimation from camera-known motion
multibaseline stereo
matching robustness for scene symmetries
sequential EKF process using inverse depth
( ref. Stochastic Approximation and Rate-Distortion Analysis for Robust Structure and Motion Estimation )
undelayed initialization for 2-D monocular SLAM
( ref. A unified framework for nearby and distant landmarks in bearing-only SLAM )
FastSLAM-based system for monocular SLAM
( ref. Ethan Eade & Tom Drummond, Scalable Monocular SLAM )
special epipolar update step
FastSLAM
( ref. Civera, J. Davison, A.J. Montiel, J.M.M., Inverse Depth to Depth Conversion for Monocular SLAM
J. Montiel and A. J. Davison “A visual compass based on SLAM,” )
loop-closing
II. State Vector Definition
handheld camera motion
> constant angular and linear velocity model
quaternion
'Computer Vision' 카테고리의 다른 글
Ethan Eade & Tom Drummond <Scalable Monocular SLAM> (0) | 2009.03.27 |
---|---|
people in SLAM (0) | 2009.03.27 |
camera calibration 09-02-16 (0) | 2009.02.16 |
Special Issue on Visual SLAM (IEEE Transactions on Robotics, Vol. 24, No. 5) (0) | 2009.02.14 |
IMU sensor calibration (0) | 2009.02.06 |