2009. 10. 26. 21:35
Computer Vision
Avoiding moving outliers in visual SLAM by tracking moving objects
Wangsiripitak, S.
Murray, D.W.
Dept. of Eng. Sci., Univ. of Oxford, Oxford, UK;
Publication Date: 12-17 May 2009
On page(s): 375-380
ISSN: 1050-4729
ISBN: 978-1-4244-2788-8
INSPEC Accession Number: 10748966
Digital Object Identifier: 10.1109/ROBOT.2009.5152290
Current Version Published: 2009-07-06
http://www.robots.ox.ac.uk/~lav//Research/Projects/2009somkiat_slamobj/project.html
Abstract
parallel implementation of monoSLAM with a 3D object tracker
information to register objects to the map's frame
the recovered geometry
I. Introduction
approaches to handling movement in the environment
segmentation between static and moving features
outlying moving points
1) active search -> sparse maps
2) robust methods -> multifocal tensors
3-1) tracking known 3D objects in the scene
-2) determining whether they are moving
-3) using their convex hulls to mask out features
"Knowledge that they are occluded rather than unreliable avoids the need to invoke the somewhat cumbersome process of feature deletion, followed later perhaps by unnecessary reinitialization."
[15] H. Zhou and S. Sakane, “Localizing objects during robot SLAM in semi-dynamic environments,” in Proc of the 2008 IEEE/ASME Int Conf on Advanced Intelligent Mechatronics, 2008, pp. 595–601.
"[15] noted that movement is likely to associated with objects in the scene, and classified them according to the likelihood that they would move."
the use of 3D objects for reasoning about motion segmentation and occlusion
occlusion masks
II. Underlying Processes
A. Visual SLAM
Monocular visual SLAM - EKF
idempotent 멱등(冪等)
http://en.wikipedia.org/wiki/Idempotence
Idempotence describes the property of operations in mathematics and computer science that means that multiple applications of the operation do not change the result.
http://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation
http://en.wikipedia.org/wiki/Conversion_between_quaternions_and_Euler_angles
http://en.wikipedia.org/wiki/Quaternion
http://en.wikipedia.org/wiki/Euler_Angles
Berthold K.P. Horn, "Some Notes on Unit Quaternions and Rotation"
"Standard monocular SLAM takes no account of occlusion."
B. Object pose tracking
Harris' RAPiD
[17] C. Harris and C. Stennett, “Rapid - a video rate object tracker,” in Proc 1st British Machine Vision Conference, Sep 1990, pp. 73–77
[20] C. Harris, “Tracking with rigid models,” in Active Vision, A. Blake and A. Yuille, Eds. MIT Press, 1992, pp. 59–73.
"(RAPiD makes the assumption that the pose change required between current and new estimates is sufficiently small, first, to allow a linearization of the solution and, second, to make trivial the problem of inter-image correspondence.) The correspondences used are between predicted point to measured image edge, allowing search in 1D rather than 2D within the image. This makes very sparing use of image data — typically only several hundred pixels per image are addressed."
aperture problem
http://en.wikipedia.org/wiki/Motion_perception
http://focus.hms.harvard.edu/2001/Mar9_2001/research_briefs.html
[21] R. L. Thompson, I. D. Reid, L. A. Munoz, and D. W. Murray, “Providing synthetic views for teleoperation using visual pose tracking in multiple cameras,” IEEE Transactions on Systems, Man and
Cybernetics, Part A, vol. 31, no. 1, pp. 43–54, 2001.
- "Three difficulties using the Harris tracker":
(1)First it was found to be easily broken by occlusions and changing lighting. Robust methods to mitigate this problem have been investigated monocularly by Armstrong and Zisserman. (2)Although this has a marked effect on tracking performance, the second problem found is that the accuracy of the pose recovered in a single camera was poor, with evident correlation between depth and rotation about axes parallel to the image plane. Maitland and Harris had already noted as much when recovering the pose of a pointing device destined for neurosurgical application. They reported much improved accuracy using two cameras; but the object was stationary, had an elaborate pattern drawn on it and was visible at all times to both cameras. (3)The third difficulty, or rather uncertainty, was that the convergence properties and dynamic performances of the monocular and multicamera methods were largely unreported.(3) : little solution
(2) => [21] "recovered pose using 3 iterations of the pose update cycle per image"
(1) => [21], [22] : search -> matching -> weighting
[22] M. Armstrong and A. Zisserman, “Robust object tracking,” in Proc 2nd Asian Conference on Computer Vision, 1995, vol. I. Springer, 1996, pp. 58–62.
RANSAC
[23] M. Fischler and R. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, June 1981.
Least median of squares as the underlying standard deviation is unknown
[24] P. J. Rousseeuw, “Least median of squares regression,” Journal of the American Statistical Association, vol. 79, no. 388, pp. 871–880, 1984.
III. MonoSLAM with Tracked Objects
A. Information from SLAM to the object tracker
B. Information from the object tracker to SLAM
"The convex hull is uniformly dilated by an amount that corresponds to the projection of the typical change in pose."