WP2.1 Visual flight stabilization and control
We propose to investigate accurate navigation of the helicopter with vision based flight stabilization and control along with the use of an IMU. Image processing will be directly integrated into the control loop which requires extremely efficient and robust computer vision algorithms. As roll and pitch require high bandwidth control, this will be assessed by the use of inertial sensors, while the visual flight stabilization will contribute in addition to IMU for controlling yaw, x-y position and altitude. The altitude can either be controlled by pressure sensors or range finding sensors. The first solution is more challenging and provides less precise absolute and relative altitude information, while the second one is easily manageable, but it only gives information about the vertical distance from the floor. For this project we will investigate the possibility to use the same vision sensor (in addition to IMU) for altitude estimation. Micro-helicopter altitude dynamics is less complex and control quality will only be limited with sensor’s bandwidth and actuators performances.
Autonomous take-off, hovering, landing and flight stabilization will be faced using visual tracking. From tracked features in the image the pose of the helicopter can be computed, and this information will be used to perform stable take-offs, landings and hovering. The feasibility of this approach has already been demonstrated by Kemp et al. however with the help of an external computer system and a priori knowledge of the environment.
In addition to real-time constraints for visual tracking another difficulty is caused by the light weight and instability of the micro helicopter. It means, that external disturbances have a large effect and the helicopter’s motion is relatively difficult to predict. Further, the quality of the images obtained from the miniature camera is very poor and this greatly increases demands on the vision system. Finally, the large disturbances and poor image quality result in large pose uncertainty and the estimates must be correctly filtered before they can be used by a control system.
WP2.2 Obstacle detection and avoidance
Obstacle detection and avoidance will be achieved using visual optical flow. This requires a dense optical flow computation under real-time constraints. The embedded platform with its limited resources will be a challenge for real-time optical flow computation. A justification for using optical flow for visual navigation can be found in flying insects such as bees. It has been shown [Srinivasan1991, Lehrer1988] that bees use optical flow to navigate between obstacles and also to build a 3D representation of their environment.
We will investigate the use of optical flow for three main topics:
WP2.3 Vision based SLAM
Simultaneous localization and map building is a necessity for autonomous robot operation. With all the restrictions imposed by a micro helicopter we will focus on visual SLAM combined with an IMU. The approach that we envision has to be scalable, robust and reliable and it has to work in natural environments both indoors and outdoors. The method has to be robust to dynamic changes of the environment, like people moving or temporary structure changes. Dynamic objects should ideally be identified and tracked, so that their movements can be considered in path planning and navigation.
World representation with local sub-maps
Scalability is a major difficulty in SLAM. In general the computational requirement increases as O(N2), where N is the number of features currently used to build the map. This O(N2) arises from the number of correlations, which need to be computed among all features of the map. This means that these approaches cannot be used for real-time/world implementation. A number of partitioned update methods (using local sub-maps) have been devised to reduce this computational effort. These methods confine sensor-rate updates to a small local region and update the global map only at a much lower frequency. These partition methods all produce optimal estimates.
There are two basic types of partitioned update. The first operates in a local region of the global map and maintains globally referenced coordinates. This approach is taken by the compressed EKF [Guivant2001] and the postponement algorithm [Knight2001]. The second generates a short-term submap with its own local coordinate frame. This is the approach of the constrained local submap filter [Williams2001] and the local map sequencing algorithm [Tardos2002]. We will focus on this latter approach as it is simpler and, by performing high-frequency operations in a local coordinate frame, it avoids very large global covariance matrices and so is more numerically stable and less affected by linearization errors. We intend to use relative submaps [Bailey2006, Guivant2002, Leonard2003], i.e. they do not have a common coordinate frame. The location of any given submap is recorded only by its neighboring submaps, and these are connected in a graphical network [Chong1999, Estrada2005]. Inside a submap relative mapping [Csorba1997, Martinelli2007, Nguyen2006] will be used. This means that rather than estimate landmark locations, the SLAM state is composed of distances and angles between landmarks. As proved by the ASL group in [Martinelli2007], and [Nguyen2006], the relative mapping approach has better convergence than an absolute map approach.
We will focus on using visual features as landmarks for SLAM. Feature tracks are computed from optical flow and fused with information from the IMU. The IMU will provide us with odometry information and scale. We will also investigate methods to retrieve the scale without an IMU, e.g. from the control inputs.
In addition, we will investigate how to select features that are best suited to update the model of the environment. Especially, when using visual features, the number of features available in a single image might be too high for updating the whole state space. Our plan is to identify features that are most informative and allow the vehicle to reliably learn consistent maps of the environments. For example, we expect that clusters of landmarks can be represented by a single one. As a result, the computational requirements will be significantly reduced.
Loop closing for optimized maps
Loop closing will be used to account for accumulated errors and missing connections between submaps in the global map. Loop closing will be performed using visual features. In a first step loops are detected by visual place recognition. Second, all the submaps will be optimized to find the most accurate minimum error configuration.
Place recognition will be performed with visual features and a bag-of-features approach. The main challenge for place recognition is to solve the correspondence problem for local features. Techniques for feature matching using scale invariant or affine invariant local detectors have already achieved a level of maturity and can be considered reliable enough for our application. The place recognition scheme compares all the previously created submaps with the current one by computing a similarity score based on visual features. For each submap a set of discriminative and descriptive visual features is extracted andstored. In detail, we propose to use SIFT-features [Lowe2004] as they are scale and rotational invariant and even very robust to affine transformations. For efficient feature representation and matching we follow the scheme proposed by [Nister2006]. SIFT features have been found to work very well but the computational complexity for detection, extraction and quantization might pose problems for the limited resources on the micro helicopter. We therefore will also investigate the use of different features in the manner of [Tuytelaars2007, Bay2006] where the extraction step is more efficient. We will also investigate how it is possible to combine visual features with 3D features, e.g. 3D lines. This is important for non-textured environments. In such cases, where visual features provide only weak cues, 3D structure might be able to resolve the ambiguities. For geometric verification of a detected loop we will perform 3D model registration using VIP features as described in [Wu20008].
The map update after the detection of a loop will be carried out as a large optimization problem in the spirit of bundle adjustment (see [Ni2007a]). Due to high computational costs, the optimization will be computed offline. In the optimization we seek the maximum likelihood solution for the minimum reprojection error taking into account all robot poses, all measured features and all previously detected loops. This problem can easily grow beyond computational feasible size. We intend to tackle it by computing an approximation to the solution by splitting it up into sub problems. Such a submap approach was already demonstrated in [Ni2007a, Ni2007b]. However, the success of this approach relies on how the submaps are created. We therefore intend to control the submap creation so that it is optimal for the bundle adjustment.
3D model as environment map
For path planning and visualization purposes the map composed of local features is not ideal. In the case of path planning, approaches work with 3D occupancy grids or triangulated models of the environment. A sparse feature based representation will not give information about the space between features, if it is occupied or free. We therefore will run a dense stereo algorithm on the input images using the MAV pose from the localization as additional information. The depth maps will be used to create a 3D occupancy grid. In addition, we also intend to create a visually appealing 3D model of the explored environment, which will be done only periodically or offline.
WP2.4 Cooperative localization and mapping
The problem we are considering is the simultaneous estimation of the configuration of several moving Micro Air Vehicles (MAV) together with the location of some salient features detected in the environment. The estimation has to be performed in real time starting from the MAVs’ sensor data which typically provide relative distances and bearings measurements among the vehicles and/or among the vehicles and the detected features. This estimation problem can be seen as a non-linear optimization problem. Two concurrent issues must be addressed: the accuracy and the computational complexity. We will investigate two possible approaches: