Project Info‎ > ‎Work Packages‎ > ‎


WP2.1 Visual flight stabilization and control 
    We propose to investigate accurate navigation of the helicopter with vision based flight stabilization and control along with the use of an IMU. Image processing will be directly integrated into the control loop which requires extremely efficient and robust computer vision algorithms. As roll and pitch require high bandwidth control, this will be assessed by the use of inertial sensors, while the visual flight stabilization will contribute in addition to IMU for controlling yaw, x-y position and altitude. The altitude can either be controlled by pressure sensors or range finding sensors. The first solution is more challenging and provides less precise absolute and relative altitude information, while the second one is easily manageable, but it only gives information about the vertical distance from the floor. For this project we will investigate the possibility to use the same vision sensor (in addition to IMU) for altitude estimation. Micro-helicopter altitude dynamics is less complex and control quality will only be limited with sensor’s bandwidth and actuators performances.             
    Autonomous take-off, hovering, landing and flight stabilization will be faced using visual tracking. From tracked features in the image the pose of the helicopter can be computed, and this information will be used to perform stable take-offs, landings and hovering. The feasibility of this approach has already been demonstrated by Kemp et al. however with the help of an external computer system and a priori knowledge of the environment. 
    In addition to real-time constraints for visual tracking another difficulty is caused by the light weight and instability of the micro helicopter. It means, that external disturbances have a large effect and the helicopter’s motion is relatively difficult to predict. Further, the quality of the images obtained from the miniature camera is very poor and this greatly increases demands on the vision system. Finally, the large disturbances and poor image quality result in large pose uncertainty and the estimates must be correctly filtered before they can be used by a control system.

WP2.2 Obstacle detection and avoidance 
    Obstacle detection and avoidance will be achieved using visual optical flow. This requires a dense optical flow computation under real-time constraints. The embedded platform with its limited resources will be a challenge for real-time optical flow computation. A justification for using optical flow for visual navigation can be found in flying insects such as bees. It has been shown [Srinivasan1991, Lehrer1988] that bees use optical flow to navigate between obstacles and also to build a 3D representation of their environment.
    We will investigate the use of optical flow for three main topics: 
  • The first topic is the use of optical flow in the initial exploration and map building stage. Without any map, robot navigation is simplified to traverse areas of free space. Optical flow is used to detect occupied space and to navigate around it. Its output will therefore be directly linked to the navigation control. Similarly optical flow will also be used in mapped environments to deal with dynamic moving objects and for objects that are not mapped. The goal is to investigate navigation strategies for different optical flow inputs. 
  • The second topic is to investigate evasive maneuvers for incoming, moving objects which also can be detected by optical flow. 
  • The third topic is the re-use of the optical flow vectors for the subsequent structure from motion computations. Dense optical flow for obstacle detection is usually computed between subsequent frames only and not tracked for multiple frames. We therefore will select a subset of optical flow vectors (features) to be tracked over several frames. Difficulties arise in dynamics scenes where features might be tracked on moving objects. This would complicate structure from motion features. We therefore will investigate how only features residing on the static background can be selected. In addition, starting from a dense optical flow solution might also improve feature tracks in low textured environments.

WP2.3 Vision based SLAM 
    Simultaneous localization and map building is a necessity for autonomous robot operation. With all the restrictions imposed by a micro helicopter we will focus on visual SLAM combined with an IMU. The approach that we envision has to be scalable, robust and reliable and it has to work in natural environments both indoors and outdoors. The method has to be robust to dynamic changes of the environment, like people moving or temporary structure changes. Dynamic objects should ideally be identified and tracked, so that their movements can be considered in path planning and navigation. 

World representation with local sub-maps 
    Scalability is a major difficulty in SLAM. In general the computational requirement increases as O(N2), where N is the number of features currently used to build the map. This O(N2) arises from the number of correlations, which need to be computed among all features of the map. This means that these approaches cannot be used for real-time/world implementation. A number of partitioned update methods (using local sub-maps) have been devised to reduce this computational effort. These methods confine sensor-rate updates to a small local region and update the global map only at a much lower frequency. These partition methods all produce optimal estimates. 
    There are two basic types of partitioned update. The first operates in a local region of the global map and maintains globally referenced coordinates. This approach is taken by the compressed EKF [Guivant2001] and the postponement algorithm [Knight2001]. The second generates a short-term submap with its own local coordinate frame. This is the approach of the constrained local submap filter [Williams2001] and the local map sequencing algorithm [Tardos2002]. We will focus on this latter approach as it is simpler and, by performing high-frequency operations in a local coordinate frame, it avoids very large global covariance matrices and so is more numerically stable and less affected by linearization errors. We intend to use relative submaps [Bailey2006, Guivant2002, Leonard2003], i.e. they do not have a common coordinate frame. The location of any given submap is recorded only by its neighboring submaps, and these are connected in a graphical network [Chong1999, Estrada2005]. Inside a submap relative mapping [Csorba1997, Martinelli2007, Nguyen2006] will be used. This means that rather than estimate landmark locations, the SLAM state is composed of distances and angles between landmarks. As proved by the ASL group in [Martinelli2007], and [Nguyen2006], the relative mapping approach has better convergence than an absolute map approach. 
    We will focus on using visual features as landmarks for SLAM. Feature tracks are computed from optical flow and fused with information from the IMU. The IMU will provide us with odometry information and scale. We will also investigate methods to retrieve the scale without an IMU, e.g. from the control inputs. 
    In addition, we will investigate how to select features that are best suited to update the model of the environment. Especially, when using visual features, the number of features available in a single image might be too high for updating the whole state space. Our plan is to identify features that are most informative and allow the vehicle to reliably learn consistent maps of the environments. For example, we expect that clusters of landmarks can be represented by a single one. As a result, the computational requirements will be significantly reduced.

Loop closing for optimized maps 
    Loop closing will be used to account for accumulated errors and missing connections between submaps in the global map. Loop closing will be performed using visual features. In a first step loops are detected by visual place recognition. Second, all the submaps will be optimized to find the most accurate minimum error configuration. 
    Place recognition will be performed with visual features and a bag-of-features approach. The main challenge for place recognition is to solve the correspondence problem for local features. Techniques for feature matching using scale invariant or affine invariant local detectors have already achieved a level of maturity and can be considered reliable enough for our application. The place recognition scheme compares all the previously created submaps with the current one by computing a similarity score based on visual features. For each submap a set of discriminative and descriptive visual features is extracted and
stored. In detail, we propose to use SIFT-features [Lowe2004] as they are scale and rotational invariant and even very robust to affine transformations. For efficient feature representation and matching we follow the scheme proposed by [Nister2006]. SIFT features have been found to work very well but the computational complexity for detection, extraction and quantization might pose problems for the limited resources on the micro helicopter. We therefore will also investigate the use of different features in the manner of [Tuytelaars2007, Bay2006] where the extraction step is more efficient. We will also investigate how it is possible to combine visual features with 3D features, e.g. 3D lines. This is important for non-textured environments. In such cases, where visual features provide only weak cues, 3D structure might be able to resolve the ambiguities. For geometric verification of a detected loop we will perform 3D model registration using VIP features as described in [Wu20008]. 
    The map update after the detection of a loop will be carried out as a large optimization problem in the spirit of bundle adjustment (see [Ni2007a]). Due to high computational costs, the optimization will be computed offline. In the optimization we seek the maximum likelihood solution for the minimum reprojection error taking into account all robot poses, all measured features and all previously detected loops. This problem can easily grow beyond computational feasible size. We intend to tackle it by computing an approximation to the solution by splitting it up into sub problems. Such a submap approach was already demonstrated in [Ni2007a, Ni2007b]. However, the success of this approach relies on how the submaps are created. We therefore intend to control the submap creation so that it is optimal for the bundle adjustment.

3D model as environment map 
    For path planning and visualization purposes the map composed of local features is not ideal. In the case of path planning, approaches work with 3D occupancy grids or triangulated models of the environment. A sparse feature based representation will not give information about the space between features, if it is occupied or free. We therefore will run a dense stereo algorithm on the input images using the MAV pose from the localization as additional information. The depth maps will be used to create a 3D occupancy grid. In addition, we also intend to create a visually appealing 3D model of the explored environment, which will be done only periodically or offline.

WP2.4 Cooperative localization and mapping 
    The problem we are considering is the simultaneous estimation of the configuration of several moving Micro Air Vehicles (MAV) together with the location of some salient features detected in the environment. The estimation has to be performed in real time starting from the MAVs’ sensor data which typically provide relative distances and bearings measurements among the vehicles and/or among the vehicles and the detected features. This estimation problem can be seen as a non-linear optimization problem. Two concurrent issues must be addressed: the accuracy and the computational complexity. We will investigate two possible approaches: 
  • In a recent paper [Eustice2006], the single-robot SLAM estimation process has been formulated and solved using an Information filter which estimates all the vehicle poses (at every time step). This approach exhibits very good properties concerning the computational complexity. However, the use of a linearized estimator may lead to failure when the linearization errors become large (for instance during loop closure). Our first contribution will be to extend this approach and employ the Conjugate Gradient Descent (CGD) minimization method for computing the state estimates. In particular, the Information Filter is adopted to estimate a state containing elements which are geometrically dependent (as in a relative map approach [Martinelli2007]). The CGD will be adopted in a separate phase to exploit the information coming from these geometrical dependencies. 
  • Recently, “maximum a posteriori” estimators have been applied to the SLAM problem [Dellaert2006]. In these cases, the SLAM estimation process can be solved by using bundle adjustment techniques. Our plan is to extend these methodologies to the case of Cooperative Simultaneous Localization and Mapping (C-SLAM). In particular, our objective is to investigate distributed iterative solutions to the C-SLAM problem that minimize the estimation error under processing and communication constraints. The resulting solution will be computed in an incremental fashion by exchanging minimal information between the robots while employing a distributed version of the CGD minimization algorithm. 
  • In addition, we will look into the benefits of using direct visual observations of other collaborating MAV’s for C-SLAM, including the potential to refine range estimates.