CAD Model Based Tracking for Visually Guided Robotic Manipulation
6DOF Visual Feedback for Microassembly

We are using a CAD model based visual tracking approach for visually guided robotic manipulation at the micro and macro scales. In addition, we are working on methods for robust registration of CAD models in complex scenes with closely spaced model edges.
The main application area of the visual tracking system is wafer-level microassembly of hybrid MEMS. Using two cameras with microscope lenses, the system is capable of providing 6 degrees-of-freedom position and orientation feedback in real-time (30 Hz). There are several advantages to using CAD models as a standard form of input for a flexible automation system. In general, the CAD model of a MEMS component is readily available from its design phase. More importantly, CAD models provide 3D information on the components' geometry so that their appearance can be predicted for any viewpoint while including the effect of occlusions. This is essential for a visual tracking system that can handle translations and rotations about an arbitrary axis. In addition to these, by using the objects' edges (contours) as visual features to be observed, the CAD model based approach avoids the need for fudicial marks or other distinctive features to be present on the object.

This picture shows the visual tracking setup with two microscopes. The reflective dome and the backlight provide diffuse illumination. A 3-axis manipulator carries the components on a transparent platform. The zoom microscope lenses porvide 0.75-3.0X variable magnification with high depth of field ((~ 500 µm @ 3X).
Improved Visual Resolvability with the Model Based Approach
Unlike stereo vision methods, the model based multi-camera approach does not require the same features to be visible in different camera views. Therefore, the view-points of the cameras can be largely separated without concern for features being occluded or out of the view in other cameras' images. The figure below shows the visual resolvability ellipsoids for a single microscope and for two microscopes observing the same scene. In the single microscope case the ellipsoid has degenerated into en ellipse perpendicular to the view axis of the microscope, showing that the object's motion along the view axis is not observable. This is due to the fact that the microscope optics closely approximate a scaled orthographic projection system rather than perspective projection. Depth from defocus methods have been commonly used to overcome this limitation of the microscope optics but these methods require specialized hardware to be applied in real-time. However, the resolvability ellipsoid of the two microscope case shows that 6 DOF motion can be observed with this configuration.

The effect of the separation of the cameras on visual resolvability of the system can be observed in the animation below. One camera is rotated about the z-axis while the other is hold fixed. Note that the ellipsoid rapidly degenerates into an ellipse perpendicular to the view axes as the cameras get closer. The ratio of the smallest and largest singular values of the image Jacobian matrix is also shown.

Improving the Robustness of Tracking

A major source of feature detection errors in model based visual tracking is the ambiguous object configurations with closely spaced edges. During robotic manipulation, it is difficult to avoid a configuration of the scene where other close and parallel edges appear around an edge and cause registration errors. While the vision system uses CAD models to describe the geometry of individual objects, the available model information can also be used to detect such "troublesome" configurations and reduce their effects.
An image space potential method was developed for this purpose. A potential field is applied around the visible edges such that the field value decreases linearly away from the edges and is constant along the tangent. When the potential is applied on all visible edges in the scene while superimposing the overlapping fields, the resulting field map encodes the distance of all points in the image to the potential "confusers". By using this information as a weighting factor during edge detection, correct registration is performed. The method is applied in real-time (30 Hz).