Nothing Special   »   [go: up one dir, main page]

Academia.eduAcademia.edu

Matched Filters for Bin Picking

1984, IEEE Transactions on Pattern Analysis and Machine Intelligence

Currently, a major difficulty for the widespread use of robots in assembly and material handling comes from the necessity of feeding accurately positioned workpieces to robots. ``Bin picking'' techniques help reduce this constraint. This paper presents the application of matched filters for enabling robots with vision to acquire workpieces randomly stored in bins. This approach complements heuristic methods already reported. The concept of matched filter is an old one. Here, however, it is redefined to take into account robot end-effector features, in terms of geometry and mechanics. In particular, the proposed filters match local workpiece structures where the robot end-effector is likely to grasp successfully and hold workpieces. The local nature of the holdsites is very important as computation costs are shown to vary with the fifth power of structure size. In addition, the proposed filters tend to have a narrow angular bandwidth. An example, which features a parallel-jaw hand is developed in detail, using both statistical and Fourier models. Both approaches concur in requiring a very small number of filters (typically four), even if a good orientation accuracy is expected (two degrees). Success rates of about 90 percent in three or fewer attempts have been experimentally obtained on a system which includes a small minicomputer, a 128 × 128 pixel solidstate camera, a prototype Cartesian robot, and a ``universal'' parallel-jaw hand.

686 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-6, NO. 6, NOVEMBER 1984 Matched Filters for Bin Picking JEAN-DANIEL DESSIMOZ, MEMBER, IEEE, JOHN R. BIRK, MEMBER, IEEE, ROBERT B. KELLEY, MEMBER, IEEE, HENRIQUE A. S. MARTINS, AND CHI LIN I Abstract-Currently, a major difficulty for the widespread use of robots in assembly and material handling comes from the necessity of feeding accurately positioned workpieces to robots. "Bin picking" techniques help reduce this constraint. This paper presents the application of matched filters for enabling robots with vision to acquire workpieces randomly stored in bins. This approach complements heuristic methods already reported. The concept of matched filter is an old one. Here, however, it is redefined to take into account robot end-effector features, in terms of geometry and mechanics. In particular, the proposed filters match local workpiece structures where the robot end-effector is likely to grasp successfully and hold workpieces. The local nature of the holdsites is very important as computation costs are shown to vary with the fifth power of structure size. In addition, the proposed filters tend to have a narrow angular bandwidth. An example, which features a parallel-jaw hand is developed in detail, using both statistical and Fourier models. Both approaches concur in requiring a very small number of filters (typically four), even if a good orientation accuracy is expected (two degrees). Success rates of about 90 percent in three or fewer attempts have been experimentally obtained on a system which includes a small minicomputer, a 128 X 128 pixel solid-state camera, a prototype Cartesian robot, and a "universal" parallel-jaw hand. Index Terms-Artificial intelligence, bin picking, robot control, robot vision, scene analysis. I. INTRODUCTION TODAY, fully automated assembly lines are working in many industrial systems. In most cases, however, the assembly line requires components to be initially palletized or stacked in some kind of magazine. Off-line, magazines are usually loaded by workers. This paper addresses the problem of feeding a buffer, an assembly line, a machine, or, generally speaking, a goal site from bins that store randomly oriented workpieces. The problem of feeding parts from bins is usually considered very difficult to solve because the assumption is made that a workpiece needs to be identified and located precisely in space before being grasped. This paper explores the inadequacies of Manuscript recieved July 25, 1983; revised March 26, 1984. J.-D. Dessimoz was with the Department of Electrical Engineering, University of Rhode Island, Kingston, RI. He is now with the Ecole d'Ingenieurs de l'Etat de Vaud, Yverdon-les-Bains, Switzerland. J. R. Birk was with the Department of Electrical Engineering, University of Rhode Island, Kingston, RI. He is now with Hewlett-Packard, Palo Alto, CA. R. B. Kelley is with the Department of Electrical Engineering, University of Rhode Island, Kingston, RI. H. A. S. Martins was with the Department of Electrical Engineering, University of Rhode Island, Kingston, RI. He is now with the Instituto Superior Tecnico, Lisbon, Portugal. that assumption, and, at the same time, discusses the bin picking problem in detail. Several methods to acquire randomly stored workpieces are examined, and their respective advantages and limitations described in Section II. The solution introduced here, the matched filter one, is then thoroughly delineated. In Section III, matched filters are defined in general terms. Section IV considers various properties of the bin picking problem which allow a reduction of the computational load associated with matched filters. Section V introduces the design of matched filters for workpiece acquisition, and an example is extensively developed in Section VI. Finally, some extensions of the presented methods are suggested in the Conclusion, Section VII. II. CURRENT METHODS FOR BIN PICKING Traditionally, workpieces have been fed to machines from bins by specialized equipment or human labor. There, however, is a large class of situations where a robot should bring costeffective solutions. Our study addresses the latter technique. In the simplest case, robots without vision can pick workpieces from bins. "Blindly," the end-effector mechanically scans a bin until an object is acquired. Magnetic [1] ,one-fingered [2], two-fingered [3], and vacuum cup [4] hands have been used for such a purpose. While simplicity is an obvious advantage of these techniques, they become increasingly inadequate as the probability of meeting potential holdsites along the (blind) end-effector path decreases. This problem is exacerbated by the realtively long time constants associated with mechanical motions. Proximity sensors may increase the active range of the hand, thereby improving the probability of detecting holdsites, but they do not remove the fundamental speed limit imposed by arm motions. By contrast, (remote) vision can drastically reduce the impact of that limit, replacing most mechanical motions by electronic scanning. The tremendous bandwidth of existing vision sensors (tube sensors may deliver information at a rate in excess of 107 bits/s) contributes to the possibility of recognizing and locating parts randomly piled in bins. Limitations, however, are still associated with vision data processing. While not intractable, the problem of recognizing and locating parts in bins requires a lot of computation, and existing computers do not provide a practical solution. Machine vision allows workpiece pose estimation with algorithms of various complexity and success, according to application [5] . Isolated workpieces that rest on a planar background can generally be represented by binary images. In most cases, 0162-8828/84/1 100-0686$01 .00 © 1984 IEEE 687 DESSIMOZ et al.: MATCHED FILTERS FOR BIN PICKING these workpieces have a small number of stable states. It is relatively easy to identify the silhouette of each state and to estimate accurately the corresponding pose in the background plane (two translations and one rotation). When the number of states is infinite (four or five continuous degrees of freedom), the problem becomes much more difficult to handle. Moreover, if parts overlap, scene analysis becomes less reliable (hidden components cannot guide analysis). Under restricted conditions, as when workpieces are relatively flat, it is possible to attack the three degrees of freedom problem with partial occlusion [6], [7]. In general, however, overlapping leads to uncertainties in a six-dimensional space (three translations and three rotations). This is a typical situation when parts are stored in bins. As of today, no viable solutions for pose estimation in such a context have been discovered. A system approach to the bin picking problem suggests the exploitation of the gripper-related knowledge in order to decrease the burden of scene analysis otherwise left to vision. In order to improve the performances of techniques mentioned above, the essential requirement is to replace the mechanical motions of the gripper searching for accessible parts by electronic scanning of the sensor. Our goal is then modest-to detect potential holdsites in images-and leaves part recognition and pose estimation for a latter stage, if required at all. In a first stage, sensor data indicate a potential holdsite; that is, a place in the bin where the robot hand is likely to grasp something (yet unknown). Then the workpiece can usually be grasped (this should be acknowledged by simple sensors in the hand), isolated, brought against any suitable background, and even constrained in degrees of freedom (e.g., dropped on a planar surface). Several ad hoc methods have proved effective for removing various classes of workpieces from bins with a robot (e.g., [8]). Their performances (processing time, accuracy, cost) are compatible with industrial constraints. They have set milestones on the wdy towards a general bin picking station. This paper will now explore a new solution, perhaps more versatile, and present an example in some detail. This solution relies heavily on matched filters. III. MATCHED FILTER DEFINITION Matched filters have been well known for many years in signal processing. However, template matching is more popular processing. Matched filtering differs basically from template matching (e.g., [9] ) in taking noise components into account (e.g., [10]). Basically, image analysis is inhibited in frequency bands where noise power is large. Several key ideas must be added to the basic matched filter concept in order to reach industrial applicability. First, only local properties which are noise-, workpiece-, and hand-depen- dent need to be matched. Second, a large number of filters corresponding to different workpiece pitch, yaw, and roll values can often be replaced by a much smaller set of orthogonal flters ("eigenfilters"). Third, provided that normalization is not required, the high frequency band which will eventually be rejected by matched filters can be cut out in a very early stage of the processing, allowing a low-rate resampling and thereby reducing the computational load. Actually, this can be achieved easily by adequately defocusing the sensor, and then sampling at coarser resolution. The classical definition of matched filters can be explained as follows. Consider a digital image f(i, j) and a pattern to be detected p(i, j) defined on a domain D. Conceptually, a simple solution to detecting the pattern would call for a filter h(i,j). When applied to f, h(i, j) would give peaks in the output image g(i, j) at locations where the pattern is present. The convolution product, typical of a filtering operation is the following: h(m,n)-f(i-m,j n). g(i,j)= Em m,n in D (1) Differences between a picture f and a pattern p can be locally modelled in various ways. The most common and theoretically appealing representation relies on Euclidian distance: [(f(i+m,j+n)- p(m,n)] 2 E= V (2) Therefore, E2 E = f2(i + m,j + n) m, n in D +p2(m,n)- 2f(i+m,j+n)-p(m,n). (3) The first term represents the energy of the input image in a domain D. It generally varies with the position of D in the image, but does not depend on the pattern. The second term represents the energy of the pattern and is constant. The third term is really interesting, because that is where a similarity between pattern and image may actually appear. It is defined as the cross correlation between p and f. Equations (1) and (3) lead to the well-known result that the filter should basically perform the correlation between the pattern to be detected and the analyzed image: h(i, j) p(- il -A = (4) A convolution in the space domain corresponds to a product in the Fourier domain. Changing the sign of the independent variable leads to the complex conjugate of the transform; therefore, we have the following equations: G(u, v) = H(u, v) * F(u, v) G(u, v) = P*(u, v) - F(u, v) (5) (6) where F, G, H, P, are the Fourier transform of g, f, h, p, respectively. P*(u, v) is the conjugate of P(u, v)and corresponds to p(-i, -). As introduced so far, the matched filter is rather simple. It is important, however, to see how it is defined when noise is present in the system. In this case, the following definition applies: v) H(uH(u,v)kW(u, v) =P*(u, v)(7 (7) where W is the Fourier transform of the noise and k is a constant (to be discussed later). Note in (7) that the frequency bands where the noise is large tend to be attenuated. On the other hand, if the noise is white, the matched filter is essentially the same as if there were no noise at all (6). In industrial situa- IEEE TRANSACTIONS ON PA1TERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-6, NO. 6, NOVEMBER 1984 688 tions where bin picking must be performed, low (spatial) frequency variation of the illumination and workpiece reflectance variations are the main noise sources. Template matching can be reduced to cross correlation in (2) if the energy of the image is relatively uniform everywhere. If not, the first term in (2) must be taken into account, and this leads, in particular, to the following normalized cross correlation: zE p(m,n)-f(i+m,j±n) g(i, j) _ (8) I/ 1/ m, f2(i+m,j+n) , m, n in D Operating on images, the normalized cross correlation detects Fig. 1. Dashed circles enclose two potential holdsites. In each case, patterns which have the same shape as the template, even though the segment of interest is much smaller and presents more symmetries they possibly have a different amplitude. The constant k in (7) than the workpiece as a whole. can serve the same purpose. Although it appears as a constant in (7), k usually varies with the space coordinates of the picture when normalization is performed. To make this explicit, we can write I k(i, j) = f2(i- m,J -n) m, n 9 (9) in D Normalized matching is sometimes necessary, although in practice, it is often ignored in order to save computation time. IV. COMPUTATION REDUCTION Applied naively in order to detect each workpiece in the picture of a bin, the matched filter is very time consuming. Consider a workpiece represented by M X M picture elements (pixels). For each value of pitch, yaw, and roll, a different filter is defined. The angular resolution may be considered to increase linearly with M (for example, the resolution for pitch is best along the window perimeter). This leads to about M3 filters, each requiring approximately M 2N2 operations (multiplication and addition) per picture of size N X N. The computation load can be expressed as follows: L = cM5N2 (10) (b) (a) Fig. 2. Workpiece pose need not be completely known for a good grasping. The gripper in (a) allows a random orientation in one plane, while the vacuum cap in (b) can accommodate rotation about two axes. where c is constant. The computation load can be reduced when workpieces are For rigid and semirigid workpieces, holdsites can be small in matched locally, which affects M and its exponent, and when respect to overall object size. Therefore, M decreases, and the the bandwidth of the filter output is considered, which affects pattern to detect leads to much less computation (Fig. 1). both M and N. Notice in (10) that the computation load increases with the fifth power of the pattern size. A. Matching Local Structures Additional benefits appear when the robot hand is taken into Consider our specific problem, bin picking. Generally, the account. First, the end-effector may be applied without repattern to be matched may be drastically reduced, if the goal quiring knowledge of all workpiece translations and rotations. of detecting the workpiece type and pose is postponed, and For example, assume that a parallel-jaw hand is used in order the more practical subgoal of picking "something" out of the to pick an object [Fig. 2(a)]. A rotation of the object in the bin is selected. Vision algorithms must then detect potential symmetry plane between the jaws usually does not affect graspholdsites, i.e., places where a robot end-effector is likely to ing. Similarly, if a compliant vacuum cup is used, small rotagrasp a workpiece. Potential holdsites appear in images as local tions about two axes can be ignored [Fig. 2(b)]. Another benefit derives from the property that, considered structures that reflect both workpiece and end-effector characteristics. Indeed, even for the same object, holdsites are locally, workpiece components are much more likely to present usually located differently when different types of end effectors symmetries than when viewed as a whole (see Fig. 1). For instance, it would probably be inacceptable to consider an entire are used (e.g., vacuum cup versus parallel-jaw hand). 689 DESSIMOZ et al.: MATCHED FILTERS FOR BIN PICKING connecting rod as cylindrical. However, the approximation is much more valid for the central shaft and any segment of it than for the whole object. The larger ring of a connecting rod has a similar circular symmetry about another axis. Both properties affect the dimensionality of the computational load, reducing it to a dependence in M4, M3, or even less for some workpieces. Notice also that images are essentially orthographic in our applications. Distance between camera and bin is large with respect to bin depth. Thus, distorsions due to perspective viewing can be neglected. B. Slow-Rate Resampling or "Eigenfilters" Intuitively, it seems that the matched filter technique requires a large number of filters in order to detect the actual orientation of a holdsite. However, both the traditional signal processing theory and vector space considerations point here to a reasonably small number of filters. The point spread functions of matched filters for bin picking tend to be defined on domains much larger than one pixel (e.g., [11]). When viewed in the Fourier domain, matched filters often appear as relatively narrow passbands. By examining this matched filter bandwidth, the minimum sampling rate for images can be estimated and the corresponding de-aliasing can be simply achieved by defocusing the camera lens. Essentially, this means that images can often be coarsely sampled before matched filtering. If a higher resolution were provided, the additional information would be rejected by the matched filter, leading to the same result at a higher computational cost. In the previous section, matched filtering has been defined for one and two dimensions. This is not adequate when signals are of dimension three or more. Translations in the image plane can be expressed in a two-dimensional space; the most common third dimension corresponds to the rotation in the image plane of local structures to detect. The same analysis as above should be conducted in a polar coordinate system in order to identify the minimum angular sampling rate. A practical way of assessing the angular bandwidth of a matched filter consists in monitoring its output when applied to a white (pseudo) noise image. This directly defines the minimum number of samples required, i.e., the minimum number of filters (each at a different orientation) that should be applied to images in order to detect local structures of a given type. The example of Section VI typically calls for four filters, if a signal to aliasing noise ratio better than 10 is chosen. The model used above is the classic Fourier, which basically expresses a signal as a sum of sinewaves. An alternative is possible where a space is generated by vectors corresponding to a set of filters which differ by a certain orientation angle. For example, a filter corresponding to a parallel-jaw gripper is copied 360 times with one degree of rotation in order that the resulting set of filters covers all the hand rotation possibilities. These filters, and thus their corresponding vectors, are usually not independent, and when the space is orthogonalized, the actual space dimension, and thereby the minimum number of (eigen) filters, may be found. In the example discussed in Section VI, four (eigen) filters prove sufficient for a two degree angular resolution, and two of them already yield a practically useful similarity measure for holdsite detection. In the former case, the structure orientation angle is related to the phase of the processed image, and in the latter case, the similarity is estimated by its magnitude. V. DESIGN OF MATCHED FILTERS FOR WORKPIECE ACQUISITION Workpieces and robot end-effectors must be known in geometric terms so that a potential holdsite can be detected. For instance, a parallel-jaw gripper would securely hold opposing parallel edges or opposed concave corners (") Q pattern), (e.g., [12] ). During part of the training phase, the geometric representation of potential holdsites must be transformed into light intensity maps similar to the one perceived by vision sensors. This mapping operation is often done in graphics (e.g. [13]), but is fully deterministic in that context. In particular, on every visible workpiece surface element, there are fixed amounts of incident light and effective shadowing. In our context, a slightly different formulation is necessary, essentially in order to take into account the stochastic nature of shadowing. Image sensors perceive the light intensity reflected by scenes. The light intensity is related to geometric properties by two laws: Lambert's Law 'r = I * Ir=IO I=I, o cos2 a (l-e W3dIC 0(I -e (11) (12) The first law is deterministic and states that the reflected light intensity Ir from a workpiece surface element varies with the incident light Ii on that element, with the angle a of the local surface normal in respect to the sensor axis, and with the surface reflectance constant c. This law is valid for diffuse reflection. When reflection is specular (mirror-like), the amount of reflected light decreases much faster than cos2 . The second law addresses a stochastic phenomenon and states that the average amount of light incident on a surface element Ii decreases as a function of the depth d of a surface element below the top layer of a bin of parts. We propose it as a result of empirical considerations. Assuming diffuse illumination above the bin, the amount of light falling on a surface element decreases with depth, because of the partial shadowing of higher workpiece layers. Indirect illumination is exponentially attenuated by losses in consecutive reflections. In the model of (12), the constant cW corresponds to a shadow and reflectance factor and is workpiece dependent. It can be physically interpreted as the depth ofthe deepest part below top layer, which can be observed in a typical bin of parts. (Ii on that deepest visible part amounts to less than 5 percent of Iio.) In principle, the signal to detect P is now identified. Properties of the noise W should be evaluated. W is defined on the same domain as P. Most noticeably, noise includes slow (spatial) variations in illumination both on top of the bin, where it is not perfectly uniform, and as a function of depth, when various layers are processed. Therefore, the very low (spatial) frequency component should be rejected. There is another noise component which is scene dependent. 690 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-6, NO. 6, NOVEMBER 1984 This noise component appears as potential holdsites (in the sense of a good similarity with P), which correspond to sites where the robot hand cannot acquire the object. (An attempt can be physically made, or training can be assisted by an operator.) Perhaps the most adequate method to extract it relies on an iterative technique, applied during a training phase. [See (14)-(1 7).] Mathematically the technique can be described as follows, for one matched filter: H(n) =(1 - k)H(n-1) +k 1 +6(o,o)+ W(n) (13) W(n-1) if no potential holdsite is detected W(n) = or if a potential holdsite is detected and proven successful (14) (1 - k) W(n-1) + k B(n) if a potential holdsite is detected and proven impractical W() =0;H(°) = 1 +6(o,) B = F[b("' i)] and a potential holdsite is detected if y(n -l) >E. Fig. 3. A parallel-jaw (PJ) gripper. (15) (16) (17) The subscript (n) or (n - 1) indicates the iteration number. 6(o, o) is a Dirac function, corresponding to the average illumination; y is the output of the filter when an image is processed. Typically, the image represents a bin full of workpieces, and E is a threshold above which the image is found locally very similar to the pattern to detect. In (16), b(e") is defined as the image in the domain D centered on a potential holdsite as defined by (17). F [x] denotes the Fourier transform of x. VI. PARALLEL-JAW FILTER: AN EXPERIMENT The matched filter presented in this section can be said to be applicable to those parts that can be picked by a parallel-jaw gripper (see Fig. 3). Experiments have been performed which illustrate the use of the matched filter technique in a practical context. A. Filter Definition The pattern to be matched, in order to allow a parallel-jaw gripper to pick up a part, can be roughly described as two dark regions surrounding a bright region [see (11) and (12)]. This pattern can be seen in Fig. 4. The bright region represents the part; the dark regions represent places where the gripper can be placed, i.e., where the part does not touch adjacent pieces. The dimensions of the window are both gripper and piece dependent; the size of the dark regions has to be large enough to "hold" the jaws of the gripper; the size of the bright region should match a typical structure of the part. The filter was computed as the differences in the average pixel intensity in the light region and in the dark region, each pixel with a weighting factor of one. The average weight of the whole filter is zero, thereby rejecting illumination variation. Presented like this, the filter exhibits dimensionality reduction by discarding two of the three angles that specify the pose Fig. 4. Parallel-jaw (PJ) filter. Correspondence between geometrical and reflectance properties is illustrated here. The "+" area corresponds to a bright zone where a workpiece is visible, while the "-" area corresponds to deep locations in the bin, where the gripper jaws should not encounter obstructions. of the part, specifically those that correspond to rotations within the jaws, and by using a much smaller pattern than the one that would be necessary to match a full piece. One problem remains, though, which is the rotation in the image plane, and which, if not taken into account, will make the filter unpractical for most purposes. To overcome this problem the first thing to be noticed is the fact that the jaws are assumed symmetric, thus the angular domain to be sampled is reduced to a range of 180 degrees. Second, the gripper could accommodate a certain margin of error for direction estimation within the plane, say 9°. This would require an 180 sampling interval, leading to a total of 10 filters, which is still quite large. 1) Rotational Sampling: Experiments have been performed where the parallel-jaw (PJ) filter described above is applied to a (pseudo) white-noise image. The bandwidth of the output was observed. The results are discussed here which support the considerations of Section IV-B. As the image consists of white noise, the output of the filter is stochastic. The experiment was repeated many times in order to estimate the average power specturm. The test image was divided into 36 nonoverlapping regions. In each region the PJ filter was applied 180 times, at a 10 rotation interval. The Fourier transform of the output was evaluated, leading to a power spectrum. Referring to Fig. 5, using an ideal white noise image as the input, the output will be exactly the transfer function of the matched filter. Hence, by analyzing the Fourier transform of the output, we can get a very good idea about the cutoff frequency of a particular filter. Then, according to the Shannon DESSIMOZ et aL: MATCHED FILTERS FOR BIN PICKING W494IT 691 TRAA S.FER FLMCWT1 tOEE UTM 9 I ILTE I Fig. 5. Estimation of matched filter transfer function through the of white noise. use (a) TABLE I COLLISION FRONT SUCCESS RATE. Two HUNDRED CASTINGS WERE REMOVED. BINS WERE LOADED TEN TIMES wiTH 20 CASTINGS. Filter Dimension (U, Lm, Lp) (9.4.9) (7.4.9) (5.4.9) (3.4.9) (9.3.7) (7.3.7) W190 (cutoff Freg.) Power ratio (%) 1 92.9 2 2 98.5 95.9 2 96.8 DC comporert (magnitude) 868.67 1304.50 1615.56 1864.83 1362.03 90.1 94.1 (3.3.7) 1 1 1 2 (9.2.5) (7.2.5) (5.2.5) (3.2.5) 2 2 2 2 97.5 98.4 98.1 1382.42 93.6 1685.67 1553.97 1510.25 (9.5.5) 1 2 2 95.0 98.0 751.42 1185.47 2 98.3 93.6 1 1 1 2 90.2 95.3 92.1 96.8 2034.17 2258.31 2 98.1 97.0 97.8 98.2 1538.92 1995.28 2025.13 2056.11 (5.3.7) (7.5.9) (5.5.9) (3.5.9) (9.4.7) (7.4.7) (5.4.7) (3.4.7) (9.3.5) (7.3.5) (5-3-5) (3.5.5) 1 1 2 91.9 96.7 1932.03 2281.69 2551.58 (b) 1484.50 1730.42 1299.50 1778.64 (C) theorem, the maximimum number of samples (i.e., filters at different orientation) needed can be determined. Practically, the procedures are the following: create a pseudowhite noise image. Apply the particular matched filter of interest to the image with 512 rotations in the range (0, ir); the rotations are made with regular coordinate transformation and bilinear approximation. The range (7w, 2wa) is omitted because of symmetry. Then take the fast Fourier transform (power spectrum) of the output. To avoid the bias due to nonideal whiteness of the image, we repeat the above steps at 36 independent (nonoverlapping area) points, and take their average as a good estimate of the ideal power spectrum. Then set 90 percent of the total power as the criterion for bandwidth (BW). We obtain the results shown in Table I. Some typical results in linear-log scale are shown in Fig. 6(a)-(d). They correspond to filters having dimensions (7,4,9), (7, 5, 9), (9, 3, 7), and (9, 4, 7), where the first value is the width W, the second is the length of the jaw region L., and the last, Lp, is the length of the center area in Fig. 4. As expected, those filters are low pass and very narrowly band-limited. Another interesting point is that, in general, the filters with square central region appear to have the most narrow passbands among the PJ filters. Notice that for certain PJ filter sites the bandwidth varies, 6. Power spectrum at the output of various (width, Lm, Lp) filters: but is limited for practical values to the range 1-2 cycles per Fig. (7, 4, 9) filter in (a), (7, 5, 9) filter in (b), (9, 3, 7) filter in (c), and 3600. Intuitively speaking, the filter output swings twice be- (9, 4, 7) filter in (d). Ail spectra have been estimated in 36 different nonoverlapping areas of a white noise image and averaged. tween low and high values as the filter makes a 360° rotation 692 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-6, NO. 6, NOVEMBER 1984 Fig. 7. Angular ranges corresponding to various widths. on an image region. The size parameters of the PJ filter for our application where connecting rods and yokes are acquired lead to a bandwidth of two cycles per revolution. The Nyquist theorem therefore calls for no more than four angular samples, i.e., four filters at regular angle interval. Once the filter is defined, according to the gripper and the workpiece being considered, its angular spectrum can be analyzed and the number of necessary filters is easily obtained. The approach described above is fully automatic while the ad hoc solution given in the sequel requires human operator ingenuity. 2) Using "Eigenfilters": The previous approach defines the minimum number of filters by considering aliasing and signal to noise ratio. It would be preferable to have it directly related to holdsite detection or orientation estimation error. The latter strategy is adopted here, as the number of basic filters is increased until experimental data match the desired performances. The ideal situation would be the achievement of an "eigenfilter" situation where the use of a very small number (e.g., two) of orthogonal (or at least independent) filters, would yield acceptable performances. Depending on the spanning angle of the filter, i.e., the angular sector that it occupies,more or less overlapping will be obtained between successive rotated filters (see Fig. 7). Even filters that are 450 out of phase would contain some common information about the pattern to be matched. The spanning angle corresponds here to an angular averaging, which in effect is a low-pass (angular) filtering. A test was designed where filters of varying widths were passed over an operator-selected holdsite on a bin of connecting rods, for half a rotation. Two examples are shown in Fig. 8(a) and (b). The response curve suggested that the larger the width, the lower the resolution of the filter, as demonstrated by a flatter peak response of smaller amplitude. The test also suggested that for a sufficiently wide filter, a cosine square dependency was being obtained with angle. (Applying the Nyquist theorem, we would then need only two filters, 900 apart, the output to other filters being obtained by interpolation.) With one such filter, another test was performed, where a synthetic image was created consisting of a perfect holdsite, corresponding to the filter in dimensions. The filter was applied twice, with a 90° interval. This is equivalent to applying two filters which are 90° out of phase. We shall refer to them as the vertical or the horizontal filter. Assuming a sine square, (a) (b) Fig. 8. PJ filter response versus angle for various widths. The filter is applied at the point addressed by a cursor. In (a), width = 9, length minus = 4, length plus = 11. In (b), width = 3, length minus = 4, length plus = 11. Refer to Fig. 4 for the definition of variables. cosine square variation for the filters' responses, H=M- sin2 a (18) V = M - cos2 a. The peak magnitude and angle were estimated as M=H+ V (19) a= +/- tan-' (HIV)I12. The sign ambiguity in the angle determination was eliminated with the use of another pair of filters 45° out of phase, i.e., at 45 and 1350. The errors on magnitude and angle determination can be seen in Fig. 9. Estimation can be considered good over the entire range. For the angle determination they are even lower if we combine the most accurate range of each filter pair. The resulting maximum error in orientation estimation is about 20 for the entire range. In the magnitude estimation there is also some gain, but not as large. The new error plot can be seen in Fig. 10. This result suggested the following procedure to apply the filter. First use only two filters, the horizontal and vertical for simplicity of computation; with the output images estimate the magnitude of the response; find the maxima in the response DESSIMOZ et al: MATCHED FILTERS FOR BIN PICKING 693 Image acquisition I Horizontal and vertical filtering Magnitude estimatnnard peak detectbon Angle estimation and Symmetry ambqtty elimination (45- and 135- degree filtering) Fig. 11. Main steps for potential holdsite detection. Fig. 9. Error plots for test 1 (2 filter estimation). each of the phases follows, with explanation of the several parameters involved in each. This algorithm has been programmed in a mixture of assembler and Fortran to run on a minicomputer. The procedures that form the core of the algorithm have then been translated in Pascal for documentation and transportability purposes. 1) Image Acquisition: A gray scale image of the bin was acquired and stored in memory as a two dimensional array G, (20) 0 < G(i,j) < MAX; 1 < i, i < N where MAX is the maximum intensity level and N the size of the image. For notation convenience assume that this is already the reduced resolution image desired. The two filtered images are obtained using a recursive computation. Only the vertical filtering computation is described as the procedure was similar for the horizontal filter. The filtering is only computed where the window fits completely within the image. Assume a window of width W, and length ofthe dark or minus regions, equal to Ld and of the bright or plus region, Lp. Assume also that both W and Lp are odd integers. A buffer B is created containing the sum of W pixels along the image columns. The recursive computation of B is BX(j) =B1._(j) G(i w,j) + G(i - Fig. 10. Error plots for test 2 (4 filter estimation). - - 1 + w,j) j = 1, * * ,N i=w+1, ,N+l-w w=(W- 1)/2 (21) and select a fixed number of holdsites; for those, and only those, apply the filter at 45 and 1350; use one pair to estimate the direction, the other to resolve symmetry ambiguity. The with starting conditions pair to be used for direction estimation is the one that has w more balanced response, i.e., the one for which the ratio of the (22) BW() = E Go(i,); j=1, ,N. i=1 larger response to the smaller is closer to unity. The rationale for this choice can be understood by analyzing the error plots For each occurrence of B, two buffers, M and P, are recurjust presented, as within the best estimating range of a filter sively computed, holding the sums of the pixels in the minus pair, both filters yield closer responses than outside it. and plus zones of the filter as B. Detection of Potential Holdsites Mi(L +j) =M,(L +j - 1) - B1(j) Matched filtering is only one component of the overall algo- B(j + Lp +Lm) rithm which allows a robot with vision to acquire workpieces from bins. + Bj(j +Lm) The breakdown of the different phases of the parallel-jaw + Bi(j + Lp + 2Lm) filter algorithm can be seen in Fig. 11. A brief description of (23) 694 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-6, NO. 6, NOVEMBER 1984 P,(L + j) =Pj(L + i - 1) - Bi(j +Lm) +Bi(j+Lp +Lm) j=l,--,N+I-2L i =w, * * ,N + I - w L =(Lp + 1)/2 +Lm - (24) with starting conditions Mi(L)= Pi(L) = Lm E Bi(k) +B1(k +Lp +Lm) (25) Lp E (26) k-l k=1 Bi(k + Lm). The vertical image V is obtained from the buffers P and M as V(i, j) =(I/W) [(Pi(j)/Lp) - (Mi(i)/(2 Lm))] (27) if>0 =0, otherwise. (27) The two filtered images are then summed to estimate the magnitude response. The S largest local maxima are chosen as the selected holdsites if their magnitudes are above a minimum threshold T. In case all peaks are below that threshold the bin is considered empty. The local maxima are obtained in the following way. An ordered list of peaks is created with length S. When a new magnitude is computed it is compared against the list of entries. If its magnitude is smaller than the smallest entry it is rejected. If the entry is found somewhere within the list the distance to all entries above it is computed. If it differs by less than a minimum distance D from one of those entries it is rejected, otherwise it is inserted in the list; all entries below it that are closer to it by less than D are then removed from the list. Two filtering windows identical to the ones used for vertical and horizontal filtering, rotated 450 from those ones, are passed over the image in a brute force way, i.e., as a double summation of pixel values, their coordinates being computed using standard plane rotation. Direction of the holdsite is then computed following the criterion explained above. Sometimes the algorithms detect false holdsites, i.e., regions in images which do not correspond to sites where workpieces can be grasped. An example of such a situation might be a spurious reflectance pattern in an empty bin. The false holdsites lead to failures because the robot gripper makes a motion but cannot acquire any workpiece. In order to avoid deadlocks, some bookkeeping must be made of past failures. Essentially, subsequent holdsite detection should be inhibited where failures have occurred. C. ExperimentalResults Fig. 12 (a)-(c) shows the result of applying the parallel-jaw filter algorithm to different pieces, the connecting rod and yoke castings, and titanium cylinders. Shown are the input gray scale image, the horizontal and vertical filters output images, and (a) (b) (c) Fig. 12. PJ filter applied to various workpieces: in (a), connecting rod castings; in (b), yoke castings; in (c), large titanium cylinders. DESSIMOZ et al.: 695 MATCHED FILTERS FOR BIN PICKING CAMERA CAMERA 2 1 TRAVERSE PLANE / BIN I CHUTE CHUTE 1 2 Fig. 13. Schematic of the dual bin execution cycle, with key labeled. the holdsites selected. Notice that the filter output has only half of the resolution of the input images, as the coarser sampling is made possible by the passband effect of the matched filters. Following potential holdsite detection, robots with vision have to physically acquire workpieces. For our experiments, the system architecture included a robot MARK IV, a Cartesian robot developed at URI, a pair of bins, two overhead cameras (GE TN 2000) looking down in two bins, and two receiving sites for the acquired parts. The supply bins had a flat bottom area the same size of the field of view. To provide collisionfree access to all the pieces, the sides of the bin were slanted. The alignment of the two cameras in relation to the robot's coordinate frame was done in a semiautomatic way, generating a camera model which is described in [13]. This model provided relatively free position of the cameras which were placed in an almost vertical orientation above the bins. The software ran on a minicomputer (Control Automation LSI-2). To speed up execution, acquisition was performed in parallel from two bins. Two tasks (see Fig. 13) were run simultaneously by the real-time executive. One of the tasks performed the image analysis on one of the bins while the other performed the acquisition on the other. The acquisition task had priority over the image analysis task, which was only allowed to take control of the CPU during arm motions. Cycle Times: If the image analysis task was run alone, an execution time for the parallel-jaw filter algorithm of about 5 s was achieved. Image complexity slightly affects the computation time. With the system architecture used, paralleling the image analysis with an acquisition attempt, the overall cycle took about poses TABLE II Removed on first attempt (pl) ...... ................. Removed on second attempt (p2) ...... ................ Removed on third attempt (p3) ...... ................. Not removed in three attempts (new picture taken) ... 57 24 6 13 % 7. % 7. 9 s/piece. This time was generally dominated by arm motions in the sense that, even if the image analysis was performed in no time at all, the reduction in cycle time would be negligible. It is difficult to assess the success rate of vision algorithms for bin picking. Ultimately, it is only by trying a detected holdsite with an actual end-effector that the success or failure can be observed. Thus, the algorithms require a complete system in order to be tested. System performances will be effected by many factors which are totally unrelated to vision, for example, the size of a vacuum cup. Even when the chance is taken to assess the success rate for a certain algorithm and system, the results are of limited value. What would be far more interesting, if possible, would be to assess the performances for a large class of objects or even, for all existing objects. The practical limitations for such an assessment are obvious and the best achievable rating would be restricted to a "standard workpiece set," yet to be defined. Some informal and formal statistics were collected observing several runs of the system. They are all based on allowing three holdsite selections per bin at most. The second selection would be considered only in case the first attempt was a failure, and the third in case the second was also a failure. In the case all sites resulted in failures, a new image was taken and analyzed. A test run was made for the "collision fronts" algorithm and the connecting rods castings. Results are show in Table II IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-6, NO. 6, NOVEMBER 1984 696 held. In the future, geometric modeling should allow the simulation of the same operation, thereby leading to the automatic definition of holdsites. 3) Hierarchical approach where matched filters would not directly detect holdsites but rather extract holdsite components. Filter output could be interpreted at a higher level-or undergo additional iteration. (a) (b) 4) Acquisition of randomly piled workpieces requires comFig. 14. Nonlinear filters seem necessary, when very small structures plex systems. In particular, components nonrelated to vision make grasping difficult. In (a), small holes must be detected, if a vacuum cup hand is used, to avoid leaks. In (b), narrow fins might have an influence on performances. Therefore, additional rebreak if a parallel-jaw hand is not carefully positioned. search is especially needed in end-effector design, for aspects related to kinematics (e.g., how many fingers), geometry (e.g., and indicate a probability p, larger than 50 percent for coming what finger shape), and sensory capabilities (e.g., artificial out of the bin with one part at first trial and of almost 90 per- skin). cent not to come out empty. Considering that this piece was REFERENCES once considered the "difficult" problem for bin picking, the Ferloni, 1. Franchetti, P. Vincentini, and P. Fici, "Ordinatore: results are very good. Notice in Table II that experimental re- [1] A. A dedicated robot that orientates objects in a predetermined direction," in Proc. 10th Int. Symp. Industrial Robots, Milano, Italy, sults agree rather well with the statistical predictions that define 1980, pp. 655-658. the probability P2 of being successful in exactly two trials as [2] A. Romiti, "Picking from a bin through tactile sensing," in Proc. Pi - pl and similarly,p3 asp, - 2p +p I1th Int. Symp. IndustrialRobots, Tokyo, Japan, Oct. 7-9,1981, pp. 273-280. For the parallel-jaw filter algorithm no formal results were T. Sakata, "An experimental bin-picking robot system," in Proc. [3] was in obtained as at that point in time the Robot Laboratory 3rd Int. Conf. Assembly Automat., Stuttgart, Germany, May the process of changing from the MARK IV robot to an indus25-27, 1982, pp. 615-626. trial robot, a Unimation PUMA 600, from the original PJ gripper [4] R. Tella, J. R. Birk, and R. B. Kelley, "General purpose hands for bin picking robots," IEEE 7rans. Syst., Man, Cybern., vol. SMCto a new model, PJ II, to a more powerful computer, a TI 12, Nov.-Dec. 1982. 990/12, and also making an effort to write all programs in [51 J.-D. Dessimoz, J. Birk, R. Kelley, and A. Beckwith, "Between bins and goalsites: Workpiece pose estimation," in Proc. 3rd. Pascal to facilitate technology transfer. For the period of time Int. Conf Automat. Assembly, Stuttgart, Germany, May 25-27, where the old system was still running, the parallel-jaw filter 1982, pp. 603-614. algorithm seemed to outperform the former collison fronts [6] J.-D. Dessimoz, M. Kunt, G. H. Granlund, and J. M. Zurcher, "Recognition and handling of overlapping parts," in Proc. 9th algorithm for the two workpiece types of our experiments, conInt. Symp. Industrial Robots. Washington, DC, Mar. 1979, pp. necting rod and yoke casting. The numbers presented in Table 357-366. II can then be thought of as a lower limit on performance of [7] W. Perkins, "A model-based vision system for industrial parts," IEEE Trans. Comput., vol. C-27, pp. 126-143, Feb. 1978. the parallel-jaw filter algorithm. 0 0 , VII. CONCLUSION The paper has introduced a new technique for the acquisition of randomly piled workpieces by a robot with vision. It complements past algorithms especially for situations where scene representation by binary images is not adequate. Although it appears to allow the automatic acquisition of a significant class ofworkpieces,it is not universal. Topics related to the described technique and where research is presently needed include the following ones: 1) Definition of nonlinear "matched" filters for applications where a small structure would prevent the successful acquisition of workpieces. For example, even a relatively small hole in a workpiece would prevent a vacuum cup from acquiring it, or thin fins along a handle would prevent a parallel-jaw from acquiring an object (see Fig. 14). In both cases, however, a linear filter would probably not perceive these small contributions or they would be dominated by other sources, e.g., workpiece illumination variations. 2) Automatic definition of matched filters, from solid modeling representations of workpiece and robot end-effectors. Consider the selection of potential holdsites of a workpiece by a given robot end-effector during training phase for a new workpiece type. At present, an operator estimates and possibly manually tries locations where the workpiece can be securely [8] R. Kelley, J. Birk, J.-D. Dessimoz, H. Martins, and R. Tella, "Acquiring connecting rod castings using a robot with vision and sensors," in Proc. Int. Conf. Robot Vision Sensory Contr., Stratford-upon-Avon, U.K., April 1-3, 1981, pp. 169-178. [9] R. Duda and P. Hart, Pattern Classification and Scene Analysis. New York: Wiley, 1973. [10] W. K. Pratt, Digital Inage Processing. New York: Wiley, 1978. (11] J. Birk, J.-D. Dessimoz, and R. Kelley, "General methods to enable robots with vision to acquire, orient, and transport workpieces," presented at 9th NSF Grantees' Conf Product. Res. Technol., Ann Arbor, MI, Nov. 2-4, 1981. [12] S. D. Roth. "Ray casting for modeling solids." Comput. Graph. Image Processing, pp. 109-144, 1982. [13] H. Martins, J. Birk, and R. Kelley, "Camera models based on data from two calibration planes," Comput. Graph. Image Processing, vol. 17, pp. 173-180, Oct. 1981. [14] R. Kelley, H. Martins, J. Birk, and J.-D. Dessimoz "Three vision algorithms for acquiring workpieces from bins," Proc. IEEE, Special Issue on Robotics, pp. 803-820, July 1983. Jean-Daniel Dessimoz (S'79-M'80) was born in Conthey, Switzerland, on March 17, 1953. He received the B.A. degree in humanities from the College de Sion, Switzerland, in 1971, and the M.S. and Ph.D. degrees in electrical engineering from the Swiss Federal Institute of Technology, Lausanne, in 1976 and 1980, respectively. He was with the Department of Electrical Engineering, University of Rhode Island, Kingston, from 1980 to 1982 where he was an Assis- 697 DESSIMOZ et aL: MATCHED FILTERS FOR BIN PICKING tant Professor. Since 1982, he has been sharing his time between the Ecole d'Ingenieurs de l'Etat de Vaud, Yverdon-les-Bains, Switzerland, where he is currently an Associate Professor, and consulting. His research interests include machine vision, robotics, digital signal processing, knowledge engineering, and computer architecture. John R.Birk(S'67-M'70) wasbornin New York, NY, on June 10, 1944. He received the BE. degree in mechanical engineering from Cooper Union, New York, NY, in 1966, and the M.S. and Ph.D. degrees in biological engineering from the University of Connecticut, Storrs, in 1968 and 1970, respectively. He was with the Department of Electrical Engineering, University of Rhode Island, Kingston, from 1970 to 1982, where he was a Professor and was the first Director of the Robotics Research Center. Since 1982 he has been with Hewlett-Packard, Palo Alto, CA. His research interests include manipulators, computer control, scene analysis, and industrial automation. C'hiLin I, photograph and biography unavailable at the time of pub- lication. Robert B. Kelley (S'56-M'67) was born in Newark, NJ, on April 24, 1935. He received the B.S.E.E. degree from the New Jersey Institute of Technology, Newark, in 1956, the M.S.E.E. degree from the University of Southern California, Los Angeles, in 1958, and the Ph.D. degree from the University of California at Los Angeles in 1967. He has been at the University of Rhode Island, Kingston, with the Department of Electrical Engineering since 1966, where he is currently a Professor and Technical Director of the Robotics Research Center. His research interests include image analysis, pattem recognition, artificial intelligence, robotics, and advanced automation. Henrique A. S. Martins was Portugal, on June 22, 1953. born in Lisbon, He received the electrical engineering degree from the Instituto Superior Tecnico, Lisbon, Portugal, in 1976, and the M.S. and Ph.D. degrees in electrical engineering from the University of Rhode Island, Kingston, in 1980 and 1982, respectively. He was a Teaching Assistant from 1974 to 1976 and an Instructor from 1976 to 1977 at the Department of Electrical Engineering, Instituto Superior Technico, Lisbon, Portugal. In 1977 he took a leave when he was awarded a Fulbright-Hays travel grant to pursue the M.S. and Ph.D. degrees in the U.S. He was a Research Assistant at the Department of Electrical Engineering, University of Rhode Island, Kingston, from 1977 to 1982. He is currently at the Instituto Superior Tecnico, Lisbon, Portugal.