Abstract.
We are developing a computational CMOS imager with
integrated early image processing general-purpose filter. The
goal of this collaborative work with the Jet Propulsion
Laboratory is to produce a single chip serving as a camera
able to pre-process the image in real-time through a convolution filter
chosen by the user, allowing an efficient implementation of a
variety of computationally intensive applications such as
autonomous navigation, object avoidance or intercept, real-time target
tracking and recognition.
Motivation.
A system capable of tracking any target in real-time in an unknown environment
finds numerous applications: object avoidance or interception, autonomous
navigation (machine vision, nano-rovers, robots, docking...), recognition
(tracking of eyes, nose...), etc. Low-level processing of images often consists
of repetitive and computationally intensive tasks that are also based on
convolution operators. A hardware implementation is very
well suited for such real-time vision systems. A software-based approach
would lack miniaturization (need of a camera, a computer with a frame
grabber) and would run too slowly (~1 frame/second) for most of these
applications requiring a real-time flow of data.
The
on-going collaboration with the Jet Propulsion Laboratory has brought
to this project expertise in Active Pixel Sensors. This CMOS technology
for building imager chips allows on-focal plane signal processing (as
opposed to their CCD counterparts that need to serially output the flow
of pixels to an external processing chip). The filtering can therefore
be implemented as a fast, low-power analog circuit.
Research and Achievements.
Convolution is achieved by matching a template to an image
using a computation unit, allowing generic filters to be used as a kernel. The
chip has an integrated imager array and a 9×9 pixel digital
memory to store the kernel. When recognizing or tracking a target, the kernel
represents the template, chosen off-chip through a separate
learning process. This part will therefore not be discussed
here. Filtering is performed through a column-parallel
architecture of computing units, so real-time computation
can be achieved.
The core of the filtering system relies on a 9×9 pixel cell which task is
to perform convolution with the imager. The image provided by the array reaches the
convolution block one row at a time. Hence, the image can be processed by only 9
rows of convolution units operating in parallel. To reduce the number of cells,
a small level of serialization was introduced by sharing the same convolution unit
among 16 neighboring columns.
Each of these cells computes the matching between the analog image (I)
and the template (T):
eq. (1)
The chip can be divided into three main components that form the system. The imager
comprises the pixel array, the necessary control logic for both row and column
addressing and the readout circuitry with correlated double sampling implemented
to reduce the image artifacts due to the fixed pattern noise (FPN).
The convolution includes the other two with an array of mixed-signal multipliers
and one of analog pipelined accumulators. Figure 1. below shows the block diagram
of the chip.

Figure 1. System block diagram
The imager.
A traditional voltage-mode pixel implementation was chosen to create the pixel array
of the chip. It allows good linearity and noise robustness that current-mode pixels
lack. However, for simplicity and compactness of implementation, the convolution
calculations downstream are (as described below) performed in the current domain.
A Voltage to current converter was therefore designed and is combined with the
double-sampling circuit that offers a reduction of the fixed-pattern noise of
the imager.

Figure 2. Imager and pixel layout detail
The voltage level read from each column when a specific row is selected, is converted
into a current by changing the current flowing through a resistor when modulating the
gate of a PMOS transistor as shown in Figure 3. The generated current is then mirrored
into either a current memory cell (the light-exposed value is saved at the beginning
of the readout cycle) or subtracted from the previously saved value so the dark
value (which contains most of the FPN artifacts) is removed from the picture. Images
taken with and without the FPN reduction circuit show the importance of such
capability, as seen in Figure 4.

Figure 3. Pixel readout and FPN reduction simplified circuit

Figure 4. Effect of the FPN reduction on a 64×64 pixel image
The multipliers Σ
(Ij·Kj).
The equation for convolution, eq. (1) above, shows that it is a two-dimensional
accumulation of products. In the row direction, the accumulators described below will
handle the "one row at a time" type of addressing of the imager. The pixels of one same
row being all available at the same time, the addition of the partial products in this
direction is made by connecting the current outputs of the multipliers (Kirchhoff's
currents law).
One of the operands being a digital, 8-bit signal, the multipliers are effectively
Digital to Analog Converters (DAC) weighted by the second operand (an analog current).
The input current acts like the reference current for the DAC and is mirrored into
binary-scaled current mirrors, which outputs are controlled through switches by the
bits of the template. To achieve 8-bit depth, it is therefore necessary to provide
scaled current mirrors from ×1 to ×128, which is not practical. To circumvene
this problem, multiplier was divided into two halves corresponding to the 4 LSBs and
the 4 MSBs of the template, each scaling from ×1 to ×8. The output of the
MSB half is scaled up by a factor 16 and added to the lower half. The full multiplication
is performed using a much smaller area on chip.

Figure 5. One-pixel multiplier cell circuit
The accumulators Σ(Σ
(Ij·Kj))
.
The mixed-signal multipliers implemented arecombinatorial elements that do not have
the capability to memorize the partial products computed when previous rows were
available from the imager. Such a memory is, however necessary to reconstruct the
convolution over as partial products from nine consecutive rows. As seen in Figure 6,
accumulators for one culumn have nine inputs coming from the multipliers for the nine
template rows and the corresponding column neighborhood. When a new row from the imager
is readout, the partial products move two steps through the current memories in the
pipeline and a new set of partial products is generated. A new result, the convolution
from 9 rows ago, is then ready to be read out of the chip.

Figure 6. Pipeline accumulator - principle of operation.
Xi: input from multiplier for template row i;
CMi: Current memory.
Tests and Results.
This chip was fabricated and is currently in testing. A twin, simplified chip was also
fabricated, containing the same elements for test and characterization of each block,
independently of each other. Preliminary results show good performance of the imager
(Figure 4(b)) and the multiplier. As an example of the linearity of the multiplier,
Figure 7 below shows a linear fit of the multiplication of a saturated kernel (K=255)
with a varying current in the single-pixel multiplying cell described above.

Figure 7. Multiplier linearity: fixed template; input current of varying
intensity.
The project started with a
first prototype of the convolution chip, which was built and tested. It consisted of
a 64×64 pixel imager array, a 49-byte digital memory to store the kernel,
and a single 7×7 pixel convolution cell tied to the center of the imager.
Test results demonstrated a convolution was indeed performed and allowed identification
of adjustments to be made for the following fabrication run.
The
second generation of the convolution chip has been tested and was presented at
the 2003 Workshop on CCDs and Advanced Image Sensors, May 2003 in Elmau, Germany.
From the first generation, this run incorporated design modifications (we demonstrated
better matching in computation cells and in the pixels) as well as added features (sum of
all pixels necessary to perform normalization so targets can be tracked accurately, etc.).
The current
circuit, described here, is currently being fully tested and characterized. A number of
important design changes were included, either to address issues that could be improved on
(for instance, we moved from a current-mode imager to a voltage mode pixel coupled with a
current to voltage converter), or to demonstrate a new way to approach the problem (the
accumulators operate in the current domain and no longer in the charge domain). By
implementing a larger image array, we also demonstrated the scalability of the architecture
in the spatial domain to an arbitrary sized imager. Preliminary test results are encouraging
and full characterization is expected in the near future.
References
C. Basset, et al, "CMOS Imager with Embedded Analog Early Image processor",
2003 IEEE Workshop on CCDs and Advanced Image Sensors, Elmau, Germany, May 2003.
R.H. Nixon, et al, "256x256 CMOS Active Pixel Sensor Camera On A Chip", pp.
178-179, Proc. IEEE International Solid-State Conference, San Francisco,
Ca. Feb. 1996.
L.G. McIlrath, et al, "Design and analysis of a 512×768
current-mediated active pixel array image sensor", IEEE
Trans. on Electron Devices, vol. 44, pp. 1706-1715, Oct. 1997.
C.Clark, et al, "Application of APS arrays to star and feature tracking
systems", Proc SPIE, Vol. 2810, pp 116-120, 1996.
T. Komuro, et al, "A digital vision chip specialized for
high-speed target tracking", IEEE Trans. on Electron
Devices, vol. 50, pp. 191-199, Jan. 2003.
V. Gruev, et al, "Implementation of steerable spatiotemporal image filters on the focal plane",
IEEE Trans. on Circuits and Systems II, Vol 49, pp233-244, April 2002.
A. Graupner, et al, "CMOS image sensor with mixed-signal processor array",
IEEE Journal of Solid-State Circuits, vol. 38, pp. 948-957, June 2003.
A.A. Biyabani, L. R. Carley, T. Kanade, "An analog CMOS IC for template matching",
proc. IEEE Int. Solid-State Circuits Conference, pp. 82-83, Feb. 1999.