Research Projects Index

An Image Processing Custom Computer Using FPGAs

Danny Crookes & Khaled Benkrid

See Paper on Hardware Skeletons presented at FCCM 2001


Image processing requires fast realisation of certain compute-intensive operations. To provide this processing power, we are investigating the use of FPGAs as a more cost-effective alternative to parallel computing.

Normally, for users to gain the speed advantages of FPGA hardware, they need to program the FPGAs at a low level and to have a detailed knowledge of the device being used [1, 2]. This imposes a lengthy learning curve on users, and later can result in them spending most of their time designing and testing the hardware implementation of their algorithms instead of concentrating on the algorithms themselves. By making the hardware as invisible as possible, start-up and development times for system developers could be drastically reduced.


FPGAs have been used as the basis for implementing image processing algorithms in hardware. Our aim is to provide users with a very high level programming interface to an FPGA-based reconfigurable image co-processor. This will enable users to exploit the advantages of the hardware without requiring any specialist knowledge. The interface to the user is through software tools, including a high level language, a translator and a library. The envisaged execution hardware comprises a PC host processor (for control) and an image co-processor using FPGA hardware for computation.

An algorithm written in the high level language will be expressed in terms of predefined abstractions. The translator will then generate the necessary configurations for the host processor and the FPGA-based image co-processor respectively.


The high level abstractions being provided are based on concepts developed by previous research in parallel image processing in the Computer Science Department at Queen's University Belfast [5, 6]. As a result of this earlier work, an abstract machine for image processing was defined [7], with instructions at the complete image level (rather than at the individual pixel level).

From this, we have developed an abstract image co-processor suitable for FPGA implementation, with an instruction set which includes basic instructions to apply point-to-point operators (e.g. subtraction, thresholding, scaling) and neighbourhood operators (e.g. convolution with any user-defined template). These operators are applied at the complete image level. A wide range of built-in operators is available, based on those used in Image Algebra. The operators can also be applied recursively (in which the result of a neighbourhood operation is stored back in the original source image).

For instance, the standard Sobel edge extraction operation can be programmed using two image convolutions with the usual Sobel horizontal and vertical windows (Sh and Sv), an image-image addition, and a thresholding operation, as follows (where the operator |@| means convolution, then find the absolute value):

Sobel := ( (Image |@| Sh) + (Image |@| Sv) ) > Threshold;

One possible way to realise this abstract co-processor is on a specially-designed FPGA-based co-processor board, with its own memory and onboard control. However, we currently favour a more cost-effective approach, which is to utilise a standard high-speed bus (e.g. PCI), and to use the FPGAs as a genuine co-processor, with image data being held on the host processor.


A key feature of the abstract co-processor design is that it is dynamically extensible, in that new operators (and associated FPGA configurations) can be created either before or during program execution. Three instances where this could happen are:

(i) an optimisation will be to combine several successive image instructions of a similar kind into a single instruction which applies a new, compound operator. The compiler will generate the FPGA configuration for this new operator.

(ii) the user develops, by hand, the FPGA design for a new neighbourhood operator which is included in a library and becomes available for subsequent use as a built-in neighbourhood operator (e.g. DCT). This enables use to be made of separately-developed, hand-optimised designs.

(iii) using a high level notation designed for the purpose, users will be able to specify and use their own neighbourhood functions (of the kind in (ii) above), in a problem-oriented rather than hardware-oriented way. The compiler will generate efficient FPGA configurations from this. This notation is still under development.

Thus the high level abstractions are application-oriented (expressed in terms of object like images and templates), unlike Hardware Description Languages (HDLs) such as VHDL which are hardware-oriented high level notations for low level components (such as buses, latches, adders, etc.).


The above high level language abstractions are available to the user in the form of C++ classes, which are the programmer's interface to the image co-processor. These may be used from the outset, or may be included into existing applications which require hardware acceleration. Existing C++ programming properties may be applied fully to these new classes.

There are three classes of objects which the user can build and process:

image objects (includes details image size, pixel size, etc.)

template objects (includes details of weights, etc.)

operation objects (describes a possibly compound operation to be applied)

Image and template objects are the variables used in the application program. There is a range of functions (instructions) for the creation, initialisation, input/output, etc. of such objects.

An operation object represents an image-level instruction. Initially there is a set of predefined such objects, corresponding to the set of built-in image operators. More generally, an operation object is a tree defining a compound image level instruction which produces a single result image. It corresponds to a new instruction (reflecting the extensible nature of the co-processor), and subsequently enables the compound operation to be carried out as a single instruction. For example, a single operation object could describe the Sobel operation outlined above. A user could build a new instruction for this operation by constructing the operation object:

Sobel = NewOp( gt ( plus ( absconv (Im, Sh), absconv(Im, Sv) ), Threshold ) ;

When an operation object is built, it is at this stage that the configuration files for the FPGA implementation are generated. Later, when the operation is to be performed, the user calls a member function of the operation class called perform(). The appropriate configuration files are loaded, the input image data is fed to the FPGA(s) and the output image data is read back.


Implementations of the built-in neighbourhood and point-to-point operations in hardware are being realised using Xilinx (previously Algotronix) CAL1024 chips [8]. This fine grain implementation is well suited to the bit level approach, and we are investigating the use of the SDNR (Signed Digit Number Representation) arithmetic system. This circuit development and verification process is currently PC based using a CHS2x4 development board containing 8 of the CAL1024 chips.

The high level language interface is available (as C++ classes), though just in simulation mode. This includes a simulator of a co-processor board. The system is currently being used to test the strengths, weaknesses and ease of use of the image co-processor model, as well as to run simulations of possible hardware implementations. The C++ classes are currently running on both SUN Sparc stations and PCs.


[1] Gray J and Kean T, 'Configurable hardware: A new paradigm for computation', Advanced Research in VLSI, Proc. Decennial Caltech Confernece on VLSI, Pasadena, CA, 1989.

[2] Rose, J, 'Architecture of Field Programmable Gate Arrays', Proc. IEEE, Vol 81, No 7, July 1993.

[3] Shoup R G, 'Parameterized convolution filtering in an FPGA', More FPGAs, pp 274-280.

[4] Chan S C, 'A programmable image processing system using FPGAs', Int. Journal of Electronics, Vol 75, No 4, 1993.

[5] Crookes D, Morrow P J and McParland P J, 'IAL: a parallel image processing programming language', IEE Proceedings, Part I, Vol 137 No 3 (June 1990) pp 176-182.

[6] Brown T J and Crookes D, 'A high level language for image processing', Image and Vision Computing, Vol 12 No 2 (March 1994) pp 67-79.

[7] Steele, J A, 'An abstract machine approach to environments for image interpretation on transputers', PhD Thesis, The Queen's University of Belfast, 1994.

[8] Algotronix Ltd., 'The CAL1024 Datasheet', Nov. 1991.

Return to index

EPIC - An Extensible Parallel Image Coprocessor

This project is supported by the EPSRC under its Portable Software Tools for Parallel Architectures (PSTPA) programme. The project is in collaboration with University of Ulster at Coleraine, and Transtech Parallel Systems.

The primary objective of the project is to build an environment for developing portable, parallel software systems, initially targeted at image processing applications. An efficient implementation is being produced for distributed memory multiprocessor machines, using Texas C40's supplied by Transtech. Our approach is to decouple the potentially data parallel (and architecture-dependent) aspects of a program from the remainder by defining an abstract Extensible Parallel Image Coprocessor (EPIC), and developing associated software tools. The extensible nature of the coprocessor is crucial to obtaining optimised execution of programs.

The long term aim of the research is to contribute to the quest for a general purpose abstract programming model which guarantees efficient implementation. One approach to this is often to begin with a general-purpose model and to research techniques which make the implementation increasingly efficient. Our approach is instead to begin with an application-specific model which from the outset guarantees efficiency and portability, and from this model, generalise its abstractions to move towards a general purpose model.


The main objectives of the research project are as follows:

(i) To define a set of programming abstractions appropriate to parallel image processing systems. This set of abstractions constitutes the core of the Extensible Parallel Image Coprocessor (EPIC) model.

(ii) To design and construct an (object-oriented) application development environment, in which users can develop portable (and efficient) image processing applications based on the EPIC model.

(iii) To develop a rule-based code generation system which will allow the set of programming abstractions to be extended (and auttomatically optimised); i.e. to provide the extensible nature of the EPIC model.

(iv) To develop a full and efficient implementation of the EPIC model on one specific architecture - a distributed memory C40-based multi-processor system (although the EPIC model itself will be portable across a range of parallel architectures).

(v) To provide a wide range of base level operations, and an application layer based on Image Algebra and Image Morphology.

(vi) To identify and recommend how the EPIC model could be applied to alternative application domains.

Features of the EPIC model

The programmer's model of the parallel capability of an architecture is defined as an abstract coprocessor. While this is a standard way of achieving portability, the traditional problems with achieving efficiency and optimisation is being overcome by enabling the system to generate, automatically, new compound instructions. This retains the well-defined model of a coprocessor, but provides the equivalent efficiency of an optimising compiler with knowledge of the architecture. Features of the EPIC include:

- The coprocessor is object oriented. It processes objects defined by a range of classes (e.g. images of different kinds, vectors, templates, etc.) using an extensible set of operations on these objects. There is a base level layer of primitive operations, on top of is an application-specific layer of operations.

- One component of the EPIC model is an instruction builder, to build new higher level functions from a specification which is in terms of already-defined functions. Rather than base the new function on calls to existing functions, this component uses a set of transformations which transform the instruction specification into a new function. The rule set defining the transformations is designed to produce a function optimised for the target architecture. This optimising tool is the means of achieving efficiency without sacrificing portability. It avoids the need, for instance, for temporary storage and additional loop overheads traditionally associated with compound functions.

- The EPIC software environment is C++ based, for conformance with what is in effect an industry standard. Users actually program in C++, but make use of EPIC tools and class objects. The initial target architecture is based around a network of Texas Instruments C40 processors.

Return to index

TULIP - A Language for Image Processing

Danny Crookes & Jim Steele

We have developed a very high level image processing language, called Tulip, which enables very rapid development of low to medium level image processing applications[1]. Tulip is an extension of Image Algebra Language (IAL) [2], and provides several of the facilities of I-BOL [3] but in a more portable form. It is designed to be portable across a range of parallel architectures. It has been implemented on a Transputer network, and a subset runs on an AMT DAP.

Operations in Tulip are programmed at the complete image level. There is a standard set of image level point operations (using +, *, etc.). The power of Tulip comes from its range of built-in neighbourhood operators, and the template abstraction. User-defined neighbourhood operations can be written (in C).

As well as developing efficient implementations of Tulip, we use the language for teaching image processing, since students can program image enhancement or simple object recognition systems in a small number of lines. A PC implementation of Tulip is available, running under Windows.


[1] Steele, J A, 'An abstract machine approach to environments for image interpretation on transputers', PhD Thesis, The Queen's University of Belfast, 1994.

[2] Crookes D, Morrow P J and McParland P J, 'IAL: a parallel image processing programming language', IEE Proceedings, Part I, Vol 137 No 3 (June 1990) pp 176-182.

[3] Brown T J and Crookes D, 'A high level language for image processing', Image and Vision Computing, Vol 12 No 2 (March 1994) pp 67-79.

Return to index