comment on this article

The best tool for the job

Looking to use a CPU or FPGA for image processing the response tends to be ‘it depends’. Both have inherent benefits as Brandon Treece explains.

Machine vision has long been used in industrial automation systems to improve production quality and throughput, but the biggest advancement in machine vision has been processing power. With processor performance doubling every two years and a continued focus on parallel processing technologies like multicore CPUs and FPGAs, vision system designers can now apply highly sophisticated algorithms to create more intelligent systems.

This increase in performance helps designers not only to acquire images faster but also process them faster. Pre-processing algorithms such as thresholding and filtering or processing algorithms such as pattern matching can execute much more quickly meaning designers are able to make decisions based on visual data faster than ever.

As more vision systems that include the latest generations of multicore CPUs and powerful FPGAs reach the market, vision system designers need to understand the benefits and trade-offs of using these processing elements.

Inline vs co-processing

When developing a vision system based on the heterogeneous architecture of a CPU and an FPGA, you need to consider two main use cases: inline and co-processing.

Above: In FPGA co-processing, images are acquired using the CPU and then sent to the FPGA via DMA so the FPGA can perform operations.

With FPGA co-processing, the FPGA and CPU work together to share the processing load. This architecture is most commonly used with GigE Vision and USB3 Vision cameras because their acquisition logic is best implemented using a CPU. You acquire the image using the CPU and then send it to the FPGA via direct memory access (DMA) so the FPGA can perform operations such as filtering or colour plane extraction. Then you can send the image back to the CPU for more advanced operations such as optical character recognition (OCR) or pattern matching. In some cases, you can implement all of the processing steps on the FPGA and send only the processing results back to the CPU.

“Vision system designers can now apply highly sophisticated algorithms to create more intelligent systems.”
Brandon Treece, National Instruments

In an inline FPGA processing architecture, you connect the camera interface directly to the pins of the FPGA so the pixels are passed directly to the FPGA as you send them from the camera. This architecture is commonly used with Camera Link cameras because their acquisition logic is easily implemented using the digital circuitry on the FPGA. This architecture has two main benefits. First, you can use inline processing to move some of the work from the CPU to the FPGA by performing pre-processing functions on the FPGA. This reduces the amount of data that the CPU must process because it implements logic to only capture the pixels from regions of interest, which increases overall system throughput. The second benefit of this architecture is that it allows for high-speed control operations to occur directly within the FPGA without using the CPU.

Vision algorithms

With a basic understanding of the different ways to architect heterogeneous vision systems, you can look at the best algorithms to run on the FPGA. First, you should understand how CPUs and FPGAs operate. Consider a theoretical algorithm that performs four different operations on an image and examine how each of these operations runs when implemented on a CPU and an FPGA.

CPUs perform operations in sequence, so the first operation must run on the entire image before the second one can start. In this example, assume that each step in the algorithm takes 6ms to run on the CPU; the total processing time is 24ms.

Because FPGAs are massively parallel in nature, each of the four operations, when running the same algorithm can operate on different pixels in the image at the same time. This means the amount of time to process the entire image is 4ms, so total processing time is 6ms.

Even if you use an FPGA co-processing architecture and transfer the image to and from the CPU, the overall processing time is much shorter than using the CPU alone.

Above: In the inline FPGA processing architecture, the camera interface is connected directly to the pins of the FPGA so pixels are passed directly to the FPGA.

In a real-world example you will apply a convolution filter to sharpen the image. You then run the image through a threshold to produce a binary image – this reduces the amount of data in the image by converting it from 8-bit monochrome to binary but also prepares the image for binary morphology. The last step is to use morphology to apply the close function. This removes any holes in the binary particles.

If you execute this algorithm only on the CPU, it has to complete the convolution step on the entire image before the threshold step can begin and so on. This takes 166.7ms when using the NI Vision Development Module for LabVIEW and the cRIO-9068 CompactRIO Controller based on a Xilinx Zynq-7020 All Programmable SoC, but if you run this same algorithm on the FPGA, you can execute every step in parallel as each pixel completes the previous step reducing the time to complete to just 8ms. In some applications, you may need to send the processed image back to the CPU for use in other parts of the application. Factoring in time for that, this entire process takes only 8.5ms.

Below: Since FPGAs are massively parallel in nature, they can offer significant performance improvements over CPUs.

Though the FPGA has benefits for vision processing over CPUs, those benefits come with trade-offs. For example, FPGA clock rates are on the order of 100MHz to 200MHz, significantly lower than those of a CPU. Therefore, if an application requires an image processing algorithm that must run iteratively and cannot take advantage of the parallelism of an FPGA, a CPU can process it faster.

In the previous example each of the processing steps operates on individual pixels, so the algorithm can take advantage of the massive parallelism of the FPGA to process the images. However, if the algorithm uses processing steps such as pattern matching and OCR, which require the entire image to be analysed at once, the FPGA struggles to outperform.

Below: Running this vision algorithm using an FPGA co-processing architecture yields 20 times more performance than using just a CPU.

Overcoming complexity

The advantages of an FPGA for image processing depend on each use case, including the specific algorithms applied, latency or jitter requirements, I/O synchronization, and power utilization. Often using an architecture featuring both an FPGA and a CPU presents the best of both worlds.

The problem with an FPGA-based vision system is overcoming the programming complexity of FPGAs. Vision algorithm development is, by its very nature, an iterative process, you need to determine not which approach works but which approach works best, and “best” is different from application to application. For some applications, speed is paramount. In others, it’s accuracy so you need to try a few different approaches to find the best one for any specific application.

To maximise productivity, you need to get immediate feedback and benchmarking information on your algorithms regardless of the processing platform you are using. Seeing algorithm results in real time is a huge time-saver when you are using an iterative exploratory approach and having the ability to make changes and see the results quickly is key. However, the traditional approach to FPGA development can slow down innovation due to the compilation times required between each design change of the algorithm. One way to overcome this is to use an algorithm development tool that helps you develop for both CPUs and FPGAs from the same environment while not getting bogged down in FPGA compilation times.

The Vision Assistant from NI is an algorithm engineering tool that simplifies vision system design by helping you develop algorithms for deployment on either the CPU or FPGA. You also can use it to test the algorithm before compiling and running it on the target hardware while easily accessing throughput and resource utilisation information.

Above: Developing an algorithm in a configuration-based tool for FPGA targets with integrated benchmarking cuts down on the time spent waiting for code to compile and accelerates development.

When considering whether a CPU or an FPGA is best for image processing, the answer is, “It depends.” You need to understand the goals of your application and use the processing element that is best suited to that design.

However, regardless of your application, CPU- and FPGA-based architectures and their many inherent benefits are poised to take machine vision applications to the next level.

Author details: Brandon Treece is a product marketing manager at National Instruments.

Brandon Treece

Comment on this article

This material is protected by MA Business copyright See Terms and Conditions. One-off usage is permitted but bulk copying is not. For multiple copies contact the sales team.

What you think about this article:

Add your comments


Your comments/feedback may be edited prior to publishing. Not all entries will be published.
Please view our Terms and Conditions before leaving a comment.

Related Articles