Semi-Conscious: GPU – Now you’re just mashing it.

by Craig Sullender on May 5, 2011

CPU/GPU/FPGA/ASIC.

General purpose processors that do everything slowly vs. specialized functions that do some things quickly.

If a special function has broad application, why not use the processor for what it does best?

What’s a CPU to do?

By using an efficient combination of PLDs and datapaths, you can create smart, flexible, low-cost peripherals that take the load off the CPU. However, if so much functionality can be offloaded to peripherals, what’s left for the CPU to do? In many cases, not much—in some cases after system initialization, the CPU can be turned off! However, a more realistic solution is to use the CPU to do what CPUs do best, such as:

  • Complex calculations.
  • String and text processing.
  • Database management.
  • Communications management.
  • System management.

Why your embedded controller may not need a CPU

Exactly!

Why can’t general purpose processors do better? The cost is in the complexity of trying to create faster and more efficient random-access-order operations.

Pipelining & Instruction-Level Parallelism

Instructions are executed one after the other inside the processor, right? Well, that makes it easy to understand, but that’s not really what happens. In fact, that hasn’t happened since the middle of the 1980s. Instead, several instructions are all partially executing at the same time.

From the hardware point of view, each pipeline stage consists of some combinatorial logic and possibly access to a register set and/or some form of high speed cache memory. The pipeline stages are separated by latches. A common clock signal synchronizes the latches between each stage, so that all the latches capture the results produced by the pipeline stages at the same time. In effect, the clock “pumps” instructions down the pipeline.

Memory Bandwidth vs Latency

Since memory is transferred in blocks, and since cache misses are an urgent “show stopper” type of event with the potential to halt the processor in its tracks (or at least severely hamper its progress), the speed of those block transfers from memory is critical. The transfer rate of a memory system is called its bandwidth. But how is that different from latency?

A good analogy is a highway… Suppose you want to drive in to the city from 100 miles away. By doubling the number of lanes, the total number of cars that can travel per hour (the bandwidth) is doubled, but your own travel time (the latency) is not reduced. If all you want to do is increase cars-per-second, then adding more lanes (wider bus) is the answer, but if you want to reduce the time for a specific car to get from A to B then you need to do something else – usually either raise the speed limit (bus & DRAM speed), or reduce the distance, or perhaps build a regional mall so that people don’t need to go to the city as often (a cache).

Modern Microprocessors – A 90 Minute Guide!

Now you’re just mashing it.

What about GPUs?

The GPU lands in the middle, between dedicated pixel processing and general purpose computing. So you can argue that it is better for a vision system to be built around a GPU compared to an FPGA or a general-purpose processor. But all you have done is make the decision between FPGA/ASIC vs. software processor more complex. The main problem is not addressed: computer vision system architecture is the controlling factor in vision product progress. In other words, the common architecture is holding us back from creating a platform for applications. To vision innovators fighting the weight of overblown hardware, is the GPU tossed at them a float or an anchor?

For clarity start at the other end — what application, what cost, what camera? For an embedded vision system in a consumer product begin with separating pixel processing from the application.

If the volume is high and the cost needs to be low, the GPU is too much silicon: too expensive, too complex, uses too much power, needs frame buffers….

For a PC hosted machine vision project where volume is low and cost is secondary, use whatever (parallel processors, FPGA, GPUs) has the best tools!

Share:
---
---

{ 3 comments… read them below or add one }

Nathan Lively May 5, 2011 at 10:08 pm

Ha! Love the Always Sunny reference.
Gail the Snail: “I’m giving uncle Frank a handy under the table”
Frank: “Now you’re just mashing it in, you gotta stop”

Reply

Craig Sullender May 6, 2011 at 2:24 am

I don’t think anyone else will catch it.

Reply

David Teitelbaum January 10, 2012 at 8:21 pm

First time reader, very interesting article. Sunny reference was hilarious!

Reply

Leave a Comment

Previous post:

Next post: