Color Space for Embedded Vision

by Craig Sullender on June 13, 2011

Color Space

A color space is a model for describing scene appearance. The components of a color space let us specify the reproduction of an image pixel by pixel. The quality of appearance each component describes–and how well it describes that quality–can determine the complexity, cost, and performance of our embedded vision system.

Why change from the RGB color space?

The main reason to convert RGB to another color space is that RGB doesn’t represent natural scenes efficiently. For example, luminance (brightness) is spread across the RGB values. But luminance and color (chrominance) behave differently in natural scenes. Surfaces (low spatial frequency) contain more consistent color, while edges and details like texture (high spatial frequency) contain more luminance variations. To efficiently compress or transmit the appearance of a scene, color components and luminance are described separately.

The second reason for color transforms away from RGB is to make optimal use of the color space data channels. Because RGB carries luminance information in all three channels, a portion of the data is redundant. Some of the dynamic range, or bit-width of the data channels, is wasted. The goal is to avoid a color space that has overlapping components.

Some color spaces were created for better matching of human visual response, or better matching between image capture and viewing or printing, or improved color precision or accuracy … color is endless, and an essential ingredient in both beauty and our daily lives, so we probably haven’t heard the last of it.

Which color space is best for embedded vision?

In computer vision, a reason to convert RGB to another color space is to better follow the change in appearance of an object under changing conditions. If an object passes from light into shadow, the measured object luminance has changed but not the color. If you are tracking an object you want a way of specifying the qualities of appearance separately. So again you want the components of appearance to be decorrelated, to act independently.

For example, in the HSV color space, a green object with hue = 120 (taken from the hue scale above) will still show hue = 120 when the object passes into a shadow or changes its orientation to illumination.

Actual lighting can be more complex, with multiple sources of different dominant wavelengths on the same scene.

Do the goals for color transforms used to correct, enhance, compress, and transmit images work in our favor for embedded vision systems? We typically need to recognize objects, frequently with limited processing resources.

Let’s survey the common transforms for vision and then look at what works for embedded systems. Follow the links for each transform’s details and equations. For fewer details, but a good summary with nice images, see this page. Here are the equations to go with it.

____________

Common spaces

RGB
“An RGB color space can be easily understood by thinking of it as “all possible colors” that can be made from three colourants for red, green and blue. RGB is a convenient color model for computer graphics because the human visual system works in a way that is similar — though not quite identical — to an RGB color space.”

Normalized RGB
Normalizing RGB removes intensity variations so that the R’G’B’ components specify color only, and no luminance. Used for tracking color objects because it is invariant to (unaltered by) changes in surface orientation relative to the light source.
R’ = R/(R+G+B)
G’ = G/(R+G+B)
B’ = B/(R+G+B)

The luminance component, if needed, is
Y = (R + G + B ) / 3

Normalized RGB is one of the better color spaces for skin detection.
Normalized RGB used in a tutorial for skin detection with example images.

Hue, from HSV (below), is another color invariant as it specifies color uncorrelated with other components that are expected to vary. The reverse is also true–in the following image hue is changed while saturation and brightness are constant.

Image source: Wikipedia

HSL – HSV – HSI
“HSL and HSV are the two most common cylindrical-coordinate representations of points in an RGB color model, which rearrange the geometry of RGB in an attempt to be more perceptually relevant … HSL stands for hue, saturation, and lightness, and is often also called HLS. HSV stands for hue, saturation, and value, and is also often called HSB (B for brightness). A third model, common in computer vision applications, is HSI, for hue, saturation, and intensity.”

YCbCr
“Y′CbCr is used to separate out a luma signal (Y′) that can be stored with high resolution or transmitted at high bandwidth, and two chroma components (CB and CR) that can be bandwidth-reduced, subsampled, compressed, or otherwise treated separately for improved system efficiency.”

YUV
“YUV is a color space typically used as part of a color image pipeline. It encodes a color image or video taking human perception into account, allowing reduced bandwidth for chrominance components, thereby typically enabling transmission errors or compression artifacts to be more efficiently masked by the human perception than using a “direct” RGB-representation. Other color spaces have similar properties, and the main reason to implement or investigate properties of Y’UV would be for interfacing with analog or digital television or photographic equipment that conforms to certain Y’UV standards.”

YIQ
“The Y component represents the luma information, and is the only component used by black-and-white television receivers. I and Q represent the chrominance information. In YUV, the U and V components can be thought of as X and Y coordinates within the color space. I and Q can be thought of as a second pair of axes on the same graph, rotated 33°; therefore IQ and UV represent different coordinate systems on the same plane. The YIQ system is intended to take advantage of human color-response characteristics. The eye is more sensitive to changes in the orange-blue (I) range than in the purple-green range (Q) — therefore less bandwidth is required for Q than for I.”

Luminance calculation for embedded vision
Note that luminance is calculated various ways, mathematically derived for each transform as discussed above. For computer vision applications the definition of Y can be flexible:
Y = (R + Gr + Gb + B)/4 (Bayer pattern raw pixel values)
Y = (Gr + Gb)/2 (Bayer pattern raw pixel values)
etc., depending on the capabilities of your embedded system and the needs of your vision application.

Lab
“Unlike the RGB and CMYK color models, Lab color is designed to approximate human vision. It aspires to perceptual uniformity, and its L component closely matches human perception of lightness. It can thus be used to make accurate color balance corrections by modifying output curves in the a and b components, or to adjust the lightness contrast using the L component.”

XYZ
“The human eye has photoreceptors (called cone cells) for medium- and high-brightness color vision, with sensitivity peaks in short (S, 420–440 nm), middle (M, 530–540 nm), and long (L, 560–580 nm) wavelengths (there is also the low-brightness monochromatic “night-vision” receptor, called rod cell, with peak sensitivity at 490-495 nm). Thus, in principle, three parameters describe a color sensation. The tristimulus values of a color are the amounts of three primary colors in a three-component additive color model needed to match that test color. The tristimulus values are most often given in the CIE 1931 color space, in which they are denoted X, Y, and Z. In the CIE XYZ color space, the tristimulus values are not the S, M, and L responses of the human eye, but rather a set of tristimulus values called X, Y, and Z, which are roughly red, green and blue, respectively (note that the X,Y,Z values are not physically observed red, green, blue colors. Rather, they may be thought of as ‘derived’ parameters from the red, green, blue colors).”

____________

Uncommon spaces

HWB
From Alvy Ray Smith, the creator of the HSV color space and a founder of Pixar:

HWB – A More Intuitive Hue-Based Color Model

Alvy Ray Smith and Eric Ray Lyons, the journal of graphics tools, Vol 1, No 1: 3-17, 1996

The two most common color selector models, other than RGB (Red-Green-Blue), are the hue-based HSV (Hue-Saturation-Value) and HSL (Hue-Saturation-Lightness) color models. It is shown that both of these models are flawed. A closely related model, HWB (Hue-Whiteness-Blackness), is introduced that avoids the flaws, is slightly faster to compute, and is very easy to teach to new users: Choose a hue. Lighten it with white. Darken it with black. We explain that lightening is not the opposite of darkening.

C code for the HWB transform.

____________

Next week in Part 2 of Color Space for Embedded Vision, we’ll look at a more recent color space that is perfectly suited to embedded vision systems.

Share:
---
---

{ 3 comments… read them below or add one }

David Banas June 14, 2011 at 12:45 pm

Great primer! Looking forward to more.

Reply

Craig Sullender June 14, 2011 at 2:12 pm

Thanks David!

Vision people sometimes have strong opinions in this area.

I’m getting good suggestions from the LinkedIn groups, approaches I hadn’t thought to try.

Follow ChipSight on Twitter to get updates–I’ll have hardware (4Cam) before too long.

Reply

Martin Thompson June 14, 2011 at 8:29 pm

Good stuff Craig – this is an interesting paper about how the colour *does* change a bit in shadows (and how to then remove them from images):

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.15.4223&rep=rep1&type=pdf

Reply

Leave a Comment

Previous post:

Next post: