Color Space for Embedded Vision, Part 2

by Craig Sullender on June 23, 2011

Reader’s Input

I received good suggestions from Color Space for Embedded Vision, Part 1.

Max van Rooij: “I can’t help but notice that “embedded” appears to mean “simplistic” as I read your options for computing the gray value. This may give people the wrong impression, e.g. embedded = simplistic approximation, not suited for “real” work. If embedded vision is to succeed, these kind of false impressions should be carefully avoided.”

I agree! I really did rush into the luminance options in Part 1. I’ll take this opportunity to try to gain amnesty for when I make the same mistake again.

But as for embedded vision, it will succeed when it delivers useful vision functions at a cost appropriate to the application. Embedded vision needs to perform well enough to get its assigned job done using the processing resources available.

So when I say “simplify,” I mean “remove complexity that doesn’t help your application.” Many forms of complexity have a cost that affects us directly. On the other hand, simplifying an algorithm can have drawbacks. Altering an algorithm can reduce accuracy and robustness. As a basic test, why not run every algorithm down to its most reduced state? Somewhere in the middle you will find the best fit to your hardware and application.

From the feedback I received, it seems color spaces can be personal preferences. This is a very important point–your comfort with a certain method may be what determines how well you use it to solve different problems.

Luckily when it comes to color spaces, research is sometimes on the side of reduced calculations. Moises Alencastre-Miranda recommended Ohta’s color space derived for best segmentation results “that is better for natural color scenes than RGB, HSI, … “:

In color image processing various kinds of color features can be calculated from the tristimuli R, G, and B. We attempt to derive a set of effective color features by systematic experiments of region segmentation. An Ohlander-type segmentation algorithm by recursive thresholding is employed as a tool for the experiment. At each step of segmenting a region, new color features are calculated for the pixels in that region by the Karhunen Loeve transformation of R, G, and B data. By analyzing more than 100 color features which are thus obtained during segmenting eight kinds of color pictures, we have found that a set of color features, (R + G + B)/3, RB, and (2GRB)/2, are effective. These three features are significant in this order and in many cases a good segmentation can be achieved by using only the first two. The effectiveness of our color feature set is discussed by a comparative study with various other sets of color features which are commonly used in image analysis. The comparison is performed in terms of both the quality of segmentation results and the calculation involved in transforming data of R, G, and B to other forms.

Y. Ohta, Takeo Kanade, and T. Sakai, “Color Information for Region Segmentation,” Computer Graphics and Image Processing, Vol. 13, No. 3, July, 1980, pp. 222 – 241.

So “a good segmentation can be achieved” with just two components: 1) (R + G + B)/3 and 2) R − B.

The work on YCoCg below concentrates on the implementation of a similar (the same?) color space.

Mathieu Bouchard also caught my rough treatment of luminance: “I wouldn’t be surprised if even the quite accurate approximation (2R+5G+B)/8 is faster than (R+G+B)/3 just because /8 is very fast (it’s a triple right-shift).”

Kenny Lindberg takes it case by case, sometimes using a luminance approximation: “… the answer is ‘it depends’ on what you are doing. Most vision is performed in greyscale unless the information you are recognizing/measuring/etc… unless color provides a specific advantage. Greyscale can be approximated by the green channel of color cameras if that’s what you are using.”

Martin Thompson suggested this research on color constancy and shadows:

Illumination conditions cause problems for many computer vision algorithms. In particular, shadows in an image can cause segmentation, tracking, or recognition algorithms to fail. In this paper we propose a method to process a 3-band colour image to locate, and subsequently remove shadows. The result is a 3-band colour image which contains all the original salient information in the image, except that the shadows are gone.

Graham D. Finlayson, Steven D. Hordley, Mark S. Drew, England Nr Tj, Removing Shadows from Images, ECCV 2002: European Conference on Computer Vision, 2002, pp. 823 – 836.

Here is a slide show on the topic using log-opponent chromaticities.

David Ing’s comments offer a clue to the similarities between color spaces: “If you collect a sample of images (with your camera) from a variety of environments, you will notice that (R,G,B) values are highly correlated with each other, driven primarily by the principal component of radiance (~ grayscale luminance). The principal components of your sample will yield a latent color-opponent space (tailored specifically for your camera). The latent space will have a (1) luminance component, (2) yellow-blue component, and (3) red-green component. Such color-spaces are often called “color-opponent spaces” because yellow opposes blue, red opposes green.” Ruderman DL, Cronin TW, Chiao C (1998) Statistics of cone responses to natural images: implications for visual coding. Journal of the Optical Society of America A 15:2036-2045.

Alexander Nedzved summarized: HSV-HLS-HSI – for color understanding; Lab-LCH-O1O2O3 – for human perception; RGB-YUV-r*g*b*-XYZ-……. – for computer or other electronic devices (processing, archivation …..).” He also started a discussion on linearity which we will get to in a later installment of Color Space for Embedded Vision.

Ok back to work…

Simplify

Here is a transform calculated directly with adds and shifts. The YCoCg color space was derived by finding the transform of RGB that gives maximum decorrelation.

YCoCg

Lifting-based reversible color transformations for image compression
Henrique S. Malvar, Gary J. Sullivan, and Sridhar Srinivasan

This paper reviews a set of color spaces that allow reversible mapping between red-green-blue and luma-chroma representations in integer arithmetic. The YCoCg transform and its reversible form YCoCg-R can improve coding gain by over 0.5 dB with respect to the popular YCrCb transform, while achieving much lower computational complexity.

From a compression standpoint, the best color space is the one in which the components are uncorrelated. For a rich enough data set, we can compute the 3×3 inter-color correlation matrix, and from it compute the Karhunen-Loève transform (KLT, which is closely related to PCA), which provides maximum decorrelation.

…. From it we can make a couple of observations: first, the KLT preserves the luma-chroma separation, but the luma Y is computed as simply the average of the R, G, and B values; second, one of the color channels is proportional to R – B (which is proportional to Cr – Cb), and the other proportional to G – (R + B) / 2, that is, the difference between G and the average of R and B. The form of the inverse color mapping in (3) inspires the question: can the inverse transform be made computationally even simpler? One easy positive answer to that can be obtained by simply rounding all entries of the original inverse KLT transform to the nearest integer. That leads to the YCoCg color space definition.

YCoCg-R: A Color Space with RGB Reversibility and Low Dynamic Range
Henrique Malvar and Gary Sullivan

In the same way that the use of transforms improves PSNR in macroblock coding, a transformation applied to RGB can also improve overall PSNR. If we map the original RGB channels into one luma and two chroma channels, the chroma components will require fewer bits, so the overall compression will be better (more than 3dB improvement in PSNR for simple non-predicted intra coding) than encoding directly in the RGB space.

Color spaces such as YCrCb provide good decorrelation, but even better results can be obtained with the YCoCg (luminance + offset orange + offset green).

The direct YCoCg color space transform is defined by:

And the inverse YCoCg color space transform is:

Thus, the encoder needs only additions and shifts to convert to YCoCg, and the decoder needs just 4 additions per pixel to convert back to RGB, using:

G = Y + Cg;
tmp = Y – Cg;
R = tmp + Co;
B = tmp – Co.

____________

Apply

Let’s go into the YCoCg calculations in more detail, with examples of variations that might be suited to different applications:

To maintain precision, YCoCg, 10:10:10 bit encoding:
Y = 2G + (R + B)
Co = 2(R – B)
Cg = 2G – (R + B)

R = (Y + Co – Cg)/4
G = (Y + Cg)/4
B = (Y – Co – Cg)/4

Adjust for YCoCg 8:8:8 bit encoding:
Y = (2G + (R + B)) >> 2
Co = (R – B) >> 1
Cg = (2G – (R + B)) >> 2

R = Y + Co – Cg
G = Y + Cg
B = Y – Co – Cg

Or a mixture for carrying more bits in Y and less in color–

RGB 8:8:8 bits to YCoCg 10:8:8 bits:
10 bits: Y = 2G + (R + B)
8 bits: Co = (R – B) >> 1
8 bits: Cg = (2G – (R + B)) >> 2

R = (Y + 4Co – 4Cg) >> 2
G = (Y + 4Cg) >> 2
B = (Y – 4Co – 4Cg) >> 2

RGB 8:8:8 bits to YCoCg 10:6:6 bits:
10 bits: Y = 2G + (R + B)
6 bits: Co = (R – B) >> 3
6 bits: Cg = (2G – (R + B)) >> 4

R = (Y + 16Co – 16Cg) >> 2
G = (Y + 16Cg) >> 2
B = (Y – 16Co – 16Cg) >> 2

Note that the 2G component works well with Bayer filter data where
2G = G(red row) + G(blue row)
Y = 2G + (R + B) = R + Gr + Gb + B
uses all four pixels in the Bayer pattern.

If you don’t need the transform to be reversible, then the bit-width of Y does not need to be linked to the bit-widths of the chrominance components. For example:
Y’ = 2G + (R + B)    (10 bit Y)
Co’ = (R – B)    (9 bit chrominance)

What color thresholding techniques do you use?

What distance metrics do you use, e.g. for full-color segmentation?

James Coggins notes: “I have usually found HSV to be most effective. If the circularity of the Hue channel bothers you, use cosH, sinH, S, V. That linearizes H in a way that allows an ordinary distance metric like the L2 norm to work in an intuitively and mathematically satisfactory way.”

____________

How well does YCoCg work in practice?

How do you test your color space?

In Part 3 of Color Space for Embedded Vision, we’ll look at other color space considerations and easy ways to compare color space performance.

Share:
---
---

{ 6 comments… read them below or add one }

David Ing June 27, 2011 at 1:41 pm

Craig… Great article. There are certainly many points of view out there. For your application (on-the-fly segmentation), I agree that you should pick the colorspace that yields the simplest calculations.

Thus, in order to derive your color-space, you should study the problem that you’re trying to solve. For example, if doing edge detection, perhaps one color dimension is particularly sensitive to surface boundaries and insensitive to shadow boundaries or texture boundaries.

If you work with machine learning tools (i.e. classifiers) that yield equivalent performance for all affine transformations in a feature-space, and if you pick your features carefully, then the actual choice of colorspace will not matter much. The MoRPE classifier has this property of affine invariance. (http://www.inferentialpower.com/MoRPE.htm)

Reply

Craig Sullender June 29, 2011 at 3:14 am

Thanks David!

I want an MoRPE tshirt. Medium, black. “Putting things where they belong since 2011.”

Reply

Max van Rooij June 28, 2011 at 10:14 pm

Hi Craig,

Being around for a while in the computer vision business, I find it amusing to see near religious wars are being waged over what color space is best. People tend to forget that humans “see color”, not on an absolute scale, but relative, for one purpose only: segmentation. Separating interesting parts from non interesting as fast as possible. Same is true for 3D vision. 2D segmentation, especially in cluttered environments, becomes easier when extra clues such as color or depth are available. So it all boils down to a rather simple question: what makes it stand out from the rest? Careful selection of lighting makes it possible to judge color very precisely _without_ the use of a color camera at all (let that one sink in for a while). Often, very good results are achieved by employing opponent calculations between color channels (typically R,G,B). This turns a color image into a set of binary images which are easy to process further on with blob analyses, etc. So why do you use a color space? Because you expect that your users expect you to, or because you really need one for the problem at hand?

Regards,

Max

Reply

Craig Sullender June 29, 2011 at 3:17 am

Thanks for inspiring me on Part 2!

Reply

Tom Hanan June 29, 2011 at 4:06 am

Max you devil…

You mean you can actually use the color pixels to triple the resolution of the camera?

It actually works quite well!
Get lower resolution color from black and white or higher resolution black and white from color…

Reply

Tom Hanan June 29, 2011 at 4:02 am

I personally am a fan of nature and i have yet to see nature do anything in a complicated manner. The only time it is complicated is when I have not yet identified the true elegance nature has used to solve a problem.

two of our most complex senses, Vision & hearing are especially this way. Subtle structures in they eye & ear dramatically simplify the processing required by the optic and acoustic regions(s) of our brain.

Our biggest problem to date is trying to force our digital processors to treat everything in the optical and acoustic space with equal priority and resolution. That forces our digital systems to use a couple of orders of magnitude more processing to achieve the same results.

Craig’s simplicity seems appropriately focused on simplifying the problem BEFORE we hand it over to the expensive and power hungry DSP clusters.

-Tom Hanan
If you can’t keep it real,
Don’t do the deal!

Reply

Leave a Comment

Previous post:

Next post: