:: Re: [Frei0r] Add fast char-float-ch…
Inizio della pagina
Delete this message
Reply to this message
Autore: Steinar H. Gunderson
Data:  
To: Minimalistic plugin API for video effects
Oggetto: Re: [Frei0r] Add fast char-float-char conversion functions with gamma correction
On Sat, Nov 03, 2012 at 10:39:18AM +0100, Marko Cebokli wrote:
> I have added a bunch of table based functions for fast char/float/char
> conversion, which can also simultaneosly do almost arbitrary gama->linear and
> back conversions.


You didn't send the patch to the list, so I assume it's this one:

http://code.dyne.org/frei0r/commit/?id=e3bda360289bc89ba8a7006227b67ed46dd7fe09

The technique is creative, but this isn't correct:

if (a>0.9999) a=0.9999;
ft[i]=(uint8_t)(256.0*a);

This introduces round-off errors into your tables (cast-to-int truncates,
it does not round), and also a bias since you're multiplying by the wrong
value. You want

ft[i]=lrintf(255.0*a);

No need for the 0.9999 hack. Similarly, this is wrong:

a=((float)i+0.5)/256.0;

and should be replaced with

a=i/255.0;

Second, you seem to have forgotten the return type on float_2_uint8:

static inline float_2_uint8(const float *in, uint8_t *tab)

You probably want to add an “int” here. Also, RGB8_2_float() etc. should
either be declared static inline, or moved to a .c file, or you'll end up
linking them over and over again into your binary.

Third, you say this is “fast”, but have you actually measured it?
The backwards transform it a lookup into a 64 kB table, but your L1 data
cache is typically only 32 kB, so you'll be reading from L2 all the time.
(That's nonwithstanding any penalties for going through memory for the
type-punning hacks.) There's a good reason why I chose a 14-bit table for
colgate instead of a 16-bit table :-)

Anyway, for gamma conversions, using a table is probably OK, but for linear
conversions, you'll be blown out of the sky by the cvtss2si (or even
cvtps2pi) instruction, aka lrintf() (on an -ffast-math system).

> This way, we can have liear color space processing, which was discussed here
> recently, not only at zero cost, but even with some speed gain, as I have
> found that these functions are faster than the usual
>
> out[i].r=f1*(float)*cin++;
> and
> *cout++=(uint8_t)(in[i].r*255.0);


This is a straw man, though, since the second of these is not the actual
fastest way to convert a float to an int (again, see lrintf()).

/* Steinar */
--
Homepage: http://www.sesse.net/