Brooks ([info]brooksmoses) wrote,
@ 2008-02-12 18:19:00
Previous Entry  Add to memories!  Tell a Friend!  Next Entry
Current mood: astonished

Sometimes modern technology is scary.
I was writing a subroutine to run on one of the coprocessors on the Cell processor, and it occurred to me that I could make things run a bit faster in a particular case by assuming that the floating point product of 0.0 and any bit pattern would always be 0.0 (at least for single-precision numbers).

Now, this isn't usually the case -- there are bit patterns that represent "not a valid number", and on a computer that does math according to the IEEE standard for floating-point arithmetic, the result of multiplying "not a valid number" and 0.0 is supposed to be "not a valid number". However, the single-precision floating-point arithmetic on the Cell's coprocessors are not designed to comply with all the niceties of the IEEE standard for this; instead, like the Cray processors of old, they're designed to go as fast as possible. So it was a reasonable conjecture that they would, in fact, simply return 0.0 for everything.

The question, then, was how to quickly get an answer that I'd trust. Google was one option, but trusting random stuff on the internet is unwise, especially when the risk is introducing a subtle and hard-to-track-down bug.

The naive answer, of course, is to test every possible input. This is the sort of thing that every computer-science freshman knows is absurd and impossible -- the number of possible inputs grows exponentially, and you can get numbers like "every possible position of every electron in the universe since the big bang" without hardly trying.

Even in this exceedingly simple case, there are 232 possible 32-bit patterns that could happen. That's a bit over four billion of them. Four billion grains of sand will overfill a 55-gallon drum. This is not really even a comprehensible number.

...

Except, wait. This is a processor with a clock speed of 3.2 billion cycles per second, and it takes probably a few dozen cycles to test each number. That's ... entirely plausible.

So I tried it.

It turned out to take it about two minutes. They are, indeed, all zero.



(Post a new comment)


[info]falsedrow
2008-02-13 03:53 am UTC (link)
Hooray for Gigahertz!

(Reply to this)


[info]echristo
2008-02-13 06:32 am UTC (link)
haha. Most excellent.

(Reply to this)


[info]ejalbert
2008-02-13 06:54 am UTC (link)
Wow, that's frightening.

(Reply to this)


[info]green_knight
2008-02-13 09:40 am UTC (link)
Wow. The problem is when your first computer had an 8mhz processor, you sometimes forget how powerful these things can be.

\me to my computer yesterday: "You have enough power to run an entire space mission, so will you *please* open that window in a reasonable time?"

(Reply to this)


[info]oldsma
2008-02-13 01:04 pm UTC (link)
I think my Blackberry has more computing power than the roomful of computer they mention in Apollo 13. Sometimes I wonder if me getting to appointments on time is more important than getting men to the moon and back, but then I come to my senses and resume texting about what I'm going to have for lunch.

MAO

(Reply to this)

Inconceivable!
[info]dragon3
2008-02-13 03:55 pm UTC (link)
Even in this exceedingly simple case, there are 232 possible 32-bit patterns that could happen. That's a bit over four billion of them. Four billion grains of sand will overfill a 55-gallon drum. This is not really even a comprehensible number.


It's smaller than the number of people on the earth and similar to the number of bytes on a DVD. Did you store all the results on your hard drive? ;-)

I had a similar epiphany long ago when I realized I could dramatically speed up repeated FFT calculations by storing all the required trig values in a lookup table.

Two minutes suggests somewhere around 100 cycles per test, so there must have been a lot of overhead in there.

(Reply to this)(Thread)

Re: Inconceivable!
[info]flippac
2008-02-14 03:46 am UTC (link)
It may well just have stalled on every test, so not necessarily overhead.

(Reply to this)(Parent)

Re: Inconceivable!
[info]brooksmoses
2008-02-14 04:30 am UTC (link)
Quite a bit of overhead, yeah. Which I expected -- I compiled with optimization off (to make sure it didn't optimize away the multiply-by-zero), which with GCC means really no optimization. And the Cell's coprocessors expect the compiler to take care of niceties like pipelining the instructions and stuff, so that was probably stalling the pipeline for a few cycles on every instruction.

Also, there was some overhead from the fact that my pseudocode was roughly this:
  for(i = 1; i != 0; i++)
    if (!(*(float*)&i == 0.0)) counter1++;
    if (!(0.0 * *(float*)&i == 0.0)) counter2++;
So, it was doing double the work.

From the quick glance I had at the assembly code, I think it was using branches for the tests, too, even though I'd written them as ternary expressions (which the Cell has a "select" instruction for), and those tend to produce significant stalls.

Actual timing results using "time": 33 seconds for -O2 (which turns out not to optimize away the multiply after all), and 4 minutes 58 seconds for -O0 -- I guess I underestimated a bit. Given how things have worked with the real code I've been working on, I expect I could probably get that down to about 5 seconds with some judicious loop unrolling and vectorization -- but not in 5 minutes of programming time!


Edited at 2008-02-14 05:02 am UTC

(Reply to this)(Parent)

Re: Inconceivable!
[info]brooksmoses
2008-02-14 05:32 am UTC (link)
So, unable to resist the curiousity, I tried hand-optimizing things a bit.

A half-hour later, I have a program that will check them in 4-element vectors in an average of 5.4 cycles per vector. The overall program run took 1.728 seconds.

(Reply to this)(Parent)(Thread)

Re: Inconceivable!
[info]dragon3
2008-02-14 07:13 pm UTC (link)
Is that after factoring out the overhead of the clock calls? I think you may have just redefined the word "geek" ;-)

I gave up counting cycles and hand optimizing assembly code back when 12 MHz was fast. Now I spend a lot of time saying "First make it work, then make it fast..."

(Reply to this)(Parent)


[info]cjsmith
2008-02-13 05:33 pm UTC (link)
Wow. That is way cool. And now you have some test code to keep lying around in case you ever need to port to a new coprocessor. :)

(Reply to this)

Re: Inconceivable!
[info]johnpalmer
2008-02-13 05:46 pm UTC (link)
You know, it's interesting... Google is facing something of the same problem. They have this incredibly huge network of incredible computing power, and they keep trying to get people who work for them to think *big*.

I'm going to have to store this little incident away in my own memory as well... it's good to remember that there's a lot of stuff you won't do repeatedly, or as a matter of course, but might be worth the computing power to do it once.

(Reply to this)(Thread)

Re: Inconceivable!
[info]brooksmoses
2008-02-14 04:33 am UTC (link)
Yup. A bit of careful optimization, and a cluster of a couple of hundred playstations, and I could check all the 64-bit doubles in a year.

Which is not a good answer to this particular question, but still entirely feasible if I needed to do it for something similar.

(Reply to this)(Parent)


[info]flippac
2008-02-14 03:47 am UTC (link)
My comp currently has so much RAM the OS doesn't have address space for swap. This amuses me in a similar manner.

(Reply to this)


Create an Account
Forgot your login?
Login w/ OpenID
English • Español • Deutsch • Русский…