What can I say... I'm a bit of a nerd, and I guess I missed doing this stuff.
/********************
* render_point.cpp *
********************
(1) Christopher Casey
(2) CS180, Spring 2010
(3) Assignment #1
(4) Due Wednesday, January 27
SUMMARY: An x86 FPU Assembly implementation of Mandelbrot Set
calculations.
OBJECTIVE: Use inline assembler to provide a speed increase
over the following reference code:
//jsh 1/08
double a = 0.0, b = 0.0, norm2 = 0.0;
for (n = 0; norm2 < 4.0 && n < N; ++n) {
double c = a*a - b*b + x;
b = 2.0*a*b + y;
a = c;
norm2 = a*a + b*b;
}
SOLUTION NOTES:
I initially used FPU assembler to convert the body of the
loop. This saved on a few calls to the `fld` instruction as
items were copied from system memory to the FPU stack. On my
second iteration of optimization, I wrote the loop condition
in asm as well. In addition to saving a few more `fld` calls,
this allowed me to re-use the `a*a` and `b*b` values
calculated for norm2.
I've also added code that tests to see if the point is in the
cardioid or the period-2 bulb. I found a write-up about these
optimizations and decided to try implementing them. They are
available via pre-processor directives so they can be turned
off, as they only increase performance when in the default
view. These optimizations were written in C for legibility
(and because they aren't the inner-most loop). The speed
increase realized in the default view is enormous -- the
scene is rendered in about 1/5 the time.
Optimization can still be made to the loop: like the reference
code, it iterates once more than necessary when reaching max
iterations. However, since we default to 5120 iterations this
would provide a hardly noticable increase. One could also
optimize the return value by providing a look-up table rather
than performing integer-to-double-to-integer conversions to
convert `n` to a COLORREF type. Doing this would require
explicit knowledge of the value of `N`.
KNOWN ISSUES:
My solution returns a slightly different set of values for the
points than the reference code. The discrepancy is due to two
different issues. First, I believe that a minor bug exists in
the reference code. The code tests for `norm2 < 4.0`, but
typical Mandelbrot Set implementations test for `norm2 <= 4.0`.
My code corrects this.
The second, and more broad issue is that because my floating
point operations occur in a different order than the ones
generated by the reference code (and I add a+a instead of
multiplying a*2), the floating point error produces slightly
different results.
The error is slight, and I don't feel either issue requires
further investigation at this time.
BENCHMARK:
Tested on an Intel i7 920 Windows 7 64-bit, mandelbrot.cpp as driver, max 5120 iterations
Default View (lower_left_x = -1.5f; lower_left_y = -1.0f; scale = 2.0f/length;):
Reference code [/Od]: 9.38 seconds
Reference code [/O2]: 6.77 seconds
(40% speed increase over /Od)
Reference code [/Ox]: 6.53 seconds
(40% speed increase over /Od)
__asm [/Od]: 4.60 seconds
(100% speed increase over reference + /Od,
or 45% increase over /O2)
__asm (+optimizations) [/Od]: 0.80 seconds
(1000% speed increase over reference + /Od)
Alternate View (lower_left_x = -0.25; lower_left_y = 0.775; scale = 0.25/length;):
Reference code [/Od]: 7.03 seconds
Reference code [/O2]: 5.71 seconds
(20% speed increase over /Od)
Reference code [/Ox]: 5.70 seconds
(20% speed increase over /Od)
__asm [/Od]: 2.40 seconds
(190% speed increase over reference + /Od,
or 130% increase over /O2)
__asm (+optimizations) [/Od]: 2.40 seconds
(190% speed increase over reference + /Od)
REFERENCES:
Assignment details:
https://faculty.digipen.edu/~jhanson/protected/cs180assignment1.pdf
https://faculty.digipen.edu/~jhanson/protected/render_point.h
https://faculty.digipen.edu/~jhanson/protected/mandelbrot.cpp
https://faculty.digipen.edu/~jhanson/protected/render_point0.cpp
Optimization notes:
http://www.nitorijournal.org/?p=517
(Cardioid and period-2 Bulb detection)
Additional notes: (researched, but not used in this project)
http://users.softlab.ece.ntua.gr/~ttsiod/mandelSSE.html
(CUDA and SSE-based mandelbrots)
http://www.ozone3d.net/tutorials/mandelbrot_set_p4.php
(General Purpose GPU Stream processing mandelbrots)
*/