Emulated 64-bit floats in OpenGL ES shader

In the previous post the resolution of a Mandelbrot set fractal was greatly increased by generating it in a vertex shader instead of a fragment shader, because full highp 32 bit floats where available instead of the mediump 16-bit half-floats that many mobile devices limit their fragment shaders to. The mantissa precision increased from 10 bits to 23 bits.

2014-9-5_mandelbrot8 2014-9-5_mandelbrot9

At a scale of 10^(-4) the difference is obvious. Earlier I tried emulating a higher precision in the fragment shader inspired by this blog post, but it just wouldn’t work with the half-floats. It worked beautifully in the Android Emulator when I ran it on my desktop, but my desktop always uses full 32-bit floats even for the mediump fragment shader floats, which we can’t expect in a mobile device fragment shader.

Now when we are doing the calculation in the vertex shader instead we have full 32-bit floats on all mobile devices, just like the Android Emulator, so let’s try this again.

Representing a double with two floats

We’ll split the double up into a high and low part and put them in two floats

vec2 df;
df.x = high;
df.y = low

Since we’re using complex numbers that already are split up in real and imaginary components we’ll need four floats to represent a double complex number.

vec4 c;
c.xy = vec2(hihg_real, low_real);
c.zw = vec2(high_imag, low_imag);

We’re now using the full 128 bit vectors that GLSL supports to represent our complex values. We can use the primitive type vec4.

We need methods to split, add and multiply these emulated doubles. See them in this blog post. They will need to be rewritten and optimized for complex numbers. Since GLSL doesn’t have any operator overloading the code ends up looking a bit messy.

attribute vec3 position;
uniform vec4 offset; //vec4(hihg_x, low_x, high_y, low_y)
uniform vec4 scale;
...
vec2 real = add(mul(scale.xy,position.x, offset.xy);
vec2 imag = add(mul(scale.zw,position.y, offset.zw);
vec4 c = vec4(real, imag);
vec4 z = vec4(0.);
...
for(i = 0; z.x*z.x + z.z*z.z < ESCAPE_RADIUS; i++) {
    z = add(mul(z,z), c);
}
...

We can’t interpolate the split attributes, but it never happens because there is one vertex per fragment. In fact I should try turning the interpolation off if possible. In the for-loop condition we only use the high parts of the real and imaginary values of z.

The add() and mul() methods are hiding a big number of multiplications. This code is much slower then the previous shaders, but it does greatly increase the precision.

2014-9-5_mandelbrot10 2014-9-5_mandelbrot11

Zooming in until the vertex method looks blocky at the scale of 10^(-7) and switching to the emulated doubles method shows the huge improvement. What we’re looking at is a small corner of the black dot that can be seen in the lower art of the images at the top of this post. Now we can zoom in all the way to a scale of 10^(-14) before we start seeing blocks. Whatever stopped the emulation from working properly in the fragment shader isn’t a problem in the vertex shader. This is working great!

Results

2014-9-5_mandelbrot13This is a rendering at the scale of 10^(-13). Occasionally there are some linear discontinuities. I don’t know where these are coming from. Maybe from an incorrect splitting of some doubles. That’s something to look into later.

cropped-lcdqtf6faykehdrbur5kd9lxhu3dxwha-etjcm4awbtlehnpvdzhxtlefz45tsu3eqh900-rw-e1412415155844.png

The shaders are working great in the app!
You can see the full source code on GitHub here. The shaders are in the res/raw folder.

 

High precision floats in OpenGL ES shaders

In the previous post I tried emulating higher precision by using two 10-bit precision floats, but couldn’t get it working.

2014-9-5_mandelbrot3 2014-9-5_mandelbrot4

As you can see in the lower part of the screenshots, the precision did increase but not by much. The upper part of the image remains blocky since the shader runs out of memory on my device and only updates part of the image. That’s because I revved up the number of iterations to 1024, but the final app should have that ability so this is another problem that needs to be solved. I also changed the color of the generated image to make the difference more apparent.

Generating a geometry that has one vertex per pixel

Let’s do the calculations in the vertex shader instead of the fragment shader. OpenGL ES guarantees high precision floats in vertex shaders but not in fragment shaders, so generating the fractal there should greatly increase the resolution. Vertex shaders are executed once per polygon vertex and not per screen pixel, so we need to create a geometry that contains exactly one vertex per pixel of the screen and fill the screen with it.

public class Geometry {
    private ShortBuffer indices;
    private FloatBuffer buffer;
    ...
}

private Geometry generatePixelGeometry(int w, int h) {
    ShortBuffer indices = ShortBuffer.allocate(w*h);
    FloatBuffer buffer = FloatBuffer.allocate(3*w*h);

    for(int i = 0; i < h; i++) {
        for(int j = 0; j < w; j++) {
            buffer.put(-1 + (2*j + 1)/(float)w);
            buffer.put(-1 + (2*i + 1)/(float)h);
            buffer.put(0);

            indices.put((short)(i*w+j));
        }
    }
    buffer.flip();
    indices.flip();

    return new Geometry(indices, buffer);
}

This will generate a geometry that contains one vertex in the middle of every pixel on a screen of resolution w x h. OpenGL ES 2 uses short indexing and doesn’t support int indexing for vertices. That’s why we use a ShortBuffer for the indices buffer. This also means that we have to split up our Geometry into smaller pieces. The max value for a short is 32767. The best option is to generate one single geometry that has fewer vertices than that and reuse it several times across the screen using multiple draw calls. Either way we have to make sure that the final scene has one vertex in every pixel of the screen.

The vertex shader

attribute vec3 position;

uniform mat4 modelViewMatrix;
uniform float MAX_ITER;
uniform vec2 scale;
uniform vec2 offset;

varying vec4 rgba;

vec4 color(float value, float radius, float max);

vec2 iter(vec2 z, vec2 c)
{
        // Complex number equivalent of z*z + c
	return vec2(z.x*z.x - z.y*z.y, 2.0*z.x*z.y) + c;
}

void main(void)
{
	vec4 posit = modelViewMatrix * vec4(position, 1.);
	
	vec2 c = 2.*(scale*posit.xy + offset);
	
	vec2 z = vec2(0.);
		
	int i;
	int max = int(MAX_ITER);
	float radius = 0.;
	for( i=0; i<max; i++ ) { 	radius = z.x*z.x + z.y*z.y; 		if( radius > 16.) break;
		z = iter(z,c);
	}
	
	float value = (i == max ? 0.0 : float(i));
	
	rgba = color(value, radius, MAX_ITER);
	
	gl_Position = posit;
}

The vertex shader has to take the modelViewMatrix into account since we are splitting the geometry into multiple pieces and placing it at different positions in a scene. This had to be done because a single big geometry would be too big for short.

The fragment shader

precision mediump float;

varying vec4 rgba;

void main()
{
	gl_FragColor = rgba;
}

The fragment shader is as simple as it can be. It just receives the rgba vector from the vertex shader and draws it to the frame buffer.

Results

2014-9-5_mandelbrot32014-9-5_mandelbrot5

That’s much better!

The fractal renders slightly slower than with the lower precision shader but still really quickly and does 1024 iterations without running out of memory . Let’s see how much we can zoom in now.

2014-9-5_mandelbrot6 2014-9-5_mandelbrot7

The pixels start filling out the screen at a scale of 0.0002, a zoom level of 5000x. Switching to the vertex mode instead shows an amazing difference. On my device highp floats in the vertex shader are normal 32-bit floats and have a precision of 23 bits, while the mediump 16-bit floats in the fragment shader only have a 10-bit precision. That means that the precision is more then doubled. Every pixel in fragment mode becomes in vertex mode bigger than the entire shader was in fragment mode. In fact pixels became apparent first on a scale of 2*10^(-6). See the results for your self in the google play: GPU Mandelbrot.

In the next post I’ll give double emulation one more try.