I have been looking into writing a shader in pixelbender ever since I saw this post @saltgames. I originally found out about saltgames searching for an algorithm to do tilemapping. I never looked much further, because at the time I was playing with the tilemapping algorithm. Which works like a dream. However, a few months later I stumbled upon this post at whiteflashwhitehit which had a very nice explanation of the shader he wrote and some enhancements. I took this code and modified it to produce this demo. Coincidentally, I had just downloaded free textures from photosculpt.
To get this demo to work, I needed to modify the textures. The diffuse texture was fine, just scaled down so that it could fit on my laptop screen in the demo and to reduce the file size, admittingly this was done near the end of the development of this demo. The normal texture was perfect as well, which I also scaled down to match the diffuse texture. I did this with the occlussion map and specular one as well.
To reduce the inputs to the shader, I decided to combine the occlusion and specular maps, as was done at whiteflashwhitehit. This was not easy however, the occlusion map was one channel, which I could add to the green channel of the new texture, however the specular map was using three channels. I emailed photosculpt to find out more on this, but they still have yet to reply. So instead of waiting for them, I converted the image of the specular map into a greyscale texture and then added that channel to the red of the new texture. Thus I had created the detail texture which contained both the specular and occlusion maps.
I had trouble with the shader initially and broke it down into various steps and created the other shaders, which are included in the demo. This helped me loads in finding bugs with certain aspects of the algorithm and allowed me to compare the differences of the effects in performance.
Over at derschmale I searched for performance enhancements to the pixel bender. What was relevant was the following:
- Use 4 channels only if necessary. No transparency? Ditch it.
- Precalculate recurring constant calculations in Flash and pass them as parameters (such as width*height). Sure, it makes the “interface” of your Kernel potentially harder to read, but since Flash doesn’t support dependents (I hope it will some day), this should be a no-brainer if performance is really important.
So, I changed all the image4 to image3 and output4 to output3. Effectively, discarding the alpha channel. There was a performance increase, but at the cost of all the shaders breaking. In none of the shaders am I looking at the alpha channel and I assumed that I could just discard it. I could not, as it gave really different results on each of the shaders (some never worked, others turned the picture green, others only showed the light downwards). Very very strange. I think it has to do with the samplenearest function fitting into a float3 instead of a float4.
What is highly of relevance though, but too much work for just a demo was the next point. Specular lighting hits the shader code the most. This has to do with the pow function that it uses. Remember, that the shader executes on every pixel of the texture (yes it gets parralised, but still every pixel!). I tried the second recommendation, but alas:
- the distance to the light source and brightness are variable per pixel
- so is the normal calculation, therefore we cannot really externalise these calculations
- the occlusion calculation is pretty fast, the fastest of the bunch, but cannot be externalised
- the specular calculation, which is expensive, can only have the 5.0-specularIntensity converted
There might be some other very clever optimizations that can be done, but I am unaware of them and have not put serious thought into it. I am happy with the results.
When profiling the swf within flashdevelop, the memory never exceeds 30MB. Strangly, when I am not profiling the swf the memory jumps in much larger increments, up to about 120MB of memory. Not sure why this is the case. Could possibly be explained with the double slit experiment (joke!). Larger textures exaggerate this even more. It seems that this is far more stable in chrome and not the desktop debug player.
From my results:
- If I have CPU cycles to waste, then I would apply everything
- The benefit of the specular and occlusion is noticeable, but not considerably so
- Specular mapping could be removed first, then occlusion and even just the normal mapping will give a good result for fewer calculations
- If I was not looking for depth, then I could apply no textures and have a very fast shader to simulate flat light
- The best-case most probably, would be using only the occlusion map, it would be the fastest depth effect of the three, but would need to be modified slightly as to not darken the texture too much