I'm working on creating a 3D landscape. So far I've got a mesh created with vertices and faces, and it all looks decent. I've applied a single texture to it, but want to have multiple textures perhaps based on the height. I'm thinking I need a shader to do this.
I'm new to shaders but so far I've followed this tutorial and have two textures blended together. However, I want to fade one texture out completely at certain heights (or position) and have the other completley show.
I'm really not sure how to approach this using a shader. Could someone give some advice on how to start?
For argument's sake, suppose your source geometry runs from y=0 to y=1 and you want the second texture to be completely transparent at y=0 and completely opaque at y=1. Then you could add an addition varying, named secondTextureAlpha or something like that. Load it with your pre-transformation y values in the shader, then combine the incoming values from the source textures either so that the second is multiplies by secondTextureAlpha, if you want to stick with additive blending, or via the mix function for multiplicative.
So e.g. the fragement shader might end up looking like:
varying vec3 lightDir,normal;
varying lowp float secondTextureAlpha;
uniform sampler2D tex,l3d;
void main()
{
vec3 ct,cf;
vec4 texel;
float intensity,at,af;
intensity = max(dot(lightDir,normalize(normal)),0.0);
cf = intensity * (gl_FrontMaterial.diffuse).rgb +
gl_FrontMaterial.ambient.rgb;
af = gl_FrontMaterial.diffuse.a;
texel = mix(texture2D(tex,gl_TexCoord[0].st),
texture2D(l3d,gl_TexCoord[0].st), secondTextureAlpha);
ct = texel.rgb;
at = texel.a;
gl_FragColor = vec4(ct * cf, at * af);
}
To achieve a more complicated mapping, just adjust how you load secondTextureAlpha in your vertex shader, or take it as an input attribute maybe.
Related
I'm currently trying to change my asset style from realistic to low poly / cartoonic.
For example, I have a toon surface shader
half4 LightingRamp(SurfaceOutput s, half3 lightDir, half atten) {
half NdotL = dot(s.Normal, lightDir);
half diff = NdotL * 0.5 + 0.5;
half3 ramp = tex2D(_LightingTex, float2(diff, 0)).rgb;
half4 c;
c.rgb = _LightColor0.rgb * atten * ramp *_Color;
c.a = s.Alpha;
return c;
}
where _LightingTex is a 2D texture ramp. This works fine for lighting effects on the objects themselfs.
As I have multiple objects with this shader in my scene, some of them are casting a shadow onto my wall.
As you can see, the shadow here is not a ramp but a continuous gradient, as it is (probably) done in some sort of ambient from unity. My question is now: is there an option to create this colorramp effect on the global shadows as well? Something like this:
Can I do it material shader based, or is it a post processing effect?
Thanks
With surface shaders: No, you can't do it in the shader. Actually, I think the best way to get a unified cartoon effect is to use a color grading LUT as a post effect. The great thing about LUTs is that you can create one easily in photoshop by first applying some cool effects to a regular image until it looks the way you want (such as "Posterize"), and then copy the effect stack to apply to a LUT texture, like this one. When you use this LUT in Unity, everything will look as they would with your Photoshop filters applied. One small caveat I've noticed though is that some standard LUT textures need to be flipped vertically to work with the Post Processing Stack. Here is a nice tutorial on how to create posterized LUTs.
If you want to get the toon-like shadows directly in the shader, it is not any harder than making a regular forward rendered vertex/fragment shader, though this by itself requires a bit of knowledge on how these work - i can recommend looking at the standard shader source code, this, or this (somewhat outdated) tutorial. You can find the details surrounding how to add shadow support from my post here. The only thing you need to change is to add a similar color ramp to the shadow mask:
half shadow = SHADOW_ATTENUATION(IN)
shadow = tex2D(_ShadowRamp, float2(shadow, 0));
For this, you can set the shadow ramp as a global shader variable from script, so you won't have to assign it for each material.
This probably is a dumb question, but I am stuck with this for a while so going to ask it anyway.
I'm trying to implement a Hudson/Nashville filter on a pet project. I googled a little and checkout out a few open-source projects and found some Objective-C (which I don't understand) based projects. They do have the filters implemented using GPUImage2, but I wasn't sure about their approach.
I have the overlay and other images that they have used and the GLSL files.
So my question is how do I go about using this images and shader files to implement a custom filter?
Note: I tried using the LookupFilter approach as suggested, but the result wasn't so good. It would be super helpful if you can show me some code. Thanks
Update:
What I am trying to understand this. Given a custom shader like the one below, how am I supposed to pass the input images for uniform inputImageTexture2, inputImageTexture3 & inputImageTexture4. Do I pass it as a PictureInput to BasicOperation by subclassing it? If so, how? What am I missing? I haven't been able to walk through the code much because of the lack of a proper documentation. I have read up on shaders and its different components now, but still not able to figure out a way to work with custom filters on GPUImage2. Please help.
precision highp float;
varying highp vec2 textureCoordinate;
uniform sampler2D inputImageTexture;
uniform sampler2D inputImageTexture2; //blowout;
uniform sampler2D inputImageTexture3; //overlay;
uniform sampler2D inputImageTexture4; //map
uniform float strength;
void main()
{
vec4 originColor = texture2D(inputImageTexture, textureCoordinate);
vec4 texel = texture2D(inputImageTexture, textureCoordinate);
vec3 bbTexel = texture2D(inputImageTexture2, textureCoordinate).rgb;
texel.r = texture2D(inputImageTexture3, vec2(bbTexel.r, texel.r)).r;
texel.g = texture2D(inputImageTexture3, vec2(bbTexel.g, texel.g)).g;
texel.b = texture2D(inputImageTexture3, vec2(bbTexel.b, texel.b)).b;
vec4 mapped;
mapped.r = texture2D(inputImageTexture4, vec2(texel.r, .16666)).r;
mapped.g = texture2D(inputImageTexture4, vec2(texel.g, .5)).g;
mapped.b = texture2D(inputImageTexture4, vec2(texel.b, .83333)).b;
mapped.a = 1.0;
mapped.rgb = mix(originColor.rgb, mapped.rgb, strength);
gl_FragColor = mapped;
}
The GPUImage convention is that the first input texture to a shader is called inputTextureCoordinate, the second inputTextureCoordinate2, and so on. In the original Objective-C version of GPUImage, you can to manually subclass a filter type that matched the number of input textures in your shader.
In the Swift GPUImage 2, I've made it so that you just need to use a BasicOperation class or subclass, and it automatically attaches textures to the number of inputs needed for your shader. You do this by initializing a BasicOperation and setting the number of inputs:
let myOperation = BasicOperation(fragmentShaderFile:myFragmentShader, numberOfInputs:4)
The above sets numberOfInputs to 4, matching your above shader. By leaving the vertexShaderFile argument as nil (the default), the BasicOperation will pick an appropriate simple vertex shader with four texture inputs.
All that you need to do then is set your inputs to that filter like you would any other, by adding your new BasicOperation as a target of an image source. The order in which you attach the inputs matters, because that will start with the first texture in your shader and progress on down.
In most cases, BasicOperation is flexible enough as-is, so you won't need to subclass. At most, you might need to provide a custom vertex shader, but that's not needed for the fragment shader code above.
As u can see in the image, on larger tiles (n > 1) the texture should be repeated as long as the current rect size.. i don't know how i can achieve this!
FYI, im getting the tile texture id with the alpha value of the vertex color.
Here the shader im using..
[UPDATE]
Thanks for clarifying the uv coordinates, unfortunately that doesn't answer my question. Take a look at the following pixture...
Your shader is fine; it's actually the vertex UVs that are the problem:
So for all rectangles the uv coordinates are as following [0, 0] / [0, rect.height] / [rect.width, 0] / [rect.width, rect.height]. So the uvs are going beyond 1
Your shader is designed to support the standard UV space, in which case you should replace rect.width and rect.height with 1.
By using UV coords greater than one, you're effectively asking for texels outside of the specified texture. When used with a texture atlas, that means you're asking for texels outside of the specified tile -- in this case, those happen to be white, and that's what you're seeing in the rendered output.
Tiling with an atlas texture
Updating because I missed an important detail: you want a tiling material.
Usually, UVs interpolate linearly:
For tiling, you essentially want more of a "sawtooth" output:
For a non-atlas texture, you can adjust material scale/wrap settings and call it done. For an atlas texture, it's possible but you'll end up with a shader and/or geometry that aren't quite standard.
The "most standard" solution would be if your larger quads are on a separate mesh from the smaller ones:
Add a float material param named uv_scale or some such
Add a Multiply node that scales incoming UVs by uv_scale
Pass output from that into a Frac node
Pass output from that into the UV Tile node
Pseudocode is roughly: uv = frac(uv * uv_scale)
If you need all of your quads to be in the same mesh, you end up needing non-standard geometry:
Change your UVs again (going back to rect.width and rect.height)
Add a Frac node before the UV Tile node
This is a simpler shader change, but has the downside that your geometry will no longer be cleanly supported in other shaders.
Thanks rutter!
i've implemented your solution into my shader and now it works perfectly!
so for everyone looking for this here is the shader im using now
Cheers, M
I'm making a space exploration game in Unity and I'm having two problems with semi-transparency.
Each planet is made up of two spheres: One is the combined surface and cloud layer, the other (with a slightly larger radius) depicts the horizon 'glow' by culling front faces and fading alpha toward the outer edge of the sphere. This is MOSTLY working fine, but with the following two problems:
1) In my custom surface shader, when I use the alpha keyword in the #pragma definition, alpha is factored into the rendered sphere, but the 'glow' sphere disappears at a distance of a few thousand units. If I DON'T include the alpha keyword, the sphere does not fade toward the edge, but it renders at distance.
2) Despite trying all RenderType, Queue, ZWrite and ZDepth options, the surface sphere and 'glow' sphere are z-fighting; the game can't seem to decide which polygons are nearer - despite the fact near faces on the glow sphere should be culled. I have even tried pushing the glow sphere away from the player camera and expanding its radius by the same proportion, but I'm STILL, inexplicably, getting the z-fighting between the spheres!
Is there any setting that I'm missing that will enable the 'glow' sphere to be always drawn BEHIND the surface sphere (given that I have tried ALL combinations of ZWrite, ZDepth as detailed above) and is there a way to have an alpha-enabled object NOT disappear at distance?
I cannot seem to figure this out, so any help will be well appreciated!
EDIT
Here's the shader code for my 'glow sphere'. Front faces are culled. I've even tried the Offset keyword to 'push' any drawn polygons further from camera. And I've tried all of the Tag, ZWrite and ZTest options I've been able to find. The shader gets passed a tint Color, an atmosphere density float and a sun direction vector...
Shader "Custom/planet glow" {
Properties {
_glowTint ("Glow Tint", Color) = (0.5,0.8,1,1)
_atmosphereMix ("Atmosphere Mix", float) = 0
_sunDirection ("Sun Direction", Vector) = (0, 0, 0, 0)
}
SubShader {
Tags { "RenderType" = "Opaque" "Queue" = "Geometry" }
Cull Front // I want only the far faces to render (behind the actual planet surface)
Offset 10000, 10000
ZWrite On // Off also tried
ZTest LEqual // I have tried various other options here, incombination with changing this setting in the planet surface shader
CGPROGRAM
#pragma surface surf Lambert alpha
#pragma target 4.0
struct Input {
float3 viewDir;
};
fixed4 _glowTint;
float _atmosphereMix;
float4 _sunDirection;
void surf (Input IN, inout SurfaceOutput o) {
_sunDirection = normalize(_sunDirection);
o.Albedo = _glowTint;
float cameraNormalDP = saturate(dot( normalize(IN.viewDir), -o.Normal ) * 4.5);
float sunNormalDP = saturate(dot( normalize(-_sunDirection), -o.Normal ) * 2);
o.Alpha = _atmosphereMix * sunNormalDP * (cameraNormalDP * cameraNormalDP * cameraNormalDP); // makes the edge fade 'faster'
o.Emission = _glowTint;
}
ENDCG
}
FallBack "Diffuse"
}
Have you thought about rendering large objects in a different scale in another camera to create dynamic skybox? That will certainly resolve z-fighting issues.
You can have, for example, two cameras - one that renders objects in range 0.1-1000, and other, that ranges from 1000 to 100000.
The additional optimization can include rendering environment from afar to the cube skybox and do that not every frame (except maybe special occasions when you destroy a planet from afar).
There is also another optimization concern - you could render a flat ring around the planet, rotated to face the camera, to avoid overdraw on the actual planet surface at all. But that will require more complex lighting calculations, apparently.
Also, have you tried your transparent shader on a camera that does not have Skybox as clear flags? Check this answer on how to use custom skybox.
I seem to have stumbled upon a solution to this problem while trying to resolve world-space 'jittering'...
The game I'm developing uses big distances. I changed the way long-distance objects are positioned (the distances get contracted the further an object is from camera) and that has resolved the z-fighting and the alpha vanishing.
A whole solar system fits into something like 300000 kilometres by using the square root of the distance divided by the distance and multiplying the relative Vector3 and scale of the object by that. Hopefully that info might be useful to someone.
If you are using alpha then you need to change the Tags to render the alpha in the transparency pass. Also turn off ZWrite. And remove the Offset.
I tested your shader in an empty project with both Forward and Deferred rendering. It worked fine with these adjustments.
All opaque geometry is drawn first, then the alpha pass is done and renders all objects with transparency on TOP of all of the objects in the scene. It needs to do this otherwise it won't be able to blend the colors for the alpha.
Tags { "RenderType" = "Transparent" "Queue" = "Transparent" }
ZWrite Off
I have an open source iOS application that uses custom OpenGL ES 2.0 shaders to display 3-D representations of molecular structures. It does this by using procedurally generated sphere and cylinder impostors drawn over rectangles, instead of these same shapes built using lots of vertices. The downside to this approach is that the depth values for each fragment of these impostor objects needs to be calculated in a fragment shader, to be used when objects overlap.
Unfortunately, OpenGL ES 2.0 does not let you write to gl_FragDepth, so I've needed to output these values to a custom depth texture. I do a pass over my scene using a framebuffer object (FBO), only rendering out a color that corresponds to a depth value, with the results being stored into a texture. This texture is then loaded into the second half of my rendering process, where the actual screen image is generated. If a fragment at that stage is at the depth level stored in the depth texture for that point on the screen, it is displayed. If not, it is tossed. More about the process, including diagrams, can be found in my post here.
The generation of this depth texture is a bottleneck in my rendering process and I'm looking for a way to make it faster. It seems slower than it should be, but I can't figure out why. In order to achieve the proper generation of this depth texture, GL_DEPTH_TEST is disabled, GL_BLEND is enabled with glBlendFunc(GL_ONE, GL_ONE), and glBlendEquation() is set to GL_MIN_EXT. I know that a scene output in this manner isn't the fastest on a tile-based deferred renderer like the PowerVR series in iOS devices, but I can't think of a better way to do this.
My depth fragment shader for spheres (the most common display element) looks to be at the heart of this bottleneck (Renderer Utilization in Instruments is pegged at 99%, indicating that I'm limited by fragment processing). It currently looks like the following:
precision mediump float;
varying mediump vec2 impostorSpaceCoordinate;
varying mediump float normalizedDepth;
varying mediump float adjustedSphereRadius;
const vec3 stepValues = vec3(2.0, 1.0, 0.0);
const float scaleDownFactor = 1.0 / 255.0;
void main()
{
float distanceFromCenter = length(impostorSpaceCoordinate);
if (distanceFromCenter > 1.0)
{
gl_FragColor = vec4(1.0);
}
else
{
float calculatedDepth = sqrt(1.0 - distanceFromCenter * distanceFromCenter);
mediump float currentDepthValue = normalizedDepth - adjustedSphereRadius * calculatedDepth;
// Inlined color encoding for the depth values
float ceiledValue = ceil(currentDepthValue * 765.0);
vec3 intDepthValue = (vec3(ceiledValue) * scaleDownFactor) - stepValues;
gl_FragColor = vec4(intDepthValue, 1.0);
}
}
On an iPad 1, this takes 35 - 68 ms to render a frame of a DNA spacefilling model using a passthrough shader for display (18 to 35 ms on iPhone 4). According to the PowerVR PVRUniSCo compiler (part of their SDK), this shader uses 11 GPU cycles at best, 16 cycles at worst. I'm aware that you're advised not to use branching in a shader, but in this case that led to better performance than otherwise.
When I simplify it to
precision mediump float;
varying mediump vec2 impostorSpaceCoordinate;
varying mediump float normalizedDepth;
varying mediump float adjustedSphereRadius;
void main()
{
gl_FragColor = vec4(adjustedSphereRadius * normalizedDepth * (impostorSpaceCoordinate + 1.0) / 2.0, normalizedDepth, 1.0);
}
it takes 18 - 35 ms on iPad 1, but only 1.7 - 2.4 ms on iPhone 4. The estimated GPU cycle count for this shader is 8 cycles. The change in render time based on cycle count doesn't seem linear.
Finally, if I just output a constant color:
precision mediump float;
void main()
{
gl_FragColor = vec4(0.5, 0.5, 0.5, 1.0);
}
the rendering time drops to 1.1 - 2.3 ms on iPad 1 (1.3 ms on iPhone 4).
The nonlinear scaling in rendering time and sudden change between iPad and iPhone 4 for the second shader makes me think that there's something I'm missing here. A full source project containing these three shader variants (look in the SphereDepth.fsh file and comment out the appropriate sections) and a test model can be downloaded from here, if you wish to try this out yourself.
If you've read this far, my question is: based on this profiling information, how can I improve the rendering performance of my custom depth shader on iOS devices?
Based on the recommendations by Tommy, Pivot, and rotoglup, I've implemented some optimizations which have led to a doubling of the rendering speed for the both the depth texture generation and the overall rendering pipeline in the application.
First, I re-enabled the precalculated sphere depth and lighting texture that I'd used before with little effect, only now I use proper lowp precision values when handling the colors and other values from that texture. This combination, along with proper mipmapping for the texture, seems to yield a ~10% performance boost.
More importantly, I now do a pass before rendering both my depth texture and the final raytraced impostors where I lay down some opaque geometry to block pixels that would never be rendered. To do this, I enable depth testing and then draw out the squares that make up the objects in my scene, shrunken by sqrt(2) / 2, with a simple opaque shader. This will create inset squares covering area known to be opaque in a represented sphere.
I then disable depth writes using glDepthMask(GL_FALSE) and render the square sphere impostor at a location closer to the user by one radius. This allows the tile-based deferred rendering hardware in the iOS devices to efficiently strip out fragments that would never appear onscreen under any conditions, yet still give smooth intersections between the visible sphere impostors based on per-pixel depth values. This is depicted in my crude illustration below:
In this example, the opaque blocking squares for the top two impostors do not prevent any of the fragments from those visible objects from being rendered, yet they block a chunk of the fragments from the lowest impostor. The frontmost impostors can then use per-pixel tests to generate a smooth intersection, while many of the pixels from the rear impostor don't waste GPU cycles by being rendered.
I hadn't thought to disable depth writes, yet leave on depth testing when doing the last rendering stage. This is the key to preventing the impostors from simply stacking on one another, yet still using some of the hardware optimizations within the PowerVR GPUs.
In my benchmarks, rendering the test model I used above yields times of 18 - 35 ms per frame, as compared to the 35 - 68 ms I was getting previously, a near doubling in rendering speed. Applying this same opaque geometry pre-rendering to the raytracing pass yields a doubling in overall rendering performance.
Oddly, when I tried to refine this further by using inset and circumscribed octagons, which should cover ~17% fewer pixels when drawn, and be more efficient with blocking fragments, performance was actually worse than when using simple squares for this. Tiler utilization was still less than 60% in the worst case, so maybe the larger geometry was resulting in more cache misses.
EDIT (5/31/2011):
Based on Pivot's suggestion, I created inscribed and circumscribed octagons to use instead of my rectangles, only I followed the recommendations here for optimizing triangles for rasterization. In previous testing, octagons yielded worse performance than squares, despite removing many unnecessary fragments and letting you block covered fragments more efficiently. By adjusting the triangle drawing as follows:
I was able to reduce overall rendering time by an average of 14% on top of the above-described optimizations by switching to octagons from squares. The depth texture is now generated in 19 ms, with occasional dips to 2 ms and spikes to 35 ms.
EDIT 2 (5/31/2011):
I've revisited Tommy's idea of using the step function, now that I have fewer fragments to discard due to the octagons. This, combined with a depth lookup texture for the sphere, now leads to a 2 ms average rendering time on the iPad 1 for the depth texture generation for my test model. I consider that to be about as good as I could hope for in this rendering case, and a giant improvement from where I started. For posterity, here is the depth shader I'm now using:
precision mediump float;
varying mediump vec2 impostorSpaceCoordinate;
varying mediump float normalizedDepth;
varying mediump float adjustedSphereRadius;
varying mediump vec2 depthLookupCoordinate;
uniform lowp sampler2D sphereDepthMap;
const lowp vec3 stepValues = vec3(2.0, 1.0, 0.0);
void main()
{
lowp vec2 precalculatedDepthAndAlpha = texture2D(sphereDepthMap, depthLookupCoordinate).ra;
float inCircleMultiplier = step(0.5, precalculatedDepthAndAlpha.g);
float currentDepthValue = normalizedDepth + adjustedSphereRadius - adjustedSphereRadius * precalculatedDepthAndAlpha.r;
// Inlined color encoding for the depth values
currentDepthValue = currentDepthValue * 3.0;
lowp vec3 intDepthValue = vec3(currentDepthValue) - stepValues;
gl_FragColor = vec4(1.0 - inCircleMultiplier) + vec4(intDepthValue, inCircleMultiplier);
}
I've updated the testing sample here, if you wish to see this new approach in action as compared to what I was doing initially.
I'm still open to other suggestions, but this is a huge step forward for this application.
On the desktop, it was the case on many early programmable devices that while they could process 8 or 16 or whatever fragments simultaneously, they effectively had only one program counter for the lot of them (since that also implies only one fetch/decode unit and one of everything else, as long as they work in units of 8 or 16 pixels). Hence the initial prohibition on conditionals and, for a while after that, the situation where if the conditional evaluations for pixels that would be processed together returned different values, those pixels would be processed in smaller groups in some arrangement.
Although PowerVR aren't explicit, their application development recommendations have a section on flow control and make a lot of recommendations about dynamic branches usually being a good idea only where the result is reasonably predictable, which makes me think they're getting at the same sort of thing. I'd therefore suggest that the speed disparity may be because you've included a conditional.
As a first test, what happens if you try the following?
void main()
{
float distanceFromCenter = length(impostorSpaceCoordinate);
// the step function doesn't count as a conditional
float inCircleMultiplier = step(distanceFromCenter, 1.0);
float calculatedDepth = sqrt(1.0 - distanceFromCenter * distanceFromCenter * inCircleMultiplier);
mediump float currentDepthValue = normalizedDepth - adjustedSphereRadius * calculatedDepth;
// Inlined color encoding for the depth values
float ceiledValue = ceil(currentDepthValue * 765.0) * inCircleMultiplier;
vec3 intDepthValue = (vec3(ceiledValue) * scaleDownFactor) - (stepValues * inCircleMultiplier);
// use the result of the step to combine results
gl_FragColor = vec4(1.0 - inCircleMultiplier) + vec4(intDepthValue, inCircleMultiplier);
}
Many of these points have been covered by others who have posted answers, but the overarching theme here is that your rendering does a lot of work that will be thrown away:
The shader itself does some potentially redundant work. The length of a vector is likely to be calculated as sqrt(dot(vector, vector)). You don’t need the sqrt to reject fragments outside of the circle, and you’re squaring the length to calculate the depth, anyway. Additionally, have you looked at whether or not explicit quantization of the depth values is actually necessary, or can you get away with just using the hardware’s conversion from floating-point to integer for the framebuffer (potentially with an additional bias to make sure your quasi-depth tests come out right later)?
Many fragments are trivially outside the circle. Only π/4 of the area of the quads you’re drawing produce useful depth values. At this point, I imagine your app is heavily skewed towards fragment processing, so you may want to consider increasing the number of vertices you draw in exchange for a reduction in the area that you have to shade. Since you’re drawing spheres through an orthographic projection, any circumscribing regular polygon will do, although you may need a little extra size depending on zoom level to make sure you rasterize enough pixels.
Many fragments are trivially occluded by other fragments. As others have pointed out, you’re not using hardware depth test, and therefore not taking full advantage of a TBDR’s ability to kill shading work early. If you’ve already implemented something for 2), all you need to do is draw an inscribed regular polygon at the maximum depth that you can generate (a plane through the middle of the sphere), and draw your real polygon at the minimum depth (the front of the sphere). Both Tommy’s and rotoglup’s posts already contain the state vector specifics.
Note that 2) and 3) apply to your raytracing shaders as well.
I'm no mobile platform expert at all, but I think that what bites you is that:
your depth shader is quite expensive
experience massive overdraw in your depth pass as you disable GL_DEPTH test
Wouldn't an additional pass, drawn before the depth test be helpful ?
This pass could do a GL_DEPTH prefill, for example by drawing each sphere represented as quad facing camera (or a cube, that may be easier to setup), and contained in the associated sphere. This pass could be drawn without color mask or fragment shader, just with GL_DEPTH_TEST and glDepthMask enabled. On desktop platforms, these kind of passes get drawn faster than color + depth passes.
Then in you depth computation pass, you could enable GL_DEPTH_TEST and disable glDepthMask, this way your shader would not be executed on pixels that are hidden by nearer geometry.
This solution would involve issuing another set of draw calls, so this may not be beneficial.