Picking Using the Depth Buffer

While under development, the Insight3D development team wrote a series of blog posts, which are now archived here to provide in-depth technical information about some of the underlying techniques and design decisions. These posts reflect the work-in-progress state of the product at the time.

The following blog post was originally published on the Insight3D blog on March 5, 2008 by Patrick Cozzi. At the time of publishing, Insight3D was referred to by its development code-name, "Point Break".

Picking makes 3D applications interactive. It allows users to select and interact with objects in the 3D scene. In applications built with Point Break, common uses of picking include double clicking on an object to zoom to it and right clicking on an object to bring up a context menu with operations that act on the object.

In this blog, I will describe the algorithm Point Break uses to implement picking. The depth buffer, in combination with multipass rendering, is used to identify picked objects. The algorithm works in many cases when others do not, is almost trivial to implement, and provides good performance for scenes with moderate object depth complexity given its use of culling and scissoring.

Goals

Before describing the algorithm, first consider the goals for picking in Point Break.

Picking should actually work. Seriously. Some picking algorithms do not consider per fragment operations such as the alpha test. If a model has a hole in it because the alpha test failed, picking on the hole should not return the model.
It should be trivial for developers implementing primitives to make their primitives pick-able. In addition, individual parts of a primitive should be pick-able. For example, it should be possible to return the exact mesh of a picked model primitive. Although, this is actually more of an architecture issue than an algorithm issue.
Picking should be fast enough to allow roll-over picking, that is, continuously executing a pick as the mouse moves across the screen. This is useful for reporting the current latitude/longitude/altitude or highlighting primitives as the mouse rolls over them.
Drill picking should be possible. That is, the first pick returns the top object, then successive picks at the same screen location return the next object beneath the last one picked.

Many of the well known picking algorithms do not meet these goals or do not meet them cleanly. OpenGL selection and feedback ignores per fragment operations, is a pain to use, and I believe is not hardware accelerated. Furthermore, it is not part of OpenGL 3.0 or OpenGL ES. A attractive alternative is to use the Color Buffer for picking. This can be difficult for primitives that use shaders or textures with alpha channels. The stencil buffer can also be used for picking but this prevents using it for rendering. Finally, casting a ray thru the picked pixel and finding the intersected objects is widely used. Besides requiring a copy of objects in CPU memory, this approach does not consider per fragment operations.

Algorithm

In designing an algorithm to meet our picking goals, the first observation to make is:

We are going to read the depth value of the picked pixel to compute the world space position of the pick.

Furthermore, we should observe:

For most picks, the depth complexity of the picked pixel is relatively low.

To be precise, we don't care so much about the exact depth complexity; we care about the number of objects behind the pixel, which is likely to be far less than the actual depth complexity. Sometimes, only a single model is behind a pixel, e.g. when a satellite is picked. Other times, a model and the earth are behind a pixel. Although a scene with the viewer on the ground looking toward the horizon may have a high object depth complexity.

Since we need to read the depth buffer at least once, we ask:

If we only render the objects behind the picked pixel, can we afford to read the depth buffer after rendering each object? If we clear the depth buffer before each object, an object is picked if it wrote to the depth buffer.

"No, of course not." the naysayers exclaim. glReadPixels forces a pipeline flush and moving data from GPU to CPU memory is slow on all but PCIe buses. How slow? Too slow when 3 reads are required per pick? 10 reads? 1,000?

I didn't know; so I implemented it to find out. The algorithm works as following.

Copy

RenderSceneForPick() 
{ 
   modify the view frustum to just cover the picked pixel; 
   scissor out everything except the picked pixel; 
   for each visible object 
   { 
      clear depth buffer; 
      render object; 
      read depth buffer at pick location; 
      if (depth value != clear value) 
      { 
         add object to list of picked objects; 
      } 
   } 
   sort list of picked objects near to far; 
}

Similar to picking with the color buffer, the scene is rendered into the back buffer, which is never swapped to the front. In addition, it assumes that all primitives write to the depth buffer, even translucent objects (which are rendered back to front without depth buffering for normal scenes). It's hard to come up with a case when an object could not write to the depth buffer. Perhaps, a post-process, per-pixel effect such as fog. Even, then it could write depth just when rendered for a pick. Who wants to pick fog anyway?

The first line of the algorithm needs some explanation.

Copy

modify the view frustum to just cover the picked pixel;

This is crucial for performance. A view frustum is created that still starts at the viewer position and intersects the normal near and far planes but the frustum only intersects the near plane thru the picked pixel. This is used for culling so only objects behind the picked pixel are rendered. A 2D view is shown below. The original frustum is in black and the modified frustum is in red.

When this frustum is used for culling in a scene when the entire earth would be rendered, only a fraction of the earth is rendered for picking as shown below.

This actually shows a bad case; the bounding spheres of three tiles intersect the frustum since the picked pixel is approximately in the center of the image. To reduce the fill rate on both the depth buffer clear and object rendering, the second line of the algorithm scissors everything out except the picked pixel. This means, although all the vertices are processed in the image above, fragments are only processed for the picked pixel.

Odds and Ends

There's still a few more details to consider.

First, objects that write the same depth as the clear value are not picked. In most sane cases, this means objects rendered onto the far plane can be missed. Thankfully, this is almost never an issue and can be fixed by also reading the color buffer.

Second, if the viewer is far away, its possible for a large amount of the scene to be behind a single pixel. For example, the entire earth and everything on it could be behind a single pixel. What does the user think they are picking on anyway? Regardless, view frustum culling is not effective so we turn to a simple solution. Objects that are farther from the viewer then a user defined threshold or do not meet a user defined pixel size are not rendered.

Finally, we've ignored the problem of rubberband picking, where the user drags a box in screen space and wants to pick every object in it. Since the frustum may be large, culling is now less effective. The good news is this doesn't have to execute at highly interactive rates like a single pick used for roll-over picking.

Summary and Future Work

In summary, the depth buffer, in combination with multipass rendering, is used to identify picked objects. This algorithm considers per fragment operations, is straightforward to implement, and provides good performance for most scenes. In a future entry, I may post some performance numbers.

This algorithm is still ripe with potential optimizations. For example, when drill picking doesn't matter and only the front object needs to be picked, can hardware occlusion queries help? Can PBOs improve the performance of reading the depth buffer? If the object depth complexity is high, can each object be written to a different pixel and read back all at once?

In my eyes, the ideal picking algorithm has the ease of use of this algorithm and the speed of traditional color buffer picking with a single read back.