Friday, 2 November 2012

Importance sampling: How to simulate diffraction and OLPF effects

So how exactly does one render a synthetic image that simulates both the effects of diffraction (circular aperture) and the 4-dot blur of a Lithium Niobate optical low-pass filter (OLPF)?

Before jumping into the combined effect, I will start with a method of rendering the effects of diffraction only.

Diffraction simulation using weights on a regular grid

The most intuitive method of rendering diffraction effects is direct convolution. This involves convolution in the spatial domain, i.e., for each pixel in the output image, sample the underlying scene (usually a black square on a white rectangle in my examples) at many points arranged on a fine, sub-pixel-spaced regular grid. Each of these samples is then multiplied with an Airy disc function centred on that pixel, and added to the running total for each pixel.

This works reasonably well because it is very simple to sample the underlying scene: you just have to determine whether a sample point is inside the black rectangle, or not. The appropriate weights for the Airy disc function are obtained by directly evaluating the appropriately scaled jinc function (see here for an overview).

This regular grid sampling strategy is described in some detail in a previous post. It works well enough for some functions, but it turns out to be a poor choice for the Airy disc function, for which the weights are close to zero almost everywhere outside of the central peak:
The Airy disc function (jinc squared)
Those peripheral rings have very low magnitude compared to the central peak, which means that samples from those regions will have a much smaller influence on our answer than samples from the centre.

Looking at the problem from another angle we see that this discrete sampling and convolution is really just a discrete approximation to the continuous convolution of the Airy function and the underlying scene, sampled at the pixel positions of our synthetic output image. This implies that other techniques, such as Monte Carlo integration, could potentially be applied to evaluate the integral.

Monte Carlo integration

It may sound exotic, but MC integration is a very straightforward concept. Imagine that you want to compute the surface area of a unit circle (radius=1). Fit a square tightly around this circle (say, width=2, centred on circle), and generate (x,y) coordinates randomly (uniform distribution) within this square. For each of these random coordinates that fall inside the circle, increment a counter A.
After a fair number of samples (N), the value A/N will approximate π/4, which is the ratio of the area of the circle to that of the square.

This may seem like a roundabout method of computing the area of a circle, but the method is very general. For example, instead of simply incrementing the counter A when a point is inside the circle, we could have evaluated some function f(x,y) at each point inside the circle. This means that we can approximate complex double integrals over complex, non-convex regions, without breaking a sweat.

Note that this method is very similar to the direct convolution approach described above, and if we assume that our "random" coordinates just happened to fall on a regular grid, then we can see that these two approaches are really very similar. So why would you choose random coordinates over a regular grid?

I can think of two disadvantages to a regular grid: 1) you have to know how finely spaced your grid should be in advance, and 2) your regular grid may interfere with the shape of the region you are integrating over. If you have some way of measuring convergence (say, the variance of your estimate), then you can keep on generating random samples until convergence; with a regular grid, you must sample all the points, or your integral will be severely biased.

Sampling using random (x,y) coordinates is not optimal, though, since random numbers have a tendency of forming clumps, and leaving gaps (they must, or they would not be random!). What works better in practice is a quasi-random sequence of numbers, such as a Sobol sequence or the Halton sequence. These quasi-random sequences will fill the space more evenly, and you can still stop at any point during the integration if convergence has been achieved; they also tend to produce a lower variance in the integral than random sampling for the same number of samples.

Importance sampling

While uniform random number sequences (and uniform quasi-random sequences) provide a convenient way of choosing new sampling positions for our Monte Carlo integrator, they can be wasteful if the function we are integrating has large regions over which the function value is small compared to the function's maximum, such as the Airy disc function above. What would happen if we could choose our sampling points in a biased manner, such that we choose sampling positions proportional to the function value at those positions?

This is the essence of importance sampling. If we can choose our sampling points according to a distribution that better fits the shape of the function we are integrating, then we can concentrate our samples in the areas that have a larger function value (weight), and are thus more important. This strategy reduces the variance of our MC integration procedure significantly.

Take the Airy disc function as an example: we could choose our sampling positions according to a Gaussian distribution. This will concentrate samples closer to the centre, but we have to be careful keep the standard deviation of the Gaussian wide enough so that sufficient samples can be generated from the outer rings of the Airy disc function. There is one very important thing to remember when applying importance sampling: you must divide the function value at each sampling position by the probability of observing that sampling position.

Thus, if generate our sampling points using a Gaussian distribution, then we have a Gaussian pdf p(x,y) = exp(-x2/(2sx2) - y2/(2sy2)) where sx and sy denotes the respective standard deviations in x and y, which means we must add the value f(x,y)/p(x,y) to our accumulator, rather than f(x,y)/N, as we would normally do with a regular grid or with uniform sampling. This is what it looks like when we are rendering a grey rectangle on a white background:
The gridlines denote the boundaries of the discrete pixels of our synthetic image. The red dots represent the sampling points of one pixel (the one the arrow points to).
Notice how the density of the sampling points decreases as we move away from the centre of the pixel --- this density is exactly Gaussian.

Since we are really computing the product of the weighting function and the underlying scene, we are accumulating I(x,y)*f(x,y)/p(x,y), where I(x,y) measures whether a point is inside or outside our target object (black rectangle).

What happens if the distribution of our sampling points match the distribution of f(x,y) exactly? Then f(x,y) = p(x,y), and we are effectively weighting each point equally, with sampling density effectively achieving the desired weighting. This strategy is optimal, since it makes the most effective use of every single sample. The only way to improve on this is to stratify according to the scene content as well, but that makes things a bit complicated.

Importance sampling and the Airy disc

So how do you generate quasi-random (x,y) coordinates with a distribution that matches the Airy disc function? Same way you generate points from a Gaussian: by inverting the cumulative distribution. This technique is called the "inverse transform sampling method". For a Gaussian, you can use Moro's inversion, but I am not aware of any fitted polynomials for inverting the cumulative distribution of the Airy disc function. What now?

Well, I decided to use a look-up table to approximate the cumulative distribution of the Airy disc function. Since the function is radially symmetrical, this is just a 1-dimensional look-up table, which I have implemented as a piecewise-linear function. Thus, given a pair of uniform variates (x,y) in the range [0,1][0,1], you can obtain a sample following the Airy disc function density by choosing an angle θ = 2π * x, and a radial distance r by looking up the value y in the cumulative distribution of the unit Airy disc function. Scaling for wavelength, pixel pitch and f-number can be performed on r afterwards.

There is only small trick, though: If you generate a polar 2D coordinate as [r cos(θ), r sin(θ)], where r has a uniform distribution, you will end up with more points close to the centre than on the outer rim. You want to partition the circular disc into parts of equal area as a function of radius, which means that your r must first be transformed to r' = √r. This is critical, or your true distribution of points will differ from your assumed distribution, and your weighting of samples will be biased.

To apply this to the Airy disc function cumulative distribution table, we just go back to basics. The cumulative distribution as a function of the radius r can be approximated as a finite sum:
F(rn) = F(rn-1) + (2 jinc(rn))2 * (πrn2 - πrn-12)
where rn is simply our discrete sample along the radius (something like rn = n/N). This looks like a simple Riemann sum, with the important change being that our "width" parameter is not a linear function of rn, but in fact quadratic. This small change ensures that outer radii are assigned an area-proportionally larger weight, so that we can generate our sampling positions in polar coordinates without biasing them towards the centre.

Summary of Airy disc function sampling

To briefly recap, here is the recipe for simulating the effects of diffraction:
  1. Generate a cumulative distribution of the unit Airy disc function and store it in a look-up table.
  2. Generate N (x,y) coordinate pairs in the range [0,1][0,1] using a quasi-random sequence such as the Halton sequence.
  3. Transform these coordinates to an Airy disc distribution by
    θ = 2π * x
    r =
    (x',y') = [r cos(θ), r sin(θ)]
  4.  For each pixel, add the pixel centre coordinates to each sampling point (x',y') to obtain (x",y").
  5. Evaluate the scene (x",y"), thus accumulating I(x",y") * f(x",y")/p(x",y").
  6. repeat steps 4-5 for all pixels in target image.
You may wonder about the scaling term f(x",y")/p(x",y"), which seems superfluous given that we expect this value to be equal to 1.0. Well, since I have a discrete approximation of the density (the LUT), I decided to use the actual probabilty from the LUT as p(x",y"), and the Airy disc function as f(x",y"). This way, if there are any residual errors in the approximation, the weighting should correct for it.

This algorithm can be illustrated as follows:
Notice the typical Airy pattern rings that are formed by the higher-density regions of our sampling point distribution.

One tiny detail has been omitted so far: the diameter of the region that we will sample. The magnitude of the Airy pattern drops off fairly rapidly, and it is tempting to only build our look-up table for the range 0 <= r <= 5. This yields very good sampling of the centre of the Airy pattern, and thus the higher frequencies in the MTF curve of our synthetic image. Unfortunately, such severe truncation distorts the lower frequencies of the MTF curve noticeably. I have obtained reasonable results with 0 <= r <= 45, storing roughly 20000 points in my look-up table.

Convolution of Airy PSF and OLPF

Unfortunately, we are not done yet. For a sensor without an OLPF, we must still convolve the Airy PSF with the pixel PSF (typically a box function) to obtain the desired value for a given pixel. There are two ways of doing this: 1) convolve a sampled Airy PSF with a sampled box PSF to produce a sampled combined PSF, or 2) sample the scene using importance-sampled points, but perform the box function convolution with the scene at each sampling point.

The first method is simple and straightforward, but suffers from all the usual disadvantages of regular grid sampling. It requires a very fine grid to produce good results; somewhere around 1050625 samples per pixel in my experience. The second method is really quite efficient if we have a good method of performing the box function convolution efficiently.

As it turns out, the convolution of a box function centred as a specific coordinate with our target object is just the area of intersection between the polygon defining the box function, and the rectangle defining our target object (provided, of course, that our target is a rectangle). I relied on the Sutherland-Hodgman polygon clipping routine to clip the box function polygon with the target rectangle's polygon, which is quite efficient. Here is an illustration of such a box function intersecting our polygon, with the box function (in blue) just happening to align with the pixel grid:

The importance sampling algorithm from the previous section remains largely unchanged: in step 5, the evaluation of I(x",y") now simply denotes the result of the box function convolution, i.e., the area of overlap between a box function (width 1 pixel) centred at (x",y"), and the target rectangle.

Finally, to render a 4-dot OLPF blur, such as that effected by a Lithium Niobate AA filter, you simply take the average of four samples at the coordinates (x" ± 0.375, y" ± 0.375), assuming of course a split distance of 0.375 pixels (or total spread of 0.75 pixels). Each sample thus required four polygon intersection calculations, like this:

This approach is conceptually simple, and fairly flexible. The main disadvantage is that rendering times will increase by roughly a factor four. Fortunately, the larger support of the 4-dot OLPF PSF means that the synthetic image rendered using it will be smoother, which means we can use reduce the number of samples required to obtain a reasonable result.

One more advantage: since this rendering approach implements the photosite aperture as a polygon intersection, it is trivial to model different aperture designs. For example, the default choice of a "gap less" photosite aperture is not entirely realistic, since practical sensors typically do not have 100% fill factors. As pointed out by one of the MTF Mapper blog readers, modern "gap less" microlens designs still suffer from attenuation in the corners, resulting in a near-circular photosite aperture.


We have time for a quick demonstration of the various PSF types, using a photosite pitch of 4.73 micron, i.e., like the Nikon D7000, assuming green light at 0.55 micron wavelenghts. Here are some synthetic images:
a) Gaussian PSF with sd=0.57 pixels, mtf50=0.33
b) f/8 circular aperture + square pixel aperture, mtf50=0.337

c) f/8 circular aperture diffraction + 4-dot OLPF + square pixel aperture, mtf50=0.26
Note that the MTF50 values of examples (a) and (b) above are almost the same, and unsurprisingly, the images also look very much the same. Sample (c) looks just a tad softer --- exactly what we would expect image (b) to look like after adding an OLPF.

It seems like quite a lot of effort to simulate images with PSFs that correspond to diffraction effects, only to end up with images that look like those generated with Gaussian PSFs.


That is probably enough for one day. In a future post I will provide more information on rendering time and accuracy.

All the algorithms discussed here have been implemented in the mtf_generate_rectangle tool included in the MTF Mapper package from version 0.4.12 onwards. See the documentation on the "-p" option, which now includes  "gaussian", "airy", "airy-box" and "airy-4dot-olpf" PSF types.

Tuesday, 23 October 2012

Ultimate macro photography camera?

The problem

If you have ever played around with macro photography at a magnification of 1x or more, you will have encountered the curse of shallow Depth of Field (DOF). It is often desirable in portrait photography to isolate the subject by having only the subject in focus, with the background nicely out of focus, i.e., you want relatively shallow DOF.

Unfortunately, there is such a thing as too little DOF, where it becomes difficult to keep the entire subject in focus, or at least the parts you would like to keep in focus. This problem pops up in macro photography all the time. Consider this example shot at 2.8x magnification (and then cropped a bit):
An ant at 2.8x magnification, lens aperture f/4
Depending on your preference, you may decide that the DOF in this shot is just fine. Personally, I would have liked to have more of the hair running down the centre line of the ant's head, especially between the mandibles and the antennae, to be in focus.

Normally, you can just stop down the lens to increase DOF. Unfortunately, there is the small matter of diffraction that gets in your way. To fully appreciate the problem, first consider the way in which magnification affects the effective aperture (also called the "working aperture"). The aperture of the lens has to be scaled according to magnification, so that
Ne = N * (1 + m)
where Ne denotes the effective aperture, N denotes the aperture as set on the lens, and m is the magnification ratio. At 1:1 magnification, which is usually considered the start of the "macro" range, the value of m is equal to 1.0. This means that the image projected onto the sensor is physically the same size as the object being photographed. (Please note that this equation is an approximation that assumes that the pupil ratio is equal to 1.0; many real lenses have pupil ratios that differ significantly from 1.0, and a different form of the equation is required to include the pupil ratio in such cases, but the overall behaviour of the equation is unchanged. For simplicity, I assume a pupil ratio of 1.0 in this article.)

For non-macro photography, the value of m is usually small, around 0.1 or less, which implies that the effective aperture Ne is approximately equal to N, the aperture selected on the lens. Under these conditions, you can just stop down the lens to increase DOF, at least to around f/11 or f/16 on modern sensors with roughly 4.8 micron photosite pitch values. Going beyond f/11 on these sensors will increase DOF, but diffraction softening will start to become visible.

In the macro domain, though, it is an entirely different kettle of fish. The shot of the ant above serves as an example: at 2.8x magnification, with the lens set to f/4, we obtain an effective aperture of f/15.2. If we stop down the lens just a little bit, say to f/5, we are already at f/19. The minimum aperture of the lens I used is f/22, which gives us a mind-bogglingly small effective aperture of f/83.6. Keep in mind that we were running into visible diffraction softening even at a lens aperture of f/4.

How bad is this diffraction softening? Well, assuming an aberration-free lens that is in its diffraction-limited range, the f/4 lens on a D7000 body would give us a maximum resolution of 39.6 line pairs per mm. A 12x8 inch print would thus render at 3 lp/mm (divide the 39.6 by the print magnification factor of ~13), which is barely acceptable at approximately 152 DPI.

Bumping up the f-number to f/22 on the lens gives us only 8.7 lp/mm of resolution on the sensor, or 0.675 lp/mm (34 DPI) in print. To reach roughly 152 DPI we can print at a maximum size of 2.6x1.75 inch; anything larger will look visibly softer than the f/4 example in the preceding paragraph.

So how much DOF do we have at 2.8x magnification with the lens set to f/4? Only about 76 micron, or less than 1/10th of a millimetre. Stopping down the lens to f/22 increases DOF to 418 micron, or 0.418 mm. I would consider a DOF of 0.3 mm to be workable for many insects, depending on their pose relative to the focus plane.

To summarise: At magnifications of greater than 1x, we can increase DOF by stopping down the lens, but the effective aperture quickly becomes so small that diffraction softening destroys much of the detail we were hoping to capture.

Can we work around this problem?

Defining "depth of field"

Compact cameras usually have significantly more DOF than SLR cameras. The explanation of this is actually quite straightforward, if rarely heard. But first we must define what we mean by depth of field. (You can safely skip ahead to the next section if you are confident that you know what DOF is.)

Shallow DOF is not the same thing as a blurred-out background; shallow DOF just means that the region of the image which will be acceptably sharp is shallow. Nothing is said about the character of the part of the image which is not in focus. The common misconception that long focal lengths produce shallow DOF is based on the confusion of these concepts: lenses with longer focal lengths produce more blurry backgrounds, but their DOF is actually very similar to lenses with shorter focal lengths after you have moved back to compensate for the smaller field of view.

DOF is only meaningful in the context of specific viewing conditions. First up, DOF only pertains to the region of the image which is acceptably sharp under the conditions in which the final image is viewed. This is usually taken to be a given print size viewed at a specified distance. Working back from what constitutes an acceptably sharp print, we arrive at the definition of the circle of confusion (CoC). In other words, we take a single, sharp point in the print, and "project" this back onto the sensor. This projected point forms a little disc on the sensor, with a diameter equal to the CoC.

Reasoning backwards, we can see that at the sensor, anything image feature smaller than the disc formed by the CoC will be compressed to a single point in the final print. Any feature larger than the CoC will be printed across more than one dot in print, and will thus be visible to the viewer. This ties the CoC to our definition of the region of acceptable sharpness: point light sources that are in front of (or behind) the exact plane of focus (in the scene) will project onto the sensor as small discs. If these discs are smaller than the CoC, then they will appear in focus. If the point source is moved away from the exact plane of focus, it will reach the distance where the disc that it forms on the sensor first matches, and then exceeds, the CoC, at which point it will start to appear blurry.

The DOF is thus the range of distances between which image features are rendered as points in the print, i.e., appear acceptably sharp. At this point it should be clear that the size of the print relative to the size of the sensor has a direct impact on the perceived DOF in print, since a small-sensor image will have to be magnified more to reach the desired print size, compared to the image formed on a large sensor (with the same field of view). This implies that the CoC of a small sensor will be proportionally smaller, i.e., the CoC for a 35 mm sensor size ("full-frame") is usually taken as 0.03 mm, while a compact camera with a 5.9 mm sensor width will have a CoC of 0.0049 mm.

Why small sensors have more DOF

So why does a small-sensor compact camera appear to have a large depth of field? The physical size of the aperture is the key concept. An f/4 lens with a focal length of 105 mm will have a physical aperture diameter (actually, entrance pupil) of 26.25 mm. A 42 mm lens at f/4 will only have an aperture diameter of 10.5 mm.

If the physical diameter of the aperture is small, the DOF will be large; just think of how a pinhole camera works. The catch is of course that a pinhole camera will suffer from diffraction softening, but you can hide this softening if your film/sensor is large enough that you do not have to magnify the image in print (i.e., contact prints).

A small sensor requires a lens with a shorter focal length to achieve the same field of view as another lens fitted to a large sensor. For example, both the 105 mm lens and the 42 mm lens will produce the same field of view if we attach them to an APS-C sized sensor and a 5.9 mm width sensor, respectively. The 5.9 mm width sensor, though will have a larger DOF because it has a smaller physical aperture.

Note that you can substitute a change in position for the change in focal length. Say you use a 50 mm lens on a full-frame camera at f/2.8. To achieve the same subject size on an APS-C (crop) camera, you would have to move further backwards to match the field of view of the full-frame camera. You can keep the lens at f/2.8, but the full-frame image will have a shallower depth of field, even though we did not change the physical aperture size. The trick is to realize that DOF is also a function of subject distance (magnification), so that the APS-C camera will have increased depth of field because it is further from the subject.

It is instructive to play with VWDOF (available here) to observe these effects first-hand.

A small-sensor dedicated macro camera at 1:1

Can we solve the problem of insufficient macro DOF by using a small sensor?
Will diffraction prevent us from obtaining sufficient detail with such small photosites?

The idea is simple: What would happen if you increased the linear pixel density of your DX camera by a factor of four (each pixel is replaced by a 4x4 group of pixels)? We would end up with extremely small photosites, but still within the limits of what is currently possible. Then, instead of increasing the magnification of our lens, which would decrease our effective aperture a lot, we just crop out the centre 25% of our higher-resolution image. This effectively gives us an additional 4x magnification. If we are going to crop out the centre 25% of every image, then we might just as well use a smaller sensor. So, we keep the same number of pixels, but we use a sensor that is only 1/4 the size of our initial sensor (DX, in this case). Once we have a smaller sensor, we can replace our lens with a shorter focal length lens, which will be smaller, lighter, hopefully cheaper, but also easier to build to achieve exceptional sharpness. To illustrate this idea, a practical example now follows.

Suppose we have a subject that is 28.3 mm wide (roughly the width you can achieve with a 105 mm lens at 1x magnification on a Nikon DX sensor). What we want to achieve is to capture the exact same subject, but with different sensor sizes. To achieve this, we will have to vary both the focal length and the lens magnification factor.

To illustrate this concept, I will define three (hypothetical) cameras:
  1. DX: 4980 pixels over 23.6 mm sensor width,
    focal length = 105 mm,
    required magnification = 1x,
    lens aperture = f/4,
    effective aperture = f/8
  2. FX: 4980 pixels over 36 mm sensor width,
    focal length = 127 mm,
    required magnification = 1.525x,
    lens aperture = f/4,
    effective aperture = f/10
  3. 1/4DX: 4980 pixels over 5.9 mm sensor width,
    focal length = 42 mm,
    required magnification = 0.25x,
    lens aperture = f/4,
    effective aperture = f/5

With these specifications, all the cameras will capture the subject at a distance of 210 mm, with the same field-of-view (FOV) of roughly 7.71 degrees, and the same subject size relative to the image frame. The 1/4DX camera is just what the name implies: scale down the sensor size by a factor of four in each dimension, but keep the same number of pixels. You can calculate the photosite pitch by dividing the sensor width by the number of pixels; the 1/4DX sensor would have a pitch of 1.18 micron, which is close to what is currently used in state-of-the-art cellphone sensors.

We define DOF the same way for all the cameras, i.e., being able to produce the same relative circle of confusion when looking at a final print size of 12 inches wide. The actual circle of confusion for the 1/4DX camera will have to be much smaller (1/4 of the DX CoC) to compensate for the fact that we have to magnify the image 4x larger than the DX image to arrive at a 12 inch print.

Computing the actual DOF using VWDOF:
  1. DX:      0.314 mm
  2. FX:      0.261 mm
  3. 1/4DX: 0.784 mm

This is looking promising. The 1/4DX sensor gives us roughly 2.5 times more DOF (compared to DX) in the final print. The FX sensor gives us less DOF, which is more or less what we expected.

Using the MTF Mapper tool mtf_generate_rectangle, we can model the MTF50 values after taking into account the softening effect of diffraction (MTF50 values are a quantitative metric that correlates well with perceived sharpness). This allows us to calculate how sharp the lens will have to be for the 1/4DX camera to work, as well as what our final resolution will be once we have scaled the images up to 12-inch prints.

The actual resolution of the final print, which arguably is the thing we want to maximise, turns out as follows:
  1. DX:       4.31 line pairs / mm
  2. FX:       4.6 lp/mm
  3. 1/4DX:  2.5 lp/mm

What happened? Diffraction destroyed a lot of our resolution on the 1/4DX sensor. In short, we decreased our photosite pitch by a factor four, which means that diffraction softening will now already be visible at f/4 (probably by f/2.8, even). We expected our smaller pixels to be able to capture smaller details, but diffraction blurred out the details to the point where very little detail remained at the pixel level.
We can try to remove the AA filter on the 1/4DX sensor (which we arguably no longer need at f/4 with a 1.18 micron photosite pitch) to win back a little resolution, which will give us 2.7 lp/mm in print. Not a huge gain, but we might as well use it.

One tiny little detail is also quite important: the 1/4DX camera would require a really, really sharp lens: around 129 lp/mm. I think this is still possible, though, but we'll need a lens designer's opinion.

So going for a smaller sensor with 4 times higher (linear) pixel density does indeed give you more DOF in a single shot, provided that you keep the lens aperture the same. The price for this is that the sharpest part of the printed picture will be noticeably less sharp than the prints produced on the DX camera. DOF increased by a factor of 2.5, but overall resolution decreased by a factor of 1.724.

We could take our DX camera and stop it down to roughly f/9.1 on the lens, producing an effective aperture of f/18.2. This would produce comparable resolution to the 1/4DX camera (around 2.7 lp/mm in a 12-inch print), and the DOF would be 0.713 mm, which is ever so slightly less than what the 1/4DX camera would produce.

Detours at 1:1

Ok, so if we lose resolution but gain DOF by stopping down the aperture, what would happen if opened up the aperture on the 1/4DX camera a bit?

Well, at a lens aperture of f/1.6, the effective aperture would be f/2. This would produce a 12-inch print at a resolution of 5.5 lp/mm, which is higher than any of the other options above. But the DOF is exactly 0.314 mm, so we are no better off in that respect, except that we have increased our final print resolution slightly.

A slightly less extreme aperture of f/2.25 would give us an effective aperture of f/2.8, which will match the print resolution of the DX camera, and give us 0.441 mm of DOF. That is a decent 40% increase in DOF over the DX camera, and still gives us the same print resolution.

Proceeding along this path, we can choose a lens aperture of f/3.2, resulting in an effective aperture of f/4, yielding print resolution of 3.29 lp/mm. This is about 30% less resolution, but we have a DOF of 0.627 mm. The DX camera will reach the same print resolution (3.29 lp/mm) at an aperture of f/6.8 (effective aperture is f/13.5), which will yield a DOF of 0.533, so the 1/4DX camera is looking less attractive at this point.

What happens at 2.8x magnification?

We can repeat the process at 2.8x magnification, which gives us the following cameras:
  1. DX: 4980 pixels over 23.6 mm sensor width,
    focal length = 105 mm,
    required magnification = 2.8x,
    lens aperture = f/4,
    effective aperture = f/15.2
  2. 1/4DX: 4980 pixels over 5.9 mm sensor width,
    focal length = 58.5 mm,
    required magnification = 0.7x,
    lens aperture = f/2.6,
    effective aperture = f/4.4
Note that the aperture for 1/4DX was chosen so that the print resolution of the 1/4DX camera matches that of the DX camera. With these settings, the DX camera has a DOF of 0.076 mm, and the 1/4DX camera has a DOF of 0.0884 mm, so no real improvement in DOF if we keep the print resolution the same.


There does not appear to be a free lunch here. It is possible to increase the DOF of a 1/4DX camera relative to that of the DX camera, but it requires an extremely sharp lens. The lens will fortunately be quite small, so it may be feasible to construct such a lens.

The price we pay for the increased DOF is that we have much smaller photosites, which will have a negative impact on image quality in the form of noise. Specifically, photon shot noise is related to the full-well capacity of a photosite, which in turn is linked to the photosite size. So while the 1/4DX camera may be able to offer slightly more DOF at just the right settings (e.g., lens aperture set to f/2.25), the trade-offs in terms of noise are unlikely to make this approach attractive.

It would be interesting to explore the parameter space more methodically. For example, what if we try a 1/2DX camera instead? I am betting against it, but I probably should run the numbers and see ...

Monday, 18 June 2012

Combining box filters, AA filters and diffraction: Do I need an AA filter?

I have been building up to this post for some time now, so it should not surprise you too much. What happens when we string together the various components in the image formation chain?

Specifically, what happens when we combine the square pixel aperture, the sensor OLPF (based on a 4-dot beam splitter) and the Airy function (representing diffraction)? First off, this is what the MTF curves of our contestants look like:
The solid black curve represents a combined sensor OLPF (4-dot beam splitter type) + pixel aperture + lens MTF (diffraction only) model. This was recently shown to be a good fit for the D40 and D7000 sensors. The dashed blue curve represents the MTF of the square pixel aperture (plus diffraction), i.e., a box filter as wide as the pixel. The dashed red curve illustrates what a Gaussian MTF (plus diffraction) would look like, fitted to have an MTF50 value that is comparable to the OLPF model. Lastly, the solid vertical grey line illustrates the remaining contrast at a frequency of 0.65 cycles per pixel, which is well above the Nyquist limit at 0.5 cycles per pixel (dashed vertical grey line).

Note how both the Gaussian and the OLPF model have low contrast values at 0.65 cycles per pixel (0.04 and 0.02, respectively), while the square pixel aperture + lens MTF, representing a sensor without an AA filter, still has a contrast value of 0.27. It is generally accepted that patterns at a contrast below 0.1 are not really visible in photos. That illustrates how the OLPF successfully attenuates the frequencies above Nyquist, but how does this look in a photo?

Ok, but how would it affect my photos visually?

I will now present some synthetic images to illustrate how much (or little) anti-aliasing we obtain at various apertures, both with and without an AA filter. The images will look like this:

The left panel is a stack of four sub-images (rows) separated by white horizontal bars. Each sub-image is simply a pattern of black-and-white bars, with both black and white bars being exactly 5 pixels wide (in this example). The four stacked sub-images differ only in phase, i.e., in each of the four rows the black-and-white pattern of bars is offset by a horizontal distance between 0 and 1 pixels in length.

The right panel is a 2x magnification of the left panel. Note that the third row in the stack is nice and crisp, containing almost pure black and pure white. The other rows have some grey values at the transition between the black and white bars, because the image has been rendered without any anti-aliasing.
These images are rendered by sampling each pixel at 2362369 sub-pixel positions, weighting each sampled point with the relevant point spread function.

The aliasing phenomenon known as frequency folding was illustrated in a previous post. When a scene contains patterns at a frequency exceeding the Nyquist limit (highest frequency representable in the final image), the patterns alias, i.e, the frequencies above Nyquist appear as patterns below the Nyquist limit, and are in fact indistinguishable from real image content at that frequency. Here is a relevant example, illustrating how a frequency of 0.65 cycles per pixel (cycle length of 1.538 pixels) aliases onto a frequency of 0.35 cycles per pixel (cycle length of 2.857 pixels) if no AA filter is present:
This set was generated at a simulated aperture of f/1.4, which does not attenuate the high frequencies much. Observe how the two images in the "No OLPF" column look virtually the same, except for a slight contrast difference; it is not possible to tell from the image whether the original scene contained a pattern at 1.538 pixels per cycle, or 2.857 pixels per cycle.

The "4-dot OLPF" column shows a clear difference between these two cases. If you look closely you will see some faint stripes in the 2x magnified version at 1.538 pixels per cycle, i.e., the OLPF did not completely suppress the pattern, but attenuated it strongly.

If we repeat the experiment at f/4, we obtain this image:
At f/4, we do not really see anything different compared to the f/1.4 images, except an overall decrease in contrast in all the panels.

Ok, rinse & repeat at f/8:
Now we can see the contrast in the "No OLPF" column, at 1.538 pixels per cycle, dropping noticeably. Diffraction is acting as a natural AA filter, effectively attenuating the frequencies above Nyquist.

Finally, at f/11 we see some strong attenuation in the sensor without the AA filter too:
You can still see some clear stripes (top left panel) in the 2x magnified view, but in the original size sub-panel the stripes are almost imperceptible.


So there you have it. A sensor without an AA filter can only really attain a significant increase in resolution at large apertures, where diffraction is not attenuating the contrast at higher frequencies too strongly. Think f/5.6 or larger apertures.

Unfortunately, this is exactly the aperture range in which aliasing is clearly visible, as shown above. In other words, if you have something like a D800E, you can avoid aliasing by stopping down to f/8 or smaller, but at those apertures your resolution will be closer to that of the D800. At apertures of f/5.6 and larger, you may experience aliasing, but you are also likely to have better sharpness than the D800.

Not an easy choice to make.

Personally, I would take the sensor with the AA filter.

Nikon D40 and D7000 AA filter MTF revisited

In an earlier post, I have shown some early MTF measurements taken with both the D40 and the D7000 at the centre of a Sigma 17-50 mm f/2.8 lens.
In this post, I revisit those measurements, presenting some new results for the D7000, and a model of the sensor AA filter (or OLPF).

Optical Low Pass Filters (OLPFs) in DSLRs

I have noticed previously that the measured MTF of the D40 was fairly close to a Gaussian, but that there were some small systematic discrepancies that were not accounted for. But how do you build an optical filter that has a Gaussian transfer function?

It turns out that the OLPFs in DSLRs are not really Gaussian, but something much simpler: beam splitters. A slice of Lithium Niobate crystal, cut at a specific angle, presents a different index of refraction depending on the polarization of the incident light. Let us assume that a beam of horizontally polarized light passes through straight, for the sake of the argument. Vertically polarized light, on the other hand, refracts (bends) as it enters the crystal, effectively taking a longer path through the crystal. As the vertically polarized light leaves the crystal, it refracts again to form a beam parallel to the horizontally polarized beam, but displaced sideways by a distance dependent on the thickness of the crystal.

Using a single layer of Lithium Niobate crystal, you can split a single beam into two parallel beams separated by a distance d, which is typically chosen to match the pixel pitch of the sensor. Since this happens for all beams, the image leaving the OLPF is the sum (average) of the incoming image and a shifted version of itself, translated by exactly one pixel pitch.

If you stack two of of these filters, with the second one rotated through 90 degrees, you effectively split a beam into four, forming a square with sides equal to the pixel pitch (but often slightly less than the pitch, to improve resolution). A circular polariser is usually inserted between the two Lithium Niobate layers to "reset" the polarisation of the light before it enters the second Niobate layer.

Combining pixel aperture MTF with the OLPF MTF

So how does this beam-splitter effect the desired blurring? We can compute the combined PSF by convolving the beam-splitter impulse response with the pixel aperture impulse response (a box function).

The beam splitter is represented as four impulses, i.e., infinitely thin points. Basic Fourier transform theory tells us that the Fourier transform of an impulse is just a cosine function, so the MTF of the beam splitter will be the sum of four cosines.

The "default" 4-way beam splitter impulse response filter can be denoted as the sum of four diagonally-placed impulses, i.e.,
f(x,y) = δ(x-d, y-d) + δ(x+d, y+d) + δ(x-d, y+d) + δ(x+d, y-d)
where δ(x,y) represents a 2D Dirac delta function, which is non-zero only if both x and y are zero, and d represents the OLPF split distance. More sophisticated OLPF designs are possible (e.g., 8-way splitters), but the 4-way designs appear to be popular. In my notation here, the distance between two beams would be 2d; this is to accommodate my usual notation of a pixel being defined over the area [-0.5, -0.5] to [0.5, 0.5].

The degree of blur is controlled by the d parameter, with d = 0 yielding no blurring, and d = 0.5 giving us a two-pixel-wide blur. Since d can be varied by controlling the thickness of the Lithium Niobate layers, a manufacturer can fine-tune the strength of the OLPF for a given sensor.

Convolving a square pixel aperture with four impulse functions is straightforward: just sum four copies of the box filter PSF, each shifted by d in the required direction. For d = 0.375, we obtain the following PSF:

in two dimensions, or
in three dimensions. Not exactly a smooth PSF, but then neither is the square pixel aperture.

Simulating the combined effects of diffraction, OLPF and pixel aperture

We can derive the combined PSF directly by convolving the diffraction, OLPF and pixel aperture PSFs. Note that this combined PSF is parameterized by wavelength, lens aperture, pixel pitch, and OLPF split distance. For a wavelength of 0.55 micron, an aperture of f/4, a pixel pitch of 4.73 micron and an OLPF split distance of 0.375 pixels (=2.696 micron), we obtain the following PSF:
in two dimensions, or
in three dimensions. Notice how diffraction has smoothed out the combined OLPF and pixel aperture PSF (from previous section).

Using this PSF we can generate synthetic images of a step edge. The MTF of this synthetic edge image can then be compared to a real image captured with a given sensor to see how well this model holds up.

Enter the Death Star

You can certainly measure lens or sensor sharpness (MTF) using any old chart that you printed on office paper, but if you are after resolution records, you will have to take a more rigorous approach. One of the simplest ways of obtaining a reasonably good slanted edge target is to blacken a razor blade with candle soot. This gives you a very straight edge, and very good contrast, since the soot does not reflect much light.

That leaves only two other questions: what do you use as a background against which you will capture the razor blade, and how do you illuminate your target?

Previously I have used an SB600 speedlight to illuminate my razor blade, which was mounted on a high-grade white paper backing. This produced reasonably good results, but it did not maximize contrast because the flash was lighting the scene from the front. There is also a possibility that the D7000 cycles its mirror when using a flash in live view mode, which could lead to mirror slap. So the flash had to go.

In its place I used a home made integrating sphere, which I call the "Death Star" (sorry George, please do not sue me). Here is what it looks like with the razor blade mounted over the integrating sphere's port:
Yes, I know the highlight is blown out ....
Why use an integrating sphere? Well, the integrating sphere produces perfectly uniform diffuse lighting, which is ideal for creating a uniform white backdrop for the razor blade.

In addition, my home made integrating sphere produces enough light to require a shutter speed of 1/200 s at f/4, ISO100 to prevent blowing out the highlights. This is perfect for minimizing the influence of vibration (although higher shutter speeds would be even better).

Using this set-up, I then capture a number of shots focused in live view, using a remote shutter release. It appears that with this target, the AF is really accurate, since I could not really do much better with manual focus bracketing. One day I will get a focusing rail, which will make focus bracketing much simpler.

Lastly, all MTF measurements on the razor edge were performed using MTF Mapper. The "--bayer green" option was used to measure MTF using only the green photosites, thus avoiding any problems with Bayer demosaicing. The raw files were converted using dcraw's "-D" option.

D40 MTF and OLPF model

Here is the MTF plot of a D40 razor image captured at f/4 (manually focus bracketed), Bayer green channel only:

D40 green channel MTF, Sigma 17-50 mm f/2.8 at f/4
The D40's PSF was modelled by convolving the square pixel aperture (7.8 micron wide), the 4-point beam splitter (d=0.375 pixels), and the Airy function (f/4). A synthetic image of a step edge with this PSF was generated, and measured using MTF Mapper.

Purely judging the match between the model and the measured MTF by eye, one would have to conclude that the model captures the interesting parts of the MTF rather well. The measured MTF is slightly below the model, which is most likely caused by a smidgen of defocus.

The fact that the model fits so well could also be taken to imply that the Sigma 17-50 mm f/2.8 lens is relatively aberration-free at f/4, i.e., it is diffraction limited in the centre.

MTF50 resolution came in at  0.334 cycles per pixel, or 42.84 line pairs per mm, or 671 line pairs per picture height.

D7000 MTF and OLPF model

Here is the MTF plot of a D7000 razor image captured at f/4 (AF in live view), Bayer green channel only:

D7000 green channel MTF, Sigma 17-50 mm f/2.8 at f/4
The D7000's PSF was modelled by convolving the square pixel aperture, the 4-point beam splitter (d=0.375 pixels), and the Airy function (f/4). A synthetic image of a step edge with this PSF was generated, and measured using MTF Mapper.

The measured MTF does not fit the model MTF quite as well as it did in the D40's case. Given that the physical linear resolution is 60% higher, it is correspondingly harder to obtain optimal focus. The shape of the measured MTF relative to the model MTF is consistent with defocus blur.

The actual resolution figures are impressive: MTF50 is 0.304 cycles per pixel, or about 64 lp/mm, or equivalently, 992 line pairs per picture height.

If the model is indeed accurate, it would mean that the D7000 can theoretically obtain resolution figures close to 68 lp/mm at f/4 in the green channel, provided that the lens is purely diffraction limited, and perfectly focused.


Perhaps this is not such a surprising result, but it appears that Nikon is using the same relative strength of AA filter in both the D40 and the D7000; this can be deduced from the fact that both the D40 and the D7000 OLPF models fitted best with an OLPF split distance of 0.375 pixels.

The somewhat unexpected result, for me at least, was that the MTF shape is so sensitive to perfect focus. Specifically, it seems that the first zero of the MTF curve, at around 0.6875 cycles per pixel, is not readily visible unless focus is perfect. The zero is quite clear in the D40 curve, but not quite so visible in the D7000 curve. You are extremely unlikely to achieve this kind of focus in real world photography, though.


1. Russ Palum, Optical Antialiasing filters, in Single-Sensor Imaging: Methods and Applications for Digital Cameras, Edited by Rastislav Lukac, CRC Press 2008

Wednesday, 6 June 2012

D800E versus diffraction

In a previous post (here), I have illustrated how diffraction through a circular aperture can be modelled either in the spatial domain as a point spread function (PSF), or in the frequency domain as a modulation transfer function (MTF). I will now put these models to use to investigate the influence of diffraction on the resolution that can be achieved with the D800E at various apertures.

Simulating the effect of diffraction

I will not go into the maths behind the diffraction MTF; this was discussed in another post (here). For now, it is sufficient to understand that we can combine the diffraction MTF with the sensor's MTF through multiplication in the frequency domain.

Assume for the moment that the D800E effectively does not have an AA filter (in practice, this might not be entirely true, i.e., the D800E may just have a very weak AA filter compared to other cameras). This allows us to model the pixel's MTF curve as a sinc(x), as was shown in a previous post. Next, we assume that the lens is diffraction limited, i.e., the other lens aberrations are negligible, and thus the lens MTF is just the diffraction MTF.
For a D800(E) pixel pitch of 4.88 micron, and an aperture of f/8, we obtain the following combined MTF curve:
D800E combined MTF at f/8
The dashed grey curve represents the sensor's MTF, and the black curve represents the diffraction MTF. The blue curve is the product of these two curves, and represents the combined diffraction-and-sensor MTF.
At f/8, our peak MTF50 value will be 0.344 c/p, or 70.4 lp/mm. Note that this is still higher than what I measured on a D7000 at f/5, which peaked at about 0.29 c/p (61 lp/mm), but the D7000  has an AA filter. 

Moving to even smaller apertures will cost us resolution, thus at f/11 the curve looks like this:
D800E combined MTF at f/11
At f/11, MTF50 peaks at only 0.278 c/p, or 57 lp/mm. This is still extremely crisp, although you might barely be able to see the difference compared to f/8 under ideal conditions. Pushing through to f/16:
D800E combined MTF at f/16
Note how close the combined MTF curve and the diffraction MTF curve have now become; this indicates that diffraction is starting to dominate the MTF curve, and thus also resolution. At f/16, MTF50 has dropped to 0.207, or about 42.3 lp/mm, which is not bad, but quite far from the 70 lp/mm we achieved at f/8.

What about going in the other direction? Here is what happens at f/5.6:
D800E combined MTF at f/5.6
MTF50 now reaches 0.412 c/p, or 84.4 lp/mm. At f/4 (not shown as a plot) we get 0.465 c/p (95.3 lp/mm), and so on. Below f/4 we will start seeing the residual aberrations of the lens take over, which will reduce effective resolution. I have no model for those yet, so I will stop here for now.

Ok, so I will go one step further. Here is the MTF plot at f/1.4, but keep in mind that for a real lens, other lens aberrations will alter the lens MTF so that it is no longer diffraction limited. But this is what it would have looked like if those aberrations were absent:
D800E combined MTF at f/1.4
Off the charts! MTF50 will sit at 0.557 c/p, or 114.1 lp/mm. The pixel MTF and the combined MTF are now very similar, which is to be expected, since diffraction effects are now almost negligible. Now if only they could build this lens ...

In closing

These results seem to support the suggestions floating around on the web that the D800E will start to visibly lose sharpness after f/8, compared to what it achieves at f/5.6. But this does not mean that f/11 is not sharp, since 57 lp/mm is not something to be sneezed at! Even more importantly, there is no "magical f-stop" after which diffraction causes the resolution to drop; diffraction will lower resolution at all f-stop values. The balance between diffraction blur and blur caused by other lens aberrations tends to cause lens resolution to peak at a certain aperture (around f/4 to f/5.6 for many lenses), but even at f/1.4 you will lose resolution to diffraction, just not a lot.

There are also some claims that the D700 was usable at f/16, but now suddenly the D800E will not be usable at f/16 any more. This is not true. If we compare a hypothetical D700E with our hypothetical D800E above, we see that the D800E attains an MTF50 value of 42.3 lp/mm at f/16, and the hypothetical D700E would reach only 37.2 lp/mm.

The real D700 has an AA filter. If we approximate the strength of this filter as a Gaussian with a standard deviation of 0.6246, then the D700 would only reach an MTF50 of 25.6 lp/mm at f/16. A similar approximation of the AA filter for the D800 would produce an MTF50 of 34.4 lp/mm at f/16. So the D800 (or D800E) will always capture more detail than the D700 at all apertures. The D800E is perfectly usable at f/16, and more so than the D700. 

[Incidentally, the diffraction + Gaussian AA filter approximation used here appears to be quite accurate. Roger Cicala's Imatest results on the D800 and D700 with the Zeiss 25 mm f/2 (see here) agree with my figures. From Roger's charts, we see the D800 at f/5.6 achieves 1200 lp/ph, or about 50.06 lp/mm, compared to my figure of 50.7 lp/mm. The D700 at f/5.6 attains roughly 750 lp/ph (31.38 lp/mm) in Roger's test, and my model predicts 31.9 lp/mm.]

The catch, though, is that the D700's MTF50 at f/16 is 0.216 c/p (25.6 lp/mm), whereas the D800's MTF50 at f/16 is 0.168 c/p (34.4 lp/mm). The apparent per-pixel sharpness of the D700 will exceed that of the D800 at 100% magnification on-screen. If you view them at the same size, though, the D800 will be somewhat sharper.

An introduction to diffraction

Incorporating diffraction

A while ago I posted an article on the MTF of a sensor without an AA filter in the absence of diffraction (here if you've missed it). By not including the effects of diffraction, we could see how the lack of an AA filter produced significant aliasing, which shows up visually as false detail.

Now it is time to add the effects of diffraction back in. I do not have the required background in physics to explain this from first principles, but we can probably blame diffraction on the wave properties of light. If we take a beam of light and pass it through a narrow vertical slit, and then project it onto a surface, we do not see quite what we expected.

Instead of seeing this:

we actually see this

Intensity pattern of diffraction through narrow slit

Of course, to illustrate this principle I had to compress the intensity of the diffraction image quite a bit, since the secondary stripes are much dimmer than the main one. Making the slit narrower causes the intensity pattern to spread out more.
The intensity pattern that we observe is in fact sinc2(x), and looks like this:
Intensity profile of narrow slit, sinc2(x)
Recall that we first observed the sinc(x) function, which is defined as
sinc(x) = sin(x) / (x),
when we computed the Fourier transform of the box function. Note that a cross-section of the vertical slit (along constant y values) is in fact a box function. Why is the intensity function sinc2(x) rather than sinc(x) ? Again, I lack the physics background to explain this properly, but it is squared because we are dealing with incoherent light, which is a safe assumption for typical photographic light sources. Coherent light sources, such as lasers, will have a sinc(x) diffraction pattern upon passing through a narrow slit.
If we have a square hole (aperture) rather than a vertical slit, we will observe this intensity pattern:
Intensity pattern of square aperture

which is simply sinc2(x)*sinc2(y).

If we want to introduce a circular aperture, like the one in a real lens, then we can no longer treat x and y separately. A circular aperture produces an intensity pattern that looks like this:
Intensity pattern of circular aperture (Airy pattern)
This pattern is also known as the Airy pattern, with the bright central circular area (before first black ring) known as the Airy disc. Again, a smaller aperture causes the size of this disc to increase, and the entire pattern to spread out.

Although x and y can no longer be treated independently, we can transform them to polar coordinates to obtain r = sqrt(x2 + y2), which yields the intensity function jinc2(r) where
jinc(r) = 2*BesselJ(r)/(r)
and BesselJ is a Bessel function of the first kind, of order 1 (see Wikipedia). Since this function is radially symmetrical (as can be seen in the intensity image above) we can plot it simply as a function of r
Intensity profile (radial) of circular aperture, jinc2(r)

The result is that the circular aperture behaves somewhat like a square aperture, except that instead of observing a sinc2(x) function, we instead observe a jinc2(r). Note that the side-lobes of the jinc2(r) function are much smaller than those of the sinc2(x) --- you can probably guess what effect this will have on MTF.

Diffraction MTF

So what does the MTF curve of a diffraction pattern look like? For the first case, namely the MTF of the diffraction pattern of a vertical slit, we can perform the analysis in one dimension, to obtain the following curve:
MTF of sinc2(x) PSF

If you have guessed that it looks an awful lot like a straight line, then you guessed correctly. Recall that the intensity function of the vertical slit is sinc2(x). The Fourier transform of sinc(x) is a box function (see here for a recap), and vice versa, so the MTF of sinc(x) is just a box function in the frequency domain. Since multiplication in the time (or spatial) domain is equivalent to convolution in the frequency domain, we can see that the Fourier transform of sinc2(x) must be the convolution of a box function with itself in the frequency domain. Lastly, recall that the convolution of a box filter with itself (once only) yields a triangle function, and we are only plotting the positive frequencies in the MTF plot, i.e., you can mirror the MTF around the y axis to see the triangle more clearly, like this:
MTF of sinc2(x) PSF, showing negative and positive frequencies

This result extends to the MTF of the diffraction pattern of a square aperture, since x and y can simply be treated independently. As you have probably guessed, the 2D MTF of the square aperture looks like this:

MTF (left) and PSF (right) of square aperture
Note that this MTF shape can also be obtained through the convolution of two 2D box functions.

You may be tempted to treat the circular aperture in the same way, but trying to compute the MTF in one dimension will not give you the correct result. If you can visualise the MTF of the square aperture as the convolution of a box function with itself, then you can visualise the MTF of the circular aperture as the convolution of two "pillbox" functions that look like this:

A pillbox function
The convolution of two of these pillbox functions will give you the MTF on the left:

MTF (left) and PSF (right) of circular aperture
which is called the Chinese hat function, or chat(x). We can take a cross-section through the centre of this 2D function to obtain this plot:
chat(f) = MTF of jinc2(r)
Note that this function is very nearly a triangle function --- except that it curves a bit. This should not be totally surprising, since you can imagine that you could fit a square box function inscribed in the cylinder formed by the pillbox function.

If we focus only on the positive half of the frequency spectrum, the function can be expressed as
chat(s) = 2/π * (acos(s) - s*√(1 - s2))
which is the form that you will usually see in optics literature. Now we know what the MTF curve looks like, so we can obtain the corresponding point spread function by computing the inverse Fourier transform of the MTF curve.

If you perform this inversion in two dimensions, and then take a cross-section through the result, you will obtain the known intensity function, jinc2(r).
Right. Now sample the jinc2(r) function in one dimension, and apply the Fast Fourier Transform (FFT) to obtain the 1D MTF:
Trying to obtain chat(f) by computing FFT of jinc2(r)
Whoops! That does not work as expected. You have to perform the calculations in 2D for this to work. It only took me a (tortuous) weekend to discover this useful fact. If only someone had warned me ...

Other useful facts regarding diffraction

Just as with a Gaussian, the width of the MTF function is inversely proportional to the width of the PSF, i.e., smaller apertures produce larger diffraction patterns.

Diffraction is wavelength-dependent. If jinc2(r) is your PSF, then r should be expressed as
r = π * d / (λ * N)
where d is your radial distance from the optical axis, as measured in the focal plane. If you are modelling the diffraction PSF in terms of pixels, then you can express d in pixels, but you must adjust λ accordingly, such that
rp = π * dp / ( (λ/Δ) * N)
where Δ denotes your pixel pitch, expressed in the same units as λ. For example, the Nikon D800 has a pixel pitch of 4.88 micron, and green light has a wavelength of 550 nm, or 0.55 micron. This means that the diffraction PSF expressed in D800 pixel-units would be
rpπ * dp / ( 0.112705 * N).

If we want to obtain the equivalent function in the frequency domain, i.e., the diffraction MTF, we use the scale
sp = (λ/Δ) * N * fp
where fp  is the frequency scale in cycles per pixel, and sp is normalized frequency we plug into the equation
chat(sp) = 2/π * (acos(sp) - sp*√(1 - sp2)).
For the D800 pixel size, we would use
sp ≈ 0.112705 * N * fp

In a future post, some examples of MTF curves, incorporating diffraction, will be investigated using the D800E as example.

Monday, 28 May 2012

Steve, Otto and the green-screen

This post is not about MTF Mapper or lens sharpness. This story is about Steve.

And about chroma keying, commonly called "green screen" or "blue screen" effects. This technique is mostly needed by Hollywood. You cannot have an actor drive a car while having a spirited argument with another actor, all the time whilst driving in actual traffic. Nor can you quickly pop out to outer space to get a few shots of the starship Enterprise.

The typical solution is to capture the footage of your foreground objects, typically the actors or your model starship, against a uniformly coloured green background. Then you assign a transparency value (called an alpha channel) to each pixel in your image so that where the green background is showing through, you have full transparency. Hopefully the pixels that comprise the actor will have zero transparency, but you have to allow for partial transparency in the actor's hair, for example.

The problem can thus be stated as follows:
The green screen problem: Given a shot of the foreground object (actor or whatnot) against a uniform green (or blue) background, compute the correct transparency value for each pixel in the image.

Of course, you have to ensure that your foreground object does not have any parts that are close to the background colour, or these parts will become transparent.

James Blinn (and co-author A.R. Smith, in a paper titled "Blue screen matting") have shown that the green screen problem is in fact an underdetermined problem, meaning that you have more variables to determine than you measurements. In the simplest case, you know the background colour, in RGBA format, (Rb, Gb, Bb, 1.0). Note that the background is fully opaque, so its alpha is exactly 1.0. The foreground object's colour is (Rf, Gf, Bf, Af), where it is assumed that the RGBA values have already been pre-multiplied by the alpha (opacity) value. This means that the colour of a pixel in our captured image is simply

(Ri, Gi, Bi, 0.0) = (Rf, Gf, Bf, Af) + (1-Af)*(Rb, Gb, Bb, 1.0)

We can break this down to give

Ri = Rf + (1-Af)*Rb

and similarly for the other two colours. This gives us three equations with four unknown values (Rf, Gf, Bf, Af), which cannot be solved uniquely.

So, despite the fact that the maths shows you that green-screen techniques cannot work, Hollywood insists on using it! Of course, if you are willing to add some assumptions (constraints) to the above equations, you can find some unique solutions. One of these constraints is that the foreground colour is distinct from the background colour, e.g., no green actors. Even under such limiting assumptions we often find that the results are not really all that great; just think of the flying-on-the-broomstick scene from Harry Potter and the Order of the Phoenix movie.

Blinn and Smith have shown that unique solutions can be obtained by capturing the foreground object against two different backgrounds. You also have to capture an image of each background without any foreground objects. Although this is somewhat of a bother, it allows you to obtain much better results in practice. Remember that the foreground object opacity, Af, is same in the two images against different backgrounds, but that the apparent foreground object colour (Rf, Gf, Bf) will not necessarily be the same if the object is not fully opaque. Combining the equations from both images thus gives us six equations with six unknowns, which fortunately will have a unique solution, provided that the two backgrounds really are different for each pixel.

The solution suggested by Blinn and Smith goes as follows:
Let D = (Ri,1 - Ri,2, Gi,1 - Gi,2, Bi,1 - Bi,2)
where  Ri,1 denotes the red value of our pixel of interest in image 1, and Ri,2 denotes the red value of the same pixel in image 2. In other words, D is the difference between the two input images containing the foreground objects against the two different backgrounds.
Similarly, let E = (Rb,1 - Rb,2, Gb,1 - Gb,2, Bb,1 - Bb,2)
denote the difference between the two background-only images, without the foreground objects, i.e., the pair of background-only shots you have to capture as part of this method.
The alpha value for a pixel is thus given by
Af = 1 - (D · E) / (E · E)
where  (D · E) denotes the dot product between the two vectors.

And that is all there is to this method; you can reconstruct the foreground object's red component as
Rf = (Ri,1 - (1.0 - Af)*Rb,1)
and similarly for the other two colours.
Now you can recombine the foreground objects with a new background as
Rn = Rf + (1.0 - Af)*Rk
where Rk denotes the red component of the new background image. Just apply the same pattern for blue and green components, and you are done. Remember, Rf is pre-multiplied with Af, which explains why the re-blending with a new background appears in this form.

How well does this work? I called in the help of Steve and Otto to demonstrate. Here is the first shot against background 1 (click for a larger version; applies to all images in this post):
Input image 1, shot against background 1
This image contains several interesting things. Note that there is quite a bit of green in the foreground objects (duck's head, for example). The champagne flute is also a bit tricky, because we can see most of the background right through it. Otto's fur would have presented endless trouble if you tried to remove the background manually in Gimp, for example. Yes, the dog is called Otto (Steve is the lime).

And the second shot:
Input image 2, shot against background 2
Lastly, a shot of each of the backgrounds, after physically removing the foreground objects from the scene.
background 1

background 2

I pushed these four images through a quick-and-dirty implementation of Blinn and Smith's method that I cobbled together in C++ using OpenCV. Here is the resulting alpha mask produced by the program without any manual intervention:
Alpha mask produced using Blinn and Smith's method
Note how solid Steve's alpha mask is --- the fact the he's green, and one of the backgrounds happened to be green, made no difference whatsoever. Otto's interior is also completely opaque, but on the edges we see some fine details like these:

100% crop of the alpha mask near Otto's head

Notice how the fine fibres are partially transparent (top right), and how the interior gaps in the fur is also partially transparent (bottom left-ish).

The champagne flute has also been extracted quite nicely.

Now for the final test: to recombine the foreground objects with a new background. Here is the background image:
PovRay's benchmark scene should do nicely as a new background
And here is the blended result:
Recombining the extracted foreground with a new background
The new composition is essentially flawless. There are no tell-tale fringes or other signs that are commonly seen with green-screen techniques.

Here is a close-up of Otto's head after blending:
100% crop of Otto's head after blending

Of course, if you concentrate a bit, you will see something is amiss. Although the champagne flute has been composited perfectly, it is quite clear that Steve (the lime) is correctly refracted through the right-hand side of the glass, but that the PovRay logo in the background suffered no distortion on the left side of the glass. This is obviously a shortcoming of all such recomposition techniques, so I guess the only solution is avoid strongly-refractive foreground objects.


Obviously it is somewhat inconvenient to have to shoot the subject against two different backgrounds. It will be downright impossible to use this technique with a toddler, for example. An adult might be able to hold still enough, provided you use something like a television to display your background like I did above.

This technique is fine for inanimate objects, though.

Oh yes, of course you have to keep the camera very, very still during the whole capture process. I used a sturdy tripod + mirror lock-up + IR shutter release. In theory, you could use image-to-image registration to correct for camera movement, but I have not tried it yet for this problem.

I recently saw another paper (published in the proceedings of SIGGRAPH'06) by McGuire and Matusik where they extend this technique to live video. The secret is to use a polarised background material, which will obviously present different images through different polarisations of the light. A special camera uses a prism to send the incoming light to two different sensors according to its polarisation. Although this means the technique falls into the category of "requires special tools", it is still pretty cool. And it produces perfect results, just like the method above. So why didn't they use this for the dreaded broomstick scene in Harry Potter and the Order of the Phoenix?


No animals were harmed during the production of this article. Steve, however, did not make it. That was some good lemonade, though!