Sunday, 27 May 2012

Pixels, AA filters, Box filters and MTF

What happens to a sensor's MTF curve when you remove the Anti-Aliasing (AA) filter (also called the Optical Low-Pass Filter, or OLPF)? I have chosen a rather long path to reach the answer, so this article is a bit longer than ideal, but I hope that you will agree that it is worth the effort to slog through it :)

To simplify the discussion, I am going to limit myself to grayscale sensors, e.g., the Leica M-Monochrom, or a lab camera such as a Prosilica. With no Bayer filter, there is little reason to add an AA filter to a sensor ... or is there?

First up we have to consider how such a grayscale sensor works. The sensor consists of a grid of photosites. The distance between the centres of two neigbouring photosites (left-right or top-bottom neighbours) is called the pixel pitch. In the ideal case, the pixel can be thought of as a little square with side lengths equal to the pixel pitch. In the real world, there has to be some space between adjacent pixels, and some space for circuitry to facilitate the read-out of a photosite. The fraction of the pixel that is actually collecting photons is called the fill factor. In the old days, this meant that only about 50% or so of your total sensor area was actually collecting photons, but with advances such as back-side illumination and microlens arrays we are doing much better today at about 90% fill factor.

Real world notwithstanding, it is convenient to think of a photosite as a little square, and a sensor as a grid of tightly packed photosite with practically no gaps between them.

This abstraction yields a very specific point spread function shaped like this:
The intensity of a grayscale sensor pixel is thus the sum of all the light collected over the surface of the photosite.

Enter the box filter

A 1D box filter is a filter that looks like this:

Clearly, the point spread function of an ideal square photosite is just the 2D version of a box filter.

The box filter pops up many times in signal processing contexts. It also crops up implicitly in unexpected places, but more on that later.

At this point I have to assume that you are familiar with convolution (check out the Wikipedia entry for a refresher). The convolution of a box filter f with itself (f * f) yields a triangular function like the black curve in this plot:
The red curve represents repeated convolution of a box filter with itself three times (((f * f) * f) * f), and the green curve 5 times. And yes, you guessed correctly, it does look a little like a Gaussian distribution. This should not surprise you any more.

I shall state without proof that repeated convolution of a box filter of width 1 pixel with itself does indeed converge on a Gaussian function with standard deviation of 1/√2 ≈ 0.7071.

The unexpected upshot (well, I certainly was not expecting it the first time) of this is that you can approximate a Gaussian blur by repeatedly blurring an image with a box filter. How do you blur an image with a box filter?

Well, convolution of an image with a 2D box filter that is 3x3 pixels in size is equivalent to replacing each pixel's value with the unweighted average of the group of 3x3 pixels centred at the pixel you are replacing. In this case, repeated convolution with a 3x3 2D box filter will approximate a Gaussian blur with a standard deviation of 3/√2 ≈ 2.12.

This observation, namely that repeated convolution with a box filter converges on a Gaussian, will be recycled in a later post, so do not page it out of your main memory just yet ...

Area-weighted Anti-Aliasing in computer graphics rendering

Rendering an image of a black square on a white background is relatively straightforward. You scan through each pixel, and test whether the current pixel is inside the square (pixel set to black) or outside (pixel set to white). This produces perfect images as long as the square is aligned with pixel boundaries, which is rather limiting.

If you rotate your black square relative to the pixel grid, a third possibility arises: a pixel might fall partially inside the square, and partially outside. The most intuitive solution to the problem is to simply set the intensity of the pixel proportional to the degree to which it is inside the square. In computer graphics terms, you clip the square to the boundaries of your pixel, and measure the area of the clipped polygon. Here is a picture:

This allows us to draw the black square with much smoother-appearing edges, since we are effectively using our radiometric resolution (different shades of gray) to compensate for our inadequate spatial resolution (large pixels) --- a practice commonly referred to as Anti-Aliasing in computer graphics.

We can also view this from a sampling perspective. First, we generate a number of (x,y) coordinates within each pixel, preferably well-spread throughout the current pixel. By counting how many of these coordinates are inside the square too, we have estimated the fraction of the current pixel that is covered by the square, which, in the limit, is identical to the fragment area that we obtained by clipping the square to the pixel boundaries above.

Lastly, we can consider the case where we sample at coordinates outside of the current pixel too, but that we apply a weight to each sample. If the weight is zero everywhere outside the current pixel, and exactly 1.0 inside the current pixel, then we can say that our weights represent the point spread function of the current pixel, which happens to be a 2D box function by construction.

This shows us that there are three equivalent methods of obtaining the correct intensity of a pixel that is only partially covered by the square:
  1. We can compute the area of intersection between the current pixel's boundaries and the square, or
  2. We can perform unweighted sampling at points spread uniformly throughout the current pixel, or
  3. We can perform weighted sampling at arbitrary points, and weight each point relative to the box filter centred at the current pixel (obviously we will improve our efficiency by only sampling close to the current pixel).
By now you can see the link between Area-weighted anti-aliased rendering and an image sensor with square pixels. If our sensor is observing a knife edge target (or just a black rectangle on a white background), then the intensity of each photosite will be proportional to the area covered by the black square, i.e., the sensor is implicitly applying a box filter while sampling the real-world scene.

MTF of a box filter

The point spread function of a non-Bayer sensor without an AA filter is simply a box filter of width equal to the pixel pitch. The point spread function of a synthetic image generated with an area-weighted sampling algorithm is also a box of width equal to the pixel size, which means we can use the one to study the other. So what does the MTF of this box function look like?

Again, I shall state without proof that the Fourier transform of the box function is sinc(f), or sin(f)/f. Thus, if our point spread function is a box function, then our MTF will simply be sinc(pi*f)/(pi*f), which looks like this:
A couple of things are important about this MTF curve (black curve):

  • Recall that in this case, a frequency of 0.5 cycles per pixel implies that our image contains a pattern of alternating black-and-white stripes that are exactly one pixel wide; together, one black stripe and one adjacent white stripe makes one cycle. This is the highest spatial frequency that can be represented correctly in our image --- if you try to make the stripes less than one pixel wide, then clearly you will not be able to preserve details exactly any more.
  • Note that the contrast drops to zero at twice the Nyquist frequency (1 cycle per pixel). When our stripes are exactly half a pixel wide, then the black and white pattern cancels exactly, leaving the entire image at 50% grey, hence zero contrast.
  • Also note that the box filter MTF curve is nonzero between 0.5 cycles per pixel and 1 cycle per pixel, which is highly undesirable. Aliasing is when a high frequency component masquerades as a lower frequency component (more on this below). As mentioned above, if the width of each stripe in a black-and-white pair falls below 1 pixel, then we can no longer represent this accurately in our image. 
  • At each integer k cycles per pixel, we can fit k pairs of black-and-white stripes inside one pixel, again cancelling each other and producing a 50% grey image. This is exactly where the MTF curve drops to zero each time.
  • The dashed grey curve is the MTF of a Gaussian PSF with a standard deviation 0.568 pixels (MTF50=0.33), included for comparison.
The MTF curve of the box filter describes exactly the MTF of a synthetic image rendered using Area-weighted AA (as described above), and it will be a good model of the MTF of a grayscale image sensor in the absence of the lens MTF (i.e., no diffraction).

Aliasing in practice

I have alluded to the fact that bad things happen when your scene contains detail that occurs at a higher frequency that what the sensor can capture, i.e., when you have details smaller than the pixels of your sensor. The type of aliasing I would like to highlight here is folding, which is when the high frequency information wraps around and re-appears as lower frequency information. To understand what it looks like, consider the following image:

This image does not exhibit any aliasing; it is merely to establish what we are looking for here. Firstly, the left panel is a stack of four sub-images (rows) separated by white horizontal bars. Each sub-image is simply a pattern of black-and-white bars, with both black and white bars being exactly 5 pixels wide. The four stacked sub-images differ only in phase, i.e., in each of the four rows the black-and-white pattern of bars is offset by a horizontal distance between 0 and 1 pixels in length.

The right panel is a 2x magnification of the left panel Note that the third row in the stack is nice and crisp, containing only pure black and pure white. The other rows have some grey values at the transition between the black and white bars, because the image has been rendered with box-filtered anti-aliasing.

Here is the same image again, repeated to simplify comparisons further down:
Box filtered, bars are 5 pixels wide

Note that the edges of the bars are exhibiting the classical box-filtered anti-aliasing patterns; depending on the exact position of the bars relative to the pixels (i.e., the differences between the four rows), we see different shades of grey at the transition, between the four rows. Contrast that with the same pattern, but rendered using a Gaussian filter (the one depicted in the dashed grey curve in the MTF plot above), rather than a box filter, to reduce aliasing:
Gaussian filtered (standard dev. = 0.568 pixels), bars are 5 pixels wide

Here we can see that the edges of the bars are noticeably more blurry, but they do appear much smoother than the version with box filtering above. Also note that the four rows now look much more alike, which is an improvement on the box filtered version.

Now for the promised frequency folding. If we define a cycle as one black bar followed by one white bar, then the 5-pixel wide bars give us a cycle, or period, of 10 pixels. Frequency is simply 1 over period, or 1/10 cycles per pixel. Since the frequency is less than one cycle per pixel, we know that we can accurately represent the bars at this frequency. We know that 0.5 cycle per pixel corresponds to the Nyquist limit, i.e., the highest frequency that can be represented, which corresponds to a pattern of a 1-pixel wide black bar followed by a 1-pixel wide white bar. Frequency folding dictates that a pattern with a frequency of 1 - (1/10) = 0.9 cycles per pixel will be aliased with a frequency of 1/10 cycles per pixel. We know that 0.9 cycles per pixel corresponds to a cycle of 1.1111' pixels in length, which implies that the bars will have a width of 0.5555' pixels, which we also know cannot be represented accurately, since the bars are smaller than the pixels. If we render the same bar pattern with box filtering, but choosing a bar pattern in which the bars are 0.5555' pixels wide, we get this image:

Box filtered, bars are 0.5555' pixels wide

Not what you were expecting? One would have expected that the bars disappear into a uniform grey patch, since we know that 50% of each row should be black, and 50% should be white. We also know the bars are only supposed to be 0.5555' pixels wide, so why are we seeing bars that are 5 pixels wide? Well, this is a textbook case of aliasing --- the frequency of 0.9 cycles per pixel is aliased, or folded, around 0.5, which gives us 0.1 (0.5 - (0.9 - 0.5) = 0.1) . We have seen above that 0.1 cycles per pixel gives us 10 pixels per cycle, or bars that are 5 pixels wide.

That explains the width of the bars. But why are they grey, and not black-and-white? The MTF plot of the box filter above provides the answer: at 0.9 cycles per pixel, contrast has dropped to roughly 10%, which manifests as both white and black moving closer to the intensity midpoint of 50% grey (remember, the image shown here are at a gamma value of 2.2). If we see an image such as the one above (box filtered, bar width=0.5555' pixels), how do we know whether we are looking at a pattern with bar width 5 at 10% contrast, or a pattern with bar width 0.5555' pixels at 100% contrast? There is no way of telling, which is why aliasing is a destructive process. By the time you have captured the image with your sensor (without AA filter), you have already lost the ability to tell these two cases apart.

Before concluding, we have to take a quick look at the same pattern, but filtered with a Gaussian point spread function:

Gaussian filtered (standard dev. = 0.568 pixels), bars are 0.5555' pixels wide
If you look closely, you can see that the grey patches are still composed of grey bands of slightly varying shades, but it is immediately clear that the contrast between the bands has dropped significantly. A Gaussian point spread function with a standard deviation of approximately 0.568 has an MTF curve described by the function exp(-6.365x²), which means that at 0.9 cycles per pixel, contrast has dropped to only 0.577%. Incidentally, this Gaussian point spread function appears to be quite close to the AA filter in a Nikon D40 or a Nikon D7000.

Personally, I prefer the fade-to-grey approach. While the box-filtered image (with bars of width 0.5555' pixels) appears to have detail at 0.1 cycles per pixel (bars of width 5 pixels), this detail is false, and was never present in the scene we captured with our sensor.


So which is better: a sensor with an AA filter, or one without? This depends a great deal on what you plan to do with that sensor. A sensor without an AA filter will be more susceptible to aliasing than one with a filter, but this depends critically on the rest of the parameters of the entire optical system.

For example, if your pixel pitch is rather small, say around 1.5 microns, like the sensors found in camera phones, then diffraction will act as a natural AA filter. For a larger pixel pitch of around 5 microns or larger, diffraction will only start acting as an AA filter at small relative apertures (say, beyond f/8, depending on physical sensor size). For large pixels at large apertures, the difference between an AA filter sensor, and one without, boils down to exactly the differences illustrated above. (Update: new post on taking diffraction into account).

Bayer sensors are another matter entirely, but suffice it to say that you must give up some resolution in order to reconstruct colours accurately with this type of sensor.

For photography, I would much rather have a slightly softer image, which can be sharpened afterwards, than a sharper image filled with artifacts that cannot be removed automatically. Remember, blurring the image after capture will not remove the aliasing; the blur must be applied before the sensor samples the incoming scene.

Lastly, always remember that aliasing is only present if your scene contains frequencies above the Nyquist limit of your sensor. Not all images captured without an AA filter will contain aliasing.


  1. Very insightful and well explained, as usual! I am just wondering about the 'perfect' box filter assumption for the sensor pixels. Are there no electronic (ex. readout, cross-talk) effects that induce a non-boxfilter PSF?

  2. Thanks!
    I would imagine that cross-talk could introduce some low-pass filtering effects. Unfortunately, I do not have such a sensor to test --- all my cameras that can provide raw data have OLPFs.

  3. Time to get a Nikon D800E. Your work requires it ;-)
    Thanks for posting!

    1. Maybe someone from Nikon is reading this. Let us hope they agree :)

  4. Hey there,
    I'm reading your article to understand the MTF of one Pixel and how this affects the overall MTF of an imaging system.
    But for me, it is not clear, how your box shaped filter merges into the sinc-function. How did you scale your x-axis?
    For my understanding, you have to squeeze the pixel width/pixel pitch into the fourier-transform/sinc-function to get the first location of MTF = 0.
    Also, did you plot the one sided power amplitude spectrum (magnitude)?

    I would appreciate if you could answer this for me! :) Either here or via eMail (

    Best Wishes,

  5. Yes, of course you plotted the one sided power spectrum

  6. Hi Thorsten,

    I can clarify the scale of the axes a bit. If we stick to pixel units, then things are relatively simple: our box filter is exactly 1 unit (pixel) wide. We can model this as the unit rectangular function (, i.e., the box starts at -0.5 pixels, and ends at 0.5 pixels.

    This box function is the Point Spread Function (PSF) of our system (which consists of only the sensor). The MTF of our system is the magnitude of the Fourier Transform of the PSF.

    As the Wikipedia page tells us, the Fourier transform of rect(x) is sinc(f), where sinc(f) is defined as sin(pi*f)/(pi*f). Since the sinc(f) function is real (not complex), we can compute the magnitude of the Fourier transform (=MTF) of our PSF simply as the absolute value of the result, i.e., our MTF is |sinc(f)|, or |sin(pi*f)/(pi*f)|.

    What scale do we use for "f"? Well, since our box function PSF was defined in units of pixels, we naturally obtain units of 1/(pixel) for "f", or as I prefer, "cycles per pixel". Strictly speaking I should say "cycles per pixel pitch".

    Anyhow, note the "pi*" factor in the definition of the sinc(f) function! This is what causes the first zero of |sinc(f)| to fall at f=1.0.

    If we want to convert to physical units, we simply have to scale both the x-axis in the PSF domain, and the f-axis in the MTF domain. For example, if our photosite pitch is 5 micron, then our box function will span from -2.5 micron to 5 micron (just multiply x-axis values by 5 micron). In the frequency (MTF) domain, we multiply the f-axis values by 200 to obtain a line-pairs-per-mm frequency scale. The factor 200 is simply 1000/5, i.e, 1/(5 micron) but expressed in mm.

    And finally, yes, the MTF plots are all one-sided power (magnitude) spectrum plots. This appears to be a convention when talking about MTF.


    1. Typo: second last paragraph "function will span from -2.5 micron to 5 micron" should be "unction will span from -2.5 micron to 2.5 micron"

  7. Hi Frans,

    thank you very much - especially for your very fast answer.
    In the meantime, I gave it a thought once again and did find an explanation for my questions. I'm glad, my thought goes with your statement. :)
    But with the pi-factor - I didn't know. I will look especially into this, to scale my axis. (I'm writing on my masters thesis)

    So, thank you once again & have good time,