Friday, 10 February 2017

Automatic chart orientation estimation: validation experiment

In my previous post I mentioned that it is rather important to ensure that your MTF Mapper test chart is parallel to your sensor (or that the chart is perpendicular to the camera's optical axis, which is almost the same thing) to ensure that you do not confuse chart misalignment with a tilted lens element. I have added the functionality to automatically estimate the orientation of the MTF Mapper test chart relative to the camera using circular fiducials embedded in the test chart. Here is an early sample of the output, which nicely demonstrates what I am talking about:
Figure 1: Sample output of chart orientation estimation
Figure 1 shows an example of the MTF Mapper "lensprofile" chart type, with the new embedded circular fiducials (they are a bit like 2D circular bar codes). Notice that the actual photo of the chart is rendered in black-and-white; everything that appears in colour was drawn in by MTF Mapper.
There is an orange plus-shaped coordinate origin marker (in the centre of the chart), as well as a reticle (the red circle with the four triangles) to indicate where the camera is aimed at. Lastly, we have the three orientation indicators in red, green and blue, showing us the three Tait-Bryan angles: Roll, Pitch and Yaw.

But how do I know that the angles reported by MTF Mapper are accurate?

The set-up

I do not have access to any actual optics lab hardware, but I do have some machinist tools. Fortunately, being able to ensure that things are flat, parallel or perpendicular is a fairly important part of machining, so this might just work. First I have to ensure that I have a sturdy device for mounting my camera; in Figure 2 you can see the hefty steel block that serves as the base of my camera mount.
Figure 2: Overview of my set-up
I machined the steel block on a lathe to produce a "true" block, meaning that the two large faces of the large shiny steel block are parallel, and that those two large faces are also perpendicular to the rear face on which the steel block is standing in the photo. The large black block in Figure 2 is a granite surface plate; this one is flat to something ridiculous like 3.5 micron maximum deviation over its entire surface. The instrument with the clock face is a dial test indicator; this one has a resolution of 2 micron per division. It is used to accurately measure small relative displacements through the pivoting action of the lever you can see in contact with the lens mount flange of the camera body. 

Using this dial test indicator, surface plate and surface gauge, I first checked that the two large faces of the steel block were parallel: they were parallel to within about 4 micron. Next, I stood up the block on its rear face (bottom face in Figure 2), and measured the perpendicularity. The description of that method is a bit outside of the the scope of this post, but the answer is what matters: near the top of the steel block the deviation from perpendicularity was also about 4 micron. The result of all this fussing with parallelism and perpendicularity is that I know (because I measured it) that my camera mounting block can be flipped through 90 degrees by either placing it on the large face with the camera pointing horizontally, or stood up with the camera pointing to the ceiling.

That was the easiest part of the job. Now I had to align my camera mount so that the actual mounting flange was parallel to the granite surface plate. 
Figure 3: Still busy tweaking the mounting flange parallel to the surface plate
The idea is that you keep on adjusting the camera (bumping it with the tripod screw partially tightened, or adding shims) until the dial test indicator reads almost zero at four points, as illustrated between Figures 2 and 3. Eventually I got it parallel to the surface plate to within 10 micron, and called it good.

This means that when I flip the steel block into its horizontal position (see Figure 4) the lens mount flange is perpendicular to the surface plate with a reasonably high degree of accuracy. Eventually, I will arrange my test chart in a similar fashion, but bear with me while I go through the process.
Figure 4: Using a precision level to ensure my two reference surfaces are parallel
In Figure 4 you can see more of my set-up. The camera is close to its final position, and you can see a precision level placed on the granite surface plate just in front of the camera itself. That spirit level measures down to a one-division movement of the bubble for each 20 micron height change at a distance of one metre, or 0.0011459 decimal degrees if you prefer. I leveled the granite surface plate in both directions. Next, I placed a rotary table about 1 metre from the camera --- you can see it to the left in Figure 4. The rotary table is fairly heavy (always a good thing), quite flat, and will later be used to rotate the test chart. The rotary table was shimmed until it too was level in both directions.

The logic is as follows: I cannot directly measure if the rotary table's surface is parallel with the granite surface plate, but I can ensure that both of them are level, which is going to ensure that their surfaces are parallel to within the tolerances that I am working to here. This means that I know that my camera lens mount is perpendicular to the rotary table's surface. All I now have to do is place my test chart so that it is perpendicular to the rotary table's surface, and I can be certain that my test chart is parallel to my camera's mounting flange. I aligned and shimmed my test chart until it was perpendicular to the rotary table top, using a precision square, resulting in the set-up shown in Figure 5.
Figure 5: overview of the final set-up. Note the obvious change in colour temperature relative to Figure 4. Yes, it took that long to get two surfaces shimmed level.

One tiny little detail (or make that two)

Astute readers may have picked up on two important details:
  1. I am assuming that my camera's lens mounting flange is parallel to the sensor. In theory, I could stick the dial test indicator into the camera and drag the stylus over the sensor itself to check, but I do actually use my camera to take photographs occasionally, so no sense in ruining it just yet. Not even in the name of science.
  2. The entire process above only ensures that I have two planes (the test chart, and the camera's sensor) standing perpendicularly on a common plane. From the camera's point of view, this means there is no up/down tilt, but there may be any amount of left/right tilt between the sensor and the chart. This is not the end of the world, since my initial test will only involve the measurement of pitch (as illustrated in Figure 1).

The first measurements

Note: Results updated on 13/02/2017 to reflect improvements in MTF Mapper code. New results are a bit more robust, i.e., lower standard deviations.

From the set-up above, I know that my expected pitch angle should be zero. Or at least small. MTF Mapper appears to agree: the first measurement yielded a pitch angle of -0.163148 degrees, which is promising. Of course, if your software gives you the expected answer on the first try, you may not be quite done yet. More testing!

I decided to shim the base of the plywood board that the test chart was mounted on. The board is 20 mm thick, so the 180 micron shim (0.18 mm) that I happened to have handy should give me a tilt of about 0.52 degrees. I also had a 350 micron (0.35 mm) shim nearby, which yields a 1 degree tilt. That gives me three test cases (~zero degrees, ~zero degrees plus 0.52 degree relative tilt, and ~zero degrees plus 1 degree relative tilt). I captured 10 shots at each setting, which produced the following results:
  1. Expected = 0 degrees. Measurements ranged from -0.163 degrees to -0.153 degrees, for a mean measurement of  -0.1597 degrees and a standard deviation of  0.00286 degrees.
  2. Expected = 0.52 degrees. Measurements ranged from 0.377 to  0.394 degrees, for a mean measurement of 0.3910 degrees with a standard deviation of  0.00509 degrees. Given that our zero measurement started at -0.16 degrees, relative angle between the two test cases comes down to  0.5507 degrees (compared to the expected 0.52 degrees).
  3. Expected = 1.00 degrees. Measurements ranged from 0.814 to 0.828, for a mean measurement of 0.8210 degrees with a standard deviation of  0.00423 degrees. The tilt relative to the starting point is 0.9806 degrees (compared to the expected 1.00 degrees).
I am calling that good enough for government work. It seems that there may have been a small residual error in my set-up, leading to the initial "zero" measurement coming in at -0.16 degrees instead, or perhaps there is another source of bias that I have not considered.

Compound angles

Having established that the pitch angle measurement appears to be fairly close to the expected absolute angle, I set out to test the relative accuracy of yaw angle measurements. Since my set-up above does not establish an absolute zero for the yaw angle, I cheated a bit: I used MTF Mapper to bring the yaw angle close to zero by nudging the chart a bit, so I started from an estimated yaw angle of 0.67 degrees. At this setting, I zeroed my rotary table, which as you can see from Figure 5 above, will rotate the test chart approximately around the vertical (y) axis to produce a desired (relative) yaw angle. At this point I got a bit lazy, and only captured 5 shots per setting, but I did rotate the chart to produce the sequence of relative yaw rotations in 0.5 degree increments. The mean values measured over each set of 5 shots were 0.673, 1.189, 1.685, 2.211, 2.717, and 3.157. If we subtract the initial 0.67 degrees (which represents our zero for relative measurements), the we get 0.000, 0.5165, 1.012, 1.538, 2.044,  and 2.484, which seems pretty close to the expected multiples of 0.5.

In the final position, I introduced the 0.18 mm shim to produce a pitch angle of 0.5 degrees. Over 5 shots a mean yaw angle of 3.132 degrees was measured (or 2.459 if we subtract out zero-angle of 0.67). I should have captured a few more shots, since at such small sample sizes it is hard to tell if the added yaw angle has changed the pitch angle, or not. It is entirely possible that I moved the chart while inserting the shim. That is what you get with a shoddy experimental procedure, I guess. Next time I will have to machine a more positive mechanism for adjusting the chart position.


Note that MTF Mapper could only extract the chart orientation correctly if I provided the focal length of the lens explicitly. My previous post demonstrated why it appears to be impossible to estimate the focal length automatically when the test chart is so close to being parallel with the sensor. This is unfortunate, because it means that there is no way that MTF Mapper can estimate the chart orientation completely automatically --- some user-provided input is required.

The good news is that it seems that MTF Mapper can actually estimate the chart orientation with sufficient accuracy to aid the alignment of the test chart. Both repeatability (worst-case spread) and relative error appears to be better than 0.05 degrees, or about three minutes of arc, which compares favourably with the claimed accuracy of Hasselblad's linear mirror unit. Keep in mind that I tested under reasonably good conditions (ISO 100, 1/200 s shutter speed, f/2.8), so my accuracy figures do not represent the worst-case scenario. Lastly, because of the limitations of my set-up, my absolute error was around 0.16 degrees, or 10 minutes of arc; it is possible that actual accuracy was better than this.

How does this angular accuracy relate to the DOF of the set-up? To put some numbers up: I used a 50 mm lens on an APS-C size sensor at a focus distance of about 1 metre. If we take the above results, and simplify it to say that MTF Mapper can probably get us to within 0.1 degrees under these conditions, then we can calculate the depth error at the extreme edges of the test chart. I used an A3 chart, so our chart width is 420 mm. If the chart has a yaw angle of 0.1 degrees (and we are shooting for 0 degrees), then the right edge of our chart will be 0.37 mm further away than expected, or our total depth error from the left edge of the chart to the right edge will be twice that, about 0.73 mm. If I run the numbers through vwdof.exe, the "critical" DOF criterion (CoC of 0.01 mm) yields a DOF of 8.95 mm. So our total depth error will be around 8% of our DOF. Will that be enough to cause us to think our lens is tilted when we look at a full-field MTF map? 

Only one way to find out. More testing!

Wednesday, 8 February 2017

Limitations of using single-shot planar targets to perform automatic camera calibration

When you are trying to measure the performance of your system across the entire field, it is rather important to ensure that your test chart is parallel to your sensor. If you are not careful, then a slight tilt in your test chart could look very much like a tilted lens element if you are looking at the MTF values, i.e., two opposite corners of your MTF image would appear to be soft: is your lens titled along the diagonal, or is the chart tilted along the same diagonal?

My solution to this problem is to directly estimate the camera pose from the MTF test chart. I have embedded fiducial markers in the latest MTF Mapper test charts which will allow me to measure the angle between your sensor and your test chart. This post details a particular difficulty I encountered while implementing the camera pose estimation method as part of MTF Mapper.

The classical approach

Classical planar calibration target methods like Tsai [Tsai1987] or Zhang [Zhang2000] prescribe that you capture several images of your planar calibration target, while ensuring that there is sufficient translation and rotation between the individually captured images. From each of the images you can extract a set of correspondences, e.g., the location of a prominent image feature (corner of a square, for example) and the corresponding real-world coordinates of that feature.

This sounds tricky, until you realize that you are allowed to express the real-world coordinates in a special coordinate system attached to your planar calibration target. This implies that you can put all the reference features at z=0 in your world coordinate system (their other two coordinates are known through measurement with a ruler, for example), meaning that even if you moved the calibration object (rather than the camera) to capture your multiple calibration images, the model assumes that the calibration object was fixed and the camera moved around it.

A set of four such correspondences are sufficient to estimate a 3x3 homography matrix up to a scale factor, since four correspondences yields 8 equations to solve for the 8 free parameters of the matrix. A homography is a linear transformation that can map one plane onto another, such as mapping our planar calibration target onto the image sensor. For each of our captured calibration images we can solve these equations to obtain a different homography matrix. The key insight is that this homography matrix can be decomposed to separate the intrinsic camera parameters from the extrinsic camera parameters. We can use a top-down approach to understand how the homography matrix is composed.

To keep things a bit simpler, we can assume that the principal point of the system is fixed at the centre of the captured image. We can thus normalize our image coordinates so that the principal point maps to (0,0) in normalized image coordinates, and while we are at it we can divide the result by the width of the image so that x coordinates run from -0.5 to 0.5 in normalized image coordinates. This centering and rescaling generaly improves the numerical stability of the camera parameter estimation process. This gives us the intrinsic camera matrix K, such that
where f denotes the focal length of the camera. Note that I am forcing square pixels without skew. This appears to be a reasonable starting point for interchangeable lens cameras. We can combine the intrinsic camera parameters and the extrinsic camera parameters into a single 3x4 matrix P, such that
where the 3x3 matrix R represents a rotation matrix, and the vector t represents a translation vector. The extrinsic camera parameters R and t is often referred to as the camera pose, and represents the transformation required to transform from world coordinates (i.e., our calibration target local coordinates) to homogeneous camera coordinates. If we have multiple calibration images, then we obtain a different R and t for each image, but the intrinsic camera matrix K must be common to all views of the chart.

The process of estimating K and the set of  Ri and ti over all the images i is called bundle adjustment [Triggs1999]. Typically we will use all the available point correspondences (hopefully more than four) from each view to minimized the backprojection error, i.e., we take our known chart-local world coordinates from each correspondence, transform it with the appropriate P matrix, divide by the third (z) coordinate to convert homogeneous coordinates to normalized image coordinates, and calculate the Euclidean distance between this back-projected image point and the measured image coordinates (e.g., output of a corner-finding algorithm) of the corresponding point in the captured image. The usual recommendation is to use a Levenberg-Marquardt algorithm to solve this non-linear optimization problem to minimize the sum of the squared backprojection errors.

Strictly speaking, we usually include a radial distortion coefficient or two in the camera model to arrive at a more realistic camera model than the pinhole model presented here, but I am going to ignore radial distortion here to simplify the discussion.

Single-view calibration using a planar target

From the definition of the camera matrix P above we can see that even if we only have a single view of the planar calibration target, we can still estimate both our intrinsic and extrinsic camera parameters using the usual bundle adjustment algorithms. Zhang observed that when a planar calibration target is employed, we can estimate a 3x3 homography matrix H such that 
where the vectors  r1  and  r2  define the first two basis vectors of the world coordinate frame in camera coordinates, and t is a translation vector. Since we require r1  and  r2  to be orthonormal, the third basis vector of the world coordinate frame is just the cross product of r1  and  r2. This little detail explains how the 8 free parameters of the homograph H are able to represent all the required degrees of freedom we expect in our full camera matrix P.

In the previous section we restricted our intrinsic camera parameters to a single unknown f, since both  Px  and  Py  are already know because we assume the principal point coincides with the image centre. With a little bit of algebraic manipulation we can see that Zhang's orthonormality constraints allows us to estimate the focal length f directly from the homography matrix H (see Appendix A below).

So this leaves me with a burning question: if we can estimate all the required camera parameters using only a single view of a planar calibration target, why do all the classical methods require multiple views (with different camera poses)?

Limitations of single-view calibration using planar targets

To answer that question, we simply have to find an example of where the single-view case would fail to estimate the camera parameters correctly. The simplest case would be to assume that our rotation matrix R is the 3x3 identity matrix (camera axis is perpendicular to planar calibration target), and that our translation vector is of the form [0 0 d] where d represents the distance of the calibration target from the camera's centre of projection. This scenario reduces our camera matrix P to
 A given point [x y 0] in world coordinates is thus transformed to [fx fy d] in homogeneous camera coordinates. We can divide out the homogeneous coordinate to obtain our desired normalized image coordinates as [fx/d fy/d].
And there we see the problem: the normalized image coordinates depend only on the ratio f/d, which implies that we do not have sufficient constraints to estimate both f and d from this single view. The intuitive interpretation is simple to understand: you can always increase d, i.e., move further away from the calibration target while adjusting the focal length f (zooming in) to keep f/d constant without affecting the image captured by the camera.
This happens because there is no variation in the depth of the calibration target correspondence points expressed in camera coordinates, thus the depth-dependent properties of a perspective projection are entirely absent.

We can try to apply the formula in Appendix A to estimate the focal length directly from the homography corresponding to the matrix P above, but we quickly run into a divide-by-zero problem. This should give us a hint. If we choose to ignore the hint, we can apply a bundle adjustment algorithm to estimate both the intrinsic and extrinsic camera parameters from correspondences generated using the matrix P. All that this will achieve is that we will find an arbitrary pair of  f and d values that satisfy the constant ratio f/d imposed by P.

The middle road

What happens if we have a slightly less pathological scenario? Let us assume that there is a small tilt between the calibration target plane and the sensor. For simplicity, we can just choose a rotation around the y axis so that
We know that for a small angle θ, sin(θ) ≈ 0, so our matrix P will be very similar to the sensor-parallel-to-chart case above. The corresponding homography H should be
We can apply the formula in Appendix A to H, which simplifies to f2 = f2, which is a relief. The question is: how accurately can we estimate the homography H using actual correspondences extracted from the captured images?

I know from simulations using MTF Mapper that the position of my circular fiducials can readily be estimated to an accuracy of 0.1 pixels under fairly heavy simulated noise. The objective now is to measure the impact of this uncertainty on the accuracy of the homography estimated using OpenCV's findHomography function. I start out with a camera matrix P like the one above with only a rotation around the y axis. A set of 25 points are generated on my virtual calibration target, serving as the world coordinates (with the same real-world dimensions as the actual A3 chart used by MTF Mapper). These are transformed using P to obtain the `perfect' simulated corresponding image coordinates representing the position of the fiducials. I perturb these perfect coordinates by adding Gaussian noise with a standard deviation of about 0.000020210 units, which corresponds to an error of 0.1 pixels, but expressed in normalized image coordinates (divided by 4948, the width of a D7000 raw image). Now I can systematically measure the uncertainty in the focal length estimated with the formula of Appendix A as a function of the angle between the chart and the sensor, θ. I ran 100000 iterations at a selection of angles, and calculated the difference between the 75th and 50th percentile of the estimated focal length as a measure of spread.
Figure 1
In Figure 1 we see that the spread of the focal length estimates increases dramatically once the angle θ drops below about 2 degrees. For the purpose of using the estimated camera pose to measure if you have aligned your chart parallel to your camera sensor, this is really terrible news: essentially, we cannot estimate the focal length of the camera reliably if the chart is close to being correctly aligned.
Figure 2
Figure 2 shows that the focal length estimate is relatively unbiased for angles above about 1 degree, but once the angle becomes small enough, we overestimate the focal length dramatically.

This experiment demonstrated that small errors in the estimated position of features (e.g., corners or centre of circular targets) leads to dramatic errors in focal length estimation. Intuitively, this makes sense, since the relative magnitude of perspective effects decreases the closer we approach a parallel alignment between the sensor and the calibration target. Since perspective effects depend on the distance from the chart, and the estimated distance from the chart is effectively controlled by the estimated focal length (assume the same framing), this seems reasonable.

I have tried using bundle adjustment, rather than homography estimation as an intermediate step, but clearly the problem lies with the unfavourable viewing geometry and the resulting subtlety of the perspective effects, not with the algorithm used to estimate the focal length. At least, as far as I can tell.

Hobson's choice

If we take the focal length of the camera as a given parameter, then the ambiguity is resolved, and we can obtain a valid, unique estimate of the calibration target distance d. This is not entirely surprising, since our assumed constrained intrinsic camera parameters depend only of the focal length f, i.e., K is known, thus the pose of the camera can be estimated for any given view, even the degenerate case where the calibration target is parallel to the sensor.

In other words, I see no way other than requiring the user to specify the focal length as an input to MTF Mapper. I will try to extract this information from the EXIF data when the MTF Mapper GUI is used, but it seems that not all cameras report this information. Fortunately, it seems that a user-provided focal length need not be 100% accurate in order to obtain a reasonable estimate of the chart orientation relative to the camera. 


  • [Zhang2000], Z. Zhang, A flexible new technique for camera calibration, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11), pp. 1330-1334, 2000.
  • [Tsai1987], R. Tsai, A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf TV cameras and lenses, IEEE Journal on Robotics and Automation, 3(4), pp. 323-344, 1987.
  • [Triggs1999], B. Triggs, P. McLauchlan, R. Hartley, A. Fitzgibbon, Bundle Adjustment — A Modern Synthesis, ICCV '99: Proceedings of the International Workshop on Vision Algorithms, Springer-Verlag, pp. 298-372, 1999.

Appendix A

If we have a homography H between our normalized image coordinate plane and our planar calibration target, such that
where h33 is an arbitrary scale factor, then the focal length of the camera can be estimated assuming square pixels, zero skew and a principal point of (0,0) in normalized image coordinates, using the formula
Note that this is only one possibility, derived from the constraint that r1 is a unit vector.

Friday, 25 November 2016

MTF Mapper finally gets a logo!

It is a sad day for command line enthusiasts, but MTF Mapper has finally conformed by adopting a logo for its GUI version.

I guess in the world of graphical user interfaces, a logo is to an application what a flag is to a nation (cue the Eddie Izzard reference).

There is of course a new version of MTF Mapper (0.5.11 or later) available over on SourceForge. Lots of fixes and cleanup to the GUI; please let me know what you think of the new(ish) interface.

Monday, 13 June 2016

Running MTF Mapper under Wine

MTF Mapper 0.5.2 was compiled using MSVC Express 2013, which Microsoft calls "vc12". The Windows binaries have been linked statically against the runtime, but this does not appear to be sufficient to run MTF Mapper under wine without further tweaks.

For me, running "winetricks vcrun2013" in the console seemed to do the trick. I would say that this is a necessary step to get MTF Mapper to work under wine.

In case you are wondering, without the winetricks step I get the following error:
wine: Call from 0x7b83c506 to unimplemented function msvcr120.dll.?_Trace_ppl_function@Concurrency@@YAXABU_GUID@@EW4ConcRT_EventType@1@@Z, aborting

Let me know if there are any other issues related to wine, and I'll see what I can do.

Wednesday, 13 April 2016

MTF Mapper vs Imatest vs Quick MTF

I recently noticed that Quick MTF now has an automated region-of-interest (ROI) detection function. This allows me (in theory) to perform the same type of automated testing that I applied to MTF Mapper and Imatest. Now would be a good time to read the Imatest comparison post to familiarise yourself with my testing procedure.

Anyhow, the automatic ROI functionality in Quick MTF is almost able to work with the simulated Imatest charts I produced with mtf_generate_rectangle. I had to manually adjust about half of the ROIs to ensure that Quick MTF was using as much of each edge as possible, i.e., similar ROIs to what Imatest and MTF Mapper used. Since the edge locations remain the same across all the test images, I used the "open with the same ROI" option to keep the experiment as fair as possible.

I also discovered that QuickMTF's "trial" limit of 40 tests can be bypassed with relatively little fuss (Oleg, if you are reading this, I promise not to share the secret).

Lastly, note that I performed these tests using the "ISO 12233" mode of Quick MTF. The default settings produces much smoother plots, but these are severely biased, i.e., they report MTF50 values that are much too low. To illustrate: the default settings produce a 95th percentile relative error of 13% when measured using images with an expected MTF50 of 0.25 c/p; switching to ISO 12233 mode reduces the error to only 5%. As expected, the standard deviation of MTF50 error is lower in the default mode, but I maintain that bias and variance should both be managed well.

The results

Figure 1: Quick MTF MTF50 relative error boxplot
Figure 1 illustrates the relative MTF50 error boxplot, calculated as 100*(measured_mtf50 - expected_mtf50)/expected_mtf50. Firstly, Quick MTF should be commended for its unbiased performance between expected MTF50 values of 0.1 and 0.4 cycles/pixel; the median error is exactly zero. Unfortunately, a strong bias appears after 0.4 c/p, which is consistent with some (light) smoothing of the ESF. The boxes, and especially the whiskers, are a bit wide, which is more readily seen in Figure 2.

Figure 2: Standard deviation of relative MTF50 error
Things go a bit pear shaped when we look at the standard deviation of the relative MTF50 error. If we consider the "usable" range of 0.08 to 0.5 c/p, then Quick MTF contains the standard deviation below 3.5%, which is not bad, but Imatest and MTF Mapper perform a bit better here. A more useful (and my preferred) measure is the 95th percentile of relative MTF50 error magnitude, as illustrated in Figure 3.
Figure 3: 95th percentile of relative MTF50 error magnitude

The values in Figure 3 have a natural interpretation: the magnitude of the error will remain below the indicated value in about 95% of the edges measured with each tool. This measure combines the effects of bias (Figure 1) and variance (Figure 2) in one convenient value. Consider again the "usable" range of 0.08 to 0.5 c/p: Quick MTF only manages to keep the error below about 9% across the range. It does quite a bit better in the centre of the range, almost matching Imatest at 0.2 c/p.


The Imatest results were not based on the latest version; I do not have an Imatest license, and my trial has expired, so it will take a fair bit of effort to refresh the Imatest results. The Quick MTF 2.09 results are current, though.
Based on these versions, it would appear that MTF Mapper still produces competitive results. And you cannot beat MTF Mapper's price.

Tuesday, 3 November 2015


There is no doubt that FFTW is one of the fastest FFT implementations available. It can be a pain to include in a Microsoft Visual Studio project, though. Maybe I am "using it wrong"...

One solution to this problem is to include my own FFT implementation in MTF Mapper, thereby avoiding the FFTW dependency entirely. Although it is generally frowned upon to use a homebrew FFT implementation in lieu of an existing, proven library, I decided it was time to ditch FFTW.

One of the main advantages of using a homebrew FFT implementation is that it avoids the GPL license of FFTW. Not that I have any fundamental objection to the GPL, but the main sources of MTF Mapper are available under a BSD license, which is a less strict license than the GPL. In particular, the BSD license makes allowance for commercial use of the code. Before anyone asks, no, MTF Mapper is not going closed source or anything like that. All things being equal, the BSD license is just less restrictive, and avoiding FFTW brings MTF Mapper closer to being a pure BSD (or compatible) license project.

FFT Implementation

After playing around with a few alternative options, including considering the my first c++ FFT implementation way back from first year at university, I settled on Sorenson's radix-2 real-valued FFT (Sorenson, H.B, et al, Real-Valued Fast Fourier Transform Algorithms, IEEE Transactions on Accoustics, Speech, and Signal Processing, 35(6), 1987). This algorithm appears to be a decent balance between complexity and theoretical efficiency, but I had to work fairly hard at the code to produce a reasonably efficient implementation.

I tried to implement it in fairly straightforward c++, but taking care to use pointer walks in stead of array indexing, and using look up tables for both the bit-reversal process and the sine/cosine functions. These changes produced an algorithm that was at least as fast as my similarly optimized complex FFT implementation augmented with a two-for-the-price-of-one step for real-valued inputs.

One thing I did notice is that the FFT in its "natural" form does not lend itself to an efficient streaming implementation. For example, the first pass of the radix-2 algorithm looks like this:
for (; xp <= xp_sentinel; xp += 2) {  
    double xt = *xp;
    *(xp)   = xt + *(xp+1);
    *(xp+1) = xt - *(xp+1);
Note that the value of x[i] (here *xp) is overwritten in the 3rd line of the code, while the original value of x[i] (copied into xt) is still required in the 4th line of the code. This write-after-read dependency causes problems for out-of-order execution. Maybe the compiler is smart enough to unroll the loop and intersperse the reads and writes to achieve maximal utilization of all the processing units on the CPU, but the stride of the loop and the packing of the values is not ideal for SSE2/AVX instructions either. I suppose that this can be addressed with better code, but before I spend time on that I first have to determine how significant raw performance of the FFT is in the context of MTF Mapper.

Real world performance in MTF Mapper

So how much time does MTF Mapper spend calculating FFTs? Well, one FFT for every edge. A high-density grid-style test chart has roughly 1452 edges. According to a "callgrind" trace produced using valgrind, MTF Mapper v0.4.21 spends 0.09% of its instruction count inside FFTW's real-valued FFT algorithm.

Using the homebrew FFT of MTF Mapper 0.4.23 the total number of instruction fetches increase by about 1.34%, but this does not imply a 1.34% increase in runtime. The callgrind trace indicates that 0.31% of v0.4.23's instructions are spent in the new FFT routine.

In relative terms, this implies that the new routine is roughly 3.5 times slower, but this does not account for the additional overheads incurred by FFTW's memory allocation routines (the FFTW routine is not in-place, hence requires a new buffer to be allocated before every FFT to keep the process thread-safe).

Measuring the actual wall-clock time gives us a result of 22.27 ± 0.14 seconds for 20 runs of MTF Mapper v0.4.21 on my test image, versus 21.631 ± 0.16 seconds for 20 runs of v0.4.23 (each experiment repeated 4 times for computing standard deviations). These timings were obtained on a Sandy-bridge laptop with 8/4 threads. The somewhat surprising reversal of the standings (the homebrew FFT now outperforms the FFTW implementation) just goes to show that the interaction between hyperthreading, caching, and SSE/AVX unit contention can produce some surprising results.

Bottom line: the homebrew FFT is fast enough (at least on the two hardware/compiler combinations I tested).

Are we done yet?

Well, surely you want to know how fast the homebrew FFT is in relation to FFTW in a fair fight, right?

I set up a simple test using FFTW version 3.3.4 built on gentoo using gcc-4.9.3, running on a Sandy-bridge laptop cpu (i7-2720QM) running at a base clock of 2.2 GHz. This was a single-threaded test, so we should see a maximum clock speed of 3.3GHz, if we are lucky.

For a 1024-sample real-valued FFT, 2 million iterations took 14.683 seconds using the homebrew code, and only 5.798 seconds using FFTW. That is a ratio of ~2.53.

For a 512-sample (same as what MTF Mapper uses) real-valued FFT, 2 million iterations took 6.635 seconds using the homebrew code, and only 2.743 seconds using FFTW. That is a ratio of ~2.42.

According to general impressions gathered from the Internet, you are doing a good-enough job if you are less than 4x slower than FFTW. I ran metaFFT's benchmarks, which gave a ratio of 2.4x and 2.1x relative to FFTW for size 1024 and 512, respectively (these were probably complex transforms, so not a straight comparison).

The MTF Mapper homebrew FFT at least appears to be in the right ballpark, at least fast enough not to cause embarrassment....

Sunday, 5 July 2015

A critical look

Most of the posts on this blog are tutorial / educational in style. I have come across a paper published by an Imatest employee that requires some commentary of a more critical nature. With some experience in the academic peer review process, I hope I can maintain the appropriate degree of objectivity in my commentary.

At any rate, if you have no interest in this kind of commentary / post, please feel free to skip it.

The paper

The paper in question is : Jackson K. M. Roland, " A study of slanted-edge MTF stability and repeatability ", Proc. SPIE 9396, Image Quality and System Performance XII, 93960L (January 8, 2015);  doi:10.1117/12.2077755;

A copy can be obtained directly from Imatest here.

Interesting point of view

One of the contributions of the paper is a discussion of the impact of edge orientation on MTF measurements. The paper appears to approach the problem from a direction that is more closely aligned with the ISO12233:2000 standard, rather than Kohm's method ("Modulation transfer function measurement method and results for the Orbview-3 high resolution imaging satellite", Proceedings of ISPRS, 2004).

By that I mean that Kohm's approach (and MTF Mapper's approach) is to compute an estimate of the edge normal, followed by projection of the pixel centre coordinates (paired with their intensity values) onto this normal. This produces a dense set of samples across the edge in a very intuitive way; the main drawback of this approach being the potential increase in the processing cost because it lends itself better to a floating point implementation.

The ISO12233:2000 approach rather attempts to project the edge "down" (assuming a vertical edge) onto the bottom-most row of pixels in the region of interest (ROI). Using the slope of the edge (estimated earlier), each pixel's intensity (sample) can be shifted left or right by the appropriate phase offset before being projected onto the bottom row. If the bottom row is modelled as bins with 0.25-pixel spacing, this process allows us to construct our 4x-oversampled, binned ESF estimate with the minimum amount of computational effort (although that might depend on whether a particular platform has strong floating-point capabilities).

The method proposed in the Imatest paper is definitely of the ISO12233:2000 variety. How can we tell? Well, the Imatest paper proposes that the ESF must be corrected by appropriate scaling of the x values using a scaling factor of cos(theta), where theta is the edge orientation angle. What this accomplishes is to "squash" the range of x values (i.e. pixel column) to be spaced at an interval that is consistent with the pixel's distance as measured along the normal to the edge. For a 5 degree angle, this correction factor is only 0.9962, meaning that distances will be squashed by a very small amount indeed. So little, in fact, that the ISO12233:2000 standard ignores this correction factor, because a pixel at a horizontal distance of 16 pixels will be mapped to a normal distance of 15.94. Keeping in mind that the ESF bins are 0.25 pixels wide, this error must have seemed small.

I recognize that the Imatest paper proposes a valid solution to this "stretching" of the ESF that would occur in its absence, and that this stretching would become quite large at larger angles (about a 1.5 pixel shift at 25 degrees for our pixel at a horizontal distance of 16 pixels).

My critique of this approach is that it would typically involve the use of floating point calculations, the potential avoidance of which appears to have been one of the main advantages of the ISO12233:2000 method. If you are going to use floating point values, then Kohm's method is more intuitive.

Major technical issues

  1. The Point Spread Functions (PSFs) used to perform the "real world" and simulated experiments were rather different, particularly in one very important aspect. The Canon 6D camera has a PSF that is anisotropic, which follows directly from its square (or even L-shaped) photosites. The composite PSF for the 6D would be an Airy pattern (diffraction) convolved with a square photosite aperture (physical sensor) convolved with a 4-dot beam splitter (the OLPF). Of course I do not have inside information on the exact photosite aperture (maybe chipworks has an image) nor the OLPF (although a 4-dot Lithium Niobate splitter seems reasonable). The point remains that this type of PSF will yield noticeably higher MTF50 values when the slanted edge approaches 45 degrees. Between the 5 and 15 degree orientations employed in the Imatest paper, we would expect a difference of about 1%. This is below the error margin of Imatest, but with a large enough set of observations this systematic effect should be visible.

    In contrast, the Gaussian PSF employed to produce the simulated images is  (or at least is supposed to be) isotropic, and should show no edge-orientation dependent bias. Bottom line: the "real world" images had an  anisotropic PSF, and the simulated images had an isotropic PSF. This means that the one cannot be used in the place of the other to evaluate the effects of edge orientation on measured MTF. Well, at least not without separating the PSF anisotropy from the residual orientation-depended artifacts of the slanted edge method.
  2.  On page 7 the Imatest paper states that "The sampling of the small Gaussian is such that the normally rotationally-invariant Gaussian function has directional factors as you approach 45 degree increments." This is further "illustrated" in Figure 13.

    At this point I take issue with the reviewers who allowed the Imatest paper to be published in this state. If you suddenly find that your Gaussian PSF becomes anisotropic, you have to take a hard look at your implementation. The only reason that the Gaussian (with a small standard deviation) is starting to develop "directional factors" is because you are undersampling the Gaussian beyond repair.

    The usual solution to this problem is to increase the resolution of your synthetic image. By generating your synthetic image at, say, 10x the scale, all your Gaussian PSFs will be reasonably wide in terms of samples in the oversampled image. For MTF measurement using the slanted edge method, you do not even have to downsize your oversampled image before applying the slanted edge method. All you have to do is to change the scale of your resolution axis in your MTF plot. That way you do not even have to worry about the MTF of the downsampling kernel.

    There are several methods that produce even higher quality simulated images. At this point I will plug my own work: see this post or this paper. These approaches rely on importance sampling (for diffraction PSFs) or direct numerical integration of the Gaussian in two dimensions; both these approaches avoid any issues with downsampling and do not sample on a regular grid. These methods are implemented in mtf_generate_rectangle.exe, which is part of the MTF Mapper package.

Minor technical issues

  1. On page 1 the Imatest paper states that the ISO 12233:2014 standard lowered the edge contrast "because with high contrast the measurement becomes unstable". This statement is quite vague, and appears to contradict the results presented in Figure 8, which shows no degradation of performance at high contrast, even in the presence of noise.

    I would offer some alternative explanations: the ISO12233 standard is often applied to images compressed with DCT-based quantization methods, such as JPEG. A high-contrast edge typically shows up with a large-magnitude DCT coefficient at higher frequencies; exactly the frequencies that are more strongly quantized, hence the well-kown appearance of "mosquito noise" in JPEG images. A lower contrast edge will reduce the relative energy at higher frequencies, thus the stronger quantization of high frequencies will have a proportionately smaller effect. I am quite temtpted to go and test this theory right away.

    Another explanation, one that is covered in some depth on Imatest's own website, is of course the potential intensity clipping that may result from incorrect exposure. Keeping the edge contrast in a more manageable range reduces the chance of clipping. Another more subtle reason is that a lower contrast chart allows more headroom for sharpening without clipping. By this I mean that sharpening (of the unsharp masking type) usually results in some "ringing" which manifests as overshoot (on the bright side of the edge) and undershoot (on the dark side of the edge). If chart contrast was so high that the overshoot of overzealous sharpening would be clipped, then it would be harder to measure (and observe) the extent of oversharpening.
  2. The noise model is employed a little basic. Strictly speaking the standard deviation of the additive Gaussian white noise should be signal dependent; this is a more accurate model of photon shot noise, and is trivial to implement. I have not done a systematic study of the effects of noise simulation models on the slanted edge method, but in 2015 one really should simulate photon shot noise as the dominant component of additive noise.
  3. Page 6 of the Imatest paper states that "There is a problem with this 5 degree angle that has not yet been addressed in any standard or paper." All I can say to this is that Kohm's paper has presented an alternative solution to this problem that really should be recognized in the Imatest paper.


Other than the unforgivable error in the generation of the simulated images, a fair effort, but more time spent on the literature, especially papers like Kohm's, would have changed the tone of the paper considerably, which in turn would have made it more credible.