Tyler's Blog: 2011

Tuesday, May 31, 2011

Conjugate Gradient Part 1

Since my project is winding down fast, I thought I'd just expand more on the math behind the scenes.

The conjugate gradient method is an iterative algorithm that can solve equations of the form Ax = b, or be stopped half-way through to get a good estimate for x. It only works if A is positive-definite and square. It is only practical if A is sparse so multiplications by A are easy to compute.

Lets start with a quick definition:

Def: u and v are conjugate
w.r.t. A if u^TAv = 0. Basically, u and Av are orthogonal.

The problem is solving for Ax = b. x can be
broken down into a linear combination of basis of Rⁿ called {d_k}
or ‘directions’ that are all conjugate w.r.t to A. Whenever we talk about linear combinations, I like to think of each basis vector as a direction. Bear with me for a second while I glaze over why it is cool if they’re conjugate. Now take the equation:

x = ∑α_id_i (by linear combination def)

b = Ax = ∑α_iAd_i (by problem definition)

This isn’t terribly interesting, it is just rewriting Ax as A times a completely arbitrary decomposition of x into {d_k}. But imagine we know all the directions in {d_k} and we just wanted to solve for the weight α. Normally, this wouldn’t tell us anything: I can break x in R³ into α₁[1; 0; 0], α₂[0; 1; 0], α₃[0; 0; 1], and I still don’t have a good way to find α’s
and x. But if my bases are conjugate, I can use the following equation:

b = Ax = ∑_i α_iAd_i	(see above)
d_kb = ∑_i d_kα_iAd_i	(multiply both sides by d_k. Note that I know {d_i} so only unkowns are α_i’s.
d_kb = ∑_i d_kα_iAd_i = d_kα_kAd_k	(by the fact that d_iAd_k = 0 if i != k, by conjugate bases)
α_k = d_kb/d_kAd_k

So now we have an easy-to-solve equation for all the weights that make Ax = b true. The direct algorithm can be broken into two steps: find conjugate directions w.r.t A, then efficiently
solve for the α’s, and then add them all together to get x. In practice, all these steps can be intermixed in an iterative algorithm that updates x after every iteration.

Conjugate Gradient Algorithm:

for k=1:n

Find direction d_k

Find α_k

x = x + α_k d_k

But how do we find d_k? One can imagine that there are many different ways to compute a set of conjugate directions, but some may combine to form x in fewer iterations than others. This is where different methods appear such as MINRES, GMRES, and the one I’m concerned with: Conjugate Gradient. In fact, the manner of selecting d_k is where the conjugate gradient gets its name! Imagine we have an energy function: E(x) = ½ x^TAx - x^Tb + c, then the gradient (derivative) E’(x) = Ax – b, and we have a minimum for E(x) at 0 = b - Ax or b = Ax (basic calculus). If you were writing an iterative algorithm that finds the minimum of E(x) without solving directly for x, then you might just greedily follow the gradient until you hit the minimum, which is just: b – Ax where x is your current guess for x. Take note that b – Ax is the gradient and the residue in this special case, and the terms get intermixed frequently. If we just followed this greedy algorithm, we’d be doing a steepest descent algorithm. Unfortunately, these bases aren’t orthogonal or conjugate, and this
leads to inefficiency.

We’ll borrow the idea behind steepest descent to pick our first direction, that is, just say x = 0, plug it into b – Ax, and get b as our first derivative. For the rest of the directions, form conjugate bases to the previous bases with the following rules:

d₀ = b

d_k = r_k-1 - ∑_i<k((d_iAr_{k 1})/(d^T_iAd_i)) d_i

Basically, d_k is the previous derivative orthogonalized (or rather, conjugatilized because we multiply by A) with all d_i for i<k. This should feel a lot like Gram-Schmidt orthonormalization. Here’s an updated outline of the algorithm:

Conjugate Gradient Algorithm

for k=1:n

Find direction d_k

for i=1:k

‘conjugatilize’ d_k
with d_i

Find α_k

x = x + α_k d_k

So as a recap, now we have a good heuristic way of picking directions (d_k) such that we can exploit their conjugate property and solve for their weights (α_k).

The algorithm listed above captures all the intuition of conjugate gradient, but doesn’t look anything like what is actually implemented. The reason is that the inner for-loop can be optimized away along with the requirement to store all previous bases. I might expand on
this idea in a later post.

This is the more common form:

Conjugate Gradient Algorithm

for k=1:n

r_k+1 = r_k - α_k Ad_k

β_k = (r_k+1^Tr_k+1)/(r_k^Tr_k)

d_k+1 = r_k+1 + β_kd_k

α_k+1 = (r_k^Tr_k)/(d_k^TAd_k)

x_k+1 = x_k + α_kd_k

Gradient Descent

Given an energy function of the form E(x) = ½x^TAx - x^Tb + c, our task is to minimize E(x) without directly solving for x because it would be too expensive.

Sunday, May 22, 2011

Robust Estimation of Multiple Motions

I made a break in my effort to understand how Ce Liu’s algorithm works before I start rewriting portions of it. The appendix of his thesis lists a paper that I read named “The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields” by Black and Anandan, and now I understand a lot more code. Here’s my summary:

In optical flow, we can use brightness constancy to say the following:

Error(u,v) = ∑
(I(x,y,t) – I(x+udt, y+ vdt, t+dt))²

That is, the error E is a function of the displacement vectors u and v, and we can solve this equation to find them. We don’t square the term because it is important to our model; it is largely for convenience sake. With the square, we can take the derivative of the formula and get a linear equation on the right side and solve for when the left side, Error’(u,v), equals 0. Since the equation is convex (one local minimum), we know we’ve minimized the error. This is all pretty straightforward stuff I’ve worked on before.

However, there’s a catch. This method only works if data fits the model 100%. For instance, if brightness doesn’t hold because of depth discontinuity or multiple motions in the same window, then we’ll have an outlying point whose affect will be squared. The problem is similar to means and medians: the mean is sensitive to being skewed by an extreme outlier while the median is not. It would be nice if we had a median-like function that is as easy to solve for u and v with.

ρ: error normalization
function, Ψ: influence function (proportional to derivative of ρ)

Obviously, care should be taken
with choosing an error normalization function after considering issues of
convexity and rates of convergence.

In Ce Liu’s (the code I’m trying to
understand), he’s using

E(u, v)
=ψ(|I(p+w) − I(p)|²) + αφ(|∇u|² + |∇v|²)dp

ψ(x) = φ(x) =√x² + ε², (ε=0.001)

Interestingly, both of these are very non-convex, but Liu points out in his thesis that a dense Gaussian pyramid can overcome bad local minimum.

These extra functions have the effect of weighing the continuous gradient constraint (|∇u|² + |∇v|² = 0) against the constant brightness constraint (I(p+w) − I(p) = 0). Once all the math is worked out, it looks like each constraint is weighted inversely proportional to the amount of perceived error it has. This makes a lot of sense: suppress one constraint if that part of the model isn’t holding up.

Thursday, May 19, 2011

Gauss Seidel

Gauss-Seidel

Building on my post ‘Iterative
Approaches to Solving Large Systems of Equations’, I’m going to break the matrix A in Ax=b into an upper triangular and lower triangular form. So [2 -1; -1 2] will be broken into D=[2 0; -1 2] and R = [0 -1; 0 0]. This technique is a little sneakier than the Jacobi method, in that you never need to take the inverse of D. In other words, we’re not decomposing the matrix this way because D is easily invertible. Instead, the equation Dx_k+1 + Rx_k = b is easily solved by doing something that looks like a lot like back-substitution. For instance, in the example above, we could write out: [2 0; -1 2]x_k+1 + [0 -1; 0 0]x_k = b or [2 0; -1 2]x_k+1 = [0 1; 0 0]x_k + b. This equation has the nice property that the first entry of x_k+1, lets call it x_k+1(1) to keep our bastardized matlab syntax, can be solve instantly just like in back substitution. 2*x_k+1(1) = x_k(1) + b in our case. Then that result can be used to solve the next equation, and so on and so forth. I’ve written some code to do this:

% Finds a solution for Ax = b.

A = [12 3 -5; 1 5 3; 3 7 13];

b = [1; 28; 76];

syms x1 x2 x3;

initial_x = [1 0 1];

curr_x = initial_x;

next_x = zeros(1,3);

errors = zeros(1,3);

% Iteratively update your estimate of entries of x.

% Note that curr_x and next_x could be the same array, I
separate them

% just so I can display the error.

for k=1:6,

next_x(1) = solve(A(1,:)*[x1; curr_x(2); curr_x(3)] - b(1));

next_x(2) = solve(A(2,:)*[next_x(1); x2; curr_x(3)] - b(2));

next_x(3) = solve(A(3,:)*[next_x(1); next_x(2); x3] - b(3));

errors = abs((next_x - curr_x) ./ next_x)*100;

disp(['Absolute Errors ' num2str(errors)]);

curr_x = next_x;

end

curr_x

The important thing is how I solve the first equation with curr_x, but then I solve the second equation with next_x. I don’t have to wait to solve for all next_x before I can start using
its improved estimates. Although I don’t understand the proof for it, apparently this method converges twice as fast as Jacobi, and by combining curr_x and next_x, it is much more space efficient.

As an aside, I wrote this matlab code with the symbolic toolbox and it’s ‘solve’ function rather than just reorganize the linear equations to solve for the one unknown. This is MUCH slower for what should just be a series of adds and a divide, but it makes the code match the equation. I have a lot trouble understanding code when algebra and optimizations distort the idea to the point where it doesn’t match the paper/book at all. Apparently Mathematica is smarter about this stuff than Matlab is, so I’ll add that to my growing list of reasons to switch.

As another aside, I think this is kind of interesting that you can iteratively solve for the residual by solving b – Ax_k efficiently, and seeing how far you are from the solution. I think it’s kind of mind bending that you can be cranking away at a problem and know the entire time how far away you are from the real answer. I guess the residual isn’t as informative as the error x-x_k, but it’s just a matrix multiply away!

Tuesday, May 17, 2011

Current Direction

Last week I experimented with a half-dozen algorithms for doing dense correspondence matching and finally settled on this:

http://people.csail.mit.edu/celiu/OpticalFlow/

Mr. Liu uses an algorithm that combines minimizing local energy functions from Lucas-Kanade and a global energy function from Horn-Schnuck to get dense flow that is resilient to noise. I tried a similar algorithm earlier on, but I suppose they were just of inferior quality because this algorithm is doing extremely well.

Left Image

Right Image Warped

My one problem moving forward is still speed. It will take a few minutes to fully register a set of images, which is OK, but a little annoying with very large databases. If I can constrain the objective function in Liu's algorithm with calibration data, I can get a major performance boost. Liu is using some pretty heavy mathematics that I'll have to spend some time learning if I'm going to be able to make changes confidently.

Iterative Approaches to Solving Large System of Equations

Here’s the problem: you have a system of linear equations, Ax = b, that is too large to solve with Gaussian elimination or whatever real linear algebra toolboxes use. Instead, it would be nice to have an algorithm that you can run for a certain amount of time, stop, and get a result that is a good approximation. There’s a class of algorithms called “Iterative Methods” which do just that, and I want to explain the idea behind them for my own sake.

The key to iterative methods is splitting A into two parts: D and R (or S and T, or P and P-A, nobody has the same names for these matrices). Now we have the formula: Dx + Rx = b where D + R = A. As a small leap, separate and prefix the x’s in the equation as Dx_a + Rx_b = b. We can start guessing for values of x_b and solve for x_a, and maybe try to tweak x_b until both our x’s are equivalent, but that seems like an ass-backwards way of solving linear equations. But hell, lets give it a shot make our lives easier by solving for x_a:

Dx_a + Rx_b = b

x_a = D^-1 (b – Rx_b)
(by matrix arithmetic)

Now that we have our formula all
spelled out, we notice that there’s a really computationally inefficient inversion, so whatever our separation of D and R is, we want D to be easily invertible (i.e. a diagonal matrix or triangular).

The key to this algorithm being practical is that there are some matrices A, D, and R which have the property that x_a is a better ‘guess’ than x_b, and if you iteratively chain operations together, you can get x_a’s that approach the real x:

for k=1:n, x_k+1 = D^-1(b – Rx_k), end

What we’re seeing is a convergence in our guesses. The property you’re looking for your matrix selections is that the greatest absolute eigenvalue (or spectral radius) of D^-1R is less than 1.

Proof: First I’m going to solve for an error function, e, such that e(k) will tell us how far away x_k (x at iteration k) is from the real x by returning x-x_k. Then I’ll show that the function converges if D^-1R is less than 1.

Original Formula: Dx + Rx = b

Iterative Formula: Dx_k+1 + Rx_k= b

(Original formula) – (Iterative formula) = D(x– x_k+1) + R(x - x_k) = 0

so (x – x_k+1) = D^-1R(x – x_k) or e_k+1 = D^-1Re_k. (by algebra + substitution of terms)

e(k) = e_k = (D^-1R)^ke₀ (by recursively chaining function calls together)

(D^-1R) = QΛQ^-1 where Q is a matrix of eigenvectors and Λ is the diagonal matrix of eigenvalues (by eigendecomposition)

(D^-1R)^k = (QΛQ^-1)^k= QΛ^kQ^-1 (by cancellation of Q^-1Q)

In this form, it is pretty clear that we want the entries of Λ to be less than zero so the matrices entries will go to zero as k increases.

So to recap, if the spectral radius of (D^-1R) is less than 1, we have an efficient way of converging on a correct solution for Ax = b. But how do we select D and R?

Here is where the technique diverges:

Jacobi’s Method: D is the diagonal of A

Gauss-Seidel Method: D is the triangular part of A

Successive Overrelaxation: Combination of Jacobi and Gauss-Seidel.

I’ve included the Matlab code I wrote for Jacobi’s Method:

A = [2 -1; -1 2];

b = [2; 2]; % Solution to this linear equation is [2 2]

S = diag(diag(A)); % Precondition

T = S - A;

S_inv = (1./S); %easy to calculate inverse, that's the point.

S_inv(isinf(S_inv)) = 0;

M = S_inv*T;

curr_x = zeros(2,1); next_x = zeros(2,1);

for k=1:10,

next_x = M*curr_x + M*b;

curr_x = next_x;

end

I was expecting 2,2 and after 10 iterations I got 1.998,1.998. Good stuff! As an aside, you can look at the matrix (D^-1R) and tell from its entries (if they’re less than 1) whether or not the matrix will converge. Here's a graph with the residuals (by looking at the (D^-1R)'s entries and seeing that they're all 1/2, you can predict the 50% decline).

I also wrote some code for Gauss-Seidel, but I think you’ve got the idea.

Tuesday, May 10, 2011

Trying different transforms

So I tried different transformations based on the correspondences I got last week. The first I tried was an elastic transformation method:

Left Image/ Right Image/ Absolute difference

And I followed it up with a thin plate spline method:

Left Image/ Right Image/ Absolute difference

Pretty good, but not good enough for scientific purposes.

Tuesday, May 3, 2011

Correspondence Matching Results

It may be hard to see, but these are the correspondences found by the Okutomi algorithm. They're not 100% accurate, but its pretty close, and it can be improved by expanding the template-matching window size.

Unfortunately, the algorithm is pretty slow. I can find these 100 correspondences in between 5-40 seconds (depending on locality of the points), so finding a dense mapping (~800*800 points) is infeasible. I've tried all optimizations short of compiler options (which never give the light-and-day differences you'd hope for). I'm considering shrinking the image.

The biggest challenges ahead are the transformation and the performance. These problems are pretty tightly related, so I'll start with the transformation. A homography does terribly (obviously...), but I picked up some literature on the subject at the library today: http://www.imgfsr.com/ifsr_book.html, and downloaded some sample code: http://biocomp.cnb.csic.es/~iarganda/bUnwarpJ/ (two camera view and in java). I'm currently wrapping my head around non-rigid transformations so I can weigh my options and decide where to go next.

Sunday, April 24, 2011

Epilines are Co-linear

Just as the dataset up-ended some assumptions I made about automating single-camera calibration and made me have to rewrite some code, the same is true of stereo calibration.

Epipolar Lines and Corresponding Points

664.018066 1742.919556 Slope: 0.793310

524.132874 1583.951416 Slope: 0.779278

408.092316 1452.080811 Slope: 0.767639

290.211090 1318.118286 Slope: 0.755814

1.257200 989.745667 Slope: 0.726830

315.574677 1346.942017 Slope: 0.758358

144.123520 1152.101563 Slope: 0.741160

-164.351868 801.544434 Slope: 0.710218

-646.410828 253.723633 Slope: 0.661864

avg err = 2.38012 (in pixels)

The good news is that the epipolar lines are pretty close to colinear, which will make later calculations much easier.

Saturday, April 23, 2011

Lucas Kanade isn't going so well.

This week I got a huge database of LUMIS images. Unfortunately, they don't work very well with Lucas Kanade. I think this has to do with the brightness consistency constraints (even after histogram matching). I think its related to finding large number of local minimum when the brightness constancy constraint is changed even a bit.

Image Subtraction (ghostly parts and blurring are bad)

*The box is a subset of the image being used for matching.

However, affine transformations do capture the transformation pretty well, even when we aren't taking images of a flat calibration grid:

Image Subtraction after histogram equalization and an affine transformation derived from hand chosen points

The sum of square differences increased 3 fold, but they're still acceptable.

Thursday, April 21, 2011

Histogram matching

Histogram matching seems like a pretty good segmentation technique. The first row below is the left image, the right image, and the differences after an affine transformation. The second row is the same, but with histogram equalization.

The average pixel difference dropped from 38 to 4, and I got it down to < 2 by cropping out the sections of the image that aren't visible in the other view.

It should be noted that there will still be some residual error from the affine transformation, so < 2 is pretty remarkable. I just hope this stands up to fluorescence!

Pure Translation isn't an Accurate Motion Model

I just figured I'd give it a try. This problem would have been very easy if was. In the following images, shadowy white parts are bad

Affine (the best so far) image subtraction

Pure Translation image subtraction

Sum of square differences increased 50%

Wednesday, April 20, 2011

Segmentation Hell: I'm in It

Recap from the weekend: I was reading papers related with Lucas-Kanade and collecting different implementations. I wanted to learn how some more advanced implementations work for my own sake.

Today I got down to comparing different implementations. In the name of K.I.S.S (Keep It Simple, Stupid), I started with OpenCV's pyramid-based Lucas Kanade, but it was totally failing because of my violations to constant brightness. My attempts at segmentation were not very good:

Basic LOG at different thresholds:

Added a Histogram equalization stage.

Friday, April 15, 2011

Affine or Projective?

Today I hand clicked some 'base truth' correspondences so I can use them as a metric for image registration later on and play around with transformations. I wanted to make sure the affine transformation was a good estimate.

Subtracting Upper Left Camera from Upper Right Camera

Affine Transformation + Image Subtraction: SoSD = 658.7664

Projective Transformation + Image Subtraction: SoSD = 911.9480

*SoSD = Sum of Square Differences

These images are a lot better than what I'm expecting, but still, it looks like an affine 'motion model' will do fine.

EDIT: Below are the raw images.

Thursday, April 14, 2011

First LUMIS2 Data: Calibration + Registration

Reflections on First LUMIS2 Image Set

Paul sent me calibration images today (pictured above) that were a little different from what I expected. Above are the images taken from the upper right camera. As a training set, it has some pretty radical changes in position (which is a good thing), but I did not count on some features like, I don't know, the big rock on the calibration pattern. Last week I wrote some OpenCV calibration code that is really difficult to modify to account for the fact that half the pattern is occluded, my world coordinate system's origin can't be in the extreme upper left of the pattern (because its missing in a lot of views), and I have 90 degree changes in angle. Bouguet's toolbox should handle it fine though. It was kind of silly for me to assume a perfect training set would be given to me when I have so much trouble collecting images from a compound camera system ABOVE water.

Recap: Last week I was writing calibration code in OpenCV C++ rather than just using Bouguet Matlab because I thought OpenCV's object-oriented code would be easier to modify to 'augment the distortion model.' It looks like I'm back to Matlab, though. At least for the time being. Coding the OpenCV calibration code was good practice and I might go back to it later.

Bouguet Calibration
Next up, I tested how well distortion models in the Bouguet toolbox accounted for radial distortion.

'Typical' consumer camera radial distortion

LUMIS2's radial distortion components

Yikes! But even with this extreme distortion I got pretty good reprojection error (measured in pixels):

Upper Left Camera
Before recomp corners + 2nd calib: [ 0.65965 0.88664 ]
After recomp corners + 2nd calib: [ 0.61634 0.83943 ]

Upper Right Camera
Before recomp corners + 2nd calib: [ 0.55489 0.85606 ]
After recomp corners + 2nd calib: [ 0.54128 0.70904 ]

Lower Left Camera

Before recomp corners + 2nd calib: [ 0.55489 0.85606 ]

After recomp corners + 2nd calib: [ 0.54128 0.70904 ]

As a frame of reference, I can usually get the FUJI 2-view camera to calibrate to around .2 pixel error, so these are pretty good results. Its small enough that we can work with it, but large enough that there's room for improvement and more investigation.

Undistortion in OpenCV

A practical piece of software for the LUMIS2's GUI will be undistorting the images given these estimated parameters. I spent a good part of today getting Matlab to output OpenCV matrix files, and having OpenCV undistort based on them. (the documentation was awful)

Distorted Image

After Undistortion

Image Registration

There's a large class of image registration algorithms that assume that your two views have a small baseline. In CSE 252a we implemented one to do video stabilization. The gyst is that given two frames, you use Lucas Kanade to get the optical flow, you use the optical flow to define an affine transformation from one pose to the other, and you transform one image onto the other. The advantage of an algorithm like this is that you don't need to know any scene geometry, and its extremely well documented. I thought this would be out of the question given our wide baseline with respect to the image depth, but the disparity didn't seem too extreme.

Left Camera Image Subtracted from Right

I remember from cse252a that a 'jump' between frames like this happen often and can be corrected, especially if you refine your algorithm with course-to-fine techniques.

These algorithms are so common I knew there had to be some good starter code somewhere, and I hit pay-dirt here: http://www.codeproject.com/KB/recipes/ImgAlign.aspx

I fired up the algorithm and here are my results:

Cropped Portion of Right Cameras View

Left Camera View with Above Image Outlined in White

The algorithm was slow as dirt though, so there's definitely plenty of room for refinement. Initializing the affine transformation matrix to something like the extrinsic parameters from calibration should speed things up. Also, Serge turned me onto a paper that uses a similar algorithm that's catered specifically for multi-spectral images. Additionally, this algorithm only works well when the cropped portion of one image is ENTIRELY contained in the other, and that's a weak point of the implementation.

Finally, stitching the images and presenting them is a whole science on its own.