Project 4A

Introduction

By finding homographies between objects in images, we can find a map which allows us to form mosaics, imaging what it would be like to have a greater field of view. Since humans can see wider a lot of cameras, capturing panoramic images through mosaics can visualize the human vision and beyond!

Shoot the Pictures

Shot by me. I went to UC Riverside for a conference and took pictures of the empty study room across the hallway from the main meeting room.

Everything except these house images were taken by me. Found online: (This was taken from this dataset)

Recover the Homographies

By finding the homography between two images, we find matrix H such that Hp = p’,or equivalently:

\begin{bmatrix} a & b & c \\ d & e & f \\ g & h & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} wx' \\ wy' \\ w \end{bmatrix}

We can solve for the values of the matrix by creating a system of equations with 8 degrees of freedom as such:

\begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -xx' & -yx' \\ 0 & 0 & 0 & x & y & 1 & -xy' & -yy' \end{bmatrix} \begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x' \\ y' \end{bmatrix}

Because the solution to the system does not always exist, we attempt to find the best solution through least squares. In addition, the use of least squares enables us to solve an over-determined system, possibly leading to more robust measurements of the homography if done correctly, otherwise this can introduce noise to the linear problem.

Warp the Images

Image Rectification

By finding the homography from a warped shape to it’s direct orthographic view, we can estimate the correspondence points of the correct view. This can lead to a lot of variance because this must be inferred, so with some trial and error, a visually-pleasing result can be made.

With these sets of correspondences we can solve for the homography and apply it to the entire image. (Pictures shot by myself. Travelled to NYC in September, captured a picture of the mural at MoMA in SF, and took a picture of my iPad with it’s numerous stickers.

Blend the Images into a Mosaic

After warping by the homography to the target image, the polygon must be shifted into the positive domain or recentered if it is not in the correct domain through a translational transformation. This can be computed by taking the minimum of each dimension of the corner coordinates, and shifting appropriately. We have two ways of aligning the images after one has been warped: 1. We can use the shift vector that we computed in the warping; However, this can be a bit off if we do not account for image sizing and padding correctly. Instead, I found a more stable approach in which I take the mean of each set of correspondences to create two interpolated points. The distance I shift by is the signed distance between the mean points. (Found through subtraction)

Function WarpImageWithPadding(image, homographyMatrix):
    Create coordinate grid for original image
    Transform image corners using homography matrix
    Calculate required padding to fit warped image
    Create new canvas of appropriate size
    Find polygon of warped image area
    Map source coordinates to destination using inverse homography
    For each color channel:
        Interpolate pixel values from source to destination
    Return warped image, shift vector, and transformed corners

For blending, I opted for a combination of Distance-based and Gaussian-based blending which blends two images by creating smooth transitions at their edges using distance-based weights and Gaussian blurring. The result is an image which fades smoothly.

Function BlendImages(image1, image2, mask1, mask2):
    Convert masks to float type
    Calculate distance from edges for both masks
    Compute normalized weights based on distances 
    Smooth weights using Gaussian blur
    Expand weights to match image channels
    Multiply each image by corresponding weights and add together
    Clip result to valid pixel range
    Return blended image

Mosaic 1

Mosaic 2

Mosaic 3

Mosaic 6

By creating the halves, we could then create new correspondences and combine the 2 mosaics to create a larger mosaic!

Mosaic 5 (Example of Bad Correspondences)

In these images of the house, we see that the position of the camera’s perspective point was changed, so we get artifacts when creating the reconstruction. In the first set of images, I made sure to keep the perspective point constant. As a result, those images were smoother and had lines which lined up correctly. In this example, even though the poster is lined up correctly, the stairs and roof are totally off. This is indicative of a bad transformation from inconsistent imaging parameters. As these images of the house were not captured by me, I will have to work with what I can. To improve the map, I can tradeoff accuracy on the poster for overall image improvement. This can be solved with an overdetermined system and least squares, which I employed in previous parts! In mosaic 6, we see this improvement.

Mosaic 6

By carefully selecting points and adding blending, we can create the a better looking. This involved finding parts of the image which captured the homography more clearly. Parts of the image which could have been affected by wind or objects in the way were not as reliable, so using more points led to a more robust measurement. In a way, this is described by the law of large numers, by adding more points we get closed to the true mean homography!

Project 4B

Steps 1: Harris Corners

The paper

Step 2: ANMS

We can also set a fixed r value and use the adaptive algorithm to generate points of interest which avoid small, tightly packed corners such as those in trees!

With lower radius settings, ANMS yields preserves a lot of the tightly-packed Harris Points. However, increasing the radius allows us to “prune” more of those small and misleading corners which would could mislead the nearest neighbors algorithm in the next steps.

Samples Used at r = 32:

In the two images below. We see that some of the feature points match across the images. In particular, these corresponding points are the corners of windows and the door frame. In addition, some points on the stairs and railing match. However, the most promising points are the door frame and windows, which I used in part 4A when we had to manually assign correspondence points for the homographies.

Step 3: Feature Descriptor Extraction

At each of the new points of interest, we can create a patch which describes the feature in that area of the image. Most of the time these are corners, but sometimes these are intersections. By standardizing the feature matrices to be 8x8 from a 40x40 region, we can use correlation algorithms to match these features across images.

I tried both color and black & white features. I found RGB-colored features to work much better than black and white features because the grayscale image reduces information, making the features more similar to each others— which causes issues when we want to match them based on their nearest neighbors.

Feature Regions (40x40) with Red Square (8x8) Feature Matrix

In these images we can see the regions where the 40x40 squares were sample from. In the set below we can also see the upscaled version of those feature vectors!

Step 4: Matching Features

By using methods based on the 1 and 2-Nearest neighbor algorithms, we can match features across two images.
While some of the points match up correctly, the matchings are not perfect. The paper describes an approach which uses Lowe’s Ratio test to prune a significant amount of the false positive matchings.

ALGORITHM: Feature Matching with Lowe's Ratio Test

MATCHING PROCESS:
1. Build KD-Tree:
   Create KD-Tree from fd_X for efficient nearest neighbor search
   Set leaf_size=5 for performance optimization
   Use Euclidean distance metric (Custom Implementation for faster comparisons!)

2. Find Two Nearest Neighbors:
   For each descriptor in fd_2_X:
       Find 2 nearest neighbors in fd_X
       Store distances in nn2_dists
       Store indices in nn2
       
3. Find Single Nearest Neighbor (for additional statistics):
   For each descriptor in fd_2_X:
       Find 1 nearest neighbor in fd_X
       Store distances in nn1_dists
       Store indices in nn1

4. Calculate Threshold Values:
   thresholds = first_nearest_distance / second_nearest_distance
   thresh = mean(thresholds)
   outlier_dist = mean(second_nearest_distances)

5. Apply Lowe's Ratio Test:
   For each descriptor i in fd_2:
       Get distances d1, d2 to two nearest neighbors
       IF d1 < thresh * d2 AND d1 < outlier_dist:
           Add match: [index_in_fd_2, index_in_fd]
OUTPUT:
- Array of matches, where each individual row contains [index in fd_2, corresponding matching index in fd]

We see that only some of the points match up correctly. However, this will be fixed with RANSAC!

Step 5: Robust Matching: RANSAC

In order to find the optimal homography, we can create all possible samples of homography points. However, this number can grow quite high when dealing with scenes with many edges because we would have to effectiveley use $\binom{n}{4}$ differetn combinations for which we would generate homographies, count outliers within our selected $\varepsilon$ radius of inliers, count them and do this. For 40 points, this is just 91390 samples. which is not that bad with modern CPUs. However, 500 points would lead to 2573031125 samples to assess! That will cost a lot of time and energy. So, we rely on the high likelihood that we have various matching correspondence points and run a set number of random samples, hoping to draw a random sample which will contain valid and correct correspondence points. We will know that we found the optimal Homogrpahy because it will have the highest number of inliers!

RANSAC loop:
For i = 1 to num_samples:
  a. Randomly sample points:
      - Select random_subset of size 4 from correspondences
      - Get matching points from both images (c1_sample, c2_sample)
  
  b. Compute homography:
      H = compute_homography(c1_sample, c2_sample)
  
  c. Test homography on all points:
      # Transform all points using H
      warped_points = H × corr_1_ext
      normalize_homogeneous_coordinates(warped_points)
      
      # Calculate errors
      distances = euclidean_distance(warped_points, corr_2)
      
      # Count inliers (points with error < eps)
      inliers = points where distance < eps
      inlier_count = number of inliers
      inlier_error = sum of inlier distances
  
  d. Update best result:
      If (inlier_count > max_inliers) OR 
         (inlier_count == max_inliers AND inlier_error < max_inlier_error * 0.5):
          max_inliers = inlier_count
          max_inlier_error = inlier_error
          H_optimal = H
          Save corresponding points

My implementation of RANSAC takes into account images where inliers could be misleading and uses an added improvement metric to prevent us from using a suboptimal homography which may be encountered early on.

Optimization Criteria:

Primarily maximizes number of inliers

Secondary criterion: minimizes total inlier error

Uses 0.5 factor for error comparison to ensure significant improvement

Mosaic 1

Manually Stitched with 6 Correspondence Points

Auto-stitched with ANMS + RANSAC

Mosaic 2

Manually Stitched with 6 Correspondence Points

Mosaic 3

Manually Stitched with 6 Correspondence Points

Auto-stitched with ANMS + RANSAC

Auto-stitched with ANMS + RANSAC

Mosaic 4: Total Mosaic for House Data Set

Manually Stitched with 6 Correspondence Points

Auto-stitched with ANMS + RANSAC

Sunset Dataset

These images were taken on the wonderful sunset from the Panoramic Apartments in Berkeley, CA. With the sun setting earlier, these Fall nights offer vibrant and captivating data.

Mosaic 5

Manually Stitched with 8 Correspondence Points

Mosaic 6

Auto-stitched with ANMS + RANSAC

Manually Stitched with 8 Correspondence Points

Auto-stitched with ANMS + RANSAC

Mosaic 7: Total Mosaic of Sunset Images 1, 2, and 3

Manual Alignment with Correspondences

Auto-stitched with ANMS + RANSAC

What Have You Learned?

The coolest thing I learned is how powerful linear algebra can really be. Sometimes we think of resorting to ML/deep learning to solve a lot of issues in computer vision, but sometimes it is best to approach things with a simple approach. I was surprised to see how well nearest neighbors performed at matching, combined wit RANSAC, this gave great results. I also doubted that RANSAC would work as well as it does, but even running batches which are a fraction of the size of the total number of combinations, I got amazing results. I also learned a lot from the research paper! Looking forward to reading it again and understanding the parts which we did not implement for this project.