Method of 3D surface reconstruction (SfM)

Detection of feature points
1. correspondence matching : A set of distinctive feature-points are detected in each image.
2. SIFT (Scale-Invariant Feature Transformation), SURF (Speeded up robust features), ORB (Oriented FAST and Rotated BRIFT) are possible image matching techniques.
3. In paper, Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Images, different kinds of transformations and deformations such as scaling, rotation, noise, fisheye distortion, and shearing are performed. ORB is the fastest algorithm while SIFT performs the best in the most scenarios.
Matches feature points
1. All the feature points are then matched across the images, for the correspondence matching to be complete. : Detecting the feature points in an image reduces the number of points to be matched for correspondence. —> Computationally less expensive
2. Nearest neighbor search is used for finding out thematched points as algorithms. : BruteforceL2, Approximate Nearest Neighbor L2, CascadehasingL2 are often used.
3. Here is the example steps of ANNL2
  1. An image pair, having some overlapping portion of the scene/object between them, is selected from the image sequence.
  2. All the feature points from one image are inserted into the leaves of the k-d tree. The feature points belonging to the other image of the pair are used as queries to the first image. The complexity of building the k-d tree in O (n.log n).
  3. Then, the k-d tree is used for efficient search of K-nearest neighbor of a feature point X. The threshold distance R is selected according to the image resolution. The time complexity of this search is ~O (log n).
  4. This search across the k-d tree leaves result in correspondence matching of the feature points in the image pair.
  5. The matched points are then checked for accuracy using RANSAC based estimation of the fundamental matrix. Those matched points, which are not satisfying the fundamental matrix equation, are rejected as mismatches.
Sparse point cloud building
1. All sets of matched point sets are triangulated to retrieve the 3D location of the points in the object. : Bundle adjustment is a non-linear least-square optimization technique. It is used to add and refine triangulated 3D points in an iterative approach by estimation of camera poses and relative motion between the image frames. All the matched points are iteratively added to the triangulation. The error being propagated by triangulation is minimized as more and more matched points are used to triangulate a 3D point.
2. IncrementalSfM : This algorithm is introduced from the paper "Adaptive Structure from Motion with a contrario model estimation" The incremental pipeline is a growing reconstruction process. It starts from an initial two-view reconstruction (the seed) that is iteratively extended by adding new views and 3D points, using pose estimation and triangulation. Due to the incremental nature of the process, successive steps of non-linear refinement, like Bundle Adjustment (BA) and Levenberg-Marquardt steps, are performed to minimize the accumulated error (drift).
```
compute correspondence tracks t
compute connectivity graph G (1 node per view, 1 edge when enough matches)
pick an edge e in G with sufficient baseline
* robustly estimate essential matrix from images of e
triangulate validated tracks, which provides an initial reconstruction
contract edge e
while G contains an edge do
  pick edge e in G that maximizes union(track(e),3D points)
  * robustly estimate pose (external orientation/resection)
  triangulate new tracks
  contract edge e
  perform bundle adjustment
end while
```
  Point Cloud
3. GlobalSfM : This algorithm is introduced from the paper “Global Fusion of Relative Motions for Robust, Accurate and Scalable Structure from Motion.” Multi-view structure from motion (SfM) estimates the position and orientation of pictures in a common 3D coordinate frame. When views are treated incrementally, this external calibration can be subject to drift, contrary to global methods that distribute residual errors evenly. Here the method propose a new global calibration approach based on the fusion of relative motions between image pairs.
```
compute relative pairwise rotations
detect and remove false relative pairwise rotations
  - using composition error of triplet of relative rotations
compute the global rotation
  - using a dense least square and approximated rotations
compute relative translations
  - using triplet of views for stability and colinear motion support
compute the global translation
  - integration of the relative translation directions using a l-∞ method.
final structure and motion
  - link tracks validated per triplets and compute global structure by triangulation,
  - refine estimated parameter in a 2 step Bundle Adjustment
    - refine structure and translations
    - refine structure and camera parameters (rotations, translations).
```
Dense correspondence matching
1. Sparse point cloud generated using sfm only have 3D coordinates of identified feature points in it. The resulting point cloud is of low density and is inappropriate for reconstruction, as it does not contain finer details of the object being reconstructed. While interpolating the sparse point cloud to build the dense point cloud can be done, several finer details in the object are missed out during the process, and object geometry cannot be retained exactly by interpolation methods.
2. Dense 3D point cloud from a large image sequences, accurately describing the geometry of the scene or object being reconstructed.
3. Exhaustive search for all the pixels in the images are computationally exhaustive and thus infeasible.
4. Here is the example steps of dense stereo matching
  1. A few image frames with minimal camera motion is selected from the image sequence. By doing this, image redundancy is addressed as image frames having large camera motion between them have less amount of overlapping object regions. So, adding two frames with large camera motion does not give many corresponding points.
  2. Dense stereo matching is run in these camera frames to obtain a dense 3D point cloud of the overlapping object region. Real-time plane sweeping algorithm is used in each of the image frames during stereo matching. The basic idea behind this process is epipolar geometry. Two corresponding points in a stereo setup follow an epipolar constraint, because of which, the corresponding point in an image frame will lie along the epipolar line. This constraint reduces the correspondence point search area from 2-D plane across the entire image, to a one dimensional epipolar line existing in the stereo image pair counterpart.
  3. Real-time plane-sweeping method is implemented by sweeping a plane through 3D space, across the image frames following the camera positions. Light rays from all the pixels of the images are projected into the respective imaging planes.
  4. These rays, when back projected towards the object/scene being reconstructed, intersect each other at 3D points. These points come from all the points across the images. So, the intersection of these rays results in a dense collection of points in 3D space, which represent the object/scene being imaged. The obtained point cloud preserves the object geometry and form a dense 3D point cloud.
  Depthmap
Generation of 3D watertight surface
1. After obtaining the 3D dense point cloud, 3D watertight surface needs to be constructed, over the point cloud.
2. For construction of surface over a 3D dense point cloud, the point cloud needs to have 3D coordinates (X, Y, Z) and surface normals (nx, ny, nz)
  
  : The surface normals for each point in a point cloud may be calculated from its (X, Y, Z) coordinates using principal component analysis (PCA).
3. Additionally, the dense point cloud generated from image sequence has color information (RGB) attached to each of the points.
Mesh generation : There are several algorithm for mesh generation. Here three methods will be introduced.
1. Alpha shapes : The alpha shape is a generalization of a convex hull. We will eventually end up with a (not necessarily convex) object bounded by caps, arcs and points.
2. Ball pivoting
  1. Intuitively, think of a 3D ball with a given radius that we drop on the point cloud. If it hits any 3 points (and it does not fall through those 3 points) it creates a triangles. Then, the algorithm starts pivoting from the edges of the existing triangles and every time it hits 3 points where the ball does not fall through we create another triangle.
  2. The radius is obtained empirically based on the size and scale of the input point cloud. Based on my experience, the below case is the good start.
```
distances = pcd.compute_nearest_neighbor_distance()
avg_dist = np.mean(distances)
radius = 3 * avg_dist
o3d.geometry.TriangleMesh.create_from_point_cloud_ball_pivoting(
pcd,
o3d.utility.DoubleVector([radius, radius * 2]))
```
  3. If the points are too far apart at some location, then it may miss the appropriate point on the surface and instead hit another point on the object. In this case, we check that the normal of the new triangle Facet is consistently oriented with the point's Vertex normal. If it is not, then we reject that triangle and create a hole.
3. Poisson surface reconstruction
  1. It solves a regularized optimization problem to obtain a smooth surface.
  2. Poisson surface reconstruction can be preferable to the methods mentioned above, as they produce non-smooth results since the points of the pointcloud are also the vertices of the resulting triangle mesh without any modifications.
  3. A tree depth is main parameter. The higher the more detailed the mesh. With noisy data you keep vertices in the generated mesh that are outliers but the algorithm doesn't detect them as such.
  4. Here is the example steps of Poisson surface reconstruction
    1. A 3D indicator function x is defined in 3D space, so that its value is 1 inside the surface (the surface to be created) and 0 at points outside the surface. This indicator function needs to be approximated for reconstruction of a watertight surface.
    2. Gradient of the indicator function is a vector field, as x is a piecewise continuous function. ∇x is non-zero only at the points near surface as those areas have changes in value of x. (Value of x is 1 inside the surface and 0 outside. So changes in x only occur near the surface).
    3. At the points where ∇x has non-zero value, the value of ∇x is found out to be equal to the surface normal vectors of those points. Thus the oriented normal vectors of the point samples can be taken to be the samples of ∇x (grad of indicator function).
    4. So, the problem is reduced to finding out the indicator function x, whose gradient ∇x is best-fit to the vector field V defined by the input point cloud.
    5. To convert the above equation to a standard Poisson problem, divergence operator is applied to both sides of the equation. As divergence of gradient is Laplacian, the problem is transformed into finding out the function x, whose Laplacian equals the gradient of vector field V· (∇·V) is a function representing the surface normals at each point of the point cloud.
Texturing : wrapping a 2D image around a 3D object and defining how light would affect it.
1. Unwrapping To start the 3D texturing process, you need to unwrap the model first; which basically means unfolding a 3D mesh.
2. Texture painting and shading Correct display of an objects’ overall look and its interaction with light is a key step towards its believability and appeal.
3. Texture mapping Texture mapping is a method for defining high-frequency detail, surface texture, or color information on a computer-generated graphic or 3D model.

Untitled

Reference: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4929111/pdf/40064_2016_Article_2425.pdf

https://towardsdatascience.com/5-step-guide-to-generate-3d-meshes-from-point-clouds-with-python-36bad397d8ba https://arxiv.org/pdf/1710.02726.pdf

https://openmvg.readthedocs.io/en/latest/

https://dreamfarmstudios.com/blog/getting-to-know-3d-texturing-in-animation-production/