The Second Fundamental Form
Let \((\tilde{M},\tilde{g})\) be a Riemannian manifold and \(M\subset \tilde{M}\) be a submanifold. By restricting \(\tilde{g}\) on \(M\) the submanifold is assigned with a natural Riemannian metric. At each point \(p\) we have the orthogonal decomposition with respect to \(\tilde{g}_p\), \[ T_p\tilde{M}=T_pM\oplus T_p^\perp M \] where \(T_p^\perp M\), called the normal space at \(p\), denotes the orthogonal complement of \(T_pM\) in \(T_p\tilde{M}\). Let \(T^\perp M=\cup_{p\in M}T_p^\perp M\). It can be verified that \(T^\perp M\) is a vector bundle on \(M\), called the normal bundle. The idea to study Riemannian submanifolds is straightforward: we differentiate the vector fields along \(M\) and look at its tangent components and normal components respectively.
Random walks arise in many fields in mathematics and physics, recently in network analysis, graph learning, clustering etc. It also has a natural relation with spectral graph theory. The survey written by L.Lovasz presents many beautiful results concerning random walks and graph spectra. We aim to write down some computation details missing in the original succinct paper.
Informally, the problem of graph partitioning, or graph clustering, can be stated as follows: we want to partition a graph into several parts such that there are fewer edges between groups but more edges within groups. If there is a measurement on edges, i.e. the edges are weighted, then it can also be stated that we want a partition that nodes within groups are more similar. Graph partitioning can be encountered in many areas such as statistical learning, network analysis, graph learning. Other than data science, it is also related to graph theory and computer vision. One recalls that there is a classical problem called minimum cut in graph theory, which can be solved in polynomial time by the Stoer-Wagner algorithm. Graph partitioning can be regarded as a strengthened min-cut problem, since we not only want to find the minimal cut but also an 'equitable' partition.
Spectral clustering is one of the most popular approaches to solve graph partitioning. It has a long history which can date back to 1970s and it is popularized in data science as a machine learning method. Results obtained by spectral clustering often outperform traditional methods. Moreover, it is very simple to implement and the mathematics is nothing but standard linear algebra. A comprehensive tutorial written by U. von Luxburg provides many different aspects to understand spectral clustering including random walk, perturbation theory and so on. Here we may cover the basics from the easiest way: linear algebra.
Bochner-Weitzenb\(\bf\ddot{o}\)ck Formula on Riemannian Manifolds
Let \((M,g)\) be an \(n\)-dimensional Riemannian manifold, and \(\nabla\) be the Levi-Civita connection. Recall that for a smooth function (or 0-form) \(f\in C^\infty(M)=\mathcal A^0(M)\), the Laplacian of smooth functions is defined as \[ \Delta_0(f)=\text{tr}\nabla^2(f)=\text{div}(\text{grad}(f)) \] The definition \(\Delta_0=\text{tr}\nabla^2\) can be generalized to arbitrary \(p\)-forms and is called the connection Laplacian. Suppose in addition \(M\) is oriented. Let \(d:\mathcal A^p(M)\to\mathcal A^{p+1}(M)\) be the differential operator. We can define the adjoint operator \(d^{\*}\) of \(d\) using the Hodge star operator. i.e. \(d^{\*}:\mathcal A^{p+1}(M)\to\mathcal A^p(M)\) is the operator satisfying \(\langle d\alpha,\beta\rangle=\langle\alpha,d^{\*}\beta\rangle=\int_Md\alpha\wedge {\*}\beta\) for \(\alpha\in\mathcal A^p(M)\) and \(\beta\in\mathcal A^{p+1}\). The Hodge Laplacian is defined by \[ \Delta=dd^*+d^*d \] By direct computation, we can see that \(\Delta=-\Delta_0\) on \(C^\infty(M)=\mathcal A^0(M)\). However, it is much more complicated in higher dimensions.
Convex Functions
Recall that a function \(f:I\to \mathbb R\) is called convex if \[ f(tx+(1-t)y)\le tf(x)+(1-t)f(y) \] for any \(t\in[0,1]\) and \(x,y\in I\) where \(I\) is an interval in \(\mathbb R\). In some context, \(f\) is called \(p\)-convex if \(f^p\) is convex in the meaning of (1). \(f\) is called quasi-convex if \[ f(t)\le\max\{f(a),f(b)\} \] for any \(t\in [a,b]\). Note that if \(f\) is convex then \(f\) is necessarily quasi-convex. Quasi-convex functions are also called peakless in Busemann's articles.
If \(f\) is differentiable, then we have a simple rule to determine its convexity: \(f\) is convex if and only if \(f'\) is non-decreasing. If \(f\) is twice differentiable, then \(f\) is convex if and only if \(f''\) is non-negative. From these rules we can see that convexity is a local property for differentiable functions. i.e. say that \(f\) is locally convex if for any \(x\) there is a neighborhood such that \(f\) is convex on that neighborhood. If \(f\) is differentiable and locally convex, by the above discussions locally convexity implies global convexity.
Let \(\mathbb X\) be a triangulable, compact metric space, \(f,g:\mathbb X\to\mathbb R\) be tame functions (that is, \(f\) and \(g\) have finitely many homological critical values.) The sublevel sets of \(f\) (resp. \(g\)) form a filtration of spaces. Using fields as coefficient groups, It gives a finite sequence of homology groups (vector spaces) in each dimension. A homology class is born if it is not an image from preceding homology groups, and is said to be died if it merges with other classes when entering succeeding homology groups. With the notion of birth and death, in each dimension one can associate \(f\) (resp. \(g\)) a multiset in the two-plane called the persistence diagram, denoted by \(D_k(f)\) (resp. \(D_k(g)\)) for \(k=0,1,\cdots\). The bottleneck distance of \(D_k(f)\) and \(D_k(g)\) is defined as \[ d_B(D_k(f),D_k(g)):=\inf_{\gamma \text{ bijection}}\sup_{x\in D_k(f)}||x-\gamma(x)||_{\infty} \] It was proved that persistence diagrams are robust to small perturbations of functions. More specifically, David Cohen-Steiner, Herbert Edelsbrunner, John Harer ('2007') proved that, for each \(k\) one has \[ d_B(D_k(f),D_k(g))\le||f-g||_\infty \] In analogy to Wasserstein distance, they note that bottleneck distance is a special case of a more general form. Let \(D(f)\) be the union of all \(D_k(f)\). Define the degree-\(p\) Wasserstein distance between \(D(f)\) and \(D(g)\) by \[ W_p(D(f),D(g))=\left[\inf_{\gamma \text{ bijections}}\sum_{x\in D(f)}||x-\gamma(x)||_\infty^p\right]^{\frac{1}{p}} \] To prove a stability result for Wasserstein distance, additional conditions are required for the space \(\mathbb X\) and functions \(f\) and \(g\). We shall explain the details about the work of David Cohen-Steiner, Herbert Edelsbrunner, John Harer, Yuriy Mileyko ('2010').