Everything is Connected

Everything is Connected

Yueqi Cao's Blog

我是如何发现怪球的
翻译自John Milnor的短文Classification of \((n-1)\)-connected \(2n\)-dimensional manifolds and the discovery of exotic spheres。 (上个世纪)50年代在普林斯顿的时候,我对理解高维流形拓扑的基本问题非常感兴趣。我尤其关注\(n-1\)连通的\(2n\)维流形,因为那看起来像是一个人最有希望取得进展的最为简单的例子。(当然,和球面有相同伦型的流形更简单。然而为了理解这类流形所提出的广义庞加莱问题似乎也太难了,我都不知从何下手。)对一个闭的\(2n\)维流形\(M^{2n}\...
The Space of Persistence Diagrams

Notation. In the following \(\mathbb R^2\) is equipped with \(\infty\)-norm. Let \(\Delta=\{(x,x)|x\in\mathbb R\}\) denote the diagonal, \(\Delta_+=\{(x,y)|y>x\}\) denote the upper off-diagonal points, and \(\overline{\Delta}_+=\Delta\cup\Delta_+\) is the closure of \(\Delta_+\). Set \(\overline{\mathbb N}=\{1,2,\cdots\}\cup\{\infty\}\).


Definition. A persistence diagram \(\mathcal T\) is a pair \((S_{\mathcal T},i_{\mathcal T})\) where \(S_{\mathcal T}\) is a subset of \(\mathbb R^2\) and \(i_{\mathcal T}:S_{\mathcal T}\to \overline{\mathbb N}\) is a function such that

  1. \(\Delta\subset S_{\mathcal T}\subset \overline{\Delta}_+\);
  2. \(i_{\mathcal T}(\Delta)=\{\infty\}\).

\(S_{\mathcal T}\) is called the support of \(\mathcal T\) while \(i_{\mathcal T}\) is called the multiplicity function of \(\mathcal T\).

Two persistence diagrams are said to be equal if they have the same support and multiplicity function. The graph of \(i_{\mathcal T}\), defined by \(G(i_{\mathcal T})=\{^k{\bf x}:=({\bf x},k)\in S_{\mathcal T}\times \overline{\mathbb N}|k\le i_{\mathcal T}({\bf x})\}\), is called the bundle space over \(\mathcal T\). The projection \(\pi_{\mathcal T}:G(i_{\mathcal T})\to S_{\mathcal T}\) to the first factor defines a map from the bundle space to the support with fiber \(\pi_{\mathcal T}^{-1}({\bf x})=\{1,2,\cdots,i_{\mathcal T}({\bf x})\}\) for each point \({\bf x}\in S_{\mathcal T}\). It is an easy consequence that two persistence diagrams are equal if and only if they have the same bundle spaces.

Some Applications of Wasserstein Distances

OMT I

The original setting for optimal mass transportation (OMT) consists of three objects.

  1. Two probability spaces \((X,\mu)\) and \((Y,\nu)\). i.e. \(\mu(X)=\nu(Y)=1\);
  2. A measurable map \(T:X\to Y\) such that \(T_*\mu=\nu\). i.e. for any Borel set \(B\subset Y\) it holds \(\nu(B)=\mu(T^{-1}(B))\).
  3. A cost function \(c:X\times Y\to \mathbb R_+\cup\{\infty\}\) such that for any \(T\) in 2. the map \(c_T(x)=c(x,T(x))\) is measurable. The integration \(\int_X c(x,T(x))d\mu\) is called the total cost.
Stability Theorems in Topological Data Analysis (3)

Perhaps the most important case in TDA is point cloud data, i.e. finite metric spaces. The collection of all compact metric spaces makes up a metric space under Gromov-Hausdorff metric. If we construct Rips filtrations for point clouds, we obtain a map between metric spaces by sending each point cloud to its persistence diagram. It is natural to ask which properties will this map have. From the work of F. Memoli etc., it is in fact a Lipschitz map. Therefore, the bottleneck distance between two persistence diagrams is bounded by the Gromov-Hausdorff distance between two point clouds. This stability shows that one can use TDA to classify different point clouds, which is a major object in shape analysis.

Let \((Z,d_Z)\) be a (compact) metric space and \(X,Y\) be subsets in \(Z\). The Hausdorff distance between \(X\) and \(Y\) is defined as \[ d_H^Z(X,Y)=\max\{\max_{x\in X}\min_{y\in Y}d_Z(x,y),\max_{y\in Y}\min_{x\in X}d_Z(x,y)\}. \] The intuition is follows: For each \(x\in X\), we can compute the distance between \(x\) and \(Y\), which is \(\min_{y\in Y}d_Z(x,y)\). Then we take the largest value as the distance from \(X\) to \(Y\). The distance is the so called one-sided Hausdorff distance. Symmetrically, we compute the distance from \(Y\) to \(X\). The maximum is the so called (two-sided) Hausdorff distance.

Stability Theorems in Topological Data Analysis (2)

One observes that when proving the \(L^{\infty}\) stability theorem, the functions \(f\) and \(g\) provide nothing but an interleaved inclusion of level sets \(f^{-1}((-\infty,a])\subseteq g^{-1}((-\infty,a+\epsilon])\subseteq f^{-1}((-\infty.a+2\epsilon])\). This further gives the following commutative diagram.

Curvature of Warped Products

Curvature of Coupled Planes

Consider the product of two planes \(\mathbb R^2\times\mathbb R^2\), with the following Riemannian metric \[ ds^2=dx^2+dy^2+e^{2f(x,y)}(du^2+dv^2), \] where \(f(x,y)\) is a smooth function. We compute the sectional/Ricci/scalar curvature of the 'coupled' planes using orthonormal frames. Basics about moving frames are referred to Loring W. Tu.

Remarks on 'Submanifold density estimation'

There seems to be a gap in Submanifold density estimation.

In the deduction of variance, one needs to estimate

\(\frac{1}{h_m^n}\int_M f(q)\frac{1}{h_m^n}K^2(\frac{u_p(q)}{h_m})dV(q)\)

Apply theorem 3.1 to function \(K^2\) one obtains \[ \frac{1}{h^n}\int_M K^2(\frac{u_p(q)}{h})\xi(q)dV(q)\to\xi(p)\int_{\mathbb{R}^n}K^2(\|z\|)d^nz,h\to 0. \] It seems \(h_m^{2n}\) will cause trouble in estimation. Whatever, they claimed that the integration will converges to \(f(p)\int K^2(\|z\|)d^nz\). However, in the expression of variance

\(Var[\frac{1}{h_m^n}K(\frac{u_p(q)}{h_m})]=E[\frac{1}{h_m^{2n}}K^2(\frac{u_p(q)}{h_m})]-(E[\frac{1}{h_m^n}K(\frac{u_p(q)}{h_m})])^2\)

the first term will converge to a constant times \(f(p)\), while the latter will converge to \(f(p)^2\). Thus the variance will not converge to 0.

The idea of proof is straightforward. Since \(K\) will be supported in \([0,1]\), one notes that when the bandwidth \(h\) is small, the integration will be zero outside the normal coordinate. Therefore, everything goes back to Euclidean space.

Stability Theorems in Topological Data Analysis (1)

Personal Comments

In this series I attempt to collect results concerning stabilities in topological data analysis. Researches in this direction has been increasing dramatically in recent years. Without doubt there will be more in the future. However, the best will not appear until one really understands all the progresses.


The first stability theorem in TDA is proved by David Cohen-Steiner, Herbert Edelsbrunner, John Harer. It asserts that the bottleneck distance between two persistence diagrams is bounded by the \(\infty\)-norm of tame functions. They soon generalized this result to \(L_p\) cases for tame Lipschitz functions (see here). This theorem is fundamental in TDA. And the idea is also direct -- counting points in persistence diagrams. We sketch the procedure in the following.

MSE of Locally Linear Regression

Backgrounds

Locally linear regression (also called locally weighted least squares) is an important method in nonparametric regression. Its understanding is intuitive. Suppose one has a general model \[ y=g(x)+\epsilon(x), \] where \(g\) is a any smooth function, and \(\epsilon(x)\) is a variable with zero mean and finite variance. Let \(X_1,X_2,\cdots,X_n\) be i.i.d. samples with bounded density \(f\), and \(Y_1,Y_2,\cdots, Y_n\) be the corresponding responses (assume \(X_i\)'s are scalars '). By Taylor expansion, we have \[ Y_i-g(x)\approx \beta(X_i-x)+\epsilon(x). \]

Concentration in Gauss Space

This article aims to solve an exercise in High-Dimensional Probability (Page 112, Exercise 5.2.3).

Gaussian Isoperimetric Inequality

Let \(\mathbb R^n\) equip with the Gauss measure \(\gamma_n\): \[ \gamma_n(E)=\frac{1}{(2\pi)^{n/2}}\int_E\exp\{-\frac{\|x\|^2}{2}\}{\rm d}x \] where \(E\subseteq \mathbb R^n\) is a Borel set. Then \((\mathbb R^n,\gamma_n)\) is called Gauss space. Gaussian isoperimetric inequality states that, among all the measurable sets with given volume, half spaces have the smallest perimeter. Moreover, define the Minkowski sum \[ A+B=\{a+b\in\mathbb R^n,a\in A,b\in B\}. \] We have the following statement.

avatar
Yueqi Cao
Genius only means hard working all one's life. --Mendeleyev
FRIENDS
Homepage Blog