Topology and Geometry from Data
XIMENA FERNANDEZ
University of Oxford
Astrophysics Seminar
Nagoya University - 22 August 2024
Topology vs Geometry
Geometry = fine details + quantitative answers
Topology = fundamental properties + qualitative answers
Topological inference
Let X X X be a space and let X n = { x 1 , . . . , x n } \mathbb{X}_n = \{x_1,...,x_n\} X n = { x 1 , ... , x n } be a finite sample of X X X .
Q: How to infer topological properties of X X X from X n \mathbb{X}_n X n ?
Topological inference
Let X X X be a space and let X n = { x 1 , . . . , x n } \mathbb{X}_n = \{x_1,...,x_n\} X n = { x 1 , ... , x n } be a finite sample of X X X .
Q: How to infer topological properties of X X X from X n \mathbb{X}_n X n ?
Point cloud X n ⊂ R D \mathbb{X}_n \subset \mathbb{R}^D X n ⊂ R D
For ϵ > 0 \epsilon>0 ϵ > 0 , the ϵ \epsilon ϵ -thickening of X n \mathbb{X}_n X n :
U ϵ = ⋃ x ∈ X n B ϵ ( x ) \displaystyle U_\epsilon = \bigcup_{x\in \mathbb{X}_n}B_{\epsilon}(x) U ϵ = x ∈ X n ⋃ B ϵ ( x )
Topological inference
Let X X X be a space and let X n = { x 1 , . . . , x n } \mathbb{X}_n = \{x_1,...,x_n\} X n = { x 1 , ... , x n } be a finite sample of X X X .
Q: How to infer topological properties of X X X from X n \mathbb{X}_n X n ?
Point cloud X n ⊂ R D \mathbb{X}_n \subset \mathbb{R}^D X n ⊂ R D
Evolving thickenings
Topological inference
Let X X X be a topological space and let X n = { x 1 , . . . , x n } \mathbb{X}_n = \{x_1,...,x_n\} X n = { x 1 , ... , x n } be a finite sample of X X X .
Q: How to infer topological properties of X X X from X n \mathbb{X}_n X n ?
Point cloud X n ⊂ R D \mathbb{X}_n \subset \mathbb{R}^D X n ⊂ R D
Evolving thickenings
Filtration of simplicial complexes
Topological inference
Let X X X be a topological space and let X n = { x 1 , . . . , x n } \mathbb{X}_n = \{x_1,...,x_n\} X n = { x 1 , ... , x n } be a finite sample of X X X .
Q: How to infer topological properties of X X X from X n \mathbb{X}_n X n ?
Point cloud X n ⊂ R D \mathbb{X}_n \subset \mathbb{R}^D X n ⊂ R D
Filtration of simplicial complexes
Persistence diagram
Topological inference
Topological inference
Applications
Gardner et al. 'Toroidal topology of population activity in grid cells'. Nature. (2022)
Applications
Reise, Fernandez, Dominguez, Harrington, Beguerisse-Diaz. 'Topological fingerprints for audio identification'. SIAM Journal of Data Science (2024)
Applications
Fernandez X., Mateos D. 'Topological biomarkers for real-time detection of epileptic seizures'. Preprint (2024)
Persistent homology
Persistent homology
Metric space: ( X n , d E ) ∼ ( M , d E ) (\mathbb X_n, d_E)\sim (\mathcal M, d_E) ( X n , d E ) ∼ ( M , d E )
Persistent homology
Metric space: ( X n , d k N N ) ∼ ( M , d M ) (\mathbb X_n, d_{kNN})\sim (\mathcal M, d_\mathcal{M})~~~ ( X n , d k NN ) ∼ ( M , d M )
(Bernstein, De Silva, Langford & Tenenbaum, 2000)
Persistent homology
Metric space: ( X n , d k N N ) ∼ ( M , d M ) (\mathbb X_n, d_{kNN})\sim (\mathcal M, d_\mathcal{M}) ( X n , d k NN ) ∼ ( M , d M )
Persistent homology
Metric space: ( X n , d k N N ) ∼ ( M , d M ) (\mathbb X_n, d_{kNN})\sim (\mathcal M, d_\mathcal{M}) ( X n , d k NN ) ∼ ( M , d M )
Persistent homology
Metric space: ( X n , d k N N ) ∼ ( M , d M ) (\mathbb X_n, d_{kNN})\sim (\mathcal M, d_\mathcal{M}) ( X n , d k NN ) ∼ ( M , d M )
Density-based geometry
Density-based geometry
Let X n = { x 1 , . . . , x n } ⊆ R D \mathbb{X}_n = \{x_1,...,x_n\}\subseteq \mathbb{R}^D X n = { x 1 , ... , x n } ⊆ R D be a finite sample.
Density-based geometry
Let X n = { x 1 , . . . , x n } ⊆ R D \mathbb{X}_n = \{x_1,...,x_n\}\subseteq \mathbb{R}^D X n = { x 1 , ... , x n } ⊆ R D be a finite sample.
Assume that:
X n \mathbb{X}_n X n is a sample of a compact manifold M \mathcal M M of dimension d d d .
The points are sampled according to a density f : M → R f\colon \mathcal M\to \mathbb R f : M → R .
Fermat principle
The path taken by a ray between two given points is the path that can be traversed in the least time.
That is, it is the extreme of the functional
γ ↦ ∫ 0 1 η ( γ t ) ∣ ∣ γ ˙ t ∣ ∣ d t
\gamma\mapsto \int_{0}^1\eta(\gamma_t)||\dot{\gamma}_t|| dt
γ ↦ ∫ 0 1 η ( γ t ) ∣∣ γ ˙ t ∣∣ d t
with η \eta η is the refraction index .
Density-based geometry
(Hwang, Damelin & Hero, 2016)
Let M ⊆ R D \mathcal M \subseteq \mathbb{R}^D M ⊆ R D be a manifold and let f : M → R > 0 f\colon\mathcal{M}\to \mathbb{R}_{>0} f : M → R > 0 be a smooth density .
For q > 0 q>0 q > 0 , the deformed Riemannian distance* in M \mathcal{M} M is
d f , q ( x , y ) = inf γ ∫ I 1 f ( γ t ) q ∣ ∣ γ ˙ t ∣ ∣ d t d_{f,q}(x,y) = \inf_{\gamma} \int_{I}\frac{1}{f(\gamma_t)^{q}}||\dot{\gamma}_t|| dt d f , q ( x , y ) = γ inf ∫ I f ( γ t ) q 1 ∣∣ γ ˙ t ∣∣ d t
over all γ : I → M \gamma:I\to \mathcal{M} γ : I → M with γ ( 0 ) = x \gamma(0) = x γ ( 0 ) = x and γ ( 1 ) = y \gamma(1)=y γ ( 1 ) = y .
* Here, if g g g is the inherited Riemannian tensor, then d f , q d_{f,q} d f , q is the Riemannian distance induced by g q = f − 2 q g g_q= f^{-2q} g g q = f − 2 q g .
Fermat distance
(Mckenzie & Damelin, 2019) (Groisman, Jonckheere & Sapienza, 2022)
Let X n = { x 1 , . . . , x n } ⊆ R D \mathbb{X}_n = \{x_1,...,x_n\}\subseteq \mathbb{R}^D X n = { x 1 , ... , x n } ⊆ R D be a finite sample.
For p > 1 p> 1 p > 1 ,
the Fermat distance between x , y ∈ R D x,y\in \mathbb{R}^D x , y ∈ R D is defined by
d X n , p ( x , y ) = inf γ ∑ i = 0 r ∣ x i + 1 − x i ∣ p
d_{\mathbb{X}_n, p}(x,y) = \inf_{\gamma} \sum_{i=0}^{r}|x_{i+1}-x_i|^{p}
d X n , p ( x , y ) = γ inf i = 0 ∑ r ∣ x i + 1 − x i ∣ p
over all paths γ = ( x 0 , … , x r + 1 ) \gamma=(x_0, \dots, x_{r+1}) γ = ( x 0 , … , x r + 1 ) of finite length with x 0 = x x_0=x x 0 = x , x r + 1 = y x_{r+1} = y x r + 1 = y and { x 1 , x 2 , … , x r } ⊆ X n \{x_1, x_2, \dots, x_{r}\}\subseteq \mathbb{X}_n { x 1 , x 2 , … , x r } ⊆ X n .
Convergence results
d X n , p d_{\mathbb X_n, p} d X n , p is an estimator of d f , q d_{f,q} d f , q if q = ( p − 1 ) / d q=(p-1)/d q = ( p − 1 ) / d .
Convergence results
(F., Borghini, Mindlin & Groisman, 2023)
Let M \mathcal{M} M be a closed smooth d d d -dimensional manifold embedded in R D \mathbb{R}^D R D .
( X n , C ( n , p , d ) d X n , p ) → n → ∞ G H ( M , d f , q ) for q = ( p − 1 ) / d \big(\mathbb{X}_n, C(n,p,d) d_{\mathbb{X}_n,p}\big)\xrightarrow[n\to \infty]{GH}\big(\mathcal{M}, d_{f,q}\big) ~~~ \text{ for } q = (p-1)/d ( X n , C ( n , p , d ) d X n , p ) G H n → ∞ ( M , d f , q ) for q = ( p − 1 ) / d
Recall that d H ( ( X , d ) ( Y , d ) ) = max { sup x ∈ X d ( x , Y ) , sup y ∈ Y d ( X , y ) } , for X , Y ⊆ ( Z , d ) d_{H}\big((X, d)(Y,d)\big) = \max \big\{\sup_{x\in X}d(x,Y), \sup_{y\in Y}d(X,y)\big\}, ~~\text{for }X,Y\subseteq (Z,d) d H ( ( X , d ) ( Y , d ) ) = max { x ∈ X sup d ( x , Y ) , y ∈ Y sup d ( X , y ) } , for X , Y ⊆ ( Z , d )
d G H ( ( X , d X ) , ( Y , d Y ) ) = inf Z metric space f : X → Z , g : Y → Z isometries d H ( f ( X ) , g ( Y ) ) d_{GH}\big((X, d_X),(Y,d_Y)\big)= \inf_{\substack{Z \text{ metric space}\\ f:X\to Z, g:Y\to Z \text{ isometries}}}d_H(f(X), g(Y)) d G H ( ( X , d X ) , ( Y , d Y ) ) = Z metric space f : X → Z , g : Y → Z isometries inf d H ( f ( X ) , g ( Y ))
Convergence results
(F., Borghini, Mindlin & Groisman, 2023)
Let M \mathcal{M} M be a closed smooth d d d -dimensional manifold embedded in R D \mathbb{R}^D R D .
( X n , C ( n , p , d ) d X n , p ) → n → ∞ G H ( M , d f , q ) for q = ( p − 1 ) / d \big(\mathbb{X}_n, C(n,p,d) d_{\mathbb{X}_n,p}\big)\xrightarrow[n\to \infty]{GH}\big(\mathcal{M}, d_{f,q}\big) ~~~ \text{ for } q = (p-1)/d ( X n , C ( n , p , d ) d X n , p ) G H n → ∞ ( M , d f , q ) for q = ( p − 1 ) / d
Theorem (F., Borghini, Mindlin, Groisman, 2023)
Let X n \mathbb{X}_n X n be a sample of a closed manifold M \mathcal M M of dimension d d d , drawn according to a density f : M → R f\colon \mathcal M\to \mathbb R f : M → R .
Given p > 1 p>1 p > 1 and q = ( p − 1 ) / d q=(p-1)/d q = ( p − 1 ) / d , there exists a constant μ = μ ( p , d ) \mu = \mu(p,d) μ = μ ( p , d ) such that for every λ ∈ ( ( p − 1 ) / p d , 1 / d ) \lambda \in \big((p-1)/pd, 1/d\big) λ ∈ ( ( p − 1 ) / p d , 1/ d ) and ε > 0 \varepsilon>0 ε > 0 there exist θ > 0 \theta>0 θ > 0 satisfying
P ( d G H ( ( M , d f , q ) , ( X n , n q μ d X n , p ) ) > ε ) ≤ exp ( − θ n ( 1 − λ d ) / ( d + 2 p ) )
\mathbb{P}\left( d_{GH}\left(\big(\mathcal{M}, d_{f,q}\big), \big(\mathbb{X}_n, {\scriptstyle \frac{n^{q}}{\mu}} d_{\mathbb{X}_n, p}\big)\right) > \varepsilon \right) \leq \exp{\left(-\theta n^{(1 - \lambda d) /(d+2p)}\right)}
P ( d G H ( ( M , d f , q ) , ( X n , μ n q d X n , p ) ) > ε ) ≤ exp ( − θ n ( 1 − λ d ) / ( d + 2 p ) )
for n n n large enough.
Fermat-distance
Computational implementation
Complexity: O ( n 3 ) O(n^3) O ( n 3 )
reducible to O ( n 2 ∗ k ∗ log ( n ) ) O(n^2*k*\log(n)) O ( n 2 ∗ k ∗ log ( n )) using the k k k -NN-graph (for k = O ( log n ) k = O(\log n) k = O ( log n ) the geodesics belong to the k k k -NN graph with high probability).
Python library:
fermat
Computational experiments:
ximenafernandez/intrinsicPH
Convergence of persistence diagrams
( X n , C ( n , p , d ) d X n , p ) → n → ∞ G H ( M , d f , q ) for q = ( p − 1 ) / d \big(\mathbb{X}_n, C(n,p,d) d_{\mathbb{X}_n,p}\big)\xrightarrow[n\to \infty]{GH}\big(\mathcal{M}, d_{f,q}\big) ~~~ \text{ for } q = (p-1)/d ( X n , C ( n , p , d ) d X n , p ) G H n → ∞ ( M , d f , q ) for q = ( p − 1 ) / d
+
Stability d B ( d g m ( F i l t ( X , d X ) ) , d g m ( F i l t ( Y , d Y ) ) ≤ 2 d G H ( ( X , d X ) , ( Y , d Y ) ) d_B\Big( \mathrm{dgm}\big(\mathrm{Filt}(X, d_X)\big), \mathrm{dgm}\big(\mathrm{Filt}(Y, d_Y\big)\Big)\leq 2 d_{GH}\big((X,d_X),(Y,d_Y)\big) d B ( dgm ( Filt ( X , d X ) ) , dgm ( Filt ( Y , d Y ) ) ≤ 2 d G H ( ( X , d X ) , ( Y , d Y ) )
Convergence of persistence diagrams
( X n , C ( n , p , d ) d X n , p ) ) → n → ∞ G H ( M , d f , q ) for q = ( p − 1 ) / d \big(\mathbb{X}_n, C(n,p,d) d_{\mathbb{X}_n,p})\big)\xrightarrow[n\to \infty]{GH}\big(\mathcal{M}, d_{f,q}\big) ~~~ \text{ for } q = (p-1)/d ( X n , C ( n , p , d ) d X n , p ) ) G H n → ∞ ( M , d f , q ) for q = ( p − 1 ) / d
+
Stability d B ( d g m ( F i l t ( X , d X ) ) , d g m ( F i l t ( Y , d Y ) ) ≤ 2 d G H ( ( X , d X ) , ( Y , d Y ) ) d_B\Big( \mathrm{dgm}\big(\mathrm{Filt}(X, d_X)\big), \mathrm{dgm}\big(\mathrm{Filt}(Y, d_Y\big)\Big)\leq 2 d_{GH}\big((X,d_X),(Y,d_Y)\big) d B ( dgm ( Filt ( X , d X ) ) , dgm ( Filt ( Y , d Y ) ) ≤ 2 d G H ( ( X , d X ) , ( Y , d Y ) )
⇓ \Downarrow ⇓
d g m ( F i l t ( X n , C ( n , p , d ) d X n , p ) ) → n → ∞ B d g m ( F i l t ( M , d f , q ) ) for q = ( p − 1 ) / d \mathrm{dgm}(\mathrm{Filt}(\mathbb{X}_n, {C(n,p,d)} d_{\mathbb{X}_n,p}))\xrightarrow[n\to \infty]{B}\mathrm{dgm}(\mathrm{Filt}(\mathcal{M}, d_{f,q})) ~~~ \text{ for } q = (p-1)/d dgm ( Filt ( X n , C ( n , p , d ) d X n , p )) B n → ∞ dgm ( Filt ( M , d f , q )) for q = ( p − 1 ) / d
Fermat-based persistence diagrams
Fermat-based persistence diagrams
Fermat-based persistence diagrams
Fermat-based persistence diagrams
Intrinsic reconstruction
Fermat-based persistence diagrams
Robustness to outliers
Application to time series Analysis
Anomaly detection
Electrocardiogram
Source data: PhysioNet Database https://physionet.org/about/database/
Anomaly detection
Electrocardiogram
Anomaly detection
Electrocardiogram
Anomaly detection
Electrocardiogram
Anomaly detection
Electrocardiogram
t ↦ D t t\mapsto \mathcal{D}_t t ↦ D t
Approximate Derivative
Change-points detection
Birdsongs
Source data: Private experiments. Laboratory of Dynamical Systems, University of Buenos Aires.
Change-points detection
Birdsongs
Change-points detection
Birdsongs
Change-points detection
Birdsongs
Change-points detection
Birdsongs
Future work
References
Source: X. Fernandez, E. Borghini, G. Mindlin, P. Groisman. Intrinsic persistent homology via density-based metric learning. Journal of Machine Learning Research 24(75):1−42 (2023).
Github Repository: ximenafernandez/intrinsicPH
Tutorial: Intrinsic persistent homology. AATRN Youtube Channel (2021)
Python Library: fermat
THANKS!
Topology and Geometry from Data XIMENA FERNANDEZ University of Oxford Astrophysics Seminar Nagoya University - 22 August 2024