Density-based Riemannian metrics &
Persistent Homology

XIMENA FERNANDEZ

Durham University

Dagstuhl Seminar



Fernandez, Borghini, Mindlin and Groisman. 'Intrinsic persistent homology via density-based metric learning.' JMLR (2023)

Motivation

Homology inference

Metric space: $(\mathbb X_n, d_E)\sim (\mathcal M, d_E)$




$\bullet ~ ~\mathrm{Rips}_\epsilon(\mathcal{M}, d_E)\simeq \mathcal{M}$ for $\epsilon < 2 \sqrt{\frac{D+1}{2D}}\mathrm{rch}(\mathcal{M})~~$ (Kim, Shin, Chazal, Rinaldo & Wasserman, 2020)

Homology inference

Metric space: $(\mathbb X_n, d_{kNN})\sim (\mathcal M, d_\mathcal{M})~~~$
(Bernstein, De Silva, Langford & Tenenbaum, 2000)




$\bullet ~ ~\mathrm{Rips}_\epsilon(\mathcal{M}, d_\mathcal{M})\simeq \mathcal{M}$ for $\epsilon < \mathrm{conv}(\mathcal{M}, d_{\mathcal{M}})~~$ (Hausmann, 1995; Latschev, 2001)

Homology inference

Metric space: $(\mathbb X_n, d_{kNN})\sim (\mathcal M, d_\mathcal{M})$




Homology inference

Metric space: $(\mathbb X_n, d_{kNN})\sim (\mathcal M, d_\mathcal{M})$




Homology inference

Metric space: $(\mathbb X_n, d_{kNN})\sim (\mathcal M, d_\mathcal{M})$




Density-based metric learning

Fermat distance

(Mckenzie & Damelin, 2019) (Groisman, Jonckheere & Sapienza, 2022)

Let $\mathbb{X}_n = \{x_1,...,x_n\}\subseteq \mathbb{R}^D$ be a finite sample.

For $p> 1$, the Fermat distance between $x,y\in \mathbb{R}^D$ is defined by \[ d_{\mathbb{X}_n, p}(x,y) = \inf_{\gamma} \sum_{i=0}^{r}|x_{i+1}-x_i|^{p} \] over all paths $\gamma=(x_0, \dots, x_{r+1})$ of finite length with $x_0=x$, $x_{r+1} = y$ and $\{x_1, x_2, \dots, x_{r}\}\subseteq \mathbb{X}_n$.

Fermat distance


Fermat distance


Fermat distance


Density-based geometry

(Hwang, Damelin & Hero, 2016)

Let $\mathcal M \subseteq \mathbb{R}^D$ be a manifold and let $f\colon\mathcal{M}\to \mathbb{R}_{>0}$ be a smooth density.

For $q>0$, the deformed Riemannian distance* in $\mathcal{M}$ is \[d_{f,q}(x,y) = \inf_{\gamma} \int_{I}\frac{1}{f(\gamma_t)^{q}}||\dot{\gamma}_t|| dt \] over all $\gamma:I\to \mathcal{M}$ with $\gamma(0) = x$ and $\gamma(1)=y$.


* Here, if $g$ is the inherited Riemannian tensor, then $d_{f,q}$ is the Riemannian distance induced by $g_q= f^{-2q} g$.

Convergence results

Convergence results

(F., Borghini, Mindlin & Groisman, 2023)

Let $\mathcal{M}$ be a closed smooth $d$-dimensional manifold embedded in $\mathbb{R}^D$. Let $\mathbb{X}_n$ be a sample of $\mathcal M$ sampled according to a density $f\colon \mathcal M\to \mathbb R$.

\[\big(\mathbb{X}_n, C(n,p,d) d_{\mathbb{X}_n,p}\big)\xrightarrow[n\to \infty]{GH}\big(\mathcal{M}, d_{f,q}\big) ~~~ \text{ for } q = (p-1)/d\]


Theorem (F., Borghini, Mindlin, Groisman, 2023)

Given $p>1$ and $q=(p-1)/d$, there exists a constant $\mu = \mu(p,d)$ such that for every $\lambda \in \big((p-1)/pd, 1/d\big)$ and $\varepsilon>0$ there exist $\theta>0$ satisfying \[ \mathbb{P}\left( d_{GH}\left(\big(\mathcal{M}, d_{f,q}\big), \big(\mathbb{X}_n, {\scriptstyle \frac{n^{q}}{\mu}} d_{\mathbb{X}_n, p}\big)\right) > \varepsilon \right) \leq \exp{\left(-\theta n^{(1 - \lambda d) /(d+2p)}\right)} \] for $n$ large enough.

Convergence results

(F., Borghini, Mindlin & Groisman, 2023)

Let $\mathcal{M}$ be a closed smooth $d$-dimensional manifold embedded in $\mathbb{R}^D$. Let $\mathbb{X}_n$ be a sample of $\mathcal M$ sampled according to a density $f\colon \mathcal M\to \mathbb R$.

\[\mathrm{dgm}(\mathrm{Filt}(\mathbb{X}_n, {C(n,p,d)} d_{\mathbb{X}_n,p}))\xrightarrow[n\to \infty]{B}\mathrm{dgm}(\mathrm{Filt}(\mathcal{M}, d_{f,q})) ~~~ \text{ for } q = (p-1)/d\]


Theorem (F., Borghini, Mindlin, Groisman, 2023)

Given $p>1$ and $q=(p-1)/d$, there exists a constant $\mu = \mu(p,d)$ such that for every $\lambda \in \big((p-1)/pd, 1/d\big)$ and $\varepsilon>0$ there exist $\theta>0$ satisfying \[ \mathbb{P}\Big( d_B\big(\mathrm{dgm}(\mathrm{Filt}(\mathcal{M}, d_{f,q})),\mathrm{dgm}(\mathrm{Filt}(\mathbb{X}_n, {\scriptstyle \frac{n^{q}}{\mu}} d_{\mathbb{X}_n,p}))\big)>\varepsilon\Big)\\\leq \exp{\big(-\theta n^{(1 - \lambda d)/(d+2p)}\big)}\] for $n$ large enough.

Fermat-based persistence diagrams


Fermat-based persistence diagrams


Fermat-based persistence diagrams


Fermat-based persistence diagrams

Intrinsic reconstruction

$\bullet ~ ~\mathrm{Rips}_\epsilon(\mathcal{M}, d_\mathcal{f,q})\simeq \mathcal{M}$ for $\epsilon < \mathrm{conv}(\mathcal{M}, d_{f,q})$

Fermat-based persistence diagrams

Robustness to outliers

Fermat-based persistence diagrams

Robustness to outliers

Prop (F., Borghini, Mindlin, Groisman, 2023)

Let $\mathbb{X}_n$ be a sample of $\mathcal{M}$ and let $Y\subseteq \mathbb{R}^D\smallsetminus \mathcal{M}$ be a finite set of outliers.

Fermat-based persistence diagrams

Robustness to outliers

Prop (F., Borghini, Mindlin, Groisman, 2023)

Let $\mathbb{X}_n$ be a sample of $\mathcal{M}$ and let $Y\subseteq \mathbb{R}^D\smallsetminus \mathcal{M}$ be a finite set of outliers.
Let $\delta = \displaystyle \min\Big\{\min_{y\in Y} d_E(y, Y\smallsetminus \{y\}), ~d_E(\mathbb X_n, Y)\Big\}$.
Then, for all $k>0$ and $p>1$, \[ \mathrm{dgm}_k(\mathrm{Rips}_{<\delta^p}(\mathbb{X}_n \cup Y, d_{\mathbb{X}_n\cup Y, p})) = \mathrm{dgm}_k(\mathrm{Rips}_{<\delta^p}(\mathbb{X}_n, d_{\mathbb{X}_n, p})) \] where $\mathrm{Rips}_{<\delta^p}$ stands for the Rips filtration up to parameter $\delta^{p}$ and $\mathrm{dgm}_k$ for the persistent homology of deg $k$.

Fermat-based persistence diagrams

Computational implementation
  • Complexity:

    $O(n^3)$

    reducible to $O(n^2*k*\log(n))$ using the $k$-NN-graph (for $k = O(\log n)$ the geodesics belong to the $k$-NN graph with high probability).

  • Python library:

    fermat

  • Tool in Giotto-TDA:

    In progress

  • Computational experiments:

    ximenafernandez/intrinsicPH

Applications to
time series analysis

Parameter selection in Takens' embeddings

Embedding dimension $D$

Parameter selection in Takens' embeddings

Time delay $T$

Topology of embeddings

Electrocardiogram

Source data: PhysioNet Database https://physionet.org/about/database/

Topology of embeddings

Electrocardiogram

Topology of embeddings

Electrocardiogram

Topology of embeddings

Electrocardiogram
* We use Fermat distance with $p=2$.

Topology of embeddings

Birdsongs

Source data: Private experiments. Laboratory of Dynamical Systems, University of Buenos Aires.

Topology of embeddings

Birdsongs

Topology of embeddings

Birdsongs

Topology of embeddings

Birdsongs

Questions


  • Results in the noisy case
    $\widetilde{\mathbb{X}}_n = \{\widetilde {x}_1, \widetilde {x}_2, \dots, \widetilde {x}_n\} $ such that $\widetilde{x}_i = x_i + \xi_i $ with $x_i \in \mathcal M$ and $\xi_i\in \mathbb R^D$ 'noise'. \[\text{Study }~~~ \mathbb{P}\Big( d_B\big(\mathrm{dgm}(\mathrm{Filt}(\mathcal{M}, d_{f,q})),\mathrm{dgm}(\mathrm{Filt}(\mathbb{\widetilde{X}}_n, {\scriptstyle \frac{n^{q}}{\mu}} d_{\mathbb{X}_n,p}))\big)>\varepsilon\Big)\]

  • The parameter $p$
    • Study the theoretical behaviour of the distances for different values of $p$. [McKenzie, Damelin]
    • 'Best choice' of $p$.

References


  • Source: X. F., E. Borghini, G. Mindlin, P. Groisman, Intrinsic persistent homology via density-based metric learning. Journal of Machine Learning Research 24 (2023) 1-42.

  • Github Repository: ximenafernandez/intrinsicPH

  • Tutorial: Intrinsic persistent homology. AATRN Youtube Channel

Thanks!