XIMENA FERNANDEZ
TOPOLOGICAL DATA ANALYSIS
Let $X$ be a topological space and let $\mathbb{X}_n = \{x_1,...,x_n\}$ be a finite sample of $X$.
Inference problem: How to estimate topological properties of $X$ from $\mathbb{X}_n$?
Let $X$ be a topological space and let $\mathbb{X}_n = \{x_1,...,x_n\}$ be a finite sample of $X$.
Inference problem: How to estimate topological properties of $X$ from $\mathbb{X}_n$?
Let $X$ be a topological space and let $\mathbb{X}_n = \{x_1,...,x_n\}$ be a finite sample of $X$.
Inference problem: How to estimate topological properties of $X$ from $\mathbb{X}_n$?
$\text{Evolving thickenings}$
Let $X$ be a topological space and let $\mathbb{X}_n = \{x_1,...,x_n\}$ be a finite sample of $X$.
Inference problem: How to estimate topological properties of $X$ from $\mathbb{X}_n$?
$\text{Evolving thickenings}$
$\text{Sequence of combinatorial spaces}$ $\epsilon = 0 ~~~~~~~\epsilon = 0.7 ~~~~~~~\epsilon = 1 ~~~~~~~\epsilon = 2$
Let $X$ be a topological space and let $\mathbb{X}_n = \{x_1,...,x_n\}$ be a finite sample of $X$.
Inference problem: How to estimate topological properties of $X$ from $\mathbb{X}_n$?
$\text{Sequence of combinatorial spaces}$
$\text{Persistent Homology}$
Input dataset $X$.
Source: B. Rieck
Input dataset $X$.
Source: B. Rieck
Input dataset $X$.
Source: B. Rieck
Input dataset $X$.
Source: B. Rieck
Problem: Persistent homology is not robust in general to noise and outliers, and it might be very sensitive to the embedding in the ambient space.
Let $\mathbb{X}_n = \{x_1,...,x_n\}\subseteq \mathbb{R}^D$ data points.
Assume that:
Goal: Infer $'H_\bullet(\mathcal M, f)'$
X. Fernandez, E. Borghini, G. Mindlin, P. Groisman. 'Intrinsic persistent-homology via density-based metric learning'. Journal of Machine Learning Research. 24 (2023) 1-42.
The path taken by a ray between two given points is the path that can be traversed in the least time.
That is, it is the extreme of the functional \[ \gamma\mapsto \int_{0}^1\eta(\gamma_t)||\dot{\gamma}_t|| dt \] with $\eta$ is the refraction index.
UNDERLYING SPACE
$\mathcal M \subseteq \mathbb{R}^D$ manifold, $f:\mathcal{M}\to \mathbb{R}_{>0}$ density.
For $q>0$, deformed Riemannian distance in $\mathcal{M}$ \[ d_{f,q}(x,y) = \inf_{\gamma:x\sim y} \int_{\gamma}\frac{1}{f(\gamma_t)^{q}}||\dot \gamma_t||dt. \]
UNDERLYING SPACE
$\mathcal M \subseteq \mathbb{R}^D$ manifold, $f:\mathcal{M}\to \mathbb{R}_{>0}$ density.
For $q>0$, deformed Riemannian distance in $\mathcal{M}$ \[ d_{f,q}(x,y) = \inf_{\gamma:x\sim y} \int_{\gamma}\frac{1}{f(\gamma_t)^{q}}||\dot \gamma_t||dt. \]
DATA
$\mathbb{X}_n = \{x_1,...,x_n\}\subseteq \mathbb{R}^D$ sample.
For $p> 1$, Fermat distance in $\mathbb{X}_n$ \[ d_{\mathbb{X}_n, p}(x,y) = \inf_{\gamma:x\sim y} \sum_{i=0}^{r}|x_{i+1}-x_i|^{p}. \]
Theorem (F., Borghini, Mindlin, Groisman)
\[\big(\mathbb{X}_n, C(n,p,d) d_{\mathbb{X}_n,p}\big)\xrightarrow[n\to \infty]{GH}\big(\mathcal{M}, d_{f,q}\big) ~~~ \text{ for } q = (p-1)/d\]
Theorem (F., Borghini, Mindlin, Groisman)
\[\big(\mathbb{X}_n, C(n,p,d) d_{\mathbb{X}_n,p}\big)\xrightarrow[n\to \infty]{GH}\big(\mathcal{M}, d_{f,q}\big) ~~~ \text{ for } q = (p-1)/d\]
Let $\mathcal{M}$ be a closed smooth $d$-dimensional Riemannian manifold embedded in $\mathbb{R}^D$. Let $\mathbb X_n\subseteq \mathcal{M}$ be a set of $n$ independent sample points with common smooth density $f:\mathcal{M}\to \mathbb{R}_{>0}$.
Given $p>1$ and $q=(p-1)/d$, there exists a constant $\mu = \mu(p,d)$ such that for every $\lambda \in \big((p-1)/pd, 1/d\big)$ and $\varepsilon>0$ there exist $\theta>0$ satisfying \[ \mathbb{P}\left( d_{GH}\left(\big(\mathcal{M}, d_{f,q}\big), \big(\mathbb{X}_n, {\scriptstyle \frac{n^{q}}{\mu}} d_{\mathbb{X}_n, p}\big)\right) > \varepsilon \right) \leq \exp{\left(-\theta n^{(1 - \lambda d) /(d+2p)}\right)} \] for $n$ large enough.
$~$Theorem (F., Borghini, Mindlin, Groisman)
\[\big(\mathbb{X}_n, C(n,p,d) d_{\mathbb{X}_n,p}\big)\xrightarrow[n\to \infty]{GH}\big(\mathcal{M}, d_{f,q}\big) ~~~ \text{ for } q = (p-1)/d\]
Theorem (F., Borghini, Mindlin, Groisman)
\[\mathrm{dgm}\Big(\big(\mathbb{X}_n, C(n,p,d) d_{\mathbb{X}_n,p}\big)\Big)\xrightarrow[n\to \infty]{B}\mathrm{dgm}\Big(\big(\mathcal{M}, d_{f,q}\big)\Big) ~~~ \text{ for } q = (p-1)/d\]
Prop (F., Borghini, Mindlin, Groisman, 2023)
Let $\mathbb{X}_n$ be a sample of $\mathcal{M}$ and let $Y\subseteq \mathbb{R}^D\smallsetminus \mathcal{M}$ be a finite set of outliers. Let $\delta = \displaystyle \min\Big\{\min_{y\in Y} d_E(y, Y\smallsetminus \{y\}), ~d_E(\mathbb X_n, Y)\Big\}$.
Then, for all $k>0$ and $p>1$,
\[
\mathrm{dgm}_k(\mathrm{Rips}_{<\delta^p}(\mathbb{X}_n \cup Y, d_{\mathbb{X}_n\cup Y, p})) = \mathrm{dgm}_k(\mathrm{Rips}_{<\delta^p}(\mathbb{X}_n, d_{\mathbb{X}_n, p}))
\]
where $\mathrm{Rips}_{<\delta^p}$ stands for the Rips filtration up to parameter $\delta^{p}$ and $\mathrm{dgm}_k$ for the persistent homology of deg $k$.
X. Fernandez, E. Borghini, G. Mindlin, P. Groisman. 'Intrinsic persistent-homology via density-based metric learning'. Journal of Machine Learning Research. 24 (2023) 1-42.
Anomaly detection en ECG
Delay embedding:
Given $T$ the time delay and $D$ the embedding dimension,
\[\{\big(\varphi(t), \varphi(t+T), \varphi(t+2 T) \dots, \varphi(t+(D-1)T)\big): t\in \mathbb R\}\subseteq \mathbb{R}^D\]
Anomaly detection in ECG
Delay embedding:
Given $T$ the time delay and $D$ the embedding dimension,
\[\{\big(\varphi(t), \varphi(t+T), \varphi(t+2 T) \dots, \varphi(t+(D-1)T)\big): t\in \mathbb R\}\subseteq \mathbb{R}^D\]
Anomaly detection in ECG
Anomaly detection in ECG
\[t\mapsto \mathrm{dgm}_t\] Approximate Derivative: $\frac{d_B(\mathrm{dgm}_t, \mathrm{dgm}_{t'})}{|t'-t|}$Pattern detection in birdsongs
Source data: Private experiments. Laboratory of Dynamical Systems, University of Buenos Aires.
Pattern detection in birdsongs
Pattern detection in birdsongs
Pattern detection in birdsongs
Pattern detection in birdsongs
\[t\mapsto \mathrm{dgm}_t\] Approximate DerivativeEpileptic seizure detection
$~~~~~$X. Fernandez, D. Mateos. Topological biomarkers for real-time epileptic seizures. Preprint arXiv:2211.02523 (2024)
Epileptic seizure detection
Problem: Given a physiological recording of the brain activity, algorithmically detect in real time epileptic seizures.
$~~~~~$X. Fernandez, D. Mateos. Topological biomarkers for real-time epileptic seizures. Preprint arXiv:2211.02523 (2024)
Epileptic seizure detection
Epileptic seizure detection
Epileptic seizure detection
Epileptic seizure detection
Epileptic seizure detection
EEG (CH-MIT Database)Research collaboration with Spotify
W. Reise, X. Fernandez, M. Dominguez, H.A. Harrington, M. Beguerisse-Diaz. Topological fingerprints for audio identification. SIAM Journal on Mathematics of Data Science Vol. 6, Iss. 3 (2024)
Research collaboration with Spotify
Problem: Given two audio tracks, identify whether they correspond to the same audio content.
W. Reise, X. Fernandez, M. Dominguez, H.A. Harrington, M. Beguerisse-Diaz. Topological fingerprints for audio identification. SIAM Journal on Mathematics of Data Science Vol. 6, Iss. 3 (2024)
Research collaboration with Spotify
$~~~~~~~~~~~$Research collaboration with Spotify
$~~~~~~~$Research collaboration with Spotify
Research collaboration with Spotify
Research collaboration with Spotify
Given $s$ an audio track, we associate topological fingerprints via local Betti curves.
$~$ $~$ $~$ $~$ $~$ $~~$ $~$ $~$ $~$ $~$
$~$
$~~~~~~~~~~~t_0~~~~~~~~~~~~~~~~~t_1~~~~~~~~~~~~~~~~~~t_2~~~~~~~~~~~~~~~~~t_3~~~~~~~~~~~~~~~~~t_4~~~~~~\dots~~~~~~~~~~~t'_0~~~~~~~~~~~~~~~~~t'_1~~~~~~~~~~~~~~~~~t'_2~~~~~~~~~~~~~~~~t'_3~~~~~~~~~~~~~~~~t'_4~~\dots$
Correlation: 'Smells like teen spirit'. (30sec-30sec): 0.9896
Research collaboration with Spotify
Kuramoto models and synchronization
Kuramoto model
\[ \frac{d\theta_i}{dt} = \omega_i + \sum_{j=1}^{N} \omega_{ij}\sin(\theta_j - \theta_i), \quad i = 1, 2, \dots, N \]$~~~~~~$where:
Kuramoto models and synchronization
Conjecture: The Kuramoto model on random geometric graphs over spaces with non-trivial homology does not synchronize.
Dependence on initial conditions
Case study: Driven double gyre.Dependence on initial conditions
Dependence on initial conditions
New topological invariants of data
Problem: Homology may fall short in capturing topological aspects of data.
New topological invariants of data
Problem: Homology may fall short in capturing topological aspects of data.
Related to: X. Fernandez. Morse theory for group presentations. Transactions of the AMS. (2024)
$~~$ Lorenz attractor$~~~~~~~~~~~~~~~~~~~$Chua attractor
THANKS!