9th European Congress of Mathematics Sevilla 2024
					
					APPLIED AND COMBINATORIAL TOPOLOGY
XIMENA FERNANDEZ
					
					University of Oxford
Joint work with 
W. REISE (Paris-Saclay), M. DOMINGUEZ (Spotify), M. BEGUERISSE-DIAZ (Spotify) & H. HARRINGTON (Oxford).
Source: Spotify
Source: Music Obfuscator by Ben Grosser (2015)
Given two audio recordings, identify wheather they correspond to the same audio content .
Case study:
					  Shazam (2003)
 Shazam (2003)
 
					
				Case study:
					  Shazam (2003)
 Shazam (2003)
 
  
				Case study:
					  Shazam (2003)
 Shazam (2003)
 
  
				Case study:
					  Shazam (2003)
 Shazam (2003)
 
  
					Case study:
					  Shazam (2003)
 Shazam (2003)
 
  
  
					 
					 
					 
					 
					 
					 
					 
				Case study:
					  Shazam
 Shazam
 
				 
 
				 
 
				Let $\mathcal S$ denote the mel-spectrogram of an audio track $s:[0,T]\to \mathbb{R}$.
 
 
					 
 
					 
 
					 
 
					 
 
					 
				Let $\mathcal S$ denote the mel-spectrogram of an audio track $s:[0,T]\to \mathbb{R}$.
Let $\mathcal S$ denote the mel-spectrogram of an audio track $s:[0,T]\to \mathbb{R}$.
$~~~~~~~~~~~~t_0~~~~~~~~~~~~~~~~~~~~~~~t_1~~~~~~~~~~~~~~~~~~~~t_2~~~~~~~~~~~~~~~~~~~~~t_3~~~~~~~~~~~~~~~~~~~~~~t_4~~~~~~~~~~~~~~~~~~~~~t_5 \dots$
 
  
					
					 
					
 
					
 
					
 
					
 
					$~~~~$
					
 
					
 
					
 
					
 
					
 
					
$~~~~~~~~~~~~~~~~t_0~~~~~~~~~~~~~~~~t_1~~~~~~~~~~~~~~~~t_2~~~~~~~~~~~~~~~t_3~~~~~~~~~~~~~~~t_4~~\dots~~~~~~~~~~~~~~~~~~t'_0~~~~~~~~~~~~~~~~t'_1~~~~~~~~~~~~~~~~t'_2~~~~~~~~~~~~~~~~t'_3~~~~~~~~~~~~~~~t'_4~~\dots$
For every homological dimension $d=0,1$, the $d$-Betti distance matrix $M_d$ between $s$ and $s'$ is defined as \[ (M_d)_{i,j} = \Vert \beta_{i,d} - \beta'_{j,d} \Vert_{L^1}. \]
We summarize the distance between every pair of windows $W_i$ and $W_j'$ as \[ C_{i,j} = \lambda (M_0)_{i,j} + (1-\lambda) (M_1)_{i,j} \] for a parameter $0\leq \lambda\leq 1$.
  
  
  
For $m\geq 1$, compute $\bar t'_{j_i} = \mathrm{median} \{t_{j_{i-m}},\dots, t_{j_{i-1}}, t_{j_i}, t_{j_{i+1}}, \dots, t_{j_{i+m}}\}$, the moving median at $t_{j_i}$. Consider $\bar P=\{( t_{i}, \bar t_{j_i}'): i =1,\dots,k\}$.
For $m\geq 1$, compute $\bar t'_{j_i} = \mathrm{median} \{t_{j_{i-m}}, t_{j_{i-m+1}}, \dots, t_{j_{i-1}}, t_{j_i}\}$, the moving median at $t_{j_i}$. Consider $\bar P=\{( t_{i}, \bar t_{j_i}'): i =1,\dots,k\}$.
We assess the functional increasing dependency of the points in $P$ as \[ \rho_{\bar P} = \mathrm{Pearson}\{(t_i), (\bar{t}'_{j_i})\}. \]
 
					
					
					
For $m\geq 1$, compute $\bar t'_{j_i} = \mathrm{median} \{t_{j_{i-m}}, t_{j_{i-m+1}}, \dots, t_{j_{i-1}}, t_{j_i}\}$, the moving median at $t_{j_i}$. Consider $\bar P=\{( t_{i}, \bar t_{j_i}'): i =1,\dots,k\}$.
We assess the functional increasing dependency of the points in $P$ as \[ \rho_{\bar P} = \mathrm{Pearson}\{(t_i), (\bar{t}'_{j_i})\}. \]
 
					
					
					
Music Obfuscator by Ben Grosser
| Song | Shazam (60 sec) | 
|---|---|
| Smells Like Teen Spirit | No | 
| Get Lucky | No | 
| Giant Steps | No | 
| Stairway to Heaven | Yes | 
| Headlines | Yes | 
| Blue in Green | No | 
| You’re Gonna Leave | No | 
| Blue Ocean Floor | No | 
Music Obfuscator by Ben Grosser
| Song | Shazam (60 sec) | Correlation (60-30 sec) | 
|---|---|---|
| Smells Like Teen Spirit | No | 0.83208 | 
| Get Lucky | No | 0.99906 | 
| Giant Steps | No | 0.83904 | 
| Stairway to Heaven | Yes | 0.88533 | 
| Headlines | Yes | 0.91173 | 
| Blue in Green | No | 0.89276 | 
| You’re Gonna Leave | No | 0.71766 | 
| Blue Ocean Floor | Yes | 0.51332 | 
Spotify Database + PySOX Transformer
 Spotify Web API. Dataset of 30 seconds preview snippets of (~135.000) songs.
 Spotify Web API. Dataset of 30 seconds preview snippets of (~135.000) songs.| Obfuscation type | Degree | 
|---|---|
| Low Pass Filter | 200, 400, 800, 1600, 2000 | 
| High Pass Filter | 50, 100, 200, 400, 800, 1200 | 
| White Noise | 0.05, 0.10, 1.20, 0.40 | 
| Pink Noise | 0.05, 0.10, 1.20, 0.40 | 
| Reverberation | 25, 50, 75, 100 | 
| Tempo | 0.50, 0.80, 1.1 1.2, 1.50, 2.00 | 
| Pitch | -8, -4, -2, -1, 1, 2, 4, 8 | 
(Accuracy)
 
					 
					Given an audio track $s$ and a database $\mathcal D$, identify an element $s'\in \mathcal D$ with the same audio content as $s$.
Given an audio track $s$ and a database $\mathcal D$, identify an element $s'\in \mathcal D$ with the same audio content as $s$.
Given an audio track $s$ and a database $\mathcal D$, identify an element $s'\in \mathcal D$ with the same audio content as $s$.
Article: W. Reise, X. Fernandez, M. Dominguez, H.A. Harrington, M. Beguerisse-Diaz. 
Topological fingerprints for audio identification
 SIAM Journal on Mathematics of Data Science (2024, to appear).
					
Slides: https://ximenafernandez.github.io/talks/
