Keble Complexity Cluster Workshop
XIMENA FERNANDEZ
MATHEMATICAL INSTITUTE - Applied Topology Research GroupJoint work with
W. REISE (Paris-Saclay), M. DOMINGUEZ (Spotify), M. BEGUERISSE-DIAZ (Spotify) & H. HARRINGTON (Oxford).
Source: Spotify
Source: Music Obfuscator by Ben Grosser (2015)
Given two audio recordings, identify wheather they correspond to the same audio content .
Case study: Shazam (2003)
Case study: Shazam (2003)
Case study: Shazam (2003)
Case study: Shazam (2003)
Case study: Shazam
Let $\mathcal S$ denote the mel-spectrogram of an audio track $s:[0,T]\to \mathbb{R}$.
Let $\mathcal S$ denote the mel-spectrogram of an audio track $s:[0,T]\to \mathbb{R}$.
Let $\mathcal S$ denote the mel-spectrogram of an audio track $s:[0,T]\to \mathbb{R}$.
$~~~~~~~~~~~~t_0~~~~~~~~~~~~~~~~~~~~~~~t_1~~~~~~~~~~~~~~~~~~~~t_2~~~~~~~~~~~~~~~~~~~~~t_3~~~~~~~~~~~~~~~~~~~~~~t_4~~~~~~~~~~~~~~~~~~~~~t_5 \dots$
$~~~~$
$~~~~~~~~~~~~~~~~t_0~~~~~~~~~~~~~~~~t_1~~~~~~~~~~~~~~~~t_2~~~~~~~~~~~~~~~t_3~~~~~~~~~~~~~~~t_4~~\dots~~~~~~~~~~~~~~~~~~t'_0~~~~~~~~~~~~~~~~t'_1~~~~~~~~~~~~~~~~t'_2~~~~~~~~~~~~~~~~t'_3~~~~~~~~~~~~~~~t'_4~~\dots$
For every homological dimension $d=0,1$, the $d$-Betti distance matrix $M_d$ between $s$ and $s'$ is defined as \[ (M_d)_{i,j} = \Vert \beta_{i,d} - \beta'_{j,d} \Vert_{L^1}. \]
We summarize the distance between every pair of windows $W_i$ and $W_j'$ as \[ C_{i,j} = \lambda (M_0)_{i,j} + (1-\lambda) (M_1)_{i,j} \] for a parameter $0\leq \lambda\leq 1$.
For $m\geq 1$, compute $\bar t'_{j_i} = \mathrm{median} \{t_{j_{i-m}},\dots, t_{j_{i-1}}, t_{j_i}, t_{j_{i+1}}, \dots, t_{j_{i+m}}\}$, the moving median at $t_{j_i}$. Consider $\bar P=\{( t_{i}, \bar t_{j_i}'): i =1,\dots,k\}$.
For $m\geq 1$, compute $\bar t'_{j_i} = \mathrm{median} \{t_{j_{i-m}}, t_{j_{i-m+1}}, \dots, t_{j_{i-1}}, t_{j_i}\}$, the moving median at $t_{j_i}$. Consider $\bar P=\{( t_{i}, \bar t_{j_i}'): i =1,\dots,k\}$.
We assess the functional increasing dependency of the points in $P$ as \[ \rho_{\bar P} = \mathrm{Pearson}\{(t_i), (\bar{t}'_{j_i})\}. \]
For $m\geq 1$, compute $\bar t'_{j_i} = \mathrm{median} \{t_{j_{i-m}}, t_{j_{i-m+1}}, \dots, t_{j_{i-1}}, t_{j_i}\}$, the moving median at $t_{j_i}$. Consider $\bar P=\{( t_{i}, \bar t_{j_i}'): i =1,\dots,k\}$.
We assess the functional increasing dependency of the points in $P$ as \[ \rho_{\bar P} = \mathrm{Pearson}\{(t_i), (\bar{t}'_{j_i})\}. \]
Music Obfuscator by Ben Grosser
Song | Shazam (60 sec) |
---|---|
Smells Like Teen Spirit | No |
Get Lucky | No |
Giant Steps | No |
Stairway to Heaven | Yes |
Headlines | Yes |
Blue in Green | No |
You’re Gonna Leave | No |
Blue Ocean Floor | No |
Music Obfuscator by Ben Grosser
Song | Shazam (60 sec) | Correlation (60-30 sec) |
---|---|---|
Smells Like Teen Spirit | No | 0.83208 |
Get Lucky | No | 0.99906 |
Giant Steps | No | 0.83904 |
Stairway to Heaven | Yes | 0.88533 |
Headlines | Yes | 0.91173 |
Blue in Green | No | 0.89276 |
You’re Gonna Leave | No | 0.71766 |
Blue Ocean Floor | Yes | 0.51332 |
Spotify Database + PySOX Transformer
Obfuscation type | Degree |
---|---|
Low Pass Filter | 200, 400, 800, 1600, 2000 |
High Pass Filter | 50, 100, 200, 400, 800, 1200 |
White Noise | 0.05, 0.10, 1.20, 0.40 |
Pink Noise | 0.05, 0.10, 1.20, 0.40 |
Reverberation | 25, 50, 75, 100 |
Tempo | 0.50, 0.80, 1.1 1.2, 1.50, 2.00 |
Pitch | -8, -4, -2, -1, 1, 2, 4, 8 |
Given an audio track $s$ and a database $\mathcal D$, identify an element $s'\in \mathcal D$ with the same audio content as $s$.
W. Reise, X. Fernandez, M. Dominguez, H.A. Harrington, M. Beguerisse-Diaz. Topological fingerprints for audio identification (2024) SIAM Journal on Mathematics of Data Science (to appear).