SPIRES 2024
Unmet Needs and Future Challenges for TDA
XIMENA FERNANDEZ
University of Oxford
Joint work with
W. REISE (EPFL/Oxford/Spotify), M. DOMINGUEZ (Spotify), M. BEGUERISSE-DIAZ (Spotify) & H. HARRINGTON (Oxford).
Source: Spotify
Source: Music Obfuscator by Ben Grosser (2015)
Given two audio recordings, identify whether they correspond to the same audio content .
Case study:
Shazam (2003)
Case study:
Shazam (2003)
Case study:
Shazam (2003)
Case study:
Shazam (2003)
Case study:
Shazam
Let S denote the mel-spectrogram of an audio track s:[0,T]→R.
Let S denote the mel-spectrogram of an audio track s:[0,T]→R.
Let S denote the mel-spectrogram of an audio track s:[0,T]→R.
t0 t1 t2 t3 t4 t5…
t0 t1 t2 t3 t4 … t0′ t1′ t2′ t3′ t4′ …
For every homological dimension d=0,1, the d-Betti distance matrix Md between s and s′ is defined as (Md)i,j=∥βi,d−βj,d′∥L1.
We summarize the distance between every pair of windows Wi and Wj′ as Ci,j=λ(M0)i,j+(1−λ)(M1)i,j for a parameter 0≤λ≤1.
For m≥1, compute tˉji′=median{tji−m,…,tji−1,tji,tji+1,…,tji+m}, the moving median at tji. Consider Pˉ={(ti,tˉji′):i=1,…,k}.
For m≥1, compute tˉji′=median{tji−m,tji−m+1,…,tji−1,tji}, the moving median at tji. Consider Pˉ={(ti,tˉji′):i=1,…,k}.
We assess the functional increasing dependency of the points in P as ρPˉ=Pearson{(ti),(tˉji′)}.
For m≥1, compute tˉji′=median{tji−m,tji−m+1,…,tji−1,tji}, the moving median at tji. Consider Pˉ={(ti,tˉji′):i=1,…,k}.
We assess the functional increasing dependency of the points in P as ρPˉ=Pearson{(ti),(tˉji′)}.
Music Obfuscator by Ben Grosser
Song | Shazam (60 sec) |
---|---|
Smells Like Teen Spirit | No |
Get Lucky | No |
Giant Steps | No |
Stairway to Heaven | Yes |
Headlines | Yes |
Blue in Green | No |
You’re Gonna Leave | No |
Blue Ocean Floor | No |
Music Obfuscator by Ben Grosser
Song | Shazam (60 sec) | Correlation (60-30 sec) |
---|---|---|
Smells Like Teen Spirit | No | 0.83208 |
Get Lucky | No | 0.99906 |
Giant Steps | No | 0.83904 |
Stairway to Heaven | Yes | 0.88533 |
Headlines | Yes | 0.91173 |
Blue in Green | No | 0.89276 |
You’re Gonna Leave | No | 0.71766 |
Blue Ocean Floor | Yes | 0.51332 |
Spotify Database + PySOX Transformer
Obfuscation type | Degree |
---|---|
Low Pass Filter | 200, 400, 800, 1600, 2000 |
High Pass Filter | 50, 100, 200, 400, 800, 1200 |
White Noise | 0.05, 0.10, 1.20, 0.40 |
Pink Noise | 0.05, 0.10, 1.20, 0.40 |
Reverberation | 25, 50, 75, 100 |
Tempo | 0.50, 0.80, 1.1 1.2, 1.50, 2.00 |
Pitch | -8, -4, -2, -1, 1, 2, 4, 8 |
(Accuracy)
Article: W. Reise, X. Fernandez, M. Dominguez, H.A. Harrington, M. Beguerisse-Diaz.
Topological fingerprints for audio identification
SIAM Journal on Mathematics of Data Science (2024, to appear).
Slides: https://ximenafernandez.github.io/talks/