Saving 'Saving KY'

Finding a C64 SID track from a clip in a YouTube Video.

In the video [Vinesauce] Joel - Kerbal Space Program/Saving KY a song starts to play within the first 5 or so seconds. The song had gone unidentified for over a decade and evident in the comments was the audience's desire to discover this song. It can hardly be compared to The Most Mysterious Song on the Internet but it certainly made for a good project while stuck inside during a particularly agressive Norwegian winter.

Back in 2011 I had taken a crack at trying to find the song. My techniques were not well researched, were very inefficient and did not yield a result, I wasn't about to take this failure lying down (... it just took me 13 years).

I established what I knew to be true (with reasonable confidence) about the track just from listening.

Not much of a start, I had an ill defined needle and no haystack. Luckily and for great preservation purposes an archive of SID music is kept and it's entirety is available at the High Voltage SID Collection website (HVSC). It turns out that over 50,000 tracks can be compressed to 75MB. There was my haystack.

From my previous attempt I knew that an effective method to efficient song recognition was to generate a fingerprint for each track and compare the fingerprint of a sample. I had little to no insight into how fingerprints could be generated or compared, and thus my research led to Chromaprint | AcoustID. A brief introduction to how fingerprinting music using Chromaprint works can be found at How does Chromaprint work? - Lukáš Lalinský. I then dug into Pairwise Boosted Audio Fingerprint which while was a good read ended up being totally superfluous because shortly after I discovered that the standard install of ffmpeg comes installed with an output codec to render the fingerprint from an audio stream.

The next goal was to convert the SID files into a format which ffmpeg could make a fingerprint of, the documentation for the fingerprint codec mention that the stream is first converted into a single channel 16-bit PCM stream so I used a headless VLC to do this. I wrote a small rust application which recursed through every directory in the HVSC and output at least the first 60 seconds of each SID file to the afforementioned format.

vlc -I dummy $SID_PATH --stop-time=60 --no-dbus --sout \
#transcode{acodec=s16l,channels=1}:std{access=file,mux=wav,dst=$WAV_PATH} vlc://quit

I believe I ran into a bug in something relating to UTF characters and dbus, since every time one of them showed up in a song title or artist; VLC would crash. I simply disabled dbus support.

After the rendering I then loaded the file in ffmpeg and generated a fingerprint of the WAV file.

ffmpeg -i $WAV_PATH -f chromaprint -fp_format raw $FIP_PATH

A pipeline to stream the fingerprint creation was created as follows

Fingerprint pipeline

Of course, I performed the same treatment to my 'needle' audio. And what I was left with was a fingerprint for each item in the haystack. A fingerprint rendered in 'raw' format consists of contiguous array of 32 bit fields for each 1/5 seconds of audio. The entire fingerprint collection was small enough to load entirely in RAM and compare in the worst way possible. My approach was not optimal to say the least. I had about 9 seconds of 'needle' audio fingerprints totalling 180 bytes. I simply performed a convolution between the needle and haystack fingerprints whilst recording a minimum for each track. I ordered and filtered the convolutions by their minimum and an arbitrary threshold. The track with the lowest minimum was 'Zack Theme - Steven Diemer (A-Man)' which, when listened to revealed that I had found it!

Ecstatic, and wishing for sleep, I posted a comment on the YouTube video and shortly after wrote this post.

I'd like to talk more about writing an optimised fingerprint searching algorithm, but that must wait for another time.