Google’s next generation music recognition

A new version of Sound Search.

Google’s next generation music recognition
Image: Pixabay

In 2017, Google has launched Now Playing using deep neural networks to bring low-power, always-on music recognition to mobile devices. The goal was to create a small, efficient music recognizer which requires a very small fingerprint for each track in the database, allowing music recognition to be run entirely on-device without an internet connection.

Now Playing was not only useful for an on-device music recognizer but also greatly exceeded the accuracy and efficiency of our then-current server-side system, Sound Search, which was built before the widespread use of deep neural networks.

Now, Google has introduced a new version of Sound Search that is powered by some of the same technology used by Now Playing. You can use it through the Google Search app or the Google Assistant on any Android phone.

Simply begin a voice search, and if there’s music playing near to you, a “What’s this song?” proposal will pop up for you to press. Else, you can simply ask, “Hey Google, what’s this song?” With this most recent adaptation of Sound Search, you’ll get quicker, more exact outcomes than any other time in recent memory!

Now Playing matches songs to an on-device database. The biggest challenge in going from Now Playing, with tens of thousands of songs, to Sound Search, with tens of millions, is that there are a thousand times as many songs which could give a false positive result. To compensate for this without any other changes, it was necessary to increase the recognition threshold, which would mean needing more audio to get a confirmed match.

Image: Google

As Sound Search is a server-side system, it isn’t limited by processing and storage constraints in the same way Now Playing is. Therefore, we made two major changes to how we do fingerprint, both of which increased accuracy at the expense of server resources.

James Lyon, Google AI, Zürich noted, “We also decided to weight our index based on song popularity – in effect, for popular songs, we lower the matching threshold, and we raise it for obscure songs. Overall, this means that we can keep adding more (obscure) songs almost indefinitely to our database without slowing our recognition speed too much.”