How many servers does Shazam have

Shazam overview

Reading time: 5 min

A Audio fingerprint is a digital summary that can be used to identify an audio sample or quickly find similar objects in an audio database. For example, when you hum to someone a song, you're fingerprinting because for the hum you get / extract the elements from the music that you think are important (and if you're a good singer, the other person will recognize the song).

Before we go any further, the following illustration should show the simplified architecture of Shazam. However, this is only an assumption because Shazam does not disclose all of the information necessary for the procedure. See the paper "An Industrial-Strength Audio Search Algorithm" by Shazam co-founder Avery Li-Chun Wang from 2003.

On the server side:

  • Shazam has pre-calculated fingerprints based on a very large database of songs.
  • All of these fingerprints are placed in an audio fingerprint database, which is updated whenever a new song is added to the song database.

On the client side:

  • When a user is using the Shazam app, the app first records the current music with the phone's microphone.
  • The phone applies the same fingerprint algorithm as Shazam to the recording.
  • The phone sends the fingerprint to Shazam.
  • Shazam verifies that this fingerprint matches a fingerprint in the Shazam database.
  • If no match is found, then the user is informed that the song was not found.
  • When a match is found, Shazam looks for the metadata associated with the fingerprints (song name, ITunes url, Amazon url ...) and sends it to the user.

The key points of Shazam are:

  • To be noise and fault tolerant:
    • regarding poor quality, as the music recorded in a bar or outdoors is usually of poor quality,
    • with regard to the artifacts that result from the window functions,
    • regarding the phone's cheap microphone which generates noise / distortion,
    • regarding many other physical things.
  • Fingerprints must be time-invariant (unchangeable in time). This means that the fingerprint of a complete song must be able to match a one-second recording of the song.
  • The fingerprint comparison has to be quick: Nobody wants to wait minutes or hours to get a response from Shazam.
  • Producing few “false positives”: Nobody wants to get a wrong song as a result.