ShazamKit is a recent Apple Framework announced during WWDC 2021, that brings audio matching capabilities within your app. You can make any prerecorded audio recognizable by building your own custom catalogue using audio from podcasts, videos, and more or match music to the millions of songs in Shazam’s vast catalogue.
Today, we are going to build a simple music-matching recognizer. The idea is to build a component that is independent of the UI framework being used (SwiftUI or UIKit).
We will create a Swift
class named creatively ShazamRecognizer
that will have some simple tasks to perform:
- Create the properties that are going to help us in building our class
- Request Permission to record audio using the
AVFoundation
framework - Start Recording and Send the record to
ShazamKit
for recognition matching - Handle the response from ShazamKit (Success when a match was found or Error when no match was found)
- Display our result in a UI (e.g: SwiftUI or UIKit)
Create the properties that are going to help us in building our class
final class ShazamRecognizer: NSObject, ObservableObject {
// 1. Audio Engine
private let audioEngine = AVAudioEngine()
// 2. Shazam Engine
private let shazamSession = SHSession()
// 3. UI state purpose
@Published private(set) var isRecording = false
// 4. Success Case
@Published private(set) var matchedTrack: ShazamTrack?
// 5. Failure Case
@Published var error: ErrorAlert? = nil
}
In the above declarations, we have:
- We create the
audioEngine
which is used tostart
andstop
the recording. - We create the
shazamSession
which is used to perform the matching process. - We use
isRecording
to track whether or not there is an ongoing recording operation. This value can be used for example to show different UI for each state. - We create a variable of custom type (
ShazamTrack
) to store our result in case of success (when a match was found). - In case of failure, we store the error in the
error
variable of typeErrorAlert
which can be used to display an Alert or a UI, etc.
Request Permission to record audio using the AVFoundation
framework
In our class, we proceed by adding the listenToMusic()
function.
func listenToMusic() {
// 1.
let audioSession = AVAudioSession.sharedInstance()
// 2.
audioSession.requestRecordPermission { status in
if status {
// 3.
self.recordAudio()
} else {
// 4.
self.error = ErrorAlert("Please Allow Microphone Access !!!")
}
}
}
In our listenToMusic()
function:
- We use an
audioSession
to communicate to the operating system the general nature of our app’s audio without detailing the specific behaviour or required interactions with the audio hardware. - Using the
audioSession
, We request user's permission to record audio. At this point, we have to add a new property (NSMicrophoneUsageDescription
) in ourInfo.plist
, with a message that tells the user why the app is requesting access to the device’s microphone otherwise our app might crash at runtime. - In case the user gives permission we start the recording operation in our
recordAudio()
function that we are going to build in the next section. - In case the user has denied permission, we simply store the error in our
error
variable.
AVAudioSession: An audio session acts as an intermediary between your app and the operating system — and, in turn, the underlying audio hardware.
Start Recording and Send the record to ShazamKit for recognition
Here is what our recordAudio()
function looks like! Hmmm!!!, Quite a function~
private func recordAudio() {
// 1. If the `audioEngine` is running, stop it and return
if audioEngine.isRunning {
self.stopAudioRecording()
return
}
// 2. Create a inputNode to listen to
let inputNode = audioEngine.inputNode
// 3. Create the format to use for our inputNode
/// We are using .zero as a bus for this example
let format = inputNode.outputFormat(forBus: .zero)
// 4. Removes the tap if already installed on the node
inputNode.removeTap(onBus: .zero)
// 5. Install an audio tap on the bus using our inputNode
// Record, monitor, and observe the output of the node.
// This will listen to music continuously
inputNode.installTap(onBus: .zero,
bufferSize: 1024,
format: format)
{ buffer, time in
// 6. Start Shazam Matching Operation,
// Converts the audio in the buffer to a signature and
// searches the reference signatures in the session catalog.
self.shazamSession.matchStreamingBuffer(buffer, at: time)
}
// 7. Prepare the audio engine to start
audioEngine.prepare()
do {
// 8. Start the audio engine
try audioEngine.start()
DispatchQueue.main.async {
// 9. Set the recording state to true
self.recording = true
}
} catch {
// 10. Handle any error that may occur
self.error = ErrorAlert(error.localizedDescription)
}
}
After going, through the above function, the question to be asked is, how do we handle the response that the shazamSession
will return?
No worries, that's the topic for our next exciting section.
In case, you are wondering what the stopAudioRecording()
function mentioned above looks like, here you go:
private func stopAudioRecording() {
audioEngine.stop()
isRecording = false
}
Handle the response from ShazamKit
First, we need to tell the shazamSession
, where to delegate its result!
As you can see below, we are using our ShazamRecognizer
class as the delegate for the session so that we can be informed when there is a successful result or a failure.
override init() {
super.init()
// Sets delegate to be ShazamRecognizer class
shazamSession.delegate = self
}
By doing the above, we are obliged to conform to the SHSessionDelegate
protocol, and implement its delegate methods. So We extend our class and add the following:
extension ShazamRecognizer: SHSessionDelegate {
func session(_ session: SHSession, didFind match: SHMatch) {
DispatchQueue.main.async {
// 1.
if let firstItem = match.mediaItems.first {
// 1.
self.matchedTrack = ShazamTrack(firstItem)
// 2.
self.stopAudioRecording()
}
}
}
func session(_ session: SHSession, didNotFindMatchFor signature: SHSignature, error: Error?) {
DispatchQueue.main.async {
// 3.
self.error = ErrorAlert(error?.localizedDescription ?? "No Match found!")
// 4. Stop Audio recording
self.stopAudioRecording()
}
}
}
Our first delegate method is func session(_ session: SHSession, didFind match: SHMatch)
, It is called when a match was found. Here, we:
- Get the first item in the match's
mediaItems
(An array of the media items in the catalog that match the query signature, in order of the quality of the match); We then convert thefirstItem
of typeSHMatchedMediaItem
into our own custom modelShazamTrack
- We stop the audio recording by calling our
stopAudioRecording()
which stops ouraudioEngine
.
Our second delegate method is func session(_ session: SHSession, didNotFindMatchFor signature: SHSignature, error: Error?)
, It is is called when no match was found. Here, we:
- Send the error message to our
error
variable. - We stop the audio recording by calling our
stopAudioRecording()
which stops ouraudioEngine
.
At this point, We are pretty much done with our audio recognition system!👏🏻💪🏼. We are ready to use it in our application, no matter the UI framework.
For this example, I've used SwiftUI
for a quick prototype, but you can use UIKit
as well without any particular effort.
You can find the full demo project
Conclusion
ShazamKit
framework has a lot to offer, but in this article, we have just scratched the tip of the iceberg, I hope you have learned something today :)