The Imitation Game, a movie portraying Alan Turing’s life (who would have celebrated his 100thbirthday on Mathematica‘s 23rd birthday—read our blog post), was released this week, which we’ve been looking forward to. Turing machines were one of the focal points of the movie, and we launched a prize in 2007 to determine whether the 2,3 Turing machine was universal.
So of course, Cumberbatch’s promotional video where he impersonates other beloved actors reached us as well, which got me wondering, could Mathematica‘s machine learning capabilities recognize his voice, or could he fool a computer too?
I personally can’t stop myself from chuckling uncontrollably while watching his impressions, however, I wanted to look beyond the entertainment factor.
So I started wondering: Is he actually good at doing these impressions? Or are we all just charmed by his persona?
Is my psyche just being fooled by the meta-language, perhaps? If we take the data of pure voices, does he actually cut the mustard in matching these?
In order to determine the answer, 10 years ago we would have needed to stroll the streets and play audio snippets to 300 people from the James Bond movies, The Shining, Batman, and Cumberbatch’s impression snippets—then survey whether those people were fooled.
But no need, if you have your Mathematica handy!
With Mathematica‘s machine learning capabilities, it’s possible to classify sample voice snippets easily, which means we can determine whether Benedict’s impressions would be able to fool a computer. So I set myself the challenge of building a decent enough database of voice samples, plus I took snippets from each of Benedict’s impression attempts, and I let Mathematica do its magic.
We built a path to each person’s snippet database, which Mathematica exported for analysis:
We imported all of the real voices:
My audio database needed to include snippets of Benedict’s own voice, snippets of the impersonated actors’ own voices, and the impressions from Cumberbatch. The sources for the training were the following: Alan Rickman, Christopher Walken, Jack Nicholson, John Malkovich,Michael Caine, Owen Wilson, Sean Connery, Tom Hiddleston, and Benedict Cumberbatch. I used a total of 560 snippets, but of course, the more data used, the more reliable the results. The snippets needed to be as “clean” as possible (no laughter, music, chatter, etc. in the background).
These all needed to be exactly the same length (3.00 seconds), and we made sure all snippets were the same length by using this function in the Wolfram Language:
Some weren’t single-channel audio files, so we needed to exclude this factor as an additional feature to optimize our results during the export stage:
Thanks go to Martin Hadley and Jon McLoone for the code.
Drum-roll… time for the verdict!
I have to break everyone’s heart now, and I’m not sure I want to be the one to do it… so I will “blame” Mathematica, because machine learning could indeed mostly tell the difference between the actors’ real voices and the impressions (bar two).
As the results below reveal, Mathematica provides 97–100% confidence on the impressions tested:
For most impressions, there is a very small reported probability of any classification other than Benedict Cumberbatch or Alan Rickman.
It might be worth noting that Rickman, Connery, and Wilson all have a slow rhythm to their speech, with many pauses (especially noticeable in the snippets I used), which could have confused the algorithm.
Now it’s time to be grown up about this, and not hold it against Benedict. He is still a beloved charmer, after all.
My admiration for him lives on, and I look forward to seeing him in The Imitation Game!