It is well known that famous classical music composer Ludwig van Beethoven was heavily influenced by Wolfgang Amadeus Mozart. But, imagine if you could find out exactly at what point this happened or which musical scores in particular were most heavily influenced. The key to answering these questions may lie in the classical music search tool known as Peachnote, which is a 'name-that-tune' solution, similar to Shazam, for classical music scores and a potential gold mine for musicologists – researchers that study music just as linguists study human language.
Vladimir Viro, a computer scientist at the Ludwig-Maximilians University in Munich, Germany, is founder and lead developer of Peachnote. He published a research paper on the system at the 12th International Society for Music Information Retrieval (ISMIR) conference in 2011. The 13th ISMIR event is being held in Porto, Portugal, this week.
A little bit of history
When developing Peachnote, three years ago, Viro sought after the largest online music library and stumbled upon the International Music Score Library Project (IMSLP), the Wikipedia for classical music, with thousands of publicly-accessible scans of recordings and music sheets.
“I asked myself what was missing: a content based search engine,” says Viro. “You can search for music scores on Google by metadata or composer, but you can't actually search music by its sounds.”
The idea was also partly the result of a bet between Viro and founder of IMSLP, Edward Guo. “When I told Edward of my plans to make a search engine for their service he didn't believe it would work and was a bit sceptical because of his experience with optical music recognition technology.”
This is the use of character recognition software to interpret sheet music or printed scores into editable and playable formats. Viro's goal was to use this technology on a scale never attempted before, which was ideally suited for the technology.
Today, Peachnote is growing rapidly, with millions of music sheets and hundreds of thousands of scores stored on its servers and processing searches by millions of users. The service now also enables a user to freely search IMSLP, the US Library of Congress, and other archives for classical music.
For example, one user entered the first five notes into Peachnote to find a waltz, but nothing came up. “So I tried doing what the tune does exactly -- which is to repeat those same five notes. In a matter of seconds, Peachnote had found it and taken me directly to the violin part,” says Jonathan Still on his blog.
But, with Peachnote, you can do more than locate that classical melody that's been gnawing away at you. The service uses a music Ngram Viewer, that's analogous to Google's Ngram Viewer, which searches letter combinations or words in over 5.2 million digitized books. For example, after reading a music history book, The Rest Is Noise, about whole tone scales, which sound something like “ba, ba, ba, ba, ba”, the book stated that this scale occurred sporadically before the 19th century and took flight at the beginning of the 20th. With Peachnote's music Ngram Viewer Viro saw that this was indeed the case.
Distributing the load
The Peachnote service uses a distributed framework to process millions of search queries. First, the metadata of music scores being searched are sent to Peachnote. Then, a PDF of these scores is downloaded, split into several pages, and fed to the optical music recognition software. This data is then indexed using Hadoop, a framework that's used by Amazon, Facebook, and Google to create web indexes, track user clicks, and make recommendations to customers.
Peachnote's data index is then compressed and served by a distributed database called HBase. Each of these search queries only take tens of milliseconds to complete. Currently, the total Peachnote index contains a few billion entries and is hundreds of gigabytes in size.
These processes are coordinated by cluster, grid, and cloud infrastructures. The computing cluster is located at a hosting facility in Munich, which is a cost-effective solution, to run the Hadoop and HBase frameworks.
The cloud service, known as Amazon's Simple Queue Service (SQS), is used to coordinate various stages of data indexing. It helps virtual machines (a computer in a computer with its own operating system), which process queries, as this is where the optical music recognition software is located. The virtual machines get their jobs from the Amazon SQS. Recently, Viro has started running them on the European Grid Infrastructure.
Citizens could add knowledge to classical music
Viro is excited about the potential for musicologists to use Peachnote to study the evolution of music. Researchers can potentially use the tool to see which musical pieces influenced others. Viro and his colleagues have developed a new music-score wiki this year, allowing scores to be embedded on any website, and annotated without having to be downloaded.
Viro says, “You can put more knowledge into the score and start conversations about the music, expanding the scientific field by connecting interested members of the public, musicians, and musicologists. I want people to make compelling music presentations and share them.”
But, there's so much data and not enough time according to Viro. “Soon we'll publish a request for volunteers or citizen scientists on the ISMLP site. I want people to dive in and find something interesting out of it.”
There are many interesting questions that can be revisited and perhaps answered. Not only could you empirically discover how Beethoven was influenced by Mozart, but you could see which musical influences impacted Mozart's work, which may or may not match what history books say. “You can do all kinds of data mining. You can say what significant pieces influenced which composers, and who the trend setters were,” Viro explains.
Peachnote is already making an impact according to Michael Cuthbert, an associate professor of music and Homer A. Burnell career development professor at MIT in the US.
"Peachnote, and Viro's work in general, is already having a transformative effect on a musicologist's ability to find patterns in huge numbers of scores and changes over time that would be otherwise lost," says Cuthbert. "A musicologist doesn't need Peachnote to know that the enigmatic Tristan Chord appears in Wagner's Tristan and Isolde, but he or she can use Viro's work to see how often it appears in scores by minor composers over the next sixty years after the opera's premiere, or who came closest to using the chord, and how, before the masterwork. The largest impact is yet to come as musicologists' get the (small) training in computational tools necessary to use Peachnote to its fullest."
Cuthbert's lab works on computational tools for analyzing, manipulating, and correcting a few hundred musical scores at a time in a way that Peachnote isn't equipped to do. This work is called the music21 project, a toolkit for computer-aided musicology. They can, for example, detect errors in the automatic transcription or summarize musical features like changes from modal to tonal composing. Next spring, Viro will go to MIT to collaborate with Cuthbert's group to work on the bigger questions that previous tools have been unable to address.
"We're seeing differences between coastal and inland Chinese folk tunes that were missed by the overwhelming differences between northern and southern Chinese music. By putting the small-scale power of our project with the huge scope and speed of Peachnote, we can write about historical changes in the theory of music that were way beyond the reach of traditional methods. The next several years will be an extremely exciting time in computer music studies," says Cuthbert.
With terabytes of data for Peachnote to explore, who knows what new insights into the world of classical music could already be awaiting discovery.