Deep Learning for Sound Recognition at University of Alberta


ACE has teamed up with Dr. Michael Frishkopf at the University of Alberta for the Deep Learning for Sound Recognition (DLSR) project. By providing access to the Cantometrics dataset, comprising over 5000 songs from around the world, associated metadata, Cantometrics codings, and derived song clusters, the project makes use of ACE data in three distinct ways, with potential contributions to the Cantometrics project and ethnomusicology more broadly, as well as the biology of auditory processing.

DLSR explores application of artificial intelligence to address these questions and problems. By applying machine learning - primarily artificial neural networks trained on large datasets -- to develop sound recognition algorithms, such algorithms enable automatic labelling of digital audio repositories, supporting interdisciplinary research on sound, and potentially contributing to a better understanding of the auditory system, and its use in socialization. While a deep neural network ("deep learning") is notoriously difficult to interpret, the technology generates useful tools, and we can analyze its performance as a "black box". We can also develop shallower networks that are more interpretable and may lead to better understanding of the auditory system. Computational recognition of sound, its types, sources, attributes, and components--what may be called "machine audition" by analogy to the better-developed field of "machine vision"-- is therefore crucial for a wide array of fields, including ethnomusicology, music studies, sound studies, linguistics (especially phonetics), media studies, library and information science, and bioacoustics, in order to enable indexing, searching, retrieval, and regression of audio information. While expert human listeners may be able to recognize certain complex sound environments with ease, the process is slow: they listen in real time, and they must be trained to hear sonic events contrapuntally. Sound recognition algorithms are thus of great potential value as research tools.

Michael Frishkopf is Professor of Music, Director of the Canadian Centre for Ethnomusicology, folkwaysAlive! Research Fellow, Adjunct Professor of Religious Studies, and Adjunct Professor of Medicine and Dentistry at the University of Alberta. His research focuses on the music and sounds of Islam, the Arab world, and West Africa. Research interests also include Music for Global Human Development, music and global health, social network theory, digital music repositories, music information retrieval, and music in cyberworlds. Other collaborators include Ichiro Fujinaga, Associate Professor in Music Technology, Schulich School of Music, McGill University; George Tzanetakis, Associate Professor, Department of Computer Science, University of Victoria (developer of Marsyas); Michael Cohen, Professor of Computer Science, University of Aizu, Aizu-Wakamatsu, Japan; Diane Thram, Professor Emerita, Music Department, Rhodes University, South Africa; Philippe Collard, André Lapointe, Frédéric Osterrath, & Gilles Boulianne, Centre de recherche informatique de Montréal (CRIM).