Unlocking the Secrets of Jiang Kui’s Music
Imagine a world where the melodies of a 13th-century Chinese composer, lost to time and the complexities of ancient notation, suddenly become vibrant and accessible. This isn’t science fiction; it’s the remarkable achievement of researchers at Graz University of Technology and Know Center Research GmbH, led by Tristan Repolusk and Eduardo Veas. Their groundbreaking work uses artificial intelligence to decipher centuries-old musical notations, achieving accuracy that surpasses even human experts.
The Challenge of Ancient Notations
Jiang Kui, a renowned poet and music theorist of the Southern Song dynasty, left behind a treasure trove of music in the form of the Baishidaoren Gequ (白石道人歌曲), a collection dating back to 1202. However, the notations used—suzipu (俗字谱) and lülüpu (律吕谱)—present formidable obstacles to modern-day scholars. These notations are not only visually complex, featuring a diverse range of symbols, but also highly imbalanced, with some symbols appearing far more frequently than others. Furthermore, the existing copies are imperfect, representing centuries of transcription errors and interpretations. Traditional methods of transcribing this music are slow, painstaking, and prone to inaccuracies.
A New Era of AI-Powered Transcription
Repolusk and Veas’s solution lies in the application of Optical Music Recognition (OMR), a field of artificial intelligence dedicated to computationally extracting symbolic musical notation from images. However, applying standard OMR techniques to Jiang Kui’s music was impossible; the existing datasets were too small, and the variations in handwriting too great. Their innovative approach involved several key elements.
Firstly, they used lightweight convolutional neural networks (CNNs), a type of AI architecture that excels at image processing, but tailored them to address the challenges of limited data. Unlike larger, more data-hungry models, these efficient networks could learn effectively from the relatively small dataset of annotated Baishidaoren Gequ images.
Secondly, they employed data augmentation techniques, which artificially expand the dataset by subtly modifying the existing images, thereby enabling the AI to learn more robustly. This included adjustments like random rotation, resizing, and other transformations to mimic the diversity found across different historical copies of the music.
Thirdly, they incorporated temperature scaling, a calibration technique that helps to improve the reliability of the model’s predictions, reducing the likelihood of unexpected errors. The result? An extraordinarily well-calibrated model with remarkably low error rates.
Outperforming the Experts
The researchers meticulously evaluated their AI’s performance not only against previous attempts to transcribe this music digitally but also against human transcribers. In a head-to-head comparison, the AI outperformed human experts in both speed and accuracy, achieving character error rates far below the average, and even better than the top-performing human transcriber. For suzipu, the AI achieved a character error rate of 7.1%, while the best human achieved 7.6%, with the average human error rate at a considerably higher 15.9%. The results were even more dramatic for lülüpu, where the AI boasted a character error rate of only 0.9%, dwarfing human performance.
Beyond the Notes
This research represents more than just a technological marvel. It’s a powerful demonstration of how AI can unlock access to cultural artifacts and contribute to the preservation of historical knowledge. The implications extend far beyond a single collection of music. By building and making publicly available a comprehensive dataset of the Baishidaoren Gequ—including the previously un-digitized jianzipu notation—Repolusk and Veas have created a valuable resource for future research and scholarship. The study has created new possibilities for researchers and enthusiasts, opening a window into a rich and previously inaccessible musical heritage.
The Future of Music History
This project offers a glimpse of the transformative potential of AI in the humanities. The researchers emphasize the importance of open science and accessibility, making their code and dataset publicly available to foster collaboration and further research in this crucial area. It’s a testament to the power of technology not just to advance our understanding of the past, but to enhance the appreciation of cultural diversity and the enduring power of music across time.