SEMA - Statistic Electroacoustic Music Analyzer


Analyzing complex and often abstract forms of sonic expression presents unique challenges, moving beyond traditional score-based methods focus on the inherent qualities of the sound itself. To address this, computational tools have become invaluable, offering objective insights into the sonic landscape of electroacoustic works. But how to apply these tools in real-life situations, and how to easily interpret data collected during analysis?


Milan Milojkovic Milan Milojkovic



Veröffentlichung


Reading time
6 Minuten

Listen to article
Loading the Elevenlabs Text to Speech AudioNative Player...

A Historical Glance at Computational Music Analysis

From early pioneers in computer music, it was recognized that algorithms have the potential to both synthesize and analyze sound, which is manifested for instance in insightful writings by Otto Laske (Laske, 1981; 1991).  Emerging from Laske’s circle of music engineering, techniques such as spectral analysis were used to break down complex sounds into their constituent frequencies. Those became fundamental in computational musicology since the development of digital signal processing (DSP) in the latter half of the 20th century, further democratized these tools, making it possible to extract a wide array of features from audio without the need for expensive lab equipment. This led to the development of python libraries like LibROSA (https://librosa.org/doc/latest/index.html), which provides a robust framework for audio analysis, and is also useful in instances where electroacoustic music is approached as an acousmatic work. The contemporary integration of large language models (LLMs) like Google's Gemini into the analysis of data obtained with tools like LibROSA, allows for not just data extraction but also for sophisticated, context-aware interpretation of musical phenomena.

SEMA – An attempt to automatize tedious data collection and interpretation

The main goal I had in mind writing the script for SEMA was to find a way to overcome the problem of spending days with a single audio file just to extract raw data for analysis. Being a professor at the Academy, I had to do this to help my own research and also to use it in classes with students, most of them without prior experience in such analysis. Since I needed to process a lot of music and sounds in a short time, I tried to design an assisting tool to perform the automated in-depth quantitative analysis of electroacoustic music, with a satisfying level of accuracy and usefulness. In general, SEMA leverages established audio processing techniques, and augments them with the interpretive power of artificial intelligence. I was inspired with several recent papers on the subject (see references section) to use similar methods, and to try to integrate them into one automated process using python and open-source tools instead of a specialized software such as MAX. I also used Gemini to help with coding and commenting the code, since it was intended for studying as well using.

Core Functionality:

Processing begins by loading .wav audio file and resampling it to a standardized sample rate (22050 Hz). This ensures consistency and optimizes performance for subsequent feature extraction. This is followed by the Feature extraction segment, where key sonic characteristics are quantified over time. The script extracts:

RMS Energy: Root Mean Square energy, a measure of the overall loudness or intensity of the sound.

Spectral Centroid: Represents the center of mass of the spectrum, offering an indicator of perceived brightness or sharpness of the sound. Higher values suggest a brighter timbre.

Spectral Roll-off: The frequency below which a specified percentage (e.g., 85%) of the total spectral energy lies. It is another indicator of the spectral shape and brightness, especially useful for distinguishing between pitched and noisy sounds.

Novelty Curve (Onset Strength Function): This feature highlights moments of significant sonic change or "newness," often correlating with the perception of onsets or attacks in the music. Peaks in the novelty curve suggest potential event boundaries.

Zero-Crossing Rate (ZCR): The rate at which the audio signal changes its sign (crosses zero). High ZCR values often indicate noisy, percussive, or chaotic sounds, while low ZCR values are characteristic of more tonal or sustained sounds.

Mel-Frequency Cepstral Coefficients (MFCCs): These are widely used in audio analysis to represent the timbral or textural qualities of a sound. The script extracts 13 MFCCs, providing a compact and perceptually relevant description of the sound's color.

diagrams

Example of plot results of analysis of Karel Goeyvaerts’ Composition No 5

Video URL
Composition No 5 - Karel Goeyvaerts: The Serial Works No. 1 - 7 | Champ d'Action · Celso Antunes | ℗ Nettwerk Productions Ltd., Released on: 1989-01-01, Music Publisher: CEBEDEM

Results and usage

All extracted features, along with a corresponding time vector, are organized into a Pandas DataFrame and saved as a CSV file. This structured format facilitates further analysis and visualization. So, I found it useful to generate plots for RMS Energy, Novelty Curve, Spectral Centroid, and Zero-Crossing Rate over time. These visualizations provide a quick, intuitive overview of the piece's dynamic, temporal, and timbral evolution. I also included a text report generated with the help of Google’s Gemini LLM, tasked to do narrative descriptions of the overall trends and characteristics for each extracted feature across the entire piece. 

report

Read the full report: SEMA/Example output: Karel Goeyvaerts Composition No 5.wav_Analysis_Report_Gemini

By analyzing peaks in the novelty curve and significant changes in RMS energy, the tool suggests potential formal divisions or structural shifts within the composition. The user can also point out specific time segments, and the SEMA will generate a detailed interpretation of it, comparing its characteristics to the general trends of the piece. This allows for focused examination of distinct musical "moments" or particular sound segment, involving human experience into navigating through the entire piece of music.

When and where SEMA can assist?

SEMA offers a versatile set of capabilities, compiled having in mind real-world applications for musicians, musicologists, and researchers. Its main goal is to provide quantifiable data to support qualitative observations of electroacoustic pieces, especially those lacking traditional scores. So, it enables a systematic comparison of sonic characteristics across different pieces, composers, or stylistic periods. It can be used to assist in identifying and hypothesizing formal divisions and structural gestures based on objective sonic changes, for instance, how timbres transform and interact throughout a composition. Composers can also analyze their own works to gain objective feedback on dynamic shaping, timbral consistency, and formal clarity. Analyzing existing pieces (their own or others') can reveal underlying sonic patterns or transformations that inspire new compositional strategies or inform decisions about processing and arrangement.

SEMA is a practical tool for students to learn about audio features and their musicological significance. It allows students to directly apply computational methods to music analysis and observe the results and the code behind, with the possibility to modify it to better fit their needs. The extracted features and AI-generated descriptions can serve as rich metadata for archiving electroacoustic works, making them more searchable and understandable, and it is also fit for analyzing large collections of electroacoustic music in search for stylistic trends or common sonic characteristics. By combining audio feature extraction with context-aware interpretation, the SEMA empowers users to delve into the intricate sonic worlds of electroacoustic music, trying to bridge the gap between quantitative data and nuanced musical understanding.

Explore: SEMA - Statistic Electroacoustic Music Analyzer


Couprie, Pierre. "Analytical Approaches to Electroacoustic Music Improvisation." Organised Sound 27, no. 2 (2022): 117-130. doi:10.1017/S1355771821000571.

Dean, Roger T. "Modelling Perception of Structure and Affect in Music: Spectral Centroid and Wishart’s Red Bird." Empirical Musicological Review Vol. 6, no. 2, 2011. [https://pdfs.semanticscholar.org/071d/02885984e94e047f07e29a5fd4ce73ade…].

Jensenius, Alexander Refsum, ed. Sonic Design - Explorations Between Art and Science. Springer, 2024. [https://link.springer.com/content/pdf/10.1007/978-3-031-57892-2.pdf].

Laske, Otto. "Toward a Definition of Computer Music." Paper presented at the International Computer Music Conference, Denton, TX, 1981. [http://hdl.handle.net/2027/spo.bbp2372.1981.005].

Laske, Otto. "Understanding Music with AI." Paper presented at the International Computer Music Conference, Montreal, QC, 1991. [http://hdl.handle.net/2027/spo.bbp2372.1991.045].

Martins, Fellipe Miranda. "Analysis informed by audio features associated with statistical methods: a case study of Imago (2002) by Trevor Wishart." Paper presented at the 5th International Meeting of Music Theory and Analysis, 2019. [https://repositorio.ufmg.br/bitstream/1843/51299/2/Analysis%20informed%…].



Milan Milojković

Milan Milojković (1986) is an interdisciplinary researcher (PhD in Digital musical technology, Belgrade 2018) with over a decade of teaching and academic research experience in digital technology, music analysis, and human-computer interaction. Developing and analyzing sound-related computer tools, he implements results and achievements in his work at the Academy of Arts in Novi Sad (Serbia), and as a founding member of Laptop Ensemble Novi Sad. He regularly publishes articles about electroacoustic music and performs with LENS, applying theoretical findings in live concert musical practices.



Article topics


Original language: English
Article translations are machine translated and proofread.