MusPyExpress: Extending MusPy with Enhanced Expression Text Support

NeurIPS 2025 Workshop on AI for Music: Where Creativity Meets Computation

Phillip Long¹

Hao-Wen Dong²

Julian McAuley¹

Zachary Novack¹

¹ University of California, San Diego
² University of Michigan, Ann Arbor

Paper

NeurIPS

Code

Support

Documentation

MusPyExpress

Abstract

Current work in modeling symbolic music primarily relies on representations extracted from MIDI-like data. While such formats allow for modeling symbolic music as sequences of notes, they omit the large space of symbolic annotations common in western sheet music broadly known as expression text, such as tempo or dynamics, which specify time- and velocity-dependent controls on the musical composition and performance. To alleviate this gap, we present MusPyExpress, an extension to the popular symbolic music processing library MusPy^[1] that enables the extraction of expression text along with symbolic music for downstream modeling. Utilizing this extension, we parse the PDMX dataset^[2,3] to illustrate the wealth of expression text available in MusicXML datasets. Additionally, we introduce multiple generative tasks, including joint expression-note generation, expression-conditioned music generation, and expression tagging, that take advantage of this additional notational information.

Overview

We present three generative tasks enabled by the additional expression text extracted by MusPyExpress (the latter two of which we refer to as conditional tasks): joint note-expression text generation, expression-conditioned note generation, and expression tagging. We describe the details of our experiments, based on the model proposed in Multitrack Music Transformer (MMT)^[4], at length in Appendix C of our paper, but we provide an overview here.

Timing Schemes

We conduct all of our experiments using models trained under two different timing schemes: metrical and real. Under the metrical scheme, an event’s onset time is represented with two values, beat and position, where beat is the number of beats since the start of the piece and position is the event’s position within the current beat; the metrical timing scheme is therefore tempo-agnostic. Meanwhile, the real timing scheme represents an event's onset time as a single value, time, which is the number of seconds since the start of the piece; many expression text types are temporal in nature, which would not be captured at the note-level in a metric system.

Interleaving Strategies

As we seek to model the sequence of notes and expression text together with an autoregressive model, how we interleave the notes and expression text into a single sequence x = interleave(n, e) can make a large difference in performance. Under each conditional experiment, “events” are conditioned on “controls.” For example, expression text controls dictate the note events generated by an expression-conditioned model. Prefix conditioning sets the control sequence as a prefix to the main event sequence. While simple, this forces the model to attend to long-term dependencies in order to model the relationship between events and controls that are close in time. Anticipation^[5] seeks to remedy this fact and place controls close in the sequence to the events they affect. Specifically, under anticipatory conditioning, a control with onset time t is placed after the first event i with onset time t_i such that t_i ≥ t - δ for some offset δ, which effectively interleaves the controls within the event sequence rather than separate from it (see [5] for an in depth description). For all experiments, we set δ = 8, which is interpreted under metrical and real time schemes as beats and seconds, respectively.

Baseline

We use the model from MMT^[4] as a baseline for our experiments. This model ignores expression text and generates only notes.

Metrical Time

Before

After

Before

After

MusPyExpress: Extending MusPy with Enhanced Expression Text Support

NeurIPS 2025 Workshop on AI for Music: Where Creativity Meets Computation

Abstract

Overview

Timing Schemes

Interleaving Strategies

Baseline

Metrical Time

Real Time

Joint Note-Expression Text Generation

Prefix Conditioning

Metrical Time

Real Time

Anticipatory Conditioning

Metrical Time

Real Time

Expression-Conditioned Note Generation

Prefix Conditioning

Metrical Time

Real Time

Anticipatory Conditioning

Metrical Time

Real Time

Expression Tagging

Prefix Conditioning

Metrical Time

Real Time

Anticipatory Conditioning

Metrical Time

Real Time

Paper

References