Publications

See my Google Scholar profile for more information.


PDMX: A Large Scale Public Domain MusicXML Dataset for Symbolic Music Processing

Phillip Long, Zachary Novack, Taylor Berg-Kirkpatrick, Julian McAuley
NeurIPS 2024 Workshop on Creativity & Generative AI, https://arxiv.org/abs/2409.10831

Assembled the largest known copyright-free MusicXML dataset, PDMX, consisting of over 250,00 sheet music pieces for modeling symbolic music. Demonstrated the utility of PDMX's deduplication and rating data as a means for data distillation by training decoder-only transformers to generate symbolic music from four different-filtered subsets of PDMX. PDMX also features an abundance of fine-grained performance directives, which could be harnessed in future work as expression-text controls or natural language captions.

The utility of a closed breeding colony of Peromyscus leucopus for dissecting complex traits

Phillip Long, Vanessa J Cook, Arundhati Majumder, Alan G Barbour, Anthony D Long
Genetics, Volume 221, Issue 1, May 2022, iyac026, https://doi.org/10.1093/genetics/iyac026

Used R, Python, and Bash scripts on UC Irvine's HPC3 Cluster to analyze terabytes of genetic data collected in Peromyscus leucopus, the primary reservoir for Lyme disease. Tested a new computational framework for imputing genetic data in closed-breeding colonies. Created figures and tables summarizing datasets for the paper.