Remote Dataset Classes

Here are the classes for remote datasets.

class muspy.RemoteFolderDataset(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]

Base class for remote datasets storing files in a folder.

root

Root directory of the dataset.

Type:

str or Path

Parameters:
  • download_and_extract (bool, default: False) – Whether to download and extract the dataset.

  • cleanup (bool, default: False) – Whether to remove the source archive(s).

  • convert (bool, default: False) – Whether to convert the dataset to MusPy JSON/YAML files. If False, will check if converted data exists. If so, disable on-the-fly mode. If not, enable on-the-fly mode and warns.

  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.

  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.

  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.

  • use_converted (bool, optional) – Force to disable on-the-fly mode and use converted data. Defaults to True if converted data exist, otherwise False.

See also

muspy.FolderDataset

Class for datasets storing files in a folder.

muspy.RemoteDataset

Base class for remote MusPy datasets.

read(filename)[source]

Read a file into a Music object.

classmethod citation()

Print the citation infomation.

convert(kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)

Convert and save the Music objects.

The converted files will be named by its index and saved to root/_converted. The original filenames can be found in the filenames attribute. For example, the file at filenames[i] will be converted and saved to {i}.json.

Parameters:
  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.

  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.

  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.

  • verbose (bool, default: True) – Whether to be verbose.

  • **kwargs – Keyword arguments to pass to muspy.save().

Return type:

Object itself.

property converted_dir

Path to the root directory of the converted dataset.

converted_exists()

Return True if the saved dataset exists, otherwise False.

download(overwrite=False, verbose=True)

Download the dataset source(s).

Parameters:
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).

  • verbose (bool, default: True) – Whether to be verbose.

Return type:

Object itself.

download_and_extract(overwrite=False, cleanup=False, verbose=True)

Download source datasets and extract the downloaded archives.

Parameters:
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).

  • cleanup (bool, default: False) – Whether to remove the source archive(s).

  • verbose (bool, default: True) – Whether to be verbose.

Return type:

Object itself.

exists()

Return True if the dataset exists, otherwise False.

extract(cleanup=False, verbose=True)

Extract the downloaded archive(s).

Parameters:
  • cleanup (bool, default: False) – Whether to remove the source archive after extraction.

  • verbose (bool, default: True) – Whether to be verbose.

Return type:

Object itself.

get_converted_filenames()

Return a list of converted filenames.

get_raw_filenames()

Return a list of raw filenames.

classmethod info()

Return the dataset infomation.

load(filename)

Load a file into a Music object.

on_the_fly()

Enable on-the-fly mode and convert the data on the fly.

Return type:

Object itself.

save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)

Save all the music objects to a directory.

Parameters:
  • root (str or Path) – Root directory to save the data.

  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.

  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.

  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.

  • verbose (bool, default: True) – Whether to be verbose.

  • **kwargs – Keyword arguments to pass to muspy.save().

source_exists()

Return True if all the sources exist, otherwise False.

split(filename=None, splits=None, random_state=None)

Return the dataset as a PyTorch dataset.

Parameters:
  • filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)

Return the dataset as a PyTorch dataset.

Parameters:
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.

  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.

  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

Returns:

Converted PyTorch dataset(s).

Return type:

class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`

to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)

Return the dataset as a TensorFlow dataset.

Parameters:
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.

  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.

  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

Returns:

  • class:tensorflow.data.Dataset` or Dict of

  • class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).

use_converted()

Disable on-the-fly mode and use converted data.

Return type:

Object itself.

class muspy.RemoteMusicDataset(root, download_and_extract=False, overwrite=False, cleanup=False, kind=None, verbose=True)[source]

Base class for remote datasets of MusPy JSON/YAML files.

Parameters:
  • root (str or Path) – Root directory of the dataset.

  • download_and_extract (bool, default: False) – Whether to download and extract the dataset.

  • overwrite (bool, default: False) – Whether to overwrite existing file(s).

  • cleanup (bool, default: False) – Whether to remove the source archive(s).

  • kind ({'json', 'yaml'}, optional) – File formats to include in the dataset. Defaults to include both JSON and YAML files.

  • verbose (bool. default: True) – Whether to be verbose.

root

Root directory of the dataset.

Type:

Path

filenames

Path to the files, relative to root.

Type:

list of Path

See also

muspy.MusicDataset

Class for datasets of MusPy JSON/YAML files.

muspy.RemoteDataset

Base class for remote MusPy datasets.

classmethod citation()

Print the citation infomation.

download(overwrite=False, verbose=True)

Download the dataset source(s).

Parameters:
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).

  • verbose (bool, default: True) – Whether to be verbose.

Return type:

Object itself.

download_and_extract(overwrite=False, cleanup=False, verbose=True)

Download source datasets and extract the downloaded archives.

Parameters:
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).

  • cleanup (bool, default: False) – Whether to remove the source archive(s).

  • verbose (bool, default: True) – Whether to be verbose.

Return type:

Object itself.

exists()

Return True if the dataset exists, otherwise False.

extract(cleanup=False, verbose=True)

Extract the downloaded archive(s).

Parameters:
  • cleanup (bool, default: False) – Whether to remove the source archive after extraction.

  • verbose (bool, default: True) – Whether to be verbose.

Return type:

Object itself.

classmethod info()

Return the dataset infomation.

save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)

Save all the music objects to a directory.

Parameters:
  • root (str or Path) – Root directory to save the data.

  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.

  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.

  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.

  • verbose (bool, default: True) – Whether to be verbose.

  • **kwargs – Keyword arguments to pass to muspy.save().

source_exists()

Return True if all the sources exist, otherwise False.

split(filename=None, splits=None, random_state=None)

Return the dataset as a PyTorch dataset.

Parameters:
  • filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)

Return the dataset as a PyTorch dataset.

Parameters:
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.

  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.

  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

Returns:

Converted PyTorch dataset(s).

Return type:

class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`

to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)

Return the dataset as a TensorFlow dataset.

Parameters:
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.

  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.

  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

Returns:

  • class:tensorflow.data.Dataset` or Dict of

  • class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).

class muspy.RemoteABCFolderDataset(root, download_and_extract=False, overwrite=False, cleanup=False, convert=False, kind='json', n_jobs=1, ignore_exceptions=True, use_converted=None, verbose=True)[source]

Base class for remote datasets storing ABC files in a folder.

See also

muspy.ABCFolderDataset

Class for datasets storing ABC files in a folder.

muspy.RemoteDataset

Base class for remote MusPy datasets.

classmethod citation()

Print the citation infomation.

convert(kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)

Convert and save the Music objects.

The converted files will be named by its index and saved to root/_converted. The original filenames can be found in the filenames attribute. For example, the file at filenames[i] will be converted and saved to {i}.json.

Parameters:
  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.

  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.

  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.

  • verbose (bool, default: True) – Whether to be verbose.

  • **kwargs – Keyword arguments to pass to muspy.save().

Return type:

Object itself.

property converted_dir

Path to the root directory of the converted dataset.

converted_exists()

Return True if the saved dataset exists, otherwise False.

download(overwrite=False, verbose=True)

Download the dataset source(s).

Parameters:
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).

  • verbose (bool, default: True) – Whether to be verbose.

Return type:

Object itself.

download_and_extract(overwrite=False, cleanup=False, verbose=True)

Download source datasets and extract the downloaded archives.

Parameters:
  • overwrite (bool, default: False) – Whether to overwrite existing file(s).

  • cleanup (bool, default: False) – Whether to remove the source archive(s).

  • verbose (bool, default: True) – Whether to be verbose.

Return type:

Object itself.

exists()

Return True if the dataset exists, otherwise False.

extract(cleanup=False, verbose=True)

Extract the downloaded archive(s).

Parameters:
  • cleanup (bool, default: False) – Whether to remove the source archive after extraction.

  • verbose (bool, default: True) – Whether to be verbose.

Return type:

Object itself.

get_converted_filenames()

Return a list of converted filenames.

get_raw_filenames()

Return a list of raw filenames.

classmethod info()

Return the dataset infomation.

load(filename)

Load a file into a Music object.

on_the_fly()

Enable on-the-fly mode and convert the data on the fly.

Return type:

Object itself.

read(filename)

Read a file into a Music object.

save(root, kind='json', n_jobs=1, ignore_exceptions=True, verbose=True, **kwargs)

Save all the music objects to a directory.

Parameters:
  • root (str or Path) – Root directory to save the data.

  • kind ({'json', 'yaml'}, default: 'json') – File format to save the data.

  • n_jobs (int, default: 1) – Maximum number of concurrently running jobs. If equal to 1, disable multiprocessing.

  • ignore_exceptions (bool, default: True) – Whether to ignore errors and skip failed conversions. This can be helpful if some source files are known to be corrupted.

  • verbose (bool, default: True) – Whether to be verbose.

  • **kwargs – Keyword arguments to pass to muspy.save().

source_exists()

Return True if all the sources exist, otherwise False.

split(filename=None, splits=None, random_state=None)

Return the dataset as a PyTorch dataset.

Parameters:
  • filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

to_pytorch_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)

Return the dataset as a PyTorch dataset.

Parameters:
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.

  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.

  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

Returns:

Converted PyTorch dataset(s).

Return type:

class:torch.utils.data.Dataset` or Dict of :class:torch.utils.data.Dataset`

to_tensorflow_dataset(factory=None, representation=None, split_filename=None, splits=None, random_state=None, **kwargs)

Return the dataset as a TensorFlow dataset.

Parameters:
  • factory (Callable, optional) – Function to be applied to the Music objects. The input is a Music object, and the output is an array or a tensor.

  • representation (str, optional) – Target representation. See muspy.to_representation() for available representation.

  • split_filename (str or Path, optional) – If given and exists, path to the file to read the split from. If None or not exists, path to save the split.

  • splits (float or list of float, optional) – Ratios for train-test-validation splits. If None, return the full dataset as a whole. If float, return train and test splits. If list of two floats, return train and test splits. If list of three floats, return train, test and validation splits.

  • random_state (int, array_like or RandomState, optional) – Random state used to create the splits. If int or array_like, the value is passed to numpy.random.RandomState, and the created RandomState object is used to create the splits. If RandomState, it will be used to create the splits.

Returns:

  • class:tensorflow.data.Dataset` or Dict of

  • class:tensorflow.data.dataset` – Converted TensorFlow dataset(s).

use_converted()

Disable on-the-fly mode and use converted data.

Return type:

Object itself.