The mwtab API Reference

Routines for working with mwTab format files used by the Metabolomics Workbench.

This package includes the following modules:

mwtab
This module provides the MWTabFile class which is a python dictionary representation of a Metabolomics Workbench mwtab file. Data can be accessed directly from the MWTabFile instance using bracket accessors.
cli
This module provides command-line interface for the mwtab package.
tokenizer
This module provides the tokenizer() generator that generates tuples of key-value pairs from mwtab files.
fileio
This module provides the read_files() generator to open files from different sources (single file/multiple files on a local machine, directory/archive of files, URL address of a file).
converter
This module provides the Converter class that is responsible for the conversion of mwTab formated files into their JSON representation and vice versa.
mwschema
This module provides JSON schema definitions for the mwTab formatted files, i.e. specifies required and optional keys as well as data types.
validator
This module provides routines to validate mwTab formatted files based on schema definitions as well as checks for file self-consistency.
mwrest
This module provides the GenericMWURL class which is a python dictionary representation of a Metabolomics Workbench REST URL. The class is used to validate query parameters and to generate a URL path which can be used to request data from Metabolomics Workbench through their REST API.

mwtab.mwtab

This module provides the MWTabFile class that stores the data from a single mwTab formatted file in the form of an OrderedDict. Data can be accessed directly from the MWTabFile instance using bracket accessors.

The data is divided into a series of “sections” which each contain a number of “key-value”-like pairs. Also, the file contains a specially formatted SUBJECT_SAMPLE_FACTOR block and blocks of data between *_START and *_END.

class mwtab.mwtab.MWTabFile(source, *args, **kwds)[source]

MWTabFile class that stores data from a single mwTab formatted file in the form of collections.OrderedDict.

read(filehandle)[source]

Read data into a MWTabFile instance.

Parameters:filehandle (io.TextIOWrapper, gzip.GzipFile, bz2.BZ2File, zipfile.ZipFile) – file-like object.
Returns:None
Return type:None
write(filehandle, file_format)[source]

Write MWTabFile data into file.

Parameters:
  • filehandle (io.TextIOWrapper) – file-like object.
  • file_format (str) – Format to use to write data: mwtab or json.
Returns:

None

Return type:

None

writestr(file_format)[source]

Write MWTabFile data into string.

Parameters:file_format (str) – Format to use to write data: mwtab or json.
Returns:String representing the MWTabFile instance.
Return type:str
print_file(f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, file_format='mwtab')[source]

Print MWTabFile into a file or stdout.

Parameters:
  • f (io.StringIO) – writable file-like stream.
  • file_format (str) – Format to use: mwtab or json.
Returns:

None

Return type:

None

print_subject_sample_factors(section_key, f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, file_format='mwtab')[source]

Print mwtab SUBJECT_SAMPLE_FACTORS section into a file or stdout.

Parameters:
  • section_key (str) – Section name.
  • f (io.StringIO) – writable file-like stream.
  • file_format (str) – Format to use: mwtab or json.
Returns:

None

Return type:

None

print_block(section_key, f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, file_format='mwtab')[source]

Print mwtab section into a file or stdout.

Parameters:
  • section_key (str) – Section name.
  • f (io.StringIO) – writable file-like stream.
  • file_format (str) – Format to use: mwtab or json.
Returns:

None

Return type:

None

The mwtab command-line interface

Usage:
mwtab -h | –help mwtab –version mwtab convert (<from-path> <to-path>) [–from-format=<format>] [–to-format=<format>] [–validate] [–mw-rest=<url>] [–verbose] mwtab validate <from-path> [–mw-rest=<url>] [–verbose] mwtab download url <url> [–to-path=<path>] [–verbose] mwtab download study all [–to-path=<path>] [–input-item=<item>] [–output-format=<format>] [–mw-rest=<url>] [–validate] [–verbose] mwtab download study <input-value> [–to-path=<path>] [–input-item=<item>] [–output-item=<item>] [–output-format=<format>] [–mw-rest=<url>] [–validate] [–verbose] mwtab download (study | compound | refmet | gene | protein) <input-item> <input-value> <output-item> [–output-format=<format>] [–to-path=<path>] [–mw-rest=<url>] [–verbose] mwtab download moverz <input-item> <m/z-value> <ion-type-value> <m/z-tolerance-value> [–to-path=<path>] [–mw-rest=<url>] [–verbose] mwtab download exactmass <LIPID-abbreviation> <ion-type-value> [–to-path=<path>] [–mw-rest=<url>] [–verbose] mwtab extract metadata <from-path> <to-path> <key> … [–to-format=<format>] [–no-header] mwtab extract metabolites <from-path> <to-path> (<key> <value>) … [–to-format=<format>] [–no-header]
Options:
-h, --help Show this screen.
--version Show version.
--verbose Print what files are processing.
--validate Validate the mwTab file.
--from-format=<format>
 Input file format, available formats: mwtab, json [default: mwtab].
--to-format=<format>
 

Output file format [default: json]. Available formats for convert:

mwtab, json.
Available formats for extract:
json, csv.
--mw-rest=<url>
 URL to MW REST interface [default: https://www.metabolomicsworkbench.org/rest/].
--context=<context>
 Type of resource to access from MW REST interface, available contexts: study, compound, refmet, gene, protein, moverz, exactmass [default: study].
--input-item=<item>
 Item to search Metabolomics Workbench with.
--output-item=<item>
 Item to be retrieved from Metabolomics Workbench.
--output-format=<format>
 Format for item to be retrieved in, available formats: mwtab, json.
--no-header Include header at the top of csv formatted files.

For extraction <to-path> can take a “-” which will use stdout.

mwtab.cli.cli(cmdargs)[source]

Implements the command line interface.

param dict cmdargs: dictionary of command line arguments.

mwtab.tokenizer

This module provides the tokenizer() lexical analyzer for mwTab format syntax. It is implemented as Python generator-based state machine which generates (yields) tokens one at a time when next() is invoked on tokenizer() instance.

Each token is a tuple of “key-value”-like pairs, tuple of SUBJECT_SAMPLE_FACTORS or tuple of data deposited between *_START and *_END blocks.

mwtab.tokenizer.tokenizer(text)[source]

A lexical analyzer for the mwtab formatted files.

Parameters:text (py:class:str) – mwTab formatted text.
Returns:Tuples of data.
Return type:py:class:~collections.namedtuple

mwtab.fileio

This module provides routines for reading mwTab formatted files from difference kinds of sources:

  • Single mwTab formatted file on a local machine.
  • Directory containing multiple mwTab formatted files.
  • Compressed zip/tar archive of mwTab formatted files.
  • URL address of mwTab formatted file.
  • ANALYSIS_ID of mwTab formatted file.
mwtab.fileio.read_files(*sources, **kwds)[source]

Construct a generator that yields file instances.

Parameters:sources – One or more strings representing path to file(s).

mwtab.converter

This module provides functionality for converting between the Metabolomics Workbench mwTab formatted file and its equivalent JSONized representation.

The following conversions are possible:

Local files:
  • One-to-one file conversions:
    • textfile - to - textfile
    • textfile - to - textfile.gz
    • textfile - to - textfile.bz2
    • textfile.gz - to - textfile
    • textfile.gz - to - textfile.gz
    • textfile.gz - to - textfile.bz2
    • textfile.bz2 - to - textfile
    • textfile.bz2 - to - textfile.gz
    • textfile.bz2 - to - textfile.bz2
    • textfile / textfile.gz / textfile.bz2 - to - textfile.zip / textfile.tar / textfile.tar.gz / textfile.tar.bz2 (TypeError: One-to-many conversion)
  • Many-to-many files conversions:
    • Directories:
      • directory - to - directory
      • directory - to - directory.zip
      • directory - to - directory.tar
      • directory - to - directory.tar.bz2
      • directory - to - directory.tar.gz
      • directory - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
    • Zipfiles:
      • zipfile.zip - to - directory
      • zipfile.zip - to - zipfile.zip
      • zipfile.zip - to - tarfile.tar
      • zipfile.zip - to - tarfile.tar.gz
      • zipfile.zip - to - tarfile.tar.bz2
      • zipfile.zip - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
    • Tarfiles:
      • tarfile.tar - to - directory
      • tarfile.tar - to - zipfile.zip
      • tarfile.tar - to - tarfile.tar
      • tarfile.tar - to - tarfile.tar.gz
      • tarfile.tar - to - tarfile.tar.bz2
      • tarfile.tar - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
      • tarfile.tar.gz - to - directory
      • tarfile.tar.gz - to - zipfile.zip
      • tarfile.tar.gz - to - tarfile.tar
      • tarfile.tar.gz - to - tarfile.tar.gz
      • tarfile.tar.gz - to - tarfile.tar.bz2
      • tarfile.tar.gz - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
      • tarfile.tar.bz2 - to - directory
      • tarfile.tar.bz2 - to - zipfile.zip
      • tarfile.tar.bz2 - to - tarfile.tar
      • tarfile.tar.bz2 - to - tarfile.tar.gz
      • tarfile.tar.bz2 - to - tarfile.tar.bz2
      • tarfile.tar.bz2 - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
URL files:
  • One-to-one file conversions:
    • analysis_id - to - textfile
    • analysis_id - to - textfile.gz
    • analysis_id - to - textfile.bz2
    • analysis_id - to - textfile.zip / textfile.tar / textfile.tar.gz / textfile.tar.bz2 (TypeError: One-to-many conversion)
    • textfileurl - to - textfile
    • textfileurl - to - textfile.gz
    • textfileurl - to - textfile.bz2
    • textfileurl.gz - to - textfile
    • textfileurl.gz - to - textfile.gz
    • textfileurl.gz - to - textfile.bz2
    • textfileurl.bz2 - to - textfile
    • textfileurl.bz2 - to - textfile.gz
    • textfileurl.bz2 - to - textfile.bz2
    • textfileurl / textfileurl.gz / textfileurl.bz2 - to - textfile.zip / textfile.tar / textfile.tar.gz / textfile.tar.bz2 (TypeError: One-to-many conversion)
  • Many-to-many files conversions:
    • Zipfiles:
      • zipfileurl.zip - to - directory
      • zipfileurl.zip - to - zipfile.zip
      • zipfileurl.zip - to - tarfile.tar
      • zipfileurl.zip - to - tarfile.tar.gz
      • zipfileurl.zip - to - tarfile.tar.bz2
      • zipfileurl.zip - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
    • Tarfiles:
      • tarfileurl.tar - to - directory
      • tarfileurl.tar - to - zipfile.zip
      • tarfileurl.tar - to - tarfile.tar
      • tarfileurl.tar - to - tarfile.tar.gz
      • tarfileurl.tar - to - tarfile.tar.bz2
      • tarfileurl.tar - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
      • tarfileurl.tar.gz - to - directory
      • tarfileurl.tar.gz - to - zipfile.zip
      • tarfileurl.tar.gz - to - tarfile.tar
      • tarfileurl.tar.gz - to - tarfile.tar.gz
      • tarfileurl.tar.gz - to - tarfile.tar.bz2
      • tarfileurl.tar.gz - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
      • tarfileurl.tar.bz2 - to - directory
      • tarfileurl.tar.bz2 - to - zipfile.zip
      • tarfileurl.tar.bz2 - to - tarfile.tar
      • tarfileurl.tar.bz2 - to - tarfile.tar.gz
      • tarfileurl.tar.bz2 - to - tarfile.tar.bz2
      • tarfileurl.tar.bz2 - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
class mwtab.converter.Translator(from_path, to_path, from_format=None, to_format=None, validate=False)[source]

Translator abstract class.

class mwtab.converter.MWTabFileToMWTabFile(from_path, to_path, from_format=None, to_format=None, validate=False)[source]

Translator concrete class that can convert between mwTab and JSON formats.

class mwtab.converter.Converter(from_path, to_path, from_format='mwtab', to_format='json', validate=False)[source]

Converter class to convert mwTab files from mwTab to JSON or from JSON to mwTab format.

convert()[source]

Convert file(s) from mwTab format to JSON format or from JSON format to mwTab format. :return: None :rtype: None

mwtab.validator

This module contains routines to validate consistency of the mwTab formatted files, e.g. make sure that Samples and Factors identifiers are consistent across the file, make sure that all required key-value pairs are present.

mwtab.validator.validate_file(mwtabfile, section_schema_mapping={'ANALYSIS': Schema({'ANALYSIS_TYPE': <class 'str'>, Optional('LABORATORY_NAME'): <class 'str'>, Optional('OPERATOR_NAME'): <class 'str'>, Optional('DETECTOR_TYPE'): <class 'str'>, Optional('SOFTWARE_VERSION'): <class 'str'>, Optional('ACQUISITION_DATE'): <class 'str'>, Optional('ANALYSIS_PROTOCOL_FILE'): <class 'str'>, Optional('ACQUISITION_PARAMETERS_FILE'): <class 'str'>, Optional('PROCESSING_PARAMETERS_FILE'): <class 'str'>, Optional('DATA_FORMAT'): <class 'str'>, Optional('ACQUISITION_ID'): <class 'str'>, Optional('ACQUISITION_TIME'): <class 'str'>, Optional('ANALYSIS_COMMENTS'): <class 'str'>, Optional('ANALYSIS_DISPLAY'): <class 'str'>, Optional('INSTRUMENT_NAME'): <class 'str'>, Optional('INSTRUMENT_PARAMETERS_FILE'): <class 'str'>, Optional('NUM_FACTORS'): <class 'str'>, Optional('NUM_METABOLITES'): <class 'str'>, Optional('PROCESSED_FILE'): <class 'str'>, Optional('RANDOMIZATION_ORDER'): <class 'str'>, Optional('RAW_FILE'): <class 'str'>}), 'CHROMATOGRAPHY': Schema({Optional('CHROMATOGRAPHY_SUMMARY'): <class 'str'>, 'CHROMATOGRAPHY_TYPE': <class 'str'>, 'INSTRUMENT_NAME': <class 'str'>, 'COLUMN_NAME': <class 'str'>, Optional('FLOW_GRADIENT'): <class 'str'>, Optional('FLOW_RATE'): <class 'str'>, Optional('COLUMN_TEMPERATURE'): <class 'str'>, Optional('METHODS_FILENAME'): <class 'str'>, Optional('SOLVENT_A'): <class 'str'>, Optional('SOLVENT_B'): <class 'str'>, Optional('METHODS_ID'): <class 'str'>, Optional('COLUMN_PRESSURE'): <class 'str'>, Optional('INJECTION_TEMPERATURE'): <class 'str'>, Optional('INTERNAL_STANDARD'): <class 'str'>, Optional('INTERNAL_STANDARD_MT'): <class 'str'>, Optional('RETENTION_INDEX'): <class 'str'>, Optional('RETENTION_TIME'): <class 'str'>, Optional('SAMPLE_INJECTION'): <class 'str'>, Optional('SAMPLING_CONE'): <class 'str'>, Optional('ANALYTICAL_TIME'): <class 'str'>, Optional('CAPILLARY_VOLTAGE'): <class 'str'>, Optional('MIGRATION_TIME'): <class 'str'>, Optional('OVEN_TEMPERATURE'): <class 'str'>, Optional('PRECONDITIONING'): <class 'str'>, Optional('RUNNING_BUFFER'): <class 'str'>, Optional('RUNNING_VOLTAGE'): <class 'str'>, Optional('SHEATH_LIQUID'): <class 'str'>, Optional('TIME_PROGRAM'): <class 'str'>, Optional('TRANSFERLINE_TEMPERATURE'): <class 'str'>, Optional('WASHING_BUFFER'): <class 'str'>, Optional('WEAK_WASH_SOLVENT_NAME'): <class 'str'>, Optional('WEAK_WASH_VOLUME'): <class 'str'>, Optional('STRONG_WASH_SOLVENT_NAME'): <class 'str'>, Optional('STRONG_WASH_VOLUME'): <class 'str'>, Optional('TARGET_SAMPLE_TEMPERATURE'): <class 'str'>, Optional('SAMPLE_LOOP_SIZE'): <class 'str'>, Optional('SAMPLE_SYRINGE_SIZE'): <class 'str'>, Optional('RANDOMIZATION_ORDER'): <class 'str'>, Optional('CHROMATOGRAPHY_COMMENTS'): <class 'str'>}), 'COLLECTION': Schema({'COLLECTION_SUMMARY': <class 'str'>, Optional('COLLECTION_PROTOCOL_ID'): <class 'str'>, Optional('COLLECTION_PROTOCOL_FILENAME'): <class 'str'>, Optional('COLLECTION_PROTOCOL_COMMENTS'): <class 'str'>, Optional('SAMPLE_TYPE'): <class 'str'>, Optional('COLLECTION_METHOD'): <class 'str'>, Optional('COLLECTION_LOCATION'): <class 'str'>, Optional('COLLECTION_FREQUENCY'): <class 'str'>, Optional('COLLECTION_DURATION'): <class 'str'>, Optional('COLLECTION_TIME'): <class 'str'>, Optional('VOLUMEORAMOUNT_COLLECTED'): <class 'str'>, Optional('STORAGE_CONDITIONS'): <class 'str'>, Optional('COLLECTION_VIALS'): <class 'str'>, Optional('STORAGE_VIALS'): <class 'str'>, Optional('COLLECTION_TUBE_TEMP'): <class 'str'>, Optional('ADDITIVES'): <class 'str'>, Optional('BLOOD_SERUM_OR_PLASMA'): <class 'str'>, Optional('TISSUE_CELL_IDENTIFICATION'): <class 'str'>, Optional('TISSUE_CELL_QUANTITY_TAKEN'): <class 'str'>}), 'METABOLOMICS WORKBENCH': Schema({'VERSION': <class 'str'>, 'CREATED_ON': <class 'str'>, Optional('STUDY_ID'): <class 'str'>, Optional('ANALYSIS_ID'): <class 'str'>, Optional('PROJECT_ID'): <class 'str'>, Optional('HEADER'): <class 'str'>, Optional('DATATRACK_ID'): <class 'str'>}), 'MS': Schema({'INSTRUMENT_NAME': <class 'str'>, 'INSTRUMENT_TYPE': <class 'str'>, 'MS_TYPE': <class 'str'>, 'ION_MODE': <class 'str'>, 'MS_COMMENTS': <class 'str'>, Optional('CAPILLARY_TEMPERATURE'): <class 'str'>, Optional('CAPILLARY_VOLTAGE'): <class 'str'>, Optional('COLLISION_ENERGY'): <class 'str'>, Optional('COLLISION_GAS'): <class 'str'>, Optional('DRY_GAS_FLOW'): <class 'str'>, Optional('DRY_GAS_TEMP'): <class 'str'>, Optional('FRAGMENT_VOLTAGE'): <class 'str'>, Optional('FRAGMENTATION_METHOD'): <class 'str'>, Optional('GAS_PRESSURE'): <class 'str'>, Optional('HELIUM_FLOW'): <class 'str'>, Optional('ION_SOURCE_TEMPERATURE'): <class 'str'>, Optional('ION_SPRAY_VOLTAGE'): <class 'str'>, Optional('IONIZATION'): <class 'str'>, Optional('IONIZATION_ENERGY'): <class 'str'>, Optional('IONIZATION_POTENTIAL'): <class 'str'>, Optional('MASS_ACCURACY'): <class 'str'>, Optional('PRECURSOR_TYPE'): <class 'str'>, Optional('REAGENT_GAS'): <class 'str'>, Optional('SOURCE_TEMPERATURE'): <class 'str'>, Optional('SPRAY_VOLTAGE'): <class 'str'>, Optional('ACTIVATION_PARAMETER'): <class 'str'>, Optional('ACTIVATION_TIME'): <class 'str'>, Optional('ATOM_GUN_CURRENT'): <class 'str'>, Optional('AUTOMATIC_GAIN_CONTROL'): <class 'str'>, Optional('BOMBARDMENT'): <class 'str'>, Optional('CDL_SIDE_OCTOPOLES_BIAS_VOLTAGE'): <class 'str'>, Optional('CDL_TEMPERATURE'): <class 'str'>, Optional('DATAFORMAT'): <class 'str'>, Optional('DESOLVATION_GAS_FLOW'): <class 'str'>, Optional('DESOLVATION_TEMPERATURE'): <class 'str'>, Optional('INTERFACE_VOLTAGE'): <class 'str'>, Optional('IT_SIDE_OCTOPOLES_BIAS_VOLTAGE'): <class 'str'>, Optional('LASER'): <class 'str'>, Optional('MATRIX'): <class 'str'>, Optional('NEBULIZER'): <class 'str'>, Optional('OCTPOLE_VOLTAGE'): <class 'str'>, Optional('PROBE_TIP'): <class 'str'>, Optional('RESOLUTION_SETTING'): <class 'str'>, Optional('SAMPLE_DRIPPING'): <class 'str'>, Optional('SCAN_RANGE_MOVERZ'): <class 'str'>, Optional('SCANNING'): <class 'str'>, Optional('SCANNING_CYCLE'): <class 'str'>, Optional('SCANNING_RANGE'): <class 'str'>, Optional('SKIMMER_VOLTAGE'): <class 'str'>, Optional('TUBE_LENS_VOLTAGE'): <class 'str'>, Optional('MS_RESULTS_FILE'): Or(<class 'str'>, <class 'dict'>)}), 'MS_METABOLITE_DATA': Schema({'Units': <class 'str'>, 'Data': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), 'Metabolites': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), Optional('Extended'): Schema([{'Metabolite': <class 'str'>, Optional(<class 'str'>): <class 'str'>, 'sample_id': <class 'str'>}])}), 'NM': Schema({'INSTRUMENT_NAME': <class 'str'>, 'INSTRUMENT_TYPE': <class 'str'>, 'NMR_EXPERIMENT_TYPE': <class 'str'>, Optional('NMR_COMMENTS'): <class 'str'>, Optional('FIELD_FREQUENCY_LOCK'): <class 'str'>, Optional('STANDARD_CONCENTRATION'): <class 'str'>, 'SPECTROMETER_FREQUENCY': <class 'str'>, Optional('NMR_PROBE'): <class 'str'>, Optional('NMR_SOLVENT'): <class 'str'>, Optional('NMR_TUBE_SIZE'): <class 'str'>, Optional('SHIMMING_METHOD'): <class 'str'>, Optional('PULSE_SEQUENCE'): <class 'str'>, Optional('WATER_SUPPRESSION'): <class 'str'>, Optional('PULSE_WIDTH'): <class 'str'>, Optional('POWER_LEVEL'): <class 'str'>, Optional('RECEIVER_GAIN'): <class 'str'>, Optional('OFFSET_FREQUENCY'): <class 'str'>, Optional('PRESATURATION_POWER_LEVEL'): <class 'str'>, Optional('CHEMICAL_SHIFT_REF_CPD'): <class 'str'>, Optional('TEMPERATURE'): <class 'str'>, Optional('NUMBER_OF_SCANS'): <class 'str'>, Optional('DUMMY_SCANS'): <class 'str'>, Optional('ACQUISITION_TIME'): <class 'str'>, Optional('RELAXATION_DELAY'): <class 'str'>, Optional('SPECTRAL_WIDTH'): <class 'str'>, Optional('NUM_DATA_POINTS_ACQUIRED'): <class 'str'>, Optional('REAL_DATA_POINTS'): <class 'str'>, Optional('LINE_BROADENING'): <class 'str'>, Optional('ZERO_FILLING'): <class 'str'>, Optional('APODIZATION'): <class 'str'>, Optional('BASELINE_CORRECTION_METHOD'): <class 'str'>, Optional('CHEMICAL_SHIFT_REF_STD'): <class 'str'>, Optional('BINNED_INCREMENT'): <class 'str'>, Optional('BINNED_DATA_NORMALIZATION_METHOD'): <class 'str'>, Optional('BINNED_DATA_PROTOCOL_FILE'): <class 'str'>, Optional('BINNED_DATA_CHEMICAL_SHIFT_RANGE'): <class 'str'>, Optional('BINNED_DATA_EXCLUDED_RANGE'): <class 'str'>}), 'NMR_BINNED_DATA': Schema({'Units': <class 'str'>, 'Data': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}])}), 'NMR_METABOLITE_DATA': Schema({'Units': <class 'str'>, 'Data': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), 'Metabolites': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), Optional('Extended'): Schema([{'Metabolite': <class 'str'>, Optional(<class 'str'>): <class 'str'>, 'sample_id': <class 'str'>}])}), 'PROJECT': Schema({'PROJECT_TITLE': <class 'str'>, Optional('PROJECT_TYPE'): <class 'str'>, 'PROJECT_SUMMARY': <class 'str'>, 'INSTITUTE': <class 'str'>, Optional('DEPARTMENT'): <class 'str'>, Optional('LABORATORY'): <class 'str'>, 'LAST_NAME': <class 'str'>, 'FIRST_NAME': <class 'str'>, 'ADDRESS': <class 'str'>, 'EMAIL': <class 'str'>, 'PHONE': <class 'str'>, Optional('FUNDING_SOURCE'): <class 'str'>, Optional('PROJECT_COMMENTS'): <class 'str'>, Optional('PUBLICATIONS'): <class 'str'>, Optional('CONTRIBUTORS'): <class 'str'>, Optional('DOI'): <class 'str'>}), 'SAMPLEPREP': Schema({'SAMPLEPREP_SUMMARY': <class 'str'>, Optional('SAMPLEPREP_PROTOCOL_ID'): <class 'str'>, Optional('SAMPLEPREP_PROTOCOL_FILENAME'): <class 'str'>, Optional('SAMPLEPREP_PROTOCOL_COMMENTS'): <class 'str'>, Optional('PROCESSING_METHOD'): <class 'str'>, Optional('PROCESSING_STORAGE_CONDITIONS'): <class 'str'>, Optional('EXTRACTION_METHOD'): <class 'str'>, Optional('EXTRACT_CONCENTRATION_DILUTION'): <class 'str'>, Optional('EXTRACT_ENRICHMENT'): <class 'str'>, Optional('EXTRACT_CLEANUP'): <class 'str'>, Optional('EXTRACT_STORAGE'): <class 'str'>, Optional('SAMPLE_RESUSPENSION'): <class 'str'>, Optional('SAMPLE_DERIVATIZATION'): <class 'str'>, Optional('SAMPLE_SPIKING'): <class 'str'>, Optional('ORGAN'): <class 'str'>, Optional('ORGAN_SPECIFICATION'): <class 'str'>, Optional('CELL_TYPE'): <class 'str'>, Optional('SUBCELLULAR_LOCATION'): <class 'str'>}), 'STUDY': Schema({'STUDY_TITLE': <class 'str'>, Optional('STUDY_TYPE'): <class 'str'>, 'STUDY_SUMMARY': <class 'str'>, 'INSTITUTE': <class 'str'>, Optional('DEPARTMENT'): <class 'str'>, Optional('LABORATORY'): <class 'str'>, 'LAST_NAME': <class 'str'>, 'FIRST_NAME': <class 'str'>, 'ADDRESS': <class 'str'>, 'EMAIL': <class 'str'>, 'PHONE': <class 'str'>, Optional('NUM_GROUPS'): <class 'str'>, Optional('TOTAL_SUBJECTS'): <class 'str'>, Optional('NUM_MALES'): <class 'str'>, Optional('NUM_FEMALES'): <class 'str'>, Optional('STUDY_COMMENTS'): <class 'str'>, Optional('PUBLICATIONS'): <class 'str'>, Optional('SUBMIT_DATE'): <class 'str'>}), 'SUBJECT': Schema({'SUBJECT_TYPE': <class 'str'>, 'SUBJECT_SPECIES': <class 'str'>, Optional('TAXONOMY_ID'): <class 'str'>, Optional('GENOTYPE_STRAIN'): <class 'str'>, Optional('AGE_OR_AGE_RANGE'): <class 'str'>, Optional('WEIGHT_OR_WEIGHT_RANGE'): <class 'str'>, Optional('HEIGHT_OR_HEIGHT_RANGE'): <class 'str'>, Optional('GENDER'): <class 'str'>, Optional('HUMAN_RACE'): <class 'str'>, Optional('HUMAN_ETHNICITY'): <class 'str'>, Optional('HUMAN_TRIAL_TYPE'): <class 'str'>, Optional('HUMAN_LIFESTYLE_FACTORS'): <class 'str'>, Optional('HUMAN_MEDICATIONS'): <class 'str'>, Optional('HUMAN_PRESCRIPTION_OTC'): <class 'str'>, Optional('HUMAN_SMOKING_STATUS'): <class 'str'>, Optional('HUMAN_ALCOHOL_DRUG_USE'): <class 'str'>, Optional('HUMAN_NUTRITION'): <class 'str'>, Optional('HUMAN_INCLUSION_CRITERIA'): <class 'str'>, Optional('HUMAN_EXCLUSION_CRITERIA'): <class 'str'>, Optional('ANIMAL_ANIMAL_SUPPLIER'): <class 'str'>, Optional('ANIMAL_HOUSING'): <class 'str'>, Optional('ANIMAL_LIGHT_CYCLE'): <class 'str'>, Optional('ANIMAL_FEED'): <class 'str'>, Optional('ANIMAL_WATER'): <class 'str'>, Optional('ANIMAL_INCLUSION_CRITERIA'): <class 'str'>, Optional('CELL_BIOSOURCE_OR_SUPPLIER'): <class 'str'>, Optional('CELL_STRAIN_DETAILS'): <class 'str'>, Optional('SUBJECT_COMMENTS'): <class 'str'>, Optional('CELL_PRIMARY_IMMORTALIZED'): <class 'str'>, Optional('CELL_PASSAGE_NUMBER'): <class 'str'>, Optional('CELL_COUNTS'): <class 'str'>, Optional('SPECIES_GROUP'): <class 'str'>}), 'SUBJECT_SAMPLE_FACTORS': Schema([{'Subject ID': <class 'str'>, 'Sample ID': <class 'str'>, 'Factors': <class 'dict'>, Optional('Additional sample data'): {Optional('RAW_FILE_NAME'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}}]), 'TREATMENT': Schema({'TREATMENT_SUMMARY': <class 'str'>, Optional('TREATMENT_PROTOCOL_ID'): <class 'str'>, Optional('TREATMENT_PROTOCOL_FILENAME'): <class 'str'>, Optional('TREATMENT_PROTOCOL_COMMENTS'): <class 'str'>, Optional('TREATMENT'): <class 'str'>, Optional('TREATMENT_COMPOUND'): <class 'str'>, Optional('TREATMENT_ROUTE'): <class 'str'>, Optional('TREATMENT_DOSE'): <class 'str'>, Optional('TREATMENT_DOSEVOLUME'): <class 'str'>, Optional('TREATMENT_DOSEDURATION'): <class 'str'>, Optional('TREATMENT_VEHICLE'): <class 'str'>, Optional('ANIMAL_VET_TREATMENTS'): <class 'str'>, Optional('ANIMAL_ANESTHESIA'): <class 'str'>, Optional('ANIMAL_ACCLIMATION_DURATION'): <class 'str'>, Optional('ANIMAL_FASTING'): <class 'str'>, Optional('ANIMAL_ENDP_EUTHANASIA'): <class 'str'>, Optional('ANIMAL_ENDP_TISSUE_COLL_LIST'): <class 'str'>, Optional('ANIMAL_ENDP_TISSUE_PROC_METHOD'): <class 'str'>, Optional('ANIMAL_ENDP_CLINICAL_SIGNS'): <class 'str'>, Optional('HUMAN_FASTING'): <class 'str'>, Optional('HUMAN_ENDP_CLINICAL_SIGNS'): <class 'str'>, Optional('CELL_STORAGE'): <class 'str'>, Optional('CELL_GROWTH_CONTAINER'): <class 'str'>, Optional('CELL_GROWTH_CONFIG'): <class 'str'>, Optional('CELL_GROWTH_RATE'): <class 'str'>, Optional('CELL_INOC_PROC'): <class 'str'>, Optional('CELL_MEDIA'): <class 'str'>, Optional('CELL_ENVIR_COND'): <class 'str'>, Optional('CELL_HARVESTING'): <class 'str'>, Optional('PLANT_GROWTH_SUPPORT'): <class 'str'>, Optional('PLANT_GROWTH_LOCATION'): <class 'str'>, Optional('PLANT_PLOT_DESIGN'): <class 'str'>, Optional('PLANT_LIGHT_PERIOD'): <class 'str'>, Optional('PLANT_HUMIDITY'): <class 'str'>, Optional('PLANT_TEMP'): <class 'str'>, Optional('PLANT_WATERING_REGIME'): <class 'str'>, Optional('PLANT_NUTRITIONAL_REGIME'): <class 'str'>, Optional('PLANT_ESTAB_DATE'): <class 'str'>, Optional('PLANT_HARVEST_DATE'): <class 'str'>, Optional('PLANT_GROWTH_STAGE'): <class 'str'>, Optional('PLANT_METAB_QUENCH_METHOD'): <class 'str'>, Optional('PLANT_HARVEST_METHOD'): <class 'str'>, Optional('PLANT_STORAGE'): <class 'str'>, Optional('CELL_PCT_CONFLUENCE'): <class 'str'>, Optional('CELL_MEDIA_LASTCHANGED'): <class 'str'>})}, verbose=False, metabolites=True)[source]

Validate mwTab formatted file.

Parameters:
  • mwtabfile (MWTabFile or collections.OrderedDict) – Instance of MWTabFile.
  • section_schema_mapping (dict) – Dictionary that provides mapping between section name and schema definition.
  • verbose (bool) – whether to be verbose or not.
  • metabolites (bool) – whether to validate metabolites section.
Returns:

Validated file.

Return type:

collections.OrderedDict

mwtab.mwrest

This module provides routines for accessing the Metabolomics Workbench REST API.

See https://www.metabolomicsworkbench.org/tools/MWRestAPIv1.0.pdf for details.

mwtab.mwrest.analysis_ids(base_url='https://www.metabolomicsworkbench.org/rest/')[source]

Method for retrieving a list of analysis ids for every current analysis in Metabolomics Workbench.

Parameters:base_url (str) – Base url to Metabolomics Workbench REST API.
Returns:List of every available Metabolomics Workbench analysis identifier.
Return type:list
mwtab.mwrest.study_ids(base_url='https://www.metabolomicsworkbench.org/rest/')[source]

Method for retrieving a list of study ids for every current study in Metabolomics Workbench.

Parameters:base_url (str) – Base url to Metabolomics Workbench REST API.
Returns:List of every available Metabolomics Workbench study identifier.
Return type:list
mwtab.mwrest.generate_mwtab_urls(input_items, base_url='https://www.metabolomicsworkbench.org/rest/', output_format='txt')[source]

Method for generating URLS to be used to retrieve mwtab files for analyses and studies through the REST API of the Metabolomics Workbench database.

Parameters:
  • input_items (list) – List of Metabolomics Workbench input values for mwTab files.
  • base_url (str) – Base url to Metabolomics Workbench REST API.
  • output_format (str) – Output format for the mwTab files to be retrieved in.
Returns:

Metabolomics Workbench REST URL string(s).

Return type:

str

mwtab.mwrest.generate_urls(input_items, base_url='https://www.metabolomicsworkbench.org/rest/', **kwds)[source]

Method for creating a generator which yields validated Metabolomics Workbench REST urls.

Parameters:
  • input_items (list) – List of Metabolomics Workbench input values for mwTab files.
  • base_url (str) – Base url to Metabolomics Workbench REST API.
  • kwds (dict) – Keyword arguments of Metabolomics Workbench URL Path items.
Returns:

Metabolomics Workbench REST URL string(s).

Return type:

str

class mwtab.mwrest.GenericMWURL(rest_params, base_url='https://www.metabolomicsworkbench.org/rest/')[source]

GenericMWURL class that stores and validates parameters specifying a Metabolomics Workbench REST URL.

Metabolomics REST API requests are performed using URL requests in the form of

https://www.metabolomicsworkbench.org/rest/context/input_specification/output_specification

where:
if context = “study” | “compound” | “refmet” | “gene” | “protein”
input_specification = input_item/input_value output_specification = output_item/[output_format]
elif context = “moverz”
input_specification = input_item/input_value1/input_value2/input_value3
input_item = “LIPIDS” | “MB” | “REFMET” input_value1 = m/z_value input_value2 = ion_type_value input_value3 = m/z_tolerance_value
output_specification = output_format
output_format = “txt”
elif context = “exactmass”
input_specification = input_item/input_value1/input_value2
input_item = “LIPIDS” | “MB” | “REFMET” input_value1 = LIPID_abbreviation input_value2 = ion_type_value

output_specification = None

class mwtab.mwrest.MWRESTFile(source)[source]

MWRESTFile class that stores data from a single file download through Metabolomics Workbench’s REST API.

Mirrors MWTabFile.

read(filehandle)[source]

Read data into a MWRESTFile instance.

Parameters:filehandle (io.TextIOWrapper, gzip.GzipFile, bz2.BZ2File, zipfile.ZipFile) – file-like object.
Returns:None
Return type:None
write(filehandle)[source]

Write MWRESTFile data into file.

Parameters:filehandle (io.TextIOWrapper) – file-like object.
Returns:None
Return type:None

mwtab.mwextract

This module provides a number of functions and classes for extracting and saving data and metadata stored in mwTab formatted files in the form of MWTabFile.

class mwtab.mwextract.ItemMatcher(full_key, value_comparison)[source]

ItemMatcher class that can be called to match items from mwTab formatted files in the form of MWTabFile.

class mwtab.mwextract.ReGeXMatcher(full_key, value_comparison)[source]

ReGeXMatcher class that can be called to match items from mwTab formatted files in the form of MWTabFile using regular expressions.

mwtab.mwextract.generate_matchers(items)[source]

Construct a generator that yields Matchers ItemMatcher or ReGeXMatcher.

Parameters:items (iterable) – Iterable object containing key value pairs to match.
Returns:Yields a Matcher object for each given item.
Return type:ItemMatcher or ReGeXMatcher
mwtab.mwextract.extract_metabolites(sources, matchers)[source]

Extract metabolite data from mwTab formatted files in the form of MWTabFile.

Parameters:
  • sources (generator) – Generator of mwtab file objects (MWTabFile).
  • matchers (generator) – Generator of matcher objects (ItemMatcher or

ReGeXMatcher). :return: Extracted metabolites dictionary. :rtype: dict

mwtab.mwextract.extract_metadata(mwtabfile, keys)[source]

Extract metadata data from mwTab formatted files in the form of MWTabFile.

Parameters:
  • mwtabfile (MWTabFile) – mwTab file object for metadata to be extracted from.
  • keys (list) – List of metadata field keys for metadata values to be extracted.
Returns:

Extracted metadata dictionary.

Return type:

dict

mwtab.mwextract.write_metadata_csv(to_path, extracted_values, no_header=False)[source]

Write extracted metadata dict into csv file.

Example: “metadata”,”value1”,”value2” “SUBJECT_TYPE”,”Human”,”Plant”

Parameters:
  • to_path (str) – Path to output file.
  • extracted_values (dict) – Metadata dictionary to be saved.
  • no_header (bool) – If true header is not included, otherwise header is included.
Returns:

None

Return type:

None

mwtab.mwextract.write_metabolites_csv(to_path, extracted_values, no_header=False)[source]

Write extracted metabolites data dict into csv file.

Example: “metabolite_name”,”num-studies”,”num_analyses”,”num_samples” “1,2,4-benzenetriol”,”1”,”1”,”24” “1-monostearin”,”1”,”1”,”24” …

Parameters:
  • to_path (str) – Path to output file.
  • extracted_values (dict) – Metabolites data dictionary to be saved.
  • no_header (bool) – If true header is not included, otherwise header is included.
Returns:

None

Return type:

None

class mwtab.mwextract.SetEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

SetEncoder class for encoding Python sets set into json serializable objects list.

default(obj)[source]

Method for encoding Python objects. If object passed is a set, converts the set to JSON serializable lists or calls base implementation.

Parameters:obj (object) – Python object to be json encoded.
Returns:JSON serializable object.
Return type:dict, list, tuple, str, int, float, bool, or None
mwtab.mwextract.write_json(to_path, extracted_dict)[source]

Write extracted data or metadata dict into json file.

Metabolites example: {

“1,2,4-benzenetriol”: {
“ST000001”: {
“AN000001”: [
“LabF_115816”, …

]

}

}

}

Metadata example: {

“SUBJECT_TYPE”: [
“Plant”, “Human”

]

}

Parameters:
  • to_path (str) – Path to output file.
  • extracted_dict (dict) – Metabolites data or metadata dictionary to be saved.
Returns:

None

Return type:

None

mwtab.mwschema

This module provides schema definitions for different sections of the mwTab Metabolomics Workbench format.

mwtab.mwschema.metabolomics_workbench_schema

Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.project_schema

Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.study_schema

Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.analysis_schema

Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.subject_schema

Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.subject_sample_factors_schema

Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.collection_schema

Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.treatment_schema

Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.sampleprep_schema

Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.chromatography_schema

Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.ms_schema

Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.nmr_schema

Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.ms_metabolite_data_schema

Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.nmr_binned_data_schema

Entry point of the library, use this class to instantiate validation schema for the data that will be validated.