The mwtab API Reference¶

Routines for working with mwTab format files used by the Metabolomics Workbench.

This package includes the following modules:

mwtab: This module provides the MWTabFile class which is a python dictionary representation of a Metabolomics Workbench mwtab file. Data can be accessed directly from the MWTabFile instance using bracket accessors.
cli: This module provides command-line interface for the mwtab package.
tokenizer: This module provides the tokenizer() generator that generates tuples of key-value pairs from mwtab files.
fileio: This module provides the read_files() generator to open files from different sources (single file/multiple files on a local machine, directory/archive of files, URL address of a file).
converter: This module provides the Converter class that is responsible for the conversion of mwTab formated files into their JSON representation and vice versa.
mwschema: This module provides JSON schema definitions for the mwTab formatted files, i.e. specifies required and optional keys as well as data types.
validator: This module provides routines to validate mwTab formatted files based on schema definitions as well as checks for file self-consistency.
mwrest: This module provides the GenericMWURL class which is a python dictionary representation of a Metabolomics Workbench REST URL. The class is used to validate query parameters and to generate a URL path which can be used to request data from Metabolomics Workbench through their REST API.

mwtab.mwtab¶

This module provides the MWTabFile class that stores the data from a single mwTab formatted file in the form of an OrderedDict. Data can be accessed directly from the MWTabFile instance using bracket accessors.

The data is divided into a series of “sections” which each contain a number of “key-value”-like pairs. Also, the file contains a specially formatted SUBJECT_SAMPLE_FACTOR block and blocks of data between *_START and *_END.

class mwtab.mwtab.MWTabFile(source, *args, **kwds)[source]¶

MWTabFile class that stores data from a single mwTab formatted file in the form of collections.OrderedDict.

read(filehandle)[source]¶

Read data into a MWTabFile instance.

Parameters:	filehandle (`io.TextIOWrapper`, `gzip.GzipFile`, `bz2.BZ2File`, `zipfile.ZipFile`) – file-like object.
Returns:	None
Return type:	`None`

write(filehandle, file_format)[source]¶

Write MWTabFile data into file.

Parameters:	filehandle (`io.TextIOWrapper`) – file-like object. file_format (str) – Format to use to write data: mwtab or json.
Returns:	None
Return type:	`None`

writestr(file_format)[source]¶

Write MWTabFile data into string.

Parameters:	file_format (str) – Format to use to write data: mwtab or json.
Returns:	String representing the `MWTabFile` instance.
Return type:	`str`

print_file(f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, file_format='mwtab')[source]¶

Print MWTabFile into a file or stdout.

Parameters:	f (`io.StringIO`) – writable file-like stream. file_format (str) – Format to use: mwtab or json.
Returns:	None
Return type:	`None`

print_subject_sample_factors(section_key, f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, file_format='mwtab')[source]¶

Print mwtab SUBJECT_SAMPLE_FACTORS section into a file or stdout.

Parameters:	section_key (str) – Section name. f (`io.StringIO`) – writable file-like stream. file_format (str) – Format to use: mwtab or json.
Returns:	None
Return type:	`None`

print_block(section_key, f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, file_format='mwtab')[source]¶

Print mwtab section into a file or stdout.

Parameters:	section_key (str) – Section name. f (`io.StringIO`) – writable file-like stream. file_format (str) – Format to use: mwtab or json.
Returns:	None
Return type:	`None`

The mwtab command-line interface¶

Usage:

mwtab -h | –help mwtab –version mwtab convert (<from-path> <to-path>) [–from-format=<format>] [–to-format=<format>] [–validate] [–mw-rest=<url>] [–verbose] mwtab validate <from-path> [–mw-rest=<url>] [–verbose] mwtab download url <url> [–to-path=<path>] [–verbose] mwtab download study all [–to-path=<path>] [–input-item=<item>] [–output-format=<format>] [–mw-rest=<url>] [–validate] [–verbose] mwtab download study <input-value> [–to-path=<path>] [–input-item=<item>] [–output-item=<item>] [–output-format=<format>] [–mw-rest=<url>] [–validate] [–verbose] mwtab download (study | compound | refmet | gene | protein) <input-item> <input-value> <output-item> [–output-format=<format>] [–to-path=<path>] [–mw-rest=<url>] [–verbose] mwtab download moverz <input-item> <m/z-value> <ion-type-value> <m/z-tolerance-value> [–to-path=<path>] [–mw-rest=<url>] [–verbose] mwtab download exactmass <LIPID-abbreviation> <ion-type-value> [–to-path=<path>] [–mw-rest=<url>] [–verbose] mwtab extract metadata <from-path> <to-path> <key> … [–to-format=<format>] [–no-header] mwtab extract metabolites <from-path> <to-path> (<key> <value>) … [–to-format=<format>] [–no-header]

Options:

`-h, --help`	Show this screen.
`--version`	Show version.
`--verbose`	Print what files are processing.
`--validate`	Validate the mwTab file.
`--from-format=<format>`
	Input file format, available formats: mwtab, json [default: mwtab].
`--to-format=<format>`
	Output file format [default: json]. Available formats for convert: mwtab, json. Available formats for extract: json, csv.
`--mw-rest=<url>`
	URL to MW REST interface [default: https://www.metabolomicsworkbench.org/rest/].
`--context=<context>`
	Type of resource to access from MW REST interface, available contexts: study, compound, refmet, gene, protein, moverz, exactmass [default: study].
`--input-item=<item>`
	Item to search Metabolomics Workbench with.
`--output-item=<item>`
	Item to be retrieved from Metabolomics Workbench.
`--output-format=<format>`
	Format for item to be retrieved in, available formats: mwtab, json.
`--no-header`	Include header at the top of csv formatted files.

For extraction <to-path> can take a “-” which will use stdout.

mwtab.cli.cli(cmdargs)[source]¶

Implements the command line interface.

param dict cmdargs: dictionary of command line arguments.

mwtab.tokenizer¶

This module provides the tokenizer() lexical analyzer for mwTab format syntax. It is implemented as Python generator-based state machine which generates (yields) tokens one at a time when next() is invoked on tokenizer() instance.

Each token is a tuple of “key-value”-like pairs, tuple of SUBJECT_SAMPLE_FACTORS or tuple of data deposited between *_START and *_END blocks.

mwtab.tokenizer.tokenizer(text)[source]¶

A lexical analyzer for the mwtab formatted files.

Parameters:	text (py:class:str) – mwTab formatted text.
Returns:	Tuples of data.
Return type:	py:class:~collections.namedtuple

mwtab.fileio¶

This module provides routines for reading mwTab formatted files from difference kinds of sources:

Single mwTab formatted file on a local machine.

Directory containing multiple mwTab formatted files.

Compressed zip/tar archive of mwTab formatted files.

URL address of mwTab formatted file.

ANALYSIS_ID of mwTab formatted file.

mwtab.fileio.read_files(*sources, **kwds)[source]¶

Construct a generator that yields file instances.

Parameters:	sources – One or more strings representing path to file(s).

mwtab.converter¶

This module provides functionality for converting between the Metabolomics Workbench mwTab formatted file and its equivalent JSONized representation.

The following conversions are possible:

Local files:

One-to-one file conversions:
- textfile - to - textfile
- textfile - to - textfile.gz
- textfile - to - textfile.bz2
- textfile.gz - to - textfile
- textfile.gz - to - textfile.gz
- textfile.gz - to - textfile.bz2
- textfile.bz2 - to - textfile
- textfile.bz2 - to - textfile.gz
- textfile.bz2 - to - textfile.bz2
- textfile / textfile.gz / textfile.bz2 - to - textfile.zip / textfile.tar / textfile.tar.gz / textfile.tar.bz2 (TypeError: One-to-many conversion)
Many-to-many files conversions:
- Directories:
  
  directory - to - directory
  
  directory - to - directory.zip
  
  directory - to - directory.tar
  
  directory - to - directory.tar.bz2
  
  directory - to - directory.tar.gz
  
  directory - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- Zipfiles:
  
  zipfile.zip - to - directory
  
  zipfile.zip - to - zipfile.zip
  
  zipfile.zip - to - tarfile.tar
  
  zipfile.zip - to - tarfile.tar.gz
  
  zipfile.zip - to - tarfile.tar.bz2
  
  zipfile.zip - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- Tarfiles:
  
  tarfile.tar - to - directory
  
  tarfile.tar - to - zipfile.zip
  
  tarfile.tar - to - tarfile.tar
  
  tarfile.tar - to - tarfile.tar.gz
  
  tarfile.tar - to - tarfile.tar.bz2
  
  tarfile.tar - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
  
  tarfile.tar.gz - to - directory
  
  tarfile.tar.gz - to - zipfile.zip
  
  tarfile.tar.gz - to - tarfile.tar
  
  tarfile.tar.gz - to - tarfile.tar.gz
  
  tarfile.tar.gz - to - tarfile.tar.bz2
  
  tarfile.tar.gz - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
  
  tarfile.tar.bz2 - to - directory
  
  tarfile.tar.bz2 - to - zipfile.zip
  
  tarfile.tar.bz2 - to - tarfile.tar
  
  tarfile.tar.bz2 - to - tarfile.tar.gz
  
  tarfile.tar.bz2 - to - tarfile.tar.bz2
  
  tarfile.tar.bz2 - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)

URL files:

One-to-one file conversions:
- analysis_id - to - textfile
- analysis_id - to - textfile.gz
- analysis_id - to - textfile.bz2
- analysis_id - to - textfile.zip / textfile.tar / textfile.tar.gz / textfile.tar.bz2 (TypeError: One-to-many conversion)
- textfileurl - to - textfile
- textfileurl - to - textfile.gz
- textfileurl - to - textfile.bz2
- textfileurl.gz - to - textfile
- textfileurl.gz - to - textfile.gz
- textfileurl.gz - to - textfile.bz2
- textfileurl.bz2 - to - textfile
- textfileurl.bz2 - to - textfile.gz
- textfileurl.bz2 - to - textfile.bz2
- textfileurl / textfileurl.gz / textfileurl.bz2 - to - textfile.zip / textfile.tar / textfile.tar.gz / textfile.tar.bz2 (TypeError: One-to-many conversion)
Many-to-many files conversions:
- Zipfiles:
  
  zipfileurl.zip - to - directory
  
  zipfileurl.zip - to - zipfile.zip
  
  zipfileurl.zip - to - tarfile.tar
  
  zipfileurl.zip - to - tarfile.tar.gz
  
  zipfileurl.zip - to - tarfile.tar.bz2
  
  zipfileurl.zip - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- Tarfiles:
  
  tarfileurl.tar - to - directory
  
  tarfileurl.tar - to - zipfile.zip
  
  tarfileurl.tar - to - tarfile.tar
  
  tarfileurl.tar - to - tarfile.tar.gz
  
  tarfileurl.tar - to - tarfile.tar.bz2
  
  tarfileurl.tar - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
  
  tarfileurl.tar.gz - to - directory
  
  tarfileurl.tar.gz - to - zipfile.zip
  
  tarfileurl.tar.gz - to - tarfile.tar
  
  tarfileurl.tar.gz - to - tarfile.tar.gz
  
  tarfileurl.tar.gz - to - tarfile.tar.bz2
  
  tarfileurl.tar.gz - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
  
  tarfileurl.tar.bz2 - to - directory
  
  tarfileurl.tar.bz2 - to - zipfile.zip
  
  tarfileurl.tar.bz2 - to - tarfile.tar
  
  tarfileurl.tar.bz2 - to - tarfile.tar.gz
  
  tarfileurl.tar.bz2 - to - tarfile.tar.bz2
  
  tarfileurl.tar.bz2 - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)

class mwtab.converter.Translator(from_path, to_path, from_format=None, to_format=None, validate=False)[source]¶: Translator abstract class.

class mwtab.converter.MWTabFileToMWTabFile(from_path, to_path, from_format=None, to_format=None, validate=False)[source]¶: Translator concrete class that can convert between mwTab and JSON formats.

class mwtab.converter.Converter(from_path, to_path, from_format='mwtab', to_format='json', validate=False)[source]¶

Converter class to convert mwTab files from mwTab to JSON or from JSON to mwTab format.

convert()[source]¶: Convert file(s) from mwTab format to JSON format or from JSON format to mwTab format. :return: None :rtype: None

mwtab.validator¶

This module contains routines to validate consistency of the mwTab formatted files, e.g. make sure that Samples and Factors identifiers are consistent across the file, make sure that all required key-value pairs are present.

mwtab.validator.validate_file(mwtabfile, section_schema_mapping={'ANALYSIS': Schema({'ANALYSIS_TYPE': <class 'str'>, Optional('LABORATORY_NAME'): <class 'str'>, Optional('OPERATOR_NAME'): <class 'str'>, Optional('DETECTOR_TYPE'): <class 'str'>, Optional('SOFTWARE_VERSION'): <class 'str'>, Optional('ACQUISITION_DATE'): <class 'str'>, Optional('ANALYSIS_PROTOCOL_FILE'): <class 'str'>, Optional('ACQUISITION_PARAMETERS_FILE'): <class 'str'>, Optional('PROCESSING_PARAMETERS_FILE'): <class 'str'>, Optional('DATA_FORMAT'): <class 'str'>, Optional('ACQUISITION_ID'): <class 'str'>, Optional('ACQUISITION_TIME'): <class 'str'>, Optional('ANALYSIS_COMMENTS'): <class 'str'>, Optional('ANALYSIS_DISPLAY'): <class 'str'>, Optional('INSTRUMENT_NAME'): <class 'str'>, Optional('INSTRUMENT_PARAMETERS_FILE'): <class 'str'>, Optional('NUM_FACTORS'): <class 'str'>, Optional('NUM_METABOLITES'): <class 'str'>, Optional('PROCESSED_FILE'): <class 'str'>, Optional('RANDOMIZATION_ORDER'): <class 'str'>, Optional('RAW_FILE'): <class 'str'>}), 'CHROMATOGRAPHY': Schema({Optional('CHROMATOGRAPHY_SUMMARY'): <class 'str'>, 'CHROMATOGRAPHY_TYPE': <class 'str'>, 'INSTRUMENT_NAME': <class 'str'>, 'COLUMN_NAME': <class 'str'>, Optional('FLOW_GRADIENT'): <class 'str'>, Optional('FLOW_RATE'): <class 'str'>, Optional('COLUMN_TEMPERATURE'): <class 'str'>, Optional('METHODS_FILENAME'): <class 'str'>, Optional('SOLVENT_A'): <class 'str'>, Optional('SOLVENT_B'): <class 'str'>, Optional('METHODS_ID'): <class 'str'>, Optional('COLUMN_PRESSURE'): <class 'str'>, Optional('INJECTION_TEMPERATURE'): <class 'str'>, Optional('INTERNAL_STANDARD'): <class 'str'>, Optional('INTERNAL_STANDARD_MT'): <class 'str'>, Optional('RETENTION_INDEX'): <class 'str'>, Optional('RETENTION_TIME'): <class 'str'>, Optional('SAMPLE_INJECTION'): <class 'str'>, Optional('SAMPLING_CONE'): <class 'str'>, Optional('ANALYTICAL_TIME'): <class 'str'>, Optional('CAPILLARY_VOLTAGE'): <class 'str'>, Optional('MIGRATION_TIME'): <class 'str'>, Optional('OVEN_TEMPERATURE'): <class 'str'>, Optional('PRECONDITIONING'): <class 'str'>, Optional('RUNNING_BUFFER'): <class 'str'>, Optional('RUNNING_VOLTAGE'): <class 'str'>, Optional('SHEATH_LIQUID'): <class 'str'>, Optional('TIME_PROGRAM'): <class 'str'>, Optional('TRANSFERLINE_TEMPERATURE'): <class 'str'>, Optional('WASHING_BUFFER'): <class 'str'>, Optional('WEAK_WASH_SOLVENT_NAME'): <class 'str'>, Optional('WEAK_WASH_VOLUME'): <class 'str'>, Optional('STRONG_WASH_SOLVENT_NAME'): <class 'str'>, Optional('STRONG_WASH_VOLUME'): <class 'str'>, Optional('TARGET_SAMPLE_TEMPERATURE'): <class 'str'>, Optional('SAMPLE_LOOP_SIZE'): <class 'str'>, Optional('SAMPLE_SYRINGE_SIZE'): <class 'str'>, Optional('RANDOMIZATION_ORDER'): <class 'str'>, Optional('CHROMATOGRAPHY_COMMENTS'): <class 'str'>}), 'COLLECTION': Schema({'COLLECTION_SUMMARY': <class 'str'>, Optional('COLLECTION_PROTOCOL_ID'): <class 'str'>, Optional('COLLECTION_PROTOCOL_FILENAME'): <class 'str'>, Optional('COLLECTION_PROTOCOL_COMMENTS'): <class 'str'>, Optional('SAMPLE_TYPE'): <class 'str'>, Optional('COLLECTION_METHOD'): <class 'str'>, Optional('COLLECTION_LOCATION'): <class 'str'>, Optional('COLLECTION_FREQUENCY'): <class 'str'>, Optional('COLLECTION_DURATION'): <class 'str'>, Optional('COLLECTION_TIME'): <class 'str'>, Optional('VOLUMEORAMOUNT_COLLECTED'): <class 'str'>, Optional('STORAGE_CONDITIONS'): <class 'str'>, Optional('COLLECTION_VIALS'): <class 'str'>, Optional('STORAGE_VIALS'): <class 'str'>, Optional('COLLECTION_TUBE_TEMP'): <class 'str'>, Optional('ADDITIVES'): <class 'str'>, Optional('BLOOD_SERUM_OR_PLASMA'): <class 'str'>, Optional('TISSUE_CELL_IDENTIFICATION'): <class 'str'>, Optional('TISSUE_CELL_QUANTITY_TAKEN'): <class 'str'>}), 'METABOLOMICS WORKBENCH': Schema({'VERSION': <class 'str'>, 'CREATED_ON': <class 'str'>, Optional('STUDY_ID'): <class 'str'>, Optional('ANALYSIS_ID'): <class 'str'>, Optional('PROJECT_ID'): <class 'str'>, Optional('HEADER'): <class 'str'>, Optional('DATATRACK_ID'): <class 'str'>}), 'MS': Schema({'INSTRUMENT_NAME': <class 'str'>, 'INSTRUMENT_TYPE': <class 'str'>, 'MS_TYPE': <class 'str'>, 'ION_MODE': <class 'str'>, Optional('MS_COMMENTS'): <class 'str'>, Optional('CAPILLARY_TEMPERATURE'): <class 'str'>, Optional('CAPILLARY_VOLTAGE'): <class 'str'>, Optional('COLLISION_ENERGY'): <class 'str'>, Optional('COLLISION_GAS'): <class 'str'>, Optional('DRY_GAS_FLOW'): <class 'str'>, Optional('DRY_GAS_TEMP'): <class 'str'>, Optional('FRAGMENT_VOLTAGE'): <class 'str'>, Optional('FRAGMENTATION_METHOD'): <class 'str'>, Optional('GAS_PRESSURE'): <class 'str'>, Optional('HELIUM_FLOW'): <class 'str'>, Optional('ION_SOURCE_TEMPERATURE'): <class 'str'>, Optional('ION_SPRAY_VOLTAGE'): <class 'str'>, Optional('IONIZATION'): <class 'str'>, Optional('IONIZATION_ENERGY'): <class 'str'>, Optional('IONIZATION_POTENTIAL'): <class 'str'>, Optional('MASS_ACCURACY'): <class 'str'>, Optional('PRECURSOR_TYPE'): <class 'str'>, Optional('REAGENT_GAS'): <class 'str'>, Optional('SOURCE_TEMPERATURE'): <class 'str'>, Optional('SPRAY_VOLTAGE'): <class 'str'>, Optional('ACTIVATION_PARAMETER'): <class 'str'>, Optional('ACTIVATION_TIME'): <class 'str'>, Optional('ATOM_GUN_CURRENT'): <class 'str'>, Optional('AUTOMATIC_GAIN_CONTROL'): <class 'str'>, Optional('BOMBARDMENT'): <class 'str'>, Optional('CDL_SIDE_OCTOPOLES_BIAS_VOLTAGE'): <class 'str'>, Optional('CDL_TEMPERATURE'): <class 'str'>, Optional('DATAFORMAT'): <class 'str'>, Optional('DESOLVATION_GAS_FLOW'): <class 'str'>, Optional('DESOLVATION_TEMPERATURE'): <class 'str'>, Optional('INTERFACE_VOLTAGE'): <class 'str'>, Optional('IT_SIDE_OCTOPOLES_BIAS_VOLTAGE'): <class 'str'>, Optional('LASER'): <class 'str'>, Optional('MATRIX'): <class 'str'>, Optional('NEBULIZER'): <class 'str'>, Optional('OCTPOLE_VOLTAGE'): <class 'str'>, Optional('PROBE_TIP'): <class 'str'>, Optional('RESOLUTION_SETTING'): <class 'str'>, Optional('SAMPLE_DRIPPING'): <class 'str'>, Optional('SCAN_RANGE_MOVERZ'): <class 'str'>, Optional('SCANNING'): <class 'str'>, Optional('SCANNING_CYCLE'): <class 'str'>, Optional('SCANNING_RANGE'): <class 'str'>, Optional('SKIMMER_VOLTAGE'): <class 'str'>, Optional('TUBE_LENS_VOLTAGE'): <class 'str'>, Optional('MS_RESULTS_FILE'): Or(<class 'str'>, <class 'dict'>)}), 'MS_METABOLITE_DATA': Schema({'Units': <class 'str'>, 'Data': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), 'Metabolites': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), Optional('Extended'): Schema([{'Metabolite': <class 'str'>, Optional(<class 'str'>): <class 'str'>, 'sample_id': <class 'str'>}])}), 'NM': Schema({'INSTRUMENT_NAME': <class 'str'>, 'INSTRUMENT_TYPE': <class 'str'>, 'NMR_EXPERIMENT_TYPE': <class 'str'>, Optional('NMR_COMMENTS'): <class 'str'>, Optional('FIELD_FREQUENCY_LOCK'): <class 'str'>, Optional('STANDARD_CONCENTRATION'): <class 'str'>, 'SPECTROMETER_FREQUENCY': <class 'str'>, Optional('NMR_PROBE'): <class 'str'>, Optional('NMR_SOLVENT'): <class 'str'>, Optional('NMR_TUBE_SIZE'): <class 'str'>, Optional('SHIMMING_METHOD'): <class 'str'>, Optional('PULSE_SEQUENCE'): <class 'str'>, Optional('WATER_SUPPRESSION'): <class 'str'>, Optional('PULSE_WIDTH'): <class 'str'>, Optional('POWER_LEVEL'): <class 'str'>, Optional('RECEIVER_GAIN'): <class 'str'>, Optional('OFFSET_FREQUENCY'): <class 'str'>, Optional('PRESATURATION_POWER_LEVEL'): <class 'str'>, Optional('CHEMICAL_SHIFT_REF_CPD'): <class 'str'>, Optional('TEMPERATURE'): <class 'str'>, Optional('NUMBER_OF_SCANS'): <class 'str'>, Optional('DUMMY_SCANS'): <class 'str'>, Optional('ACQUISITION_TIME'): <class 'str'>, Optional('RELAXATION_DELAY'): <class 'str'>, Optional('SPECTRAL_WIDTH'): <class 'str'>, Optional('NUM_DATA_POINTS_ACQUIRED'): <class 'str'>, Optional('REAL_DATA_POINTS'): <class 'str'>, Optional('LINE_BROADENING'): <class 'str'>, Optional('ZERO_FILLING'): <class 'str'>, Optional('APODIZATION'): <class 'str'>, Optional('BASELINE_CORRECTION_METHOD'): <class 'str'>, Optional('CHEMICAL_SHIFT_REF_STD'): <class 'str'>, Optional('BINNED_INCREMENT'): <class 'str'>, Optional('BINNED_DATA_NORMALIZATION_METHOD'): <class 'str'>, Optional('BINNED_DATA_PROTOCOL_FILE'): <class 'str'>, Optional('BINNED_DATA_CHEMICAL_SHIFT_RANGE'): <class 'str'>, Optional('BINNED_DATA_EXCLUDED_RANGE'): <class 'str'>, Optional('NMR_RESULTS_FILE'): Or(<class 'str'>, <class 'dict'>)}), 'NMR_BINNED_DATA': Schema({'Units': <class 'str'>, 'Data': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}])}), 'NMR_METABOLITE_DATA': Schema({'Units': <class 'str'>, 'Data': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), 'Metabolites': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), Optional('Extended'): Schema([{'Metabolite': <class 'str'>, Optional(<class 'str'>): <class 'str'>, 'sample_id': <class 'str'>}])}), 'PROJECT': Schema({'PROJECT_TITLE': <class 'str'>, Optional('PROJECT_TYPE'): <class 'str'>, 'PROJECT_SUMMARY': <class 'str'>, 'INSTITUTE': <class 'str'>, Optional('DEPARTMENT'): <class 'str'>, Optional('LABORATORY'): <class 'str'>, 'LAST_NAME': <class 'str'>, 'FIRST_NAME': <class 'str'>, 'ADDRESS': <class 'str'>, 'EMAIL': <class 'str'>, 'PHONE': <class 'str'>, Optional('FUNDING_SOURCE'): <class 'str'>, Optional('PROJECT_COMMENTS'): <class 'str'>, Optional('PUBLICATIONS'): <class 'str'>, Optional('CONTRIBUTORS'): <class 'str'>, Optional('DOI'): <class 'str'>}), 'SAMPLEPREP': Schema({'SAMPLEPREP_SUMMARY': <class 'str'>, Optional('SAMPLEPREP_PROTOCOL_ID'): <class 'str'>, Optional('SAMPLEPREP_PROTOCOL_FILENAME'): <class 'str'>, Optional('SAMPLEPREP_PROTOCOL_COMMENTS'): <class 'str'>, Optional('PROCESSING_METHOD'): <class 'str'>, Optional('PROCESSING_STORAGE_CONDITIONS'): <class 'str'>, Optional('EXTRACTION_METHOD'): <class 'str'>, Optional('EXTRACT_CONCENTRATION_DILUTION'): <class 'str'>, Optional('EXTRACT_ENRICHMENT'): <class 'str'>, Optional('EXTRACT_CLEANUP'): <class 'str'>, Optional('EXTRACT_STORAGE'): <class 'str'>, Optional('SAMPLE_RESUSPENSION'): <class 'str'>, Optional('SAMPLE_DERIVATIZATION'): <class 'str'>, Optional('SAMPLE_SPIKING'): <class 'str'>, Optional('ORGAN'): <class 'str'>, Optional('ORGAN_SPECIFICATION'): <class 'str'>, Optional('CELL_TYPE'): <class 'str'>, Optional('SUBCELLULAR_LOCATION'): <class 'str'>}), 'STUDY': Schema({'STUDY_TITLE': <class 'str'>, Optional('STUDY_TYPE'): <class 'str'>, 'STUDY_SUMMARY': <class 'str'>, 'INSTITUTE': <class 'str'>, Optional('DEPARTMENT'): <class 'str'>, Optional('LABORATORY'): <class 'str'>, 'LAST_NAME': <class 'str'>, 'FIRST_NAME': <class 'str'>, 'ADDRESS': <class 'str'>, 'EMAIL': <class 'str'>, 'PHONE': <class 'str'>, Optional('NUM_GROUPS'): <class 'str'>, Optional('TOTAL_SUBJECTS'): <class 'str'>, Optional('NUM_MALES'): <class 'str'>, Optional('NUM_FEMALES'): <class 'str'>, Optional('STUDY_COMMENTS'): <class 'str'>, Optional('PUBLICATIONS'): <class 'str'>, Optional('SUBMIT_DATE'): <class 'str'>}), 'SUBJECT': Schema({'SUBJECT_TYPE': <class 'str'>, 'SUBJECT_SPECIES': <class 'str'>, Optional('TAXONOMY_ID'): <class 'str'>, Optional('GENOTYPE_STRAIN'): <class 'str'>, Optional('AGE_OR_AGE_RANGE'): <class 'str'>, Optional('WEIGHT_OR_WEIGHT_RANGE'): <class 'str'>, Optional('HEIGHT_OR_HEIGHT_RANGE'): <class 'str'>, Optional('GENDER'): <class 'str'>, Optional('HUMAN_RACE'): <class 'str'>, Optional('HUMAN_ETHNICITY'): <class 'str'>, Optional('HUMAN_TRIAL_TYPE'): <class 'str'>, Optional('HUMAN_LIFESTYLE_FACTORS'): <class 'str'>, Optional('HUMAN_MEDICATIONS'): <class 'str'>, Optional('HUMAN_PRESCRIPTION_OTC'): <class 'str'>, Optional('HUMAN_SMOKING_STATUS'): <class 'str'>, Optional('HUMAN_ALCOHOL_DRUG_USE'): <class 'str'>, Optional('HUMAN_NUTRITION'): <class 'str'>, Optional('HUMAN_INCLUSION_CRITERIA'): <class 'str'>, Optional('HUMAN_EXCLUSION_CRITERIA'): <class 'str'>, Optional('ANIMAL_ANIMAL_SUPPLIER'): <class 'str'>, Optional('ANIMAL_HOUSING'): <class 'str'>, Optional('ANIMAL_LIGHT_CYCLE'): <class 'str'>, Optional('ANIMAL_FEED'): <class 'str'>, Optional('ANIMAL_WATER'): <class 'str'>, Optional('ANIMAL_INCLUSION_CRITERIA'): <class 'str'>, Optional('CELL_BIOSOURCE_OR_SUPPLIER'): <class 'str'>, Optional('CELL_STRAIN_DETAILS'): <class 'str'>, Optional('SUBJECT_COMMENTS'): <class 'str'>, Optional('CELL_PRIMARY_IMMORTALIZED'): <class 'str'>, Optional('CELL_PASSAGE_NUMBER'): <class 'str'>, Optional('CELL_COUNTS'): <class 'str'>, Optional('SPECIES_GROUP'): <class 'str'>}), 'SUBJECT_SAMPLE_FACTORS': Schema([{'Subject ID': <class 'str'>, 'Sample ID': <class 'str'>, 'Factors': <class 'dict'>, Optional('Additional sample data'): {Optional('RAW_FILE_NAME'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}}]), 'TREATMENT': Schema({'TREATMENT_SUMMARY': <class 'str'>, Optional('TREATMENT_PROTOCOL_ID'): <class 'str'>, Optional('TREATMENT_PROTOCOL_FILENAME'): <class 'str'>, Optional('TREATMENT_PROTOCOL_COMMENTS'): <class 'str'>, Optional('TREATMENT'): <class 'str'>, Optional('TREATMENT_COMPOUND'): <class 'str'>, Optional('TREATMENT_ROUTE'): <class 'str'>, Optional('TREATMENT_DOSE'): <class 'str'>, Optional('TREATMENT_DOSEVOLUME'): <class 'str'>, Optional('TREATMENT_DOSEDURATION'): <class 'str'>, Optional('TREATMENT_VEHICLE'): <class 'str'>, Optional('ANIMAL_VET_TREATMENTS'): <class 'str'>, Optional('ANIMAL_ANESTHESIA'): <class 'str'>, Optional('ANIMAL_ACCLIMATION_DURATION'): <class 'str'>, Optional('ANIMAL_FASTING'): <class 'str'>, Optional('ANIMAL_ENDP_EUTHANASIA'): <class 'str'>, Optional('ANIMAL_ENDP_TISSUE_COLL_LIST'): <class 'str'>, Optional('ANIMAL_ENDP_TISSUE_PROC_METHOD'): <class 'str'>, Optional('ANIMAL_ENDP_CLINICAL_SIGNS'): <class 'str'>, Optional('HUMAN_FASTING'): <class 'str'>, Optional('HUMAN_ENDP_CLINICAL_SIGNS'): <class 'str'>, Optional('CELL_STORAGE'): <class 'str'>, Optional('CELL_GROWTH_CONTAINER'): <class 'str'>, Optional('CELL_GROWTH_CONFIG'): <class 'str'>, Optional('CELL_GROWTH_RATE'): <class 'str'>, Optional('CELL_INOC_PROC'): <class 'str'>, Optional('CELL_MEDIA'): <class 'str'>, Optional('CELL_ENVIR_COND'): <class 'str'>, Optional('CELL_HARVESTING'): <class 'str'>, Optional('PLANT_GROWTH_SUPPORT'): <class 'str'>, Optional('PLANT_GROWTH_LOCATION'): <class 'str'>, Optional('PLANT_PLOT_DESIGN'): <class 'str'>, Optional('PLANT_LIGHT_PERIOD'): <class 'str'>, Optional('PLANT_HUMIDITY'): <class 'str'>, Optional('PLANT_TEMP'): <class 'str'>, Optional('PLANT_WATERING_REGIME'): <class 'str'>, Optional('PLANT_NUTRITIONAL_REGIME'): <class 'str'>, Optional('PLANT_ESTAB_DATE'): <class 'str'>, Optional('PLANT_HARVEST_DATE'): <class 'str'>, Optional('PLANT_GROWTH_STAGE'): <class 'str'>, Optional('PLANT_METAB_QUENCH_METHOD'): <class 'str'>, Optional('PLANT_HARVEST_METHOD'): <class 'str'>, Optional('PLANT_STORAGE'): <class 'str'>, Optional('CELL_PCT_CONFLUENCE'): <class 'str'>, Optional('CELL_MEDIA_LASTCHANGED'): <class 'str'>})}, verbose=False, metabolites=True)[source]¶

Validate mwTab formatted file.

Parameters:	mwtabfile (`MWTabFile` or `collections.OrderedDict`) – Instance of `MWTabFile`. section_schema_mapping (dict) – Dictionary that provides mapping between section name and schema definition. verbose (bool) – whether to be verbose or not. metabolites (bool) – whether to validate metabolites section.
Returns:	Validated file.
Return type:	`collections.OrderedDict`

mwtab.mwrest¶

This module provides routines for accessing the Metabolomics Workbench REST API.

See https://www.metabolomicsworkbench.org/tools/MWRestAPIv1.0.pdf for details.

mwtab.mwrest.analysis_ids(base_url='https://www.metabolomicsworkbench.org/rest/')[source]¶

Method for retrieving a list of analysis ids for every current analysis in Metabolomics Workbench.

Parameters:	base_url (str) – Base url to Metabolomics Workbench REST API.
Returns:	List of every available Metabolomics Workbench analysis identifier.
Return type:	`list`

mwtab.mwrest.study_ids(base_url='https://www.metabolomicsworkbench.org/rest/')[source]¶

Method for retrieving a list of study ids for every current study in Metabolomics Workbench.

Parameters:	base_url (str) – Base url to Metabolomics Workbench REST API.
Returns:	List of every available Metabolomics Workbench study identifier.
Return type:	`list`

mwtab.mwrest.generate_mwtab_urls(input_items, base_url='https://www.metabolomicsworkbench.org/rest/', output_format='txt')[source]¶

Method for generating URLS to be used to retrieve mwtab files for analyses and studies through the REST API of the Metabolomics Workbench database.

Parameters:	input_items (list) – List of Metabolomics Workbench input values for mwTab files. base_url (str) – Base url to Metabolomics Workbench REST API. output_format (str) – Output format for the mwTab files to be retrieved in.
Returns:	Metabolomics Workbench REST URL string(s).
Return type:	`str`

mwtab.mwrest.generate_urls(input_items, base_url='https://www.metabolomicsworkbench.org/rest/', **kwds)[source]¶

Method for creating a generator which yields validated Metabolomics Workbench REST urls.

Parameters:	input_items (list) – List of Metabolomics Workbench input values for mwTab files. base_url (str) – Base url to Metabolomics Workbench REST API. kwds (dict) – Keyword arguments of Metabolomics Workbench URL Path items.
Returns:	Metabolomics Workbench REST URL string(s).
Return type:	`str`

class mwtab.mwrest.GenericMWURL(rest_params, base_url='https://www.metabolomicsworkbench.org/rest/')[source]¶

GenericMWURL class that stores and validates parameters specifying a Metabolomics Workbench REST URL.

Metabolomics REST API requests are performed using URL requests in the form of

https://www.metabolomicsworkbench.org/rest/context/input_specification/output_specification

where:

if context = “study” | “compound” | “refmet” | “gene” | “protein”

input_specification = input_item/input_value output_specification = output_item/[output_format]

elif context = “moverz”

input_specification = input_item/input_value1/input_value2/input_value3: input_item = “LIPIDS” | “MB” | “REFMET” input_value1 = m/z_value input_value2 = ion_type_value input_value3 = m/z_tolerance_value
output_specification = output_format: output_format = “txt”

elif context = “exactmass”

input_specification = input_item/input_value1/input_value2: input_item = “LIPIDS” | “MB” | “REFMET” input_value1 = LIPID_abbreviation input_value2 = ion_type_value

output_specification = None

class mwtab.mwrest.MWRESTFile(source)[source]¶

MWRESTFile class that stores data from a single file download through Metabolomics Workbench’s REST API.

Mirrors MWTabFile.

read(filehandle)[source]¶

Read data into a MWRESTFile instance.

Parameters:	filehandle (`io.TextIOWrapper`, `gzip.GzipFile`, `bz2.BZ2File`, `zipfile.ZipFile`) – file-like object.
Returns:	None
Return type:	`None`

write(filehandle)[source]¶

Write MWRESTFile data into file.

Parameters:	filehandle (`io.TextIOWrapper`) – file-like object.
Returns:	None
Return type:	`None`

mwtab.mwextract¶

This module provides a number of functions and classes for extracting and saving data and metadata stored in mwTab formatted files in the form of MWTabFile.

class mwtab.mwextract.ItemMatcher(full_key, value_comparison)[source]¶: ItemMatcher class that can be called to match items from mwTab formatted files in the form of MWTabFile.

class mwtab.mwextract.ReGeXMatcher(full_key, value_comparison)[source]¶: ReGeXMatcher class that can be called to match items from mwTab formatted files in the form of MWTabFile using regular expressions.

mwtab.mwextract.generate_matchers(items)[source]¶

Construct a generator that yields Matchers ItemMatcher or ReGeXMatcher.

Parameters:	items (iterable) – Iterable object containing key value pairs to match.
Returns:	Yields a Matcher object for each given item.
Return type:	`ItemMatcher` or `ReGeXMatcher`

mwtab.mwextract.extract_metabolites(sources, matchers)[source]¶

Extract metabolite data from mwTab formatted files in the form of MWTabFile.

Parameters:	sources (generator) – Generator of mwtab file objects (`MWTabFile`). matchers (generator) – Generator of matcher objects (`ItemMatcher` or

ReGeXMatcher). :return: Extracted metabolites dictionary. :rtype: dict

mwtab.mwextract.extract_metadata(mwtabfile, keys)[source]¶

Extract metadata data from mwTab formatted files in the form of MWTabFile.

Parameters:	mwtabfile (`MWTabFile`) – mwTab file object for metadata to be extracted from. keys (list) – List of metadata field keys for metadata values to be extracted.
Returns:	Extracted metadata dictionary.
Return type:	`dict`

mwtab.mwextract.write_metadata_csv(to_path, extracted_values, no_header=False)[source]¶

Write extracted metadata dict into csv file.

Example: “metadata”,”value1”,”value2” “SUBJECT_TYPE”,”Human”,”Plant”

Parameters:	to_path (str) – Path to output file. extracted_values (dict) – Metadata dictionary to be saved. no_header (bool) – If true header is not included, otherwise header is included.
Returns:	None
Return type:	`None`

mwtab.mwextract.write_metabolites_csv(to_path, extracted_values, no_header=False)[source]¶

Write extracted metabolites data dict into csv file.

Example: “metabolite_name”,”num-studies”,”num_analyses”,”num_samples” “1,2,4-benzenetriol”,”1”,”1”,”24” “1-monostearin”,”1”,”1”,”24” …

Parameters:	to_path (str) – Path to output file. extracted_values (dict) – Metabolites data dictionary to be saved. no_header (bool) – If true header is not included, otherwise header is included.
Returns:	None
Return type:	`None`

class mwtab.mwextract.SetEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶

SetEncoder class for encoding Python sets set into json serializable objects list.

default(obj)[source]¶

Method for encoding Python objects. If object passed is a set, converts the set to JSON serializable lists or calls base implementation.

Parameters:	obj (object) – Python object to be json encoded.
Returns:	JSON serializable object.
Return type:	`dict`, `list`, `tuple`, `str`, `int`, `float`, `bool`, or `None`

mwtab.mwextract.write_json(to_path, extracted_dict)[source]¶

Write extracted data or metadata dict into json file.

Metabolites example: {

“1,2,4-benzenetriol”: {

“ST000001”: {

“AN000001”: [

“LabF_115816”, …

]

}

}

}

Metadata example: {

“SUBJECT_TYPE”: [

“Plant”, “Human”

]

}

Parameters:	to_path (str) – Path to output file. extracted_dict (dict) – Metabolites data or metadata dictionary to be saved.
Returns:	None
Return type:	`None`

mwtab.mwschema¶

This module provides schema definitions for different sections of the mwTab Metabolomics Workbench format.

mwtab.mwschema.metabolomics_workbench_schema¶: Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.project_schema¶: Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.study_schema¶: Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.analysis_schema¶: Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.subject_schema¶: Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.subject_sample_factors_schema¶: Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.collection_schema¶: Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.treatment_schema¶: Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.sampleprep_schema¶: Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.chromatography_schema¶: Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.ms_schema¶: Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.nmr_schema¶: Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.ms_metabolite_data_schema¶: Entry point of the library, use this class to instantiate validation schema for the data that will be validated.

mwtab.mwschema.nmr_binned_data_schema¶: Entry point of the library, use this class to instantiate validation schema for the data that will be validated.