The mwtab API Reference¶
Routines for working with mwTab format files used by the
Metabolomics Workbench.
This package includes the following modules:
mwtab- This module provides the
MWTabFileclass which is a python dictionary representation of a Metabolomics Workbench mwtab file. Data can be accessed directly from theMWTabFileinstance using bracket accessors. cli- This module provides command-line interface for the
mwtabpackage. tokenizer- This module provides the
tokenizer()generator that generates tuples of key-value pairs from mwtab files. fileio- This module provides the
read_files()generator to open files from different sources (single file/multiple files on a local machine, directory/archive of files, URL address of a file). converter- This module provides the
Converterclass that is responsible for the conversion ofmwTabformated files into their JSON representation and vice versa. mwschema- This module provides JSON schema definitions for the
mwTabformatted files, i.e. specifies required and optional keys as well as data types. validator- This module provides routines to validate
mwTabformatted files based on schema definitions as well as checks for file self-consistency. mwrest- This module provides the
GenericMWURLclass which is a python dictionary representation of a Metabolomics Workbench REST URL. The class is used to validate query parameters and to generate a URL path which can be used to request data from Metabolomics Workbench through their REST API.
mwtab.mwtab¶
This module provides the MWTabFile class
that stores the data from a single mwTab formatted file in the
form of an OrderedDict. Data can be accessed
directly from the MWTabFile instance using
bracket accessors.
The data is divided into a series of “sections” which each contain a
number of “key-value”-like pairs. Also, the file contains a specially
formatted SUBJECT_SAMPLE_FACTOR block and blocks of data between
*_START and *_END.
-
class
mwtab.mwtab.MWTabFile(source, *args, **kwds)[source]¶ MWTabFile class that stores data from a single
mwTabformatted file in the form ofcollections.OrderedDict.-
read(filehandle)[source]¶ Read data into a
MWTabFileinstance.Parameters: filehandle ( io.TextIOWrapper,gzip.GzipFile,bz2.BZ2File,zipfile.ZipFile) – file-like object.Returns: None Return type: None
-
write(filehandle, file_format)[source]¶ Write
MWTabFiledata into file.Parameters: - filehandle (
io.TextIOWrapper) – file-like object. - file_format (str) – Format to use to write data: mwtab or json.
Returns: None
Return type: - filehandle (
-
writestr(file_format)[source]¶ Write
MWTabFiledata into string.Parameters: file_format (str) – Format to use to write data: mwtab or json. Returns: String representing the MWTabFileinstance.Return type: str
-
print_file(f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, file_format='mwtab')[source]¶ Print
MWTabFileinto a file or stdout.Parameters: - f (
io.StringIO) – writable file-like stream. - file_format (str) – Format to use: mwtab or json.
Returns: None
Return type: - f (
-
print_subject_sample_factors(section_key, f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, file_format='mwtab')[source]¶ Print mwtab SUBJECT_SAMPLE_FACTORS section into a file or stdout.
Parameters: - section_key (str) – Section name.
- f (
io.StringIO) – writable file-like stream. - file_format (str) – Format to use: mwtab or json.
Returns: None
Return type:
-
print_block(section_key, f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, file_format='mwtab')[source]¶ Print mwtab section into a file or stdout.
Parameters: - section_key (str) – Section name.
- f (
io.StringIO) – writable file-like stream. - file_format (str) – Format to use: mwtab or json.
Returns: None
Return type:
-
The mwtab command-line interface¶
- Usage:
- mwtab -h | –help mwtab –version mwtab convert (<from-path> <to-path>) [–from-format=<format>] [–to-format=<format>] [–validate] [–mw-rest=<url>] [–verbose] mwtab validate <from-path> [–mw-rest=<url>] [–verbose] mwtab download url <url> [–to-path=<path>] [–verbose] mwtab download study all [–to-path=<path>] [–input-item=<item>] [–output-format=<format>] [–mw-rest=<url>] [–validate] [–verbose] mwtab download study <input-value> [–to-path=<path>] [–input-item=<item>] [–output-item=<item>] [–output-format=<format>] [–mw-rest=<url>] [–validate] [–verbose] mwtab download (study | compound | refmet | gene | protein) <input-item> <input-value> <output-item> [–output-format=<format>] [–to-path=<path>] [–mw-rest=<url>] [–verbose] mwtab download moverz <input-item> <m/z-value> <ion-type-value> <m/z-tolerance-value> [–to-path=<path>] [–mw-rest=<url>] [–verbose] mwtab download exactmass <LIPID-abbreviation> <ion-type-value> [–to-path=<path>] [–mw-rest=<url>] [–verbose] mwtab extract metadata <from-path> <to-path> <key> … [–to-format=<format>] [–no-header] mwtab extract metabolites <from-path> <to-path> (<key> <value>) … [–to-format=<format>] [–no-header]
- Options:
-h, --help Show this screen. --version Show version. --verbose Print what files are processing. --validate Validate the mwTab file. --from-format=<format> Input file format, available formats: mwtab, json [default: mwtab]. --to-format=<format> Output file format [default: json]. Available formats for convert:
mwtab, json.- Available formats for extract:
- json, csv.
--mw-rest=<url> URL to MW REST interface [default: https://www.metabolomicsworkbench.org/rest/]. --context=<context> Type of resource to access from MW REST interface, available contexts: study, compound, refmet, gene, protein, moverz, exactmass [default: study]. --input-item=<item> Item to search Metabolomics Workbench with. --output-item=<item> Item to be retrieved from Metabolomics Workbench. --output-format=<format> Format for item to be retrieved in, available formats: mwtab, json. --no-header Include header at the top of csv formatted files. For extraction <to-path> can take a “-” which will use stdout.
-
mwtab.cli.cli(cmdargs)[source]¶ Implements the command line interface.
param dict cmdargs: dictionary of command line arguments.
mwtab.tokenizer¶
This module provides the tokenizer() lexical analyzer for
mwTab format syntax. It is implemented as Python generator-based state
machine which generates (yields) tokens one at a time when next()
is invoked on tokenizer() instance.
Each token is a tuple of “key-value”-like pairs, tuple of
SUBJECT_SAMPLE_FACTORS or tuple of data deposited between
*_START and *_END blocks.
-
mwtab.tokenizer.tokenizer(text)[source]¶ A lexical analyzer for the mwtab formatted files.
Parameters: text (py:class:str) – mwTab formatted text. Returns: Tuples of data. Return type: py:class:~collections.namedtuple
mwtab.fileio¶
This module provides routines for reading mwTab formatted files
from difference kinds of sources:
- Single
mwTabformatted file on a local machine.- Directory containing multiple
mwTabformatted files.- Compressed zip/tar archive of
mwTabformatted files.- URL address of
mwTabformatted file.ANALYSIS_IDofmwTabformatted file.
-
mwtab.fileio.read_files(*sources, **kwds)[source]¶ Construct a generator that yields file instances.
Parameters: sources – One or more strings representing path to file(s).
mwtab.converter¶
This module provides functionality for converting between the
Metabolomics Workbench mwTab formatted file and its equivalent
JSONized representation.
The following conversions are possible:
- Local files:
- One-to-one file conversions:
- textfile - to - textfile
- textfile - to - textfile.gz
- textfile - to - textfile.bz2
- textfile.gz - to - textfile
- textfile.gz - to - textfile.gz
- textfile.gz - to - textfile.bz2
- textfile.bz2 - to - textfile
- textfile.bz2 - to - textfile.gz
- textfile.bz2 - to - textfile.bz2
- textfile / textfile.gz / textfile.bz2 - to - textfile.zip / textfile.tar / textfile.tar.gz / textfile.tar.bz2 (TypeError: One-to-many conversion)
- Many-to-many files conversions:
- Directories:
- directory - to - directory
- directory - to - directory.zip
- directory - to - directory.tar
- directory - to - directory.tar.bz2
- directory - to - directory.tar.gz
- directory - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- Zipfiles:
- zipfile.zip - to - directory
- zipfile.zip - to - zipfile.zip
- zipfile.zip - to - tarfile.tar
- zipfile.zip - to - tarfile.tar.gz
- zipfile.zip - to - tarfile.tar.bz2
- zipfile.zip - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- Tarfiles:
- tarfile.tar - to - directory
- tarfile.tar - to - zipfile.zip
- tarfile.tar - to - tarfile.tar
- tarfile.tar - to - tarfile.tar.gz
- tarfile.tar - to - tarfile.tar.bz2
- tarfile.tar - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- tarfile.tar.gz - to - directory
- tarfile.tar.gz - to - zipfile.zip
- tarfile.tar.gz - to - tarfile.tar
- tarfile.tar.gz - to - tarfile.tar.gz
- tarfile.tar.gz - to - tarfile.tar.bz2
- tarfile.tar.gz - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- tarfile.tar.bz2 - to - directory
- tarfile.tar.bz2 - to - zipfile.zip
- tarfile.tar.bz2 - to - tarfile.tar
- tarfile.tar.bz2 - to - tarfile.tar.gz
- tarfile.tar.bz2 - to - tarfile.tar.bz2
- tarfile.tar.bz2 - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- URL files:
- One-to-one file conversions:
- analysis_id - to - textfile
- analysis_id - to - textfile.gz
- analysis_id - to - textfile.bz2
- analysis_id - to - textfile.zip / textfile.tar / textfile.tar.gz / textfile.tar.bz2 (TypeError: One-to-many conversion)
- textfileurl - to - textfile
- textfileurl - to - textfile.gz
- textfileurl - to - textfile.bz2
- textfileurl.gz - to - textfile
- textfileurl.gz - to - textfile.gz
- textfileurl.gz - to - textfile.bz2
- textfileurl.bz2 - to - textfile
- textfileurl.bz2 - to - textfile.gz
- textfileurl.bz2 - to - textfile.bz2
- textfileurl / textfileurl.gz / textfileurl.bz2 - to - textfile.zip / textfile.tar / textfile.tar.gz / textfile.tar.bz2 (TypeError: One-to-many conversion)
- Many-to-many files conversions:
- Zipfiles:
- zipfileurl.zip - to - directory
- zipfileurl.zip - to - zipfile.zip
- zipfileurl.zip - to - tarfile.tar
- zipfileurl.zip - to - tarfile.tar.gz
- zipfileurl.zip - to - tarfile.tar.bz2
- zipfileurl.zip - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- Tarfiles:
- tarfileurl.tar - to - directory
- tarfileurl.tar - to - zipfile.zip
- tarfileurl.tar - to - tarfile.tar
- tarfileurl.tar - to - tarfile.tar.gz
- tarfileurl.tar - to - tarfile.tar.bz2
- tarfileurl.tar - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- tarfileurl.tar.gz - to - directory
- tarfileurl.tar.gz - to - zipfile.zip
- tarfileurl.tar.gz - to - tarfile.tar
- tarfileurl.tar.gz - to - tarfile.tar.gz
- tarfileurl.tar.gz - to - tarfile.tar.bz2
- tarfileurl.tar.gz - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- tarfileurl.tar.bz2 - to - directory
- tarfileurl.tar.bz2 - to - zipfile.zip
- tarfileurl.tar.bz2 - to - tarfile.tar
- tarfileurl.tar.bz2 - to - tarfile.tar.gz
- tarfileurl.tar.bz2 - to - tarfile.tar.bz2
- tarfileurl.tar.bz2 - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
-
class
mwtab.converter.Translator(from_path, to_path, from_format=None, to_format=None, validate=False)[source]¶ Translator abstract class.
-
class
mwtab.converter.MWTabFileToMWTabFile(from_path, to_path, from_format=None, to_format=None, validate=False)[source]¶ Translator concrete class that can convert between
mwTabandJSONformats.
mwtab.validator¶
This module contains routines to validate consistency of the mwTab
formatted files, e.g. make sure that Samples and Factors
identifiers are consistent across the file, make sure that all
required key-value pairs are present.
-
mwtab.validator.validate_file(mwtabfile, section_schema_mapping={'ANALYSIS': Schema({'ANALYSIS_TYPE': <class 'str'>, Optional('LABORATORY_NAME'): <class 'str'>, Optional('OPERATOR_NAME'): <class 'str'>, Optional('DETECTOR_TYPE'): <class 'str'>, Optional('SOFTWARE_VERSION'): <class 'str'>, Optional('ACQUISITION_DATE'): <class 'str'>, Optional('ANALYSIS_PROTOCOL_FILE'): <class 'str'>, Optional('ACQUISITION_PARAMETERS_FILE'): <class 'str'>, Optional('PROCESSING_PARAMETERS_FILE'): <class 'str'>, Optional('DATA_FORMAT'): <class 'str'>, Optional('ACQUISITION_ID'): <class 'str'>, Optional('ACQUISITION_TIME'): <class 'str'>, Optional('ANALYSIS_COMMENTS'): <class 'str'>, Optional('ANALYSIS_DISPLAY'): <class 'str'>, Optional('INSTRUMENT_NAME'): <class 'str'>, Optional('INSTRUMENT_PARAMETERS_FILE'): <class 'str'>, Optional('NUM_FACTORS'): <class 'str'>, Optional('NUM_METABOLITES'): <class 'str'>, Optional('PROCESSED_FILE'): <class 'str'>, Optional('RANDOMIZATION_ORDER'): <class 'str'>, Optional('RAW_FILE'): <class 'str'>}), 'CHROMATOGRAPHY': Schema({Optional('CHROMATOGRAPHY_SUMMARY'): <class 'str'>, 'CHROMATOGRAPHY_TYPE': <class 'str'>, 'INSTRUMENT_NAME': <class 'str'>, 'COLUMN_NAME': <class 'str'>, Optional('FLOW_GRADIENT'): <class 'str'>, Optional('FLOW_RATE'): <class 'str'>, Optional('COLUMN_TEMPERATURE'): <class 'str'>, Optional('METHODS_FILENAME'): <class 'str'>, Optional('SOLVENT_A'): <class 'str'>, Optional('SOLVENT_B'): <class 'str'>, Optional('METHODS_ID'): <class 'str'>, Optional('COLUMN_PRESSURE'): <class 'str'>, Optional('INJECTION_TEMPERATURE'): <class 'str'>, Optional('INTERNAL_STANDARD'): <class 'str'>, Optional('INTERNAL_STANDARD_MT'): <class 'str'>, Optional('RETENTION_INDEX'): <class 'str'>, Optional('RETENTION_TIME'): <class 'str'>, Optional('SAMPLE_INJECTION'): <class 'str'>, Optional('SAMPLING_CONE'): <class 'str'>, Optional('ANALYTICAL_TIME'): <class 'str'>, Optional('CAPILLARY_VOLTAGE'): <class 'str'>, Optional('MIGRATION_TIME'): <class 'str'>, Optional('OVEN_TEMPERATURE'): <class 'str'>, Optional('PRECONDITIONING'): <class 'str'>, Optional('RUNNING_BUFFER'): <class 'str'>, Optional('RUNNING_VOLTAGE'): <class 'str'>, Optional('SHEATH_LIQUID'): <class 'str'>, Optional('TIME_PROGRAM'): <class 'str'>, Optional('TRANSFERLINE_TEMPERATURE'): <class 'str'>, Optional('WASHING_BUFFER'): <class 'str'>, Optional('WEAK_WASH_SOLVENT_NAME'): <class 'str'>, Optional('WEAK_WASH_VOLUME'): <class 'str'>, Optional('STRONG_WASH_SOLVENT_NAME'): <class 'str'>, Optional('STRONG_WASH_VOLUME'): <class 'str'>, Optional('TARGET_SAMPLE_TEMPERATURE'): <class 'str'>, Optional('SAMPLE_LOOP_SIZE'): <class 'str'>, Optional('SAMPLE_SYRINGE_SIZE'): <class 'str'>, Optional('RANDOMIZATION_ORDER'): <class 'str'>, Optional('CHROMATOGRAPHY_COMMENTS'): <class 'str'>}), 'COLLECTION': Schema({'COLLECTION_SUMMARY': <class 'str'>, Optional('COLLECTION_PROTOCOL_ID'): <class 'str'>, Optional('COLLECTION_PROTOCOL_FILENAME'): <class 'str'>, Optional('COLLECTION_PROTOCOL_COMMENTS'): <class 'str'>, Optional('SAMPLE_TYPE'): <class 'str'>, Optional('COLLECTION_METHOD'): <class 'str'>, Optional('COLLECTION_LOCATION'): <class 'str'>, Optional('COLLECTION_FREQUENCY'): <class 'str'>, Optional('COLLECTION_DURATION'): <class 'str'>, Optional('COLLECTION_TIME'): <class 'str'>, Optional('VOLUMEORAMOUNT_COLLECTED'): <class 'str'>, Optional('STORAGE_CONDITIONS'): <class 'str'>, Optional('COLLECTION_VIALS'): <class 'str'>, Optional('STORAGE_VIALS'): <class 'str'>, Optional('COLLECTION_TUBE_TEMP'): <class 'str'>, Optional('ADDITIVES'): <class 'str'>, Optional('BLOOD_SERUM_OR_PLASMA'): <class 'str'>, Optional('TISSUE_CELL_IDENTIFICATION'): <class 'str'>, Optional('TISSUE_CELL_QUANTITY_TAKEN'): <class 'str'>}), 'METABOLOMICS WORKBENCH': Schema({'VERSION': <class 'str'>, 'CREATED_ON': <class 'str'>, Optional('STUDY_ID'): <class 'str'>, Optional('ANALYSIS_ID'): <class 'str'>, Optional('PROJECT_ID'): <class 'str'>, Optional('HEADER'): <class 'str'>, Optional('DATATRACK_ID'): <class 'str'>}), 'MS': Schema({'INSTRUMENT_NAME': <class 'str'>, 'INSTRUMENT_TYPE': <class 'str'>, 'MS_TYPE': <class 'str'>, 'ION_MODE': <class 'str'>, Optional('MS_COMMENTS'): <class 'str'>, Optional('CAPILLARY_TEMPERATURE'): <class 'str'>, Optional('CAPILLARY_VOLTAGE'): <class 'str'>, Optional('COLLISION_ENERGY'): <class 'str'>, Optional('COLLISION_GAS'): <class 'str'>, Optional('DRY_GAS_FLOW'): <class 'str'>, Optional('DRY_GAS_TEMP'): <class 'str'>, Optional('FRAGMENT_VOLTAGE'): <class 'str'>, Optional('FRAGMENTATION_METHOD'): <class 'str'>, Optional('GAS_PRESSURE'): <class 'str'>, Optional('HELIUM_FLOW'): <class 'str'>, Optional('ION_SOURCE_TEMPERATURE'): <class 'str'>, Optional('ION_SPRAY_VOLTAGE'): <class 'str'>, Optional('IONIZATION'): <class 'str'>, Optional('IONIZATION_ENERGY'): <class 'str'>, Optional('IONIZATION_POTENTIAL'): <class 'str'>, Optional('MASS_ACCURACY'): <class 'str'>, Optional('PRECURSOR_TYPE'): <class 'str'>, Optional('REAGENT_GAS'): <class 'str'>, Optional('SOURCE_TEMPERATURE'): <class 'str'>, Optional('SPRAY_VOLTAGE'): <class 'str'>, Optional('ACTIVATION_PARAMETER'): <class 'str'>, Optional('ACTIVATION_TIME'): <class 'str'>, Optional('ATOM_GUN_CURRENT'): <class 'str'>, Optional('AUTOMATIC_GAIN_CONTROL'): <class 'str'>, Optional('BOMBARDMENT'): <class 'str'>, Optional('CDL_SIDE_OCTOPOLES_BIAS_VOLTAGE'): <class 'str'>, Optional('CDL_TEMPERATURE'): <class 'str'>, Optional('DATAFORMAT'): <class 'str'>, Optional('DESOLVATION_GAS_FLOW'): <class 'str'>, Optional('DESOLVATION_TEMPERATURE'): <class 'str'>, Optional('INTERFACE_VOLTAGE'): <class 'str'>, Optional('IT_SIDE_OCTOPOLES_BIAS_VOLTAGE'): <class 'str'>, Optional('LASER'): <class 'str'>, Optional('MATRIX'): <class 'str'>, Optional('NEBULIZER'): <class 'str'>, Optional('OCTPOLE_VOLTAGE'): <class 'str'>, Optional('PROBE_TIP'): <class 'str'>, Optional('RESOLUTION_SETTING'): <class 'str'>, Optional('SAMPLE_DRIPPING'): <class 'str'>, Optional('SCAN_RANGE_MOVERZ'): <class 'str'>, Optional('SCANNING'): <class 'str'>, Optional('SCANNING_CYCLE'): <class 'str'>, Optional('SCANNING_RANGE'): <class 'str'>, Optional('SKIMMER_VOLTAGE'): <class 'str'>, Optional('TUBE_LENS_VOLTAGE'): <class 'str'>, Optional('MS_RESULTS_FILE'): Or(<class 'str'>, <class 'dict'>)}), 'MS_METABOLITE_DATA': Schema({'Units': <class 'str'>, 'Data': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), 'Metabolites': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), Optional('Extended'): Schema([{'Metabolite': <class 'str'>, Optional(<class 'str'>): <class 'str'>, 'sample_id': <class 'str'>}])}), 'NM': Schema({'INSTRUMENT_NAME': <class 'str'>, 'INSTRUMENT_TYPE': <class 'str'>, 'NMR_EXPERIMENT_TYPE': <class 'str'>, Optional('NMR_COMMENTS'): <class 'str'>, Optional('FIELD_FREQUENCY_LOCK'): <class 'str'>, Optional('STANDARD_CONCENTRATION'): <class 'str'>, 'SPECTROMETER_FREQUENCY': <class 'str'>, Optional('NMR_PROBE'): <class 'str'>, Optional('NMR_SOLVENT'): <class 'str'>, Optional('NMR_TUBE_SIZE'): <class 'str'>, Optional('SHIMMING_METHOD'): <class 'str'>, Optional('PULSE_SEQUENCE'): <class 'str'>, Optional('WATER_SUPPRESSION'): <class 'str'>, Optional('PULSE_WIDTH'): <class 'str'>, Optional('POWER_LEVEL'): <class 'str'>, Optional('RECEIVER_GAIN'): <class 'str'>, Optional('OFFSET_FREQUENCY'): <class 'str'>, Optional('PRESATURATION_POWER_LEVEL'): <class 'str'>, Optional('CHEMICAL_SHIFT_REF_CPD'): <class 'str'>, Optional('TEMPERATURE'): <class 'str'>, Optional('NUMBER_OF_SCANS'): <class 'str'>, Optional('DUMMY_SCANS'): <class 'str'>, Optional('ACQUISITION_TIME'): <class 'str'>, Optional('RELAXATION_DELAY'): <class 'str'>, Optional('SPECTRAL_WIDTH'): <class 'str'>, Optional('NUM_DATA_POINTS_ACQUIRED'): <class 'str'>, Optional('REAL_DATA_POINTS'): <class 'str'>, Optional('LINE_BROADENING'): <class 'str'>, Optional('ZERO_FILLING'): <class 'str'>, Optional('APODIZATION'): <class 'str'>, Optional('BASELINE_CORRECTION_METHOD'): <class 'str'>, Optional('CHEMICAL_SHIFT_REF_STD'): <class 'str'>, Optional('BINNED_INCREMENT'): <class 'str'>, Optional('BINNED_DATA_NORMALIZATION_METHOD'): <class 'str'>, Optional('BINNED_DATA_PROTOCOL_FILE'): <class 'str'>, Optional('BINNED_DATA_CHEMICAL_SHIFT_RANGE'): <class 'str'>, Optional('BINNED_DATA_EXCLUDED_RANGE'): <class 'str'>, Optional('NMR_RESULTS_FILE'): Or(<class 'str'>, <class 'dict'>)}), 'NMR_BINNED_DATA': Schema({'Units': <class 'str'>, 'Data': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}])}), 'NMR_METABOLITE_DATA': Schema({'Units': <class 'str'>, 'Data': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), 'Metabolites': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), Optional('Extended'): Schema([{'Metabolite': <class 'str'>, Optional(<class 'str'>): <class 'str'>, 'sample_id': <class 'str'>}])}), 'PROJECT': Schema({'PROJECT_TITLE': <class 'str'>, Optional('PROJECT_TYPE'): <class 'str'>, 'PROJECT_SUMMARY': <class 'str'>, 'INSTITUTE': <class 'str'>, Optional('DEPARTMENT'): <class 'str'>, Optional('LABORATORY'): <class 'str'>, 'LAST_NAME': <class 'str'>, 'FIRST_NAME': <class 'str'>, 'ADDRESS': <class 'str'>, 'EMAIL': <class 'str'>, 'PHONE': <class 'str'>, Optional('FUNDING_SOURCE'): <class 'str'>, Optional('PROJECT_COMMENTS'): <class 'str'>, Optional('PUBLICATIONS'): <class 'str'>, Optional('CONTRIBUTORS'): <class 'str'>, Optional('DOI'): <class 'str'>}), 'SAMPLEPREP': Schema({'SAMPLEPREP_SUMMARY': <class 'str'>, Optional('SAMPLEPREP_PROTOCOL_ID'): <class 'str'>, Optional('SAMPLEPREP_PROTOCOL_FILENAME'): <class 'str'>, Optional('SAMPLEPREP_PROTOCOL_COMMENTS'): <class 'str'>, Optional('PROCESSING_METHOD'): <class 'str'>, Optional('PROCESSING_STORAGE_CONDITIONS'): <class 'str'>, Optional('EXTRACTION_METHOD'): <class 'str'>, Optional('EXTRACT_CONCENTRATION_DILUTION'): <class 'str'>, Optional('EXTRACT_ENRICHMENT'): <class 'str'>, Optional('EXTRACT_CLEANUP'): <class 'str'>, Optional('EXTRACT_STORAGE'): <class 'str'>, Optional('SAMPLE_RESUSPENSION'): <class 'str'>, Optional('SAMPLE_DERIVATIZATION'): <class 'str'>, Optional('SAMPLE_SPIKING'): <class 'str'>, Optional('ORGAN'): <class 'str'>, Optional('ORGAN_SPECIFICATION'): <class 'str'>, Optional('CELL_TYPE'): <class 'str'>, Optional('SUBCELLULAR_LOCATION'): <class 'str'>}), 'STUDY': Schema({'STUDY_TITLE': <class 'str'>, Optional('STUDY_TYPE'): <class 'str'>, 'STUDY_SUMMARY': <class 'str'>, 'INSTITUTE': <class 'str'>, Optional('DEPARTMENT'): <class 'str'>, Optional('LABORATORY'): <class 'str'>, 'LAST_NAME': <class 'str'>, 'FIRST_NAME': <class 'str'>, 'ADDRESS': <class 'str'>, 'EMAIL': <class 'str'>, 'PHONE': <class 'str'>, Optional('NUM_GROUPS'): <class 'str'>, Optional('TOTAL_SUBJECTS'): <class 'str'>, Optional('NUM_MALES'): <class 'str'>, Optional('NUM_FEMALES'): <class 'str'>, Optional('STUDY_COMMENTS'): <class 'str'>, Optional('PUBLICATIONS'): <class 'str'>, Optional('SUBMIT_DATE'): <class 'str'>}), 'SUBJECT': Schema({'SUBJECT_TYPE': <class 'str'>, 'SUBJECT_SPECIES': <class 'str'>, Optional('TAXONOMY_ID'): <class 'str'>, Optional('GENOTYPE_STRAIN'): <class 'str'>, Optional('AGE_OR_AGE_RANGE'): <class 'str'>, Optional('WEIGHT_OR_WEIGHT_RANGE'): <class 'str'>, Optional('HEIGHT_OR_HEIGHT_RANGE'): <class 'str'>, Optional('GENDER'): <class 'str'>, Optional('HUMAN_RACE'): <class 'str'>, Optional('HUMAN_ETHNICITY'): <class 'str'>, Optional('HUMAN_TRIAL_TYPE'): <class 'str'>, Optional('HUMAN_LIFESTYLE_FACTORS'): <class 'str'>, Optional('HUMAN_MEDICATIONS'): <class 'str'>, Optional('HUMAN_PRESCRIPTION_OTC'): <class 'str'>, Optional('HUMAN_SMOKING_STATUS'): <class 'str'>, Optional('HUMAN_ALCOHOL_DRUG_USE'): <class 'str'>, Optional('HUMAN_NUTRITION'): <class 'str'>, Optional('HUMAN_INCLUSION_CRITERIA'): <class 'str'>, Optional('HUMAN_EXCLUSION_CRITERIA'): <class 'str'>, Optional('ANIMAL_ANIMAL_SUPPLIER'): <class 'str'>, Optional('ANIMAL_HOUSING'): <class 'str'>, Optional('ANIMAL_LIGHT_CYCLE'): <class 'str'>, Optional('ANIMAL_FEED'): <class 'str'>, Optional('ANIMAL_WATER'): <class 'str'>, Optional('ANIMAL_INCLUSION_CRITERIA'): <class 'str'>, Optional('CELL_BIOSOURCE_OR_SUPPLIER'): <class 'str'>, Optional('CELL_STRAIN_DETAILS'): <class 'str'>, Optional('SUBJECT_COMMENTS'): <class 'str'>, Optional('CELL_PRIMARY_IMMORTALIZED'): <class 'str'>, Optional('CELL_PASSAGE_NUMBER'): <class 'str'>, Optional('CELL_COUNTS'): <class 'str'>, Optional('SPECIES_GROUP'): <class 'str'>}), 'SUBJECT_SAMPLE_FACTORS': Schema([{'Subject ID': <class 'str'>, 'Sample ID': <class 'str'>, 'Factors': <class 'dict'>, Optional('Additional sample data'): {Optional('RAW_FILE_NAME'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}}]), 'TREATMENT': Schema({'TREATMENT_SUMMARY': <class 'str'>, Optional('TREATMENT_PROTOCOL_ID'): <class 'str'>, Optional('TREATMENT_PROTOCOL_FILENAME'): <class 'str'>, Optional('TREATMENT_PROTOCOL_COMMENTS'): <class 'str'>, Optional('TREATMENT'): <class 'str'>, Optional('TREATMENT_COMPOUND'): <class 'str'>, Optional('TREATMENT_ROUTE'): <class 'str'>, Optional('TREATMENT_DOSE'): <class 'str'>, Optional('TREATMENT_DOSEVOLUME'): <class 'str'>, Optional('TREATMENT_DOSEDURATION'): <class 'str'>, Optional('TREATMENT_VEHICLE'): <class 'str'>, Optional('ANIMAL_VET_TREATMENTS'): <class 'str'>, Optional('ANIMAL_ANESTHESIA'): <class 'str'>, Optional('ANIMAL_ACCLIMATION_DURATION'): <class 'str'>, Optional('ANIMAL_FASTING'): <class 'str'>, Optional('ANIMAL_ENDP_EUTHANASIA'): <class 'str'>, Optional('ANIMAL_ENDP_TISSUE_COLL_LIST'): <class 'str'>, Optional('ANIMAL_ENDP_TISSUE_PROC_METHOD'): <class 'str'>, Optional('ANIMAL_ENDP_CLINICAL_SIGNS'): <class 'str'>, Optional('HUMAN_FASTING'): <class 'str'>, Optional('HUMAN_ENDP_CLINICAL_SIGNS'): <class 'str'>, Optional('CELL_STORAGE'): <class 'str'>, Optional('CELL_GROWTH_CONTAINER'): <class 'str'>, Optional('CELL_GROWTH_CONFIG'): <class 'str'>, Optional('CELL_GROWTH_RATE'): <class 'str'>, Optional('CELL_INOC_PROC'): <class 'str'>, Optional('CELL_MEDIA'): <class 'str'>, Optional('CELL_ENVIR_COND'): <class 'str'>, Optional('CELL_HARVESTING'): <class 'str'>, Optional('PLANT_GROWTH_SUPPORT'): <class 'str'>, Optional('PLANT_GROWTH_LOCATION'): <class 'str'>, Optional('PLANT_PLOT_DESIGN'): <class 'str'>, Optional('PLANT_LIGHT_PERIOD'): <class 'str'>, Optional('PLANT_HUMIDITY'): <class 'str'>, Optional('PLANT_TEMP'): <class 'str'>, Optional('PLANT_WATERING_REGIME'): <class 'str'>, Optional('PLANT_NUTRITIONAL_REGIME'): <class 'str'>, Optional('PLANT_ESTAB_DATE'): <class 'str'>, Optional('PLANT_HARVEST_DATE'): <class 'str'>, Optional('PLANT_GROWTH_STAGE'): <class 'str'>, Optional('PLANT_METAB_QUENCH_METHOD'): <class 'str'>, Optional('PLANT_HARVEST_METHOD'): <class 'str'>, Optional('PLANT_STORAGE'): <class 'str'>, Optional('CELL_PCT_CONFLUENCE'): <class 'str'>, Optional('CELL_MEDIA_LASTCHANGED'): <class 'str'>})}, verbose=False, metabolites=True)[source]¶ Validate
mwTabformatted file.Parameters: Returns: Validated file.
Return type:
mwtab.mwrest¶
This module provides routines for accessing the Metabolomics Workbench REST API.
See https://www.metabolomicsworkbench.org/tools/MWRestAPIv1.0.pdf for details.
-
mwtab.mwrest.analysis_ids(base_url='https://www.metabolomicsworkbench.org/rest/')[source]¶ Method for retrieving a list of analysis ids for every current analysis in Metabolomics Workbench.
Parameters: base_url (str) – Base url to Metabolomics Workbench REST API. Returns: List of every available Metabolomics Workbench analysis identifier. Return type: list
-
mwtab.mwrest.study_ids(base_url='https://www.metabolomicsworkbench.org/rest/')[source]¶ Method for retrieving a list of study ids for every current study in Metabolomics Workbench.
Parameters: base_url (str) – Base url to Metabolomics Workbench REST API. Returns: List of every available Metabolomics Workbench study identifier. Return type: list
-
mwtab.mwrest.generate_mwtab_urls(input_items, base_url='https://www.metabolomicsworkbench.org/rest/', output_format='txt')[source]¶ Method for generating URLS to be used to retrieve mwtab files for analyses and studies through the REST API of the Metabolomics Workbench database.
Parameters: Returns: Metabolomics Workbench REST URL string(s).
Return type:
-
mwtab.mwrest.generate_urls(input_items, base_url='https://www.metabolomicsworkbench.org/rest/', **kwds)[source]¶ Method for creating a generator which yields validated Metabolomics Workbench REST urls.
Parameters: Returns: Metabolomics Workbench REST URL string(s).
Return type:
-
class
mwtab.mwrest.GenericMWURL(rest_params, base_url='https://www.metabolomicsworkbench.org/rest/')[source]¶ GenericMWURL class that stores and validates parameters specifying a Metabolomics Workbench REST URL.
- Metabolomics REST API requests are performed using URL requests in the form of
https://www.metabolomicsworkbench.org/rest/context/input_specification/output_specification
- where:
- if context = “study” | “compound” | “refmet” | “gene” | “protein”
- input_specification = input_item/input_value output_specification = output_item/[output_format]
- elif context = “moverz”
- input_specification = input_item/input_value1/input_value2/input_value3
- input_item = “LIPIDS” | “MB” | “REFMET” input_value1 = m/z_value input_value2 = ion_type_value input_value3 = m/z_tolerance_value
- output_specification = output_format
- output_format = “txt”
- elif context = “exactmass”
- input_specification = input_item/input_value1/input_value2
- input_item = “LIPIDS” | “MB” | “REFMET” input_value1 = LIPID_abbreviation input_value2 = ion_type_value
output_specification = None
-
class
mwtab.mwrest.MWRESTFile(source)[source]¶ MWRESTFile class that stores data from a single file download through Metabolomics Workbench’s REST API.
Mirrors
MWTabFile.-
read(filehandle)[source]¶ Read data into a
MWRESTFileinstance.Parameters: filehandle ( io.TextIOWrapper,gzip.GzipFile,bz2.BZ2File,zipfile.ZipFile) – file-like object.Returns: None Return type: None
-
write(filehandle)[source]¶ Write
MWRESTFiledata into file.Parameters: filehandle ( io.TextIOWrapper) – file-like object.Returns: None Return type: None
-
mwtab.mwextract¶
This module provides a number of functions and classes for extracting and saving data and metadata
stored in mwTab formatted files in the form of MWTabFile.
-
class
mwtab.mwextract.ItemMatcher(full_key, value_comparison)[source]¶ ItemMatcher class that can be called to match items from
mwTabformatted files in the form ofMWTabFile.
-
class
mwtab.mwextract.ReGeXMatcher(full_key, value_comparison)[source]¶ ReGeXMatcher class that can be called to match items from
mwTabformatted files in the form ofMWTabFileusing regular expressions.
-
mwtab.mwextract.generate_matchers(items)[source]¶ Construct a generator that yields Matchers
ItemMatcherorReGeXMatcher.Parameters: items (iterable) – Iterable object containing key value pairs to match. Returns: Yields a Matcher object for each given item. Return type: ItemMatcherorReGeXMatcher
-
mwtab.mwextract.extract_metabolites(sources, matchers)[source]¶ Extract metabolite data from
mwTabformatted files in the form ofMWTabFile.Parameters: - sources (generator) – Generator of mwtab file objects (
MWTabFile). - matchers (generator) – Generator of matcher objects (
ItemMatcheror
ReGeXMatcher). :return: Extracted metabolites dictionary. :rtype:dict- sources (generator) – Generator of mwtab file objects (
-
mwtab.mwextract.extract_metadata(mwtabfile, keys)[source]¶ Extract metadata data from
mwTabformatted files in the form ofMWTabFile.Parameters: Returns: Extracted metadata dictionary.
Return type:
-
mwtab.mwextract.write_metadata_csv(to_path, extracted_values, no_header=False)[source]¶ Write extracted metadata
dictinto csv file.Example: “metadata”,”value1”,”value2” “SUBJECT_TYPE”,”Human”,”Plant”
Parameters: Returns: None
Return type:
-
mwtab.mwextract.write_metabolites_csv(to_path, extracted_values, no_header=False)[source]¶ Write extracted metabolites data
dictinto csv file.Example: “metabolite_name”,”num-studies”,”num_analyses”,”num_samples” “1,2,4-benzenetriol”,”1”,”1”,”24” “1-monostearin”,”1”,”1”,”24” …
Parameters: Returns: None
Return type:
-
class
mwtab.mwextract.SetEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶ SetEncoder class for encoding Python sets
setinto json serializable objectslist.-
default(obj)[source]¶ Method for encoding Python objects. If object passed is a set, converts the set to JSON serializable lists or calls base implementation.
Parameters: obj (object) – Python object to be json encoded. Returns: JSON serializable object. Return type: dict,list,tuple,str,int,float,bool, orNone
-
-
mwtab.mwextract.write_json(to_path, extracted_dict)[source]¶ Write extracted data or metadata
dictinto json file.Metabolites example: {
- “1,2,4-benzenetriol”: {
- “ST000001”: {
- “AN000001”: [
- “LabF_115816”, …
]
}
}
}
Metadata example: {
- “SUBJECT_TYPE”: [
- “Plant”, “Human”
]
}
Parameters: Returns: None
Return type:
mwtab.mwschema¶
This module provides schema definitions for different sections of the
mwTab Metabolomics Workbench format.
-
mwtab.mwschema.metabolomics_workbench_schema¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.project_schema¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.study_schema¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.analysis_schema¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.subject_schema¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.subject_sample_factors_schema¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.collection_schema¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.treatment_schema¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.sampleprep_schema¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.chromatography_schema¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.ms_schema¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.nmr_schema¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.ms_metabolite_data_schema¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.nmr_binned_data_schema¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.