Welcome to mwtab’s documentation!¶
mwtab¶

The mwtab
package is a Python library that facilitates reading and writing
files in mwTab
format used by the Metabolomics Workbench for archival of
Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) experimental data.
The mwtab
package provides facilities to convert mwTab
formatted files into
their equivalent JSON
ized representation and vice versa. JSON
stands for JavaScript
Object Notation, an open-standard format that uses human-readable text to transmit
data objects consisting of attribute-value pairs.
The mwtab
package can be used in several ways:
- As a library for accessing and manipulating data stored in
mwTab
format files.- As a command-line tool to convert between
mwTab
format and its equivalentJSON
representation.
Citation¶
When using mwtab
package in published work, please cite the following papers:
- Powell, Christian D., and Hunter NB Moseley. “The mwtab Python Library for RESTful Access and Enhanced Quality Control, Deposition, and Curation of the Metabolomics Workbench Data Repository.” Metabolites 11.3 (2021): 163. doi: 10.3390/metabo11030163.
- Smelter, Andrey and Hunter NB Moseley. “A Python library for FAIRer access and deposition to the Metabolomics Workbench Data Repository.” Metabolomics 2018, 14(5): 64. doi: 10.1007/s11306-018-1356-6.
Links¶
- mwtab @ GitHub
- mwtab @ PyPI
- Documentation @ ReadTheDocs
Installation¶
The mwtab
package runs under Python 3.5+. Use pip to install.
Starting with Python 3.4, pip is included by default.
Install on Linux, Mac OS X¶
python3 -m pip install mwtab
Install on Windows¶
py -3 -m pip install mwtab
Upgrade on Linux, Mac OS X¶
python3 -m pip install mwtab --upgrade
Upgrade on Windows¶
py -3 -m pip install mwtab --upgrade
Quickstart¶
>>> import mwtab
>>>
>>> # Here we use ANALYSIS_ID of file to fetch data from URL
>>> for mwfile in mwtab.read_files("1", "2"):
... print("STUDY_ID:", mwfile.study_id)
... print("ANALYSIS_ID:", mwfile.analysis_id)
... print("SOURCE:", mwfile.source)
... print("Blocks:", list(mwfile.keys()))
>>>

Note
Read the User Guide and the mwtab
Tutorial on ReadTheDocs
to learn more and to see code examples on using the mwtab
as a
library and as a command-line tool.
Documentation index:¶
User Guide¶
Description¶
The mwtab
package is a Python library that facilitates reading and writing
files in mwTab
format used by the Metabolomics Workbench for archival of
Mass Spectrometry (MS) and Nuclear Magnetic Resonance (NMR) experimental data.
The mwtab
package provides facilities to convert mwTab
formatted files into
their equivalent JSONized (JavaScript Object Notation, an open-standard format that
uses human-readable text to transmit data objects consisting of attribute-value pairs)
representation and vice versa.
The mwtab
package can be used in several ways:
- As a library for accessing and manipulating data stored in
mwTab
format files.- As a command-line tool to convert between
mwTab
format and its equivalentJSON
representation.
Installation¶
The mwtab
package runs under Python 2.7 and Python 3.4+.
Starting with Python 3.4, pip is included by default. To install
system-wide with pip run the following:
Install on Linux, Mac OS X¶
python3 -m pip install mwtab
Install on Windows¶
py -3 -m pip install mwtab
Install inside virtualenv¶
For an isolated install, you can run the same inside a virtualenv.
$ virtualenv -p /usr/bin/python3 venv # create virtual environment, use python3 interpreter
$ source venv/bin/activate # activate virtual environment
$ python3 -m pip install mwtab # install mwtab as usual
$ deactivate # if you are done working in the virtual environment
Get the source code¶
Code is available on GitHub: https://github.com/MoseleyBioinformaticsLab/mwtab
You can either clone the public repository:
$ https://github.com/MoseleyBioinformaticsLab/mwtab.git
Or, download the tarball and/or zipball:
$ curl -OL https://github.com/MoseleyBioinformaticsLab/mwtab/tarball/master
$ curl -OL https://github.com/MoseleyBioinformaticsLab/mwtab/zipball/master
Once you have a copy of the source, you can embed it in your own Python package, or install it into your system site-packages easily:
$ python3 setup.py install
Dependencies¶
The mwtab
package depends on several Python libraries. The pip
command
will install all dependencies automatically, but if you wish to install them manually,
run the following commands:
Basic usage¶
The mwtab
package can be used in several ways:
As a library for accessing and manipulating data stored in
mwTab
formatted files.As a command-line tool:
- Convert from
mwTab
file format into its equivalentJSON
file format and vice versa.- Validate data stored in
mwTab
file based on schema definition.
Note
Read The mwtab Tutorial to learn more and see code examples on using the mwtab
as a library and as a command-line tool.
The mwtab Tutorial¶
The mwtab
package provides classes and other facilities for downloading,
parsing, accessing, and manipulating data stored in either the mwTab
or
JSON
representation of mwTab
files.
Also, the mwtab
package provides simple command-line interface to convert
between mwTab
and JSON
representations, download entries from
Metabolomics Workbench, access the MW REST interface, validate the consistency
of the mwTab
files, or extract metadata and metabolites from these files.
Brief mwTab Format Overview¶
Note
For full official specification see the following link (mwTab file specification
):
http://www.metabolomicsworkbench.org/data/tutorials.php
The mwTab
formatted files consist of multiple blocks. Each new block starts with #
.
- Some of the blocks contain only “key-value”-like pairs.
#METABOLOMICS WORKBENCH STUDY_ID:ST000001 ANALYSIS_ID:AN000001
VERSION 1
CREATED_ON 2016-09-17
#PROJECT
PR:PROJECT_TITLE FatB Gene Project
PR:PROJECT_TYPE Genotype treatment
PR:PROJECT_SUMMARY Experiment to test the consequence of a mutation at the FatB gene (At1g08510)
PR:PROJECT_SUMMARY the wound-response of Arabidopsis
Note
*_SUMMARY
“key-value”-like pairs are typically span through multiple lines.
#SUBJECT_SAMPLE_FACTORS
block is specially formatted, i.e. it contains header specification and tab-separated values.
#SUBJECT_SAMPLE_FACTORS: SUBJECT(optional)[tab]SAMPLE[tab]FACTORS(NAME:VALUE pairs separated by |)[tab]Additional sample data
SUBJECT_SAMPLE_FACTORS - LabF_115873 Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS - LabF_115878 Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS - LabF_115883 Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS - LabF_115888 Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS - LabF_115893 Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS - LabF_115898 Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
#MS_METABOLITE_DATA
(results) block containsSamples
identifiers,Factors
identifiers as well as tab-separated data between*_START
and*_END
.
#MS_METABOLITE_DATA
MS_METABOLITE_DATA:UNITS Peak height
MS_METABOLITE_DATA_START
Samples LabF_115904 LabF_115909 LabF_115914 LabF_115919 LabF_115924 LabF_115929 LabF_115842 LabF_115847 LabF_115852 LabF_115857 LabF_115862 LabF_115867 LabF_115873 LabF_115878 LabF_115883 LabF_115888 LabF_115893 LabF_115898 LabF_115811 LabF_115816 LabF_115821 LabF_115826 LabF_115831 LabF_115836
Factors Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded
1_2_4-benzenetriol 1874.0000 3566.0000 1945.0000 1456.0000 2004.0000 1995.0000 4040.0000 2432.0000 2189.0000 1931.0000 1307.0000 2880.0000 2218.0000 1754.0000 1369.0000 1201.0000 3324.0000 1355.0000 2257.0000 1718.0000 1740.0000 3472.0000 2054.0000 1367.0000
1-monostearin 987.0000 450.0000 1910.0000 549.0000 1032.0000 902.0000 393.0000 705.0000 100.0000 481.0000 265.0000 120.0000 1185.0000 867.0000 676.0000 569.0000 579.0000 387.0000 1035.0000 789.0000 875.0000 224.0000 641.0000 693.0000
...
MS_METABOLITE_DATA_END
#METABOLITES
metadata block contains a header specifying fields and tab-separated data between*_START
and*_END
.
#METABOLITES
METABOLITES_START
metabolite_name moverz_quant ri ri_type pubchem_id inchi_key kegg_id other_id other_id_type
1,2,4-benzenetriol 239 522741 Fiehn 10787 C02814 205673 BinBase
1-monostearin 399 959625 Fiehn 107036 D01947 202835 BinBase
2-hydroxyvaleric acid 131 310750 Fiehn 98009 218773 BinBase
3-phosphoglycerate 299 611619 Fiehn 724 C00597 217821 BinBase
...
METABOLITES_END
#NMR_BINNED_DATA
metadata block contains a header specifying fields and tab-separated data between*_START
and*_END
.
#NMR_BINNED_DATA
NMR_BINNED_DATA_START
Bin range(ppm) CDC029 CDC030 CDC032 CPL101 CPL102 CPL103 CPL201 CPL202 CPL203 CDS039 CDS052 CDS054
0.50...0.56 0.00058149 1.6592 0.039301 0 0 0 0.034018 0.0028746 0.0021478 0.013387 0 0
0.56...0.58 0 0.74267 0 0.007206 0 0 0 0 0 0 0 0.0069721
0.58...0.60 0.051165 0.8258 0.089149 0.060972 0.026307 0.045697 0.069541 0 0 0.14516 0.057489 0.042255
...
NMR_BINNED_DATA_END
- Order of metadata and data blocks (MS)
#METABOLOMICS WORKBENCH
VERSION 1
CREATED_ON 2016-09-17
...
#PROJECT
...
#STUDY
...
#SUBJECT
...
#SUBJECT_SAMPLE_FACTORS: SUBJECT(optional)[tab]SAMPLE[tab]FACTORS(NAME:VALUE pairs separated by |)[tab]Additional sample data
...
#COLLECTION
...
#TREATMENT
...
#SAMPLEPREP
...
#CHROMATOGRAPHY
...
#ANALYSIS
...
#MS
...
#MS_METABOLITE_DATA
MS_METABOLITE_DATA:UNITS peak area
MS_METABOLITE_DATA_START
...
MS_METABOLITE_DATA_END
#METABOLITES
METABOLITES_START
...
METABOLITES_END
#END
Using mwtab as a Library¶
Importing mwtab Package¶
If the mwtab
package is installed on the system, it can be imported:
[1]:
import mwtab
Constructing MWTabFile Generator¶
The fileio
module provides the read_files()
generator function that yields MWTabFile
instances. Constructing a
MWTabFile
generator is easy - specify the path to a local mwTab
file,
directory of files, archive of files:
[2]:
import mwtab
mwfile_gen = mwtab.read_files("ST000017_AN000035.txt") # single mwTab file
mwfiles_gen = mwtab.read_files("ST000017_AN000035.txt", "ST000040_AN000060.json") # several mwTab files
mwdir_gen = mwtab.read_files("mwfiles_dir_mwtab") # directory of mwTab files
mwzip_gen = mwtab.read_files("mwfiles_mwtab.zip") # archive of mwTab files
mwanalysis_gen = mwtab.read_files("35", "60") # ANALYSIS_ID of mwTab files
# REST callable url of mwTab file
mwurl_gen = mwtab.read_files("https://www.metabolomicsworkbench.org/rest/study/analysis_id/AN000035/mwtab/txt")
Processing MWTabFile Generator¶
The MWTabFile
generator can be processed in several ways:
- Feed it to a for-loop and process one file at a time:
[3]:
for mwfile in mwtab.read_files("35", "60"):
print("STUDY_ID:", mwfile.study_id) # print STUDY_ID
print("ANALYSIS_ID", mwfile.analysis_id) # print ANALYSIS_ID
print("SOURCE", mwfile.source) # print source
for block_name in mwfile: # print names of blocks
print("\t", block_name)
STUDY_ID: ST000017
ANALYSIS_ID AN000035
SOURCE https://www.metabolomicsworkbench.org/rest/study/analysis_id/AN000035/mwtab/txt
METABOLOMICS WORKBENCH
PROJECT
STUDY
SUBJECT
SUBJECT_SAMPLE_FACTORS
COLLECTION
TREATMENT
SAMPLEPREP
CHROMATOGRAPHY
ANALYSIS
MS
MS_METABOLITE_DATA
STUDY_ID: ST000040
ANALYSIS_ID AN000060
SOURCE https://www.metabolomicsworkbench.org/rest/study/analysis_id/AN000060/mwtab/txt
METABOLOMICS WORKBENCH
PROJECT
STUDY
SUBJECT
SUBJECT_SAMPLE_FACTORS
COLLECTION
TREATMENT
SAMPLEPREP
CHROMATOGRAPHY
ANALYSIS
MS
MS_METABOLITE_DATA
Note
Once the generator is consumed, it becomes empty and needs to be created again.
[4]:
mwfiles_generator = mwtab.read_files("35", "60")
mwfile1 = next(mwfiles_generator)
mwfile2 = next(mwfiles_generator)
Note
Once the generator is consumed, StopIteration
will be raised.
[5]:
mwfiles_generator = mwtab.read_files("35", "60")
mwfiles_list = list(mwfiles_generator)
Accessing Data From a Single MWTabFile¶
Since a MWTabFile
is a Python collections.OrderedDict
,
data can be accessed and manipulated as with any regular Python dict
object
using bracket accessors.
- Accessing top-level “keys” in
MWTabFile
:
[7]:
mwfile = next(mwtab.read_files("ST000017_AN000035.txt"))
# list MWTabFile-level keys, i.e. saveframe names
list(mwfile.keys())
[7]:
['METABOLOMICS WORKBENCH',
'PROJECT',
'STUDY',
'SUBJECT',
'SUBJECT_SAMPLE_FACTORS',
'COLLECTION',
'TREATMENT',
'SAMPLEPREP',
'CHROMATOGRAPHY',
'ANALYSIS',
'MS',
'MS_METABOLITE_DATA']
- Accessing individual blocks in
MWTabFile
:
[8]:
# access "PROJECT" block
mwfile["PROJECT"]
[8]:
OrderedDict([('PROJECT_TITLE', 'Rat Stamina Studies'),
('PROJECT_TYPE', 'Feeding'),
('PROJECT_SUMMARY', 'Stamina in rats'),
('INSTITUTE', 'University of Michigan'),
('DEPARTMENT', 'Internal Medicine'),
('LABORATORY', 'Burant Lab'),
('LAST_NAME', 'Beecher'),
('FIRST_NAME', 'Chris'),
('ADDRESS', '-'),
('EMAIL', 'chrisbee@med.umich.edu'),
('PHONE', '734-232-0815'),
('FUNDING_SOURCE', 'NIH: R01 DK077200')])
- Accessing individual “key-value” pairs within blocks:
[9]:
# access "INSTITUTE" field within "PROJECT" block
mwfile["PROJECT"]["INSTITUTE"]
[9]:
'University of Michigan'
- Accessing data in
#SUBJECT_SAMPLE_FACTORS
block:
[10]:
# access "SUBJECT_SAMPLE_FACTORS" block and print first three
mwfile["SUBJECT_SAMPLE_FACTORS"][:3]
[10]:
[OrderedDict([('Subject ID', '-'),
('Sample ID', 'S00009477'),
('Factors',
{'Feeeding': 'Ad lib', 'Running Capacity': 'High'})]),
OrderedDict([('Subject ID', '-'),
('Sample ID', 'S00009478'),
('Factors',
{'Feeeding': 'Ad lib', 'Running Capacity': 'High'})]),
OrderedDict([('Subject ID', '-'),
('Sample ID', 'S00009479'),
('Factors',
{'Feeeding': 'Ad lib', 'Running Capacity': 'High'})])]
[11]:
# access individual factors (by index)
mwfile["SUBJECT_SAMPLE_FACTORS"][0]
[11]:
OrderedDict([('Subject ID', '-'),
('Sample ID', 'S00009477'),
('Factors', {'Feeeding': 'Ad lib', 'Running Capacity': 'High'})])
[12]:
# access individual fields within factors
mwfile["SUBJECT_SAMPLE_FACTORS"][0]["Sample ID"]
[12]:
'S00009477'
- Accessing data in
#MS_METABOLITE_DATA
block:
[13]:
# access data block keys
list(mwfile["MS_METABOLITE_DATA"].keys())
[13]:
['Units', 'Data', 'Metabolites']
[14]:
# access units field
mwfile["MS_METABOLITE_DATA"]["Units"]
[14]:
'peak area'
[15]:
# access samples field (by index)
mwfile["MS_METABOLITE_DATA"]["Data"][0].keys()
[15]:
odict_keys(['Metabolite', 'S00009477', 'S00009478', 'S00009479', 'S00009480', 'S00009481', 'S00009500', 'S00009501', 'S00009502', 'S00009503', 'S00009470', 'S00009471', 'S00009472', 'S00009473', 'S00009474', 'S00009475', 'S00009494', 'S00009495', 'S00009496', 'S00009497', 'S00009498', 'S00009499', 'S00009488', 'S00009489', 'S00009490', 'S00009491', 'S00009492', 'S00009493', 'S00009509', 'S00009510', 'S00009511', 'S00009512', 'S00009513', 'S00009514', 'S00009482', 'S00009483', 'S00009484', 'S00009486', 'S00009504', 'S00009505', 'S00009506', 'S00009507', 'S00009508'])
[16]:
# access metabolite data and print first three
mwfile["MS_METABOLITE_DATA"]["Metabolites"][:3]
[16]:
[OrderedDict([('Metabolite', '11BETA,21-DIHYDROXY-5BETA-PREGNANE-3,20-DIONE'),
('moverz_quant', ''),
('ri', ''),
('ri_type', ''),
('pubchem_id', '44263339'),
('inchi_key', ''),
('kegg_id', 'C05475'),
('other_id', '775216_UNIQUE'),
('other_id_type', 'UM_Target_ID')]),
OrderedDict([('Metabolite', '11-BETA-HYDROXYANDROST-4-ENE-3,17-DIONE'),
('moverz_quant', ''),
('ri', ''),
('ri_type', ''),
('pubchem_id', '94141'),
('inchi_key', ''),
('kegg_id', 'C05284'),
('other_id', '771312_PRIMARY'),
('other_id_type', 'UM_Target_ID')]),
OrderedDict([('Metabolite', '13(S)-HPODE'),
('moverz_quant', ''),
('ri', ''),
('ri_type', ''),
('pubchem_id', '1426'),
('inchi_key', ''),
('kegg_id', 'C04717'),
('other_id', '775541_UNIQUE'),
('other_id_type', 'UM_Target_ID')])]
Manipulating Data From a Single MWTabFile¶
In order to change values within MWTabFile
, descend into
the appropriate level using square bracket accessors and set a new value.
- Change regular “key-value” pairs:
[17]:
# access phone number information
mwfile["PROJECT"]["PHONE"]
[17]:
'734-232-0815'
[18]:
# change phone number information
mwfile["PROJECT"]["PHONE"] = "1-530-754-8258"
[19]:
# check that it has been modified
mwfile["PROJECT"]["PHONE"]
[19]:
'1-530-754-8258'
- Change
#SUBJECT_SAMPLE_FACTORS
values:
[20]:
# access the first subject sample factor by index
mwfile["SUBJECT_SAMPLE_FACTORS"][0]
[20]:
OrderedDict([('Subject ID', '-'),
('Sample ID', 'S00009477'),
('Factors', {'Feeeding': 'Ad lib', 'Running Capacity': 'High'})])
[21]:
# provide additional details to the first subject sample factor
mwfile["SUBJECT_SAMPLE_FACTORS"][0]["Additional sample data"] = {"Additional detail key": "Additional detail value"}
[22]:
# check that it has been modified
mwfile["SUBJECT_SAMPLE_FACTORS"][0]
[22]:
OrderedDict([('Subject ID', '-'),
('Sample ID', 'S00009477'),
('Factors', {'Feeeding': 'Ad lib', 'Running Capacity': 'High'}),
('Additional sample data',
{'Additional detail key': 'Additional detail value'})])
Printing a MWTabFile and its Components¶
MWTabFile
objects provide the print_file()
method which can be used to output the file in either mwTab or JSON format. The method takes a file_format
keyword argument which specifices the output format to be displayed.
The MWTabFile can be printed to output in mwTab format in its entirety using:
- mwfile.print_file(file_format=”mwtab”)
- Print the first 20 lines in
mwTab
format.
[23]:
from io import StringIO
mwtab_file_str = StringIO()
mwfile.print_file(file_format="mwtab", f=mwtab_file_str)
# print out first 20 lines
print("\n".join(mwtab_file_str.getvalue().split("\n")[:20]))
#METABOLOMICS WORKBENCH STUDY_ID:ST000017 ANALYSIS_ID:AN000035 PROJECT_ID:PR000016
VERSION 1
CREATED_ON 2016-09-17
#PROJECT
PR:PROJECT_TITLE Rat Stamina Studies
PR:PROJECT_TYPE Feeding
PR:PROJECT_SUMMARY Stamina in rats
PR:INSTITUTE University of Michigan
PR:DEPARTMENT Internal Medicine
PR:LABORATORY Burant Lab
PR:LAST_NAME Beecher
PR:FIRST_NAME Chris
PR:ADDRESS -
PR:EMAIL chrisbee@med.umich.edu
PR:PHONE 1-530-754-8258
PR:FUNDING_SOURCE NIH: R01 DK077200
#STUDY
ST:STUDY_TITLE Rat HCR/LCR Stamina Study
ST:STUDY_TYPE LC-MS analysis
ST:STUDY_SUMMARY To determine the basis of running capacity and health differences in outbread
The MWTabFile can be printed to output in JSON format in its entirety using:
- mwfile.print_file(file_format=”json”)
- Print the first 20 lines in
JSON
format.
[24]:
from io import StringIO
mwtab_file_str = StringIO()
mwfile.print_file(file_format="json", f=mwtab_file_str)
# print out first 20 lines
print("\n".join(mwtab_file_str.getvalue().split("\n")[:20]))
{
"METABOLOMICS WORKBENCH": {
"STUDY_ID": "ST000017",
"ANALYSIS_ID": "AN000035",
"PROJECT_ID": "PR000016",
"VERSION": "1",
"CREATED_ON": "2016-09-17"
},
"PROJECT": {
"PROJECT_TITLE": "Rat Stamina Studies",
"PROJECT_TYPE": "Feeding",
"PROJECT_SUMMARY": "Stamina in rats",
"INSTITUTE": "University of Michigan",
"DEPARTMENT": "Internal Medicine",
"LABORATORY": "Burant Lab",
"LAST_NAME": "Beecher",
"FIRST_NAME": "Chris",
"ADDRESS": "-",
"EMAIL": "chrisbee@med.umich.edu",
"PHONE": "1-530-754-8258",
- Print single block in
mwTab
format.
[25]:
mwfile.print_block("STUDY", file_format="mwtab")
ST:STUDY_TITLE Rat HCR/LCR Stamina Study
ST:STUDY_TYPE LC-MS analysis
ST:STUDY_SUMMARY To determine the basis of running capacity and health differences in outbread
ST:STUDY_SUMMARY N/NIH rats selected for high capacity (HCR) and low capacity (LCR) running (a for
ST:STUDY_SUMMARY VO2max) (see:Science. 2005 Jan 21;307(5708):418-20). Plasma collected at 12 of
ST:STUDY_SUMMARY age in generation 28 rats after ad lib feeding or 40% caloric restriction at week
ST:STUDY_SUMMARY 8 of age. All animals fasted 4 hours prior to collection between 5-8
ST:INSTITUTE University of Michigan
ST:DEPARTMENT Internal Medicine
ST:LABORATORY Burant Lab (MMOC)
ST:LAST_NAME Qi
ST:FIRST_NAME Nathan
ST:ADDRESS -
ST:EMAIL nathanqi@med.umich.edu
ST:PHONE 734-232-0815
ST:NUM_GROUPS 2
ST:TOTAL_SUBJECTS 42
- Print single block in
JSON
format.
[26]:
mwfile.print_block("STUDY", file_format="json")
{
"STUDY_TITLE": "Rat HCR/LCR Stamina Study",
"STUDY_TYPE": "LC-MS analysis",
"STUDY_SUMMARY": "To determine the basis of running capacity and health differences in outbread N/NIH rats selected for high capacity (HCR) and low capacity (LCR) running (a for VO2max) (see:Science. 2005 Jan 21;307(5708):418-20). Plasma collected at 12 of age in generation 28 rats after ad lib feeding or 40% caloric restriction at week 8 of age. All animals fasted 4 hours prior to collection between 5-8",
"INSTITUTE": "University of Michigan",
"DEPARTMENT": "Internal Medicine",
"LABORATORY": "Burant Lab (MMOC)",
"LAST_NAME": "Qi",
"FIRST_NAME": "Nathan",
"ADDRESS": "-",
"EMAIL": "nathanqi@med.umich.edu",
"PHONE": "734-232-0815",
"NUM_GROUPS": "2",
"TOTAL_SUBJECTS": "42"
}
Writing data from a MWTabFile object into a file¶
Data from a MWTabFile
can be written into file
in original mwTab
format or in equivalent JSON format using
write()
:
- Writing into a
mwTab
formatted file:
[27]:
with open("out/ST000017_AN000035_modified.txt", "w") as outfile:
mwfile.write(outfile, file_format="mwtab")
- Writing into a
JSON
file:
[28]:
with open("out/ST000017_AN000035_modified.json", "w") as outfile:
mwfile.write(outfile, file_format="json")
Extracting Metadata and Metabolites from mwTab Files¶
The mwtab.mwextract
module can be used to extract metadata from mwTab
files. The module contains two main methods: 1)
extract_metadata()
which can be used to parse metadata
values from a mwTab
file, and 2)
extract_metabolites()
which can be used to gather a
list of metabolites and samples containing the found metabolites from multiple
mwTab
files which contain a given metadata key value pair.
Extracting Metadata Values¶
- Extracting metadata values from a given
mwTab
file:
[29]:
from mwtab.mwextract import extract_metadata
extract_metadata(mwfile, ["STUDY_TYPE", "SUBJECT_TYPE"])
[29]:
{'STUDY_TYPE': {'LC-MS analysis'}, 'SUBJECT_TYPE': {'Animal'}}
Extracting Metabolites Values¶
- Extracting metabolite information from multiple
mwTab
files and outputing the first three metabolites:
[30]:
from mwtab.mwextract import extract_metabolites, generate_matchers
from mwtab import read_files
mwtab_gen = read_files(
"ST000017_AN000035.txt",
"ST000040_AN000060.txt"
)
matchers = generate_matchers([
("ST:STUDY_TYPE",
"LC-MS analysis")
])
list(extract_metabolites(mwtab_gen, matchers).keys())[:3]
[30]:
['11BETA_21-DIHYDROXY-5BETA-PREGNANE-3_20-DIONE',
'11-BETA-HYDROXYANDROST-4-ENE-3_17-DIONE',
'13(S)-HPODE']
- Extracting metabolite information from multiple
mwTab
files using regualar expressions and outputing the first three metabolites:
[31]:
from mwtab.mwextract import extract_metabolites, generate_matchers
from mwtab import read_files
from re import compile
mwtab_gen = read_files(
"ST000017_AN000035.txt",
"ST000040_AN000060.txt"
)
matchers = generate_matchers([
("ST:STUDY_TYPE",
compile("(LC-MS)"))
])
list(extract_metabolites(mwtab_gen, matchers).keys())[:3]
[31]:
['11BETA_21-DIHYDROXY-5BETA-PREGNANE-3_20-DIONE',
'11-BETA-HYDROXYANDROST-4-ENE-3_17-DIONE',
'13(S)-HPODE']
Converting mwTab Files¶
mwTab
files can be converted between the mwTab
file format and their JSON
representation using the mwtab.converter
module.
One-to-one file conversions¶
- Converting from the
mwTab
file format into its equivalentJSON
file format:
[32]:
from mwtab.converter import Converter
# Using valid ANALYSIS_ID to access file from URL: from_path="1"
converter = Converter(from_path="35", to_path="out/ST000017_AN000035.json",
from_format="mwtab", to_format="json")
converter.convert()
- Converting from JSON file format back to
mwTab
file format:
[33]:
from mwtab.converter import Converter
converter = Converter(from_path="out/ST000017_AN000035.json", to_path="out/ST000017_AN000035.txt",
from_format="json", to_format="mwtab")
converter.convert()
Many-to-many files conversions¶
- Converting from the directory of
mwTab
formatted files into their equivalentJSON
formatted files:
[34]:
from mwtab.converter import Converter
converter = Converter(from_path="mwfiles_dir_mwtab",
to_path="out/mwfiles_dir_json",
from_format="mwtab",
to_format="json")
converter.convert()
- Converting from the directory of
JSON
formatted files into their equivalentmwTab
formatted files:
[35]:
from mwtab.converter import Converter
converter = Converter(from_path="out/mwfiles_dir_json",
to_path="out/mwfiles_dir_mwtab",
from_format="json",
to_format="mwtab")
converter.convert()
Note
Many-to-many files and one-to-one file conversions are available.
See mwtab.converter
for full list of available conversions.
Command-Line Interface¶
- The mwtab Command-Line Interface provides the following functionality:
- Convert from the
mwTab
file format into its equivalentJSON
file format and vice versa. - Download files through Metabolomics Workbench’s REST API.
- Validate the
mwTab
formatted file. - Extract metadata and metabolite information from downloaded files.
- Convert from the
[36]:
! mwtab --help
The mwtab command-line interface
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Usage:
mwtab -h | --help
mwtab --version
mwtab convert (<from-path> <to-path>) [--from-format=<format>] [--to-format=<format>] [--validate] [--mw-rest=<url>] [--verbose]
mwtab validate <from-path> [--mw-rest=<url>] [--verbose]
mwtab download url <url> [--to-path=<path>] [--verbose]
mwtab download study all [--to-path=<path>] [--input-item=<item>] [--output-format=<format>] [--mw-rest=<url>] [--validate] [--verbose]
mwtab download study <input-value> [--to-path=<path>] [--input-item=<item>] [--output-item=<item>] [--output-format=<format>] [--mw-rest=<url>] [--validate] [--verbose]
mwtab download (study | compound | refmet | gene | protein) <input-item> <input-value> <output-item> [--output-format=<format>] [--to-path=<path>] [--mw-rest=<url>] [--verbose]
mwtab download moverz <input-item> <m/z-value> <ion-type-value> <m/z-tolerance-value> [--to-path=<path>] [--mw-rest=<url>] [--verbose]
mwtab download exactmass <LIPID-abbreviation> <ion-type-value> [--to-path=<path>] [--mw-rest=<url>] [--verbose]
mwtab extract metadata <from-path> <to-path> <key> ... [--to-format=<format>] [--no-header]
mwtab extract metabolites <from-path> <to-path> (<key> <value>) ... [--to-format=<format>] [--no-header]
Options:
-h, --help Show this screen.
--version Show version.
--verbose Print what files are processing.
--validate Validate the mwTab file.
--from-format=<format> Input file format, available formats: mwtab, json [default: mwtab].
--to-format=<format> Output file format [default: json].
Available formats for convert:
mwtab, json.
Available formats for extract:
json, csv.
--mw-rest=<url> URL to MW REST interface
[default: https://www.metabolomicsworkbench.org/rest/].
--context=<context> Type of resource to access from MW REST interface, available contexts: study,
compound, refmet, gene, protein, moverz, exactmass [default: study].
--input-item=<item> Item to search Metabolomics Workbench with.
--output-item=<item> Item to be retrieved from Metabolomics Workbench.
--output-format=<format> Format for item to be retrieved in, available formats: mwtab, json.
--no-header Include header at the top of csv formatted files.
For extraction <to-path> can take a "-" which will use stdout.
Converting mwTab
files in bulk¶
CLI one-to-one file conversions¶
- Convert from a local file in
mwTab
format to a local file inJSON
format:
[37]:
! mwtab convert ST000017_AN000035.txt out/ST000017_AN000035.json \
--from-format=mwtab --to-format=json
- Convert from a local file in
JSON
format to a local file inmwTab
format:
[38]:
! mwtab convert ST000017_AN000035.json out/ST000017_AN000035.txt \
--from-format=json --to-format=mwtab
- Convert from a compressed local file in
mwTab
format to a compressed local file inJSON
format:
[39]:
! mwtab convert ST000017_AN000035.txt.gz out/ST000017_AN000035.json.gz \
--from-format=mwtab --to-format=json
- Convert from a compressed local file in
JSON
format to a compressed local file inmwTab
format:
[40]:
! mwtab convert ST000017_AN000035.json.gz out/ST000017_AN000035.txt.gz \
--from-format=json --to-format=mwtab
- Convert from an uncompressed URL file in
mwTab
format to a compressed local file inJSON
format:
[41]:
! mwtab convert 35 out/ST000017_AN000035.json.bz2 \
--from-format=mwtab --to-format=json
Note
See mwtab.converter
for full list of available conversions.
CLI Many-to-many files conversions¶
- Convert from a directory of files in
mwTab
format to a directory of files inJSON
format:
[42]:
! mwtab convert mwfiles_dir_mwtab out/mwfiles_dir_json \
--from-format=mwtab --to-format=json
- Convert from a directory of files in
JSON
format to a directory of files inmwTab
format:
[43]:
! mwtab convert mwfiles_dir_json out/mwfiles_dir_mwtab \
--from-format=json --to-format=mwtab
- Convert from a directory of files in
mwTab
format to a zip archive of files inJSON
format:
[44]:
! mwtab convert mwfiles_dir_mwtab out/mwfiles_json.zip \
--from-format=mwtab --to-format=json
- Convert from a compressed tar archive of files in
JSON
format to a directory of files inmwTab
format:
[45]:
! mwtab convert mwfiles_json.tar.gz out/mwfiles_dir_mwtab \
--from-format=json --to-format=mwtab
- Convert from a zip archive of files in
mwTab
format to a compressed tar archive of files inJSON
format:
[46]:
! mwtab convert mwfiles_mwtab.zip out/mwfiles_json.tar.bz2 \
--from-format=mwtab --to-format=json
Note
See mwtab.converter
for full list of available conversions.
Download files through Metabolomics Workbenchs REST API¶
The mwtab
package provides the mwtab.mwrest
module, which contains a number of functions and classes for working with Metabolomics Workbenchs REST API.
Note
For full official REST API specification see the following link (MW REST API (v1.0, 5/7/2019)
):
https://www.metabolomicsworkbench.org/tools/MWRestAPIv1.0.pdf
Download by URL¶
- To download a file based on a given url, simply call the
download url
command with the desired URL and provide an output path:
[47]:
! mwtab download url "https://www.metabolomicsworkbench.org/rest/study/analysis_id/AN000035/mwtab/txt" --to-path=out/ST000017_AN000035.txt
- To download single analysis
mwTab
files, simply calldownload study
and specifiy the analysis ID:
[48]:
! mwtab download study AN000035 --to-path=out/ST000017_AN000035.txt
- To download an entire study
mwTab
file, simply calldownload study
and specifiy the study ID:
[49]:
! mwtab download study ST000017 --to-path=out/ST000017_AN000035.txt
Note
It is possible to validate downloaded files by adding the --validate
option to the command line.
Download study, compound, refmet, gene, and protein files¶
- To download study, compound, refmet, gene, and protein context files, call the
download
command and specify the context, input iten, input value, and output item (optionally specifiy the output format). - Download a study:
[50]:
! mwtab download study analysis_id AN000035 mwtab --output-format=txt --to-path=out/ST000017_AN000035.txt
- Download compound:
[51]:
! mwtab download compound regno 11 name --to-path=out/tmp.txt
- Download refmet:
[52]:
! mwtab download refmet name Cholesterol all --to-path=out/tmp.txt
- Download gene:
[53]:
! mwtab download gene gene_symbol acaca all --to-path=out/tmp.txt
- Download protein:
[54]:
! mwtab download protein uniprot_id Q13085 all --to-path=out/tmp.txt
Download all mwTab
formatted files¶
The mwTab
package provides contains a number of command line functions for downloading Metabolomics mwtab
formatted files through the Workbenchs REST API.
- To download all available analysis files, simply call the
download study all
command:
! mwtab download study all
- It is also possible to download all study files by calling the
download study all
command and providing an input item and output path:
! mwtab download study all –input-item=study_id
Download moverz and exactmass¶
- To download moverz files, call the
download moverz
command and specify the input value (LIPIDS, MB, or REFMET), m/z value, ion type value, and m/z tolerance value.
[55]:
! mwtab download moverz MB 635.52 M+H 0.5 --to-path=out/tmp.txt
- To download exactmass files, call the
download exactmass
command and specify the LIPID abbreviation and ion type value.
[56]:
! mwtab download exactmass "PC(34:1)" M+H --to-path=out/tmp.txt
Note
It is not necessary to specify an output format for exactmass files.
Extracting metabolite data and metadata from mwTab
files¶
The mwtab
package provides the extract_metabolites()
and extract_metadata()
functions that can parse mwTab
formatted files. The extract_metabolites()
takes a source (list of mwTab
file) and list of metadata key-value pairs that are used to search for mwTab
files which contain the given metadata pairs. The extract_metadata()
takes a source (list of mwTab
file) and list of metadata keys which are used to search the mwTab
files for possible values to the given keys.
- To extract metabolite from
mwTab
files in a directory, call theextract metabolites
command and provide a list of metadata key value pairs along with an output path and output format:
[57]:
! mwtab extract metabolites mwfiles_dir_mwtab out/output_file.csv SU:SUBJECT_TYPE Plant --to-format=csv
Note
It is possible to use ReGeXs to match the metadata value (eg. … SU:SUBJECT_TYPE “r’(Plant)’”).
- To extract metadata from
mwTab
files in a directory call theextract metadata
command and provide a list of metadata keys along with an output path and output format:
[58]:
! mwtab extract metadata mwfiles_dir_json out/output_file.json SUBJECT_TYPE --to-format=json
Validating mwTab
files¶
The mwtab
package provides the validate_file()
function
that can validate files based on a JSON
schema definition. The mwtab.mwschema
contains schema definitions for every block of mwTab
formatted file, i.e.
it lists the types of attributes (e.g. str
as well as specifies which keys are
optional and which are required).
- To validate file(s), simply call the
validate
command and provide path to file(s):
[59]:
! mwtab validate 35
Using the mwtab Python Package to Find Analyses Involving a Specific Disease or Condition¶
The Metabolomics Workbench data repository stores mass spectroscopy and nuclear magnetic resonanse experimental data and metadata in mwTab
formatted files. Metabolomics Workbench also provides a number of tools for searching or analyzing mwTab
files. The mwtab Python package can also be used to perform similar functions through both a programmatic API and command-line interface, which has more search flexibility.
- In order to search the repository of
mwTab
files for analyses associated with a specific disease, Metabolomics Workbench provides a web-based interface:
The mwtab Python package can be used in a number of ways to similar effect. The package provides the extract_metabolites()
method to extract and organize metabolites from multiple mwTab
files through both Python scripts and a command-line interface. This method has more search flexibility, since it can take either a search string or a regular expression.
Using mwtab package API to extract study IDs, analysis IDs, and metabolites¶
The extract_metabolites()
method takes two parameters: 1) a iterable of MWTabFile
instances and 2) an iterable of ItemMatcher
or ReGeXMatcher
instances. The iterable of MWTabFile
instances can be created using byt passing mwTab
file sources (filenames, analysis IDs, etc.) to the read_files()
method. The iterable of matcher instances can be created using the generate_matchers()
method.
- An example of using the mwtab package API to extract data from analyses associated with diabetes and output the first three metabolites:
[60]:
from mwtab.mwextract import extract_metabolites, generate_matchers
from mwtab import read_files
import re
mwtab_gen = read_files("diabetes/")
matchers = generate_matchers([
("ST:STUDY_SUMMARY",
re.compile("(diabetes)"))
])
list(extract_metabolites(mwtab_gen, matchers).keys())[:3]
[60]:
['1_5-anhydroglucitol', '1-monopalmitin', '1-monostearin']
Using mwtab CLI to extract study IDs, analysis IDs, and metabolites¶
The mwtab command line interface includes a mwtab extract metabolites
method which takes a directory of mwTab
files, an output path to save the extracted data in, and a series of mwTab
section item keys and values to be matched (either string values or regular expressions). Additionally an output format can be specified.
mwtab extract metabolites <from-path> <to-path> (<key> <value>) … [–to-format=<format>] [–no-header]
- An example of using the mwtab CLI to extract data from analyses associated with diabetes:
[61]:
! mwtab extract metabolites diabetes/ out/output_file.json ST:STUDY_SUMMARY "r'(?i)(diabetes)'" --to-format=json
The mwtab API Reference¶
Routines for working with mwTab
format files used by the
Metabolomics Workbench.
This package includes the following modules:
mwtab
- This module provides the
MWTabFile
class which is a python dictionary representation of a Metabolomics Workbench mwtab file. Data can be accessed directly from theMWTabFile
instance using bracket accessors. cli
- This module provides command-line interface for the
mwtab
package. tokenizer
- This module provides the
tokenizer()
generator that generates tuples of key-value pairs from mwtab files. fileio
- This module provides the
read_files()
generator to open files from different sources (single file/multiple files on a local machine, directory/archive of files, URL address of a file). converter
- This module provides the
Converter
class that is responsible for the conversion ofmwTab
formated files into their JSON representation and vice versa. mwschema
- This module provides JSON schema definitions for the
mwTab
formatted files, i.e. specifies required and optional keys as well as data types. validator
- This module provides routines to validate
mwTab
formatted files based on schema definitions as well as checks for file self-consistency. mwrest
- This module provides the
GenericMWURL
class which is a python dictionary representation of a Metabolomics Workbench REST URL. The class is used to validate query parameters and to generate a URL path which can be used to request data from Metabolomics Workbench through their REST API.
mwtab.mwtab¶
This module provides the MWTabFile
class
that stores the data from a single mwTab
formatted file in the
form of an OrderedDict
. Data can be accessed
directly from the MWTabFile
instance using
bracket accessors.
The data is divided into a series of “sections” which each contain a
number of “key-value”-like pairs. Also, the file contains a specially
formatted SUBJECT_SAMPLE_FACTOR
block and blocks of data between
*_START
and *_END
.
-
class
mwtab.mwtab.
MWTabFile
(source, *args, **kwds)[source]¶ MWTabFile class that stores data from a single
mwTab
formatted file in the form ofcollections.OrderedDict
.-
read
(filehandle)[source]¶ Read data into a
MWTabFile
instance.Parameters: filehandle ( io.TextIOWrapper
,gzip.GzipFile
,bz2.BZ2File
,zipfile.ZipFile
) – file-like object.Returns: None Return type: None
-
write
(filehandle, file_format)[source]¶ Write
MWTabFile
data into file.Parameters: - filehandle (
io.TextIOWrapper
) – file-like object. - file_format (str) – Format to use to write data: mwtab or json.
Returns: None
Return type: - filehandle (
-
writestr
(file_format)[source]¶ Write
MWTabFile
data into string.Parameters: file_format (str) – Format to use to write data: mwtab or json. Returns: String representing the MWTabFile
instance.Return type: str
-
print_file
(f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, file_format='mwtab')[source]¶ Print
MWTabFile
into a file or stdout.Parameters: - f (
io.StringIO
) – writable file-like stream. - file_format (str) – Format to use: mwtab or json.
Returns: None
Return type: - f (
-
print_subject_sample_factors
(section_key, f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, file_format='mwtab')[source]¶ Print mwtab SUBJECT_SAMPLE_FACTORS section into a file or stdout.
Parameters: - section_key (str) – Section name.
- f (
io.StringIO
) – writable file-like stream. - file_format (str) – Format to use: mwtab or json.
Returns: None
Return type:
-
print_block
(section_key, f=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, file_format='mwtab')[source]¶ Print mwtab section into a file or stdout.
Parameters: - section_key (str) – Section name.
- f (
io.StringIO
) – writable file-like stream. - file_format (str) – Format to use: mwtab or json.
Returns: None
Return type:
-
The mwtab command-line interface¶
- Usage:
- mwtab -h | –help mwtab –version mwtab convert (<from-path> <to-path>) [–from-format=<format>] [–to-format=<format>] [–validate] [–mw-rest=<url>] [–verbose] mwtab validate <from-path> [–mw-rest=<url>] [–verbose] mwtab download url <url> [–to-path=<path>] [–verbose] mwtab download study all [–to-path=<path>] [–input-item=<item>] [–output-format=<format>] [–mw-rest=<url>] [–validate] [–verbose] mwtab download study <input-value> [–to-path=<path>] [–input-item=<item>] [–output-item=<item>] [–output-format=<format>] [–mw-rest=<url>] [–validate] [–verbose] mwtab download (study | compound | refmet | gene | protein) <input-item> <input-value> <output-item> [–output-format=<format>] [–to-path=<path>] [–mw-rest=<url>] [–verbose] mwtab download moverz <input-item> <m/z-value> <ion-type-value> <m/z-tolerance-value> [–to-path=<path>] [–mw-rest=<url>] [–verbose] mwtab download exactmass <LIPID-abbreviation> <ion-type-value> [–to-path=<path>] [–mw-rest=<url>] [–verbose] mwtab extract metadata <from-path> <to-path> <key> … [–to-format=<format>] [–no-header] mwtab extract metabolites <from-path> <to-path> (<key> <value>) … [–to-format=<format>] [–no-header]
- Options:
-h, --help Show this screen. --version Show version. --verbose Print what files are processing. --validate Validate the mwTab file. --from-format=<format> Input file format, available formats: mwtab, json [default: mwtab]. --to-format=<format> Output file format [default: json]. Available formats for convert:
mwtab, json.- Available formats for extract:
- json, csv.
--mw-rest=<url> URL to MW REST interface [default: https://www.metabolomicsworkbench.org/rest/]. --context=<context> Type of resource to access from MW REST interface, available contexts: study, compound, refmet, gene, protein, moverz, exactmass [default: study]. --input-item=<item> Item to search Metabolomics Workbench with. --output-item=<item> Item to be retrieved from Metabolomics Workbench. --output-format=<format> Format for item to be retrieved in, available formats: mwtab, json. --no-header Include header at the top of csv formatted files. For extraction <to-path> can take a “-” which will use stdout.
-
mwtab.cli.
cli
(cmdargs)[source]¶ Implements the command line interface.
param dict cmdargs: dictionary of command line arguments.
mwtab.tokenizer¶
This module provides the tokenizer()
lexical analyzer for
mwTab format syntax. It is implemented as Python generator-based state
machine which generates (yields) tokens one at a time when next()
is invoked on tokenizer()
instance.
Each token is a tuple of “key-value”-like pairs, tuple of
SUBJECT_SAMPLE_FACTORS
or tuple of data deposited between
*_START
and *_END
blocks.
-
mwtab.tokenizer.
tokenizer
(text)[source]¶ A lexical analyzer for the mwtab formatted files.
Parameters: text (py:class:str) – mwTab formatted text. Returns: Tuples of data. Return type: py:class:~collections.namedtuple
mwtab.fileio¶
This module provides routines for reading mwTab
formatted files
from difference kinds of sources:
- Single
mwTab
formatted file on a local machine.- Directory containing multiple
mwTab
formatted files.- Compressed zip/tar archive of
mwTab
formatted files.- URL address of
mwTab
formatted file.ANALYSIS_ID
ofmwTab
formatted file.
-
mwtab.fileio.
read_files
(*sources, **kwds)[source]¶ Construct a generator that yields file instances.
Parameters: sources – One or more strings representing path to file(s).
mwtab.converter¶
This module provides functionality for converting between the
Metabolomics Workbench mwTab
formatted file and its equivalent
JSONized representation.
The following conversions are possible:
- Local files:
- One-to-one file conversions:
- textfile - to - textfile
- textfile - to - textfile.gz
- textfile - to - textfile.bz2
- textfile.gz - to - textfile
- textfile.gz - to - textfile.gz
- textfile.gz - to - textfile.bz2
- textfile.bz2 - to - textfile
- textfile.bz2 - to - textfile.gz
- textfile.bz2 - to - textfile.bz2
- textfile / textfile.gz / textfile.bz2 - to - textfile.zip / textfile.tar / textfile.tar.gz / textfile.tar.bz2 (TypeError: One-to-many conversion)
- Many-to-many files conversions:
- Directories:
- directory - to - directory
- directory - to - directory.zip
- directory - to - directory.tar
- directory - to - directory.tar.bz2
- directory - to - directory.tar.gz
- directory - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- Zipfiles:
- zipfile.zip - to - directory
- zipfile.zip - to - zipfile.zip
- zipfile.zip - to - tarfile.tar
- zipfile.zip - to - tarfile.tar.gz
- zipfile.zip - to - tarfile.tar.bz2
- zipfile.zip - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- Tarfiles:
- tarfile.tar - to - directory
- tarfile.tar - to - zipfile.zip
- tarfile.tar - to - tarfile.tar
- tarfile.tar - to - tarfile.tar.gz
- tarfile.tar - to - tarfile.tar.bz2
- tarfile.tar - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- tarfile.tar.gz - to - directory
- tarfile.tar.gz - to - zipfile.zip
- tarfile.tar.gz - to - tarfile.tar
- tarfile.tar.gz - to - tarfile.tar.gz
- tarfile.tar.gz - to - tarfile.tar.bz2
- tarfile.tar.gz - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- tarfile.tar.bz2 - to - directory
- tarfile.tar.bz2 - to - zipfile.zip
- tarfile.tar.bz2 - to - tarfile.tar
- tarfile.tar.bz2 - to - tarfile.tar.gz
- tarfile.tar.bz2 - to - tarfile.tar.bz2
- tarfile.tar.bz2 - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- URL files:
- One-to-one file conversions:
- analysis_id - to - textfile
- analysis_id - to - textfile.gz
- analysis_id - to - textfile.bz2
- analysis_id - to - textfile.zip / textfile.tar / textfile.tar.gz / textfile.tar.bz2 (TypeError: One-to-many conversion)
- textfileurl - to - textfile
- textfileurl - to - textfile.gz
- textfileurl - to - textfile.bz2
- textfileurl.gz - to - textfile
- textfileurl.gz - to - textfile.gz
- textfileurl.gz - to - textfile.bz2
- textfileurl.bz2 - to - textfile
- textfileurl.bz2 - to - textfile.gz
- textfileurl.bz2 - to - textfile.bz2
- textfileurl / textfileurl.gz / textfileurl.bz2 - to - textfile.zip / textfile.tar / textfile.tar.gz / textfile.tar.bz2 (TypeError: One-to-many conversion)
- Many-to-many files conversions:
- Zipfiles:
- zipfileurl.zip - to - directory
- zipfileurl.zip - to - zipfile.zip
- zipfileurl.zip - to - tarfile.tar
- zipfileurl.zip - to - tarfile.tar.gz
- zipfileurl.zip - to - tarfile.tar.bz2
- zipfileurl.zip - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- Tarfiles:
- tarfileurl.tar - to - directory
- tarfileurl.tar - to - zipfile.zip
- tarfileurl.tar - to - tarfile.tar
- tarfileurl.tar - to - tarfile.tar.gz
- tarfileurl.tar - to - tarfile.tar.bz2
- tarfileurl.tar - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- tarfileurl.tar.gz - to - directory
- tarfileurl.tar.gz - to - zipfile.zip
- tarfileurl.tar.gz - to - tarfile.tar
- tarfileurl.tar.gz - to - tarfile.tar.gz
- tarfileurl.tar.gz - to - tarfile.tar.bz2
- tarfileurl.tar.gz - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
- tarfileurl.tar.bz2 - to - directory
- tarfileurl.tar.bz2 - to - zipfile.zip
- tarfileurl.tar.bz2 - to - tarfile.tar
- tarfileurl.tar.bz2 - to - tarfile.tar.gz
- tarfileurl.tar.bz2 - to - tarfile.tar.bz2
- tarfileurl.tar.bz2 - to - directory.gz / directory.bz2 (TypeError: Many-to-one conversion)
-
class
mwtab.converter.
Translator
(from_path, to_path, from_format=None, to_format=None, validate=False)[source]¶ Translator abstract class.
-
class
mwtab.converter.
MWTabFileToMWTabFile
(from_path, to_path, from_format=None, to_format=None, validate=False)[source]¶ Translator concrete class that can convert between
mwTab
andJSON
formats.
mwtab.validator¶
This module contains routines to validate consistency of the mwTab
formatted files, e.g. make sure that Samples
and Factors
identifiers are consistent across the file, make sure that all
required key-value pairs are present.
-
mwtab.validator.
validate_file
(mwtabfile, section_schema_mapping={'ANALYSIS': Schema({'ANALYSIS_TYPE': <class 'str'>, Optional('LABORATORY_NAME'): <class 'str'>, Optional('OPERATOR_NAME'): <class 'str'>, Optional('DETECTOR_TYPE'): <class 'str'>, Optional('SOFTWARE_VERSION'): <class 'str'>, Optional('ACQUISITION_DATE'): <class 'str'>, Optional('ANALYSIS_PROTOCOL_FILE'): <class 'str'>, Optional('ACQUISITION_PARAMETERS_FILE'): <class 'str'>, Optional('PROCESSING_PARAMETERS_FILE'): <class 'str'>, Optional('DATA_FORMAT'): <class 'str'>, Optional('ACQUISITION_ID'): <class 'str'>, Optional('ACQUISITION_TIME'): <class 'str'>, Optional('ANALYSIS_COMMENTS'): <class 'str'>, Optional('ANALYSIS_DISPLAY'): <class 'str'>, Optional('INSTRUMENT_NAME'): <class 'str'>, Optional('INSTRUMENT_PARAMETERS_FILE'): <class 'str'>, Optional('NUM_FACTORS'): <class 'str'>, Optional('NUM_METABOLITES'): <class 'str'>, Optional('PROCESSED_FILE'): <class 'str'>, Optional('RANDOMIZATION_ORDER'): <class 'str'>, Optional('RAW_FILE'): <class 'str'>}), 'CHROMATOGRAPHY': Schema({Optional('CHROMATOGRAPHY_SUMMARY'): <class 'str'>, 'CHROMATOGRAPHY_TYPE': <class 'str'>, 'INSTRUMENT_NAME': <class 'str'>, 'COLUMN_NAME': <class 'str'>, Optional('FLOW_GRADIENT'): <class 'str'>, Optional('FLOW_RATE'): <class 'str'>, Optional('COLUMN_TEMPERATURE'): <class 'str'>, Optional('METHODS_FILENAME'): <class 'str'>, Optional('SOLVENT_A'): <class 'str'>, Optional('SOLVENT_B'): <class 'str'>, Optional('METHODS_ID'): <class 'str'>, Optional('COLUMN_PRESSURE'): <class 'str'>, Optional('INJECTION_TEMPERATURE'): <class 'str'>, Optional('INTERNAL_STANDARD'): <class 'str'>, Optional('INTERNAL_STANDARD_MT'): <class 'str'>, Optional('RETENTION_INDEX'): <class 'str'>, Optional('RETENTION_TIME'): <class 'str'>, Optional('SAMPLE_INJECTION'): <class 'str'>, Optional('SAMPLING_CONE'): <class 'str'>, Optional('ANALYTICAL_TIME'): <class 'str'>, Optional('CAPILLARY_VOLTAGE'): <class 'str'>, Optional('MIGRATION_TIME'): <class 'str'>, Optional('OVEN_TEMPERATURE'): <class 'str'>, Optional('PRECONDITIONING'): <class 'str'>, Optional('RUNNING_BUFFER'): <class 'str'>, Optional('RUNNING_VOLTAGE'): <class 'str'>, Optional('SHEATH_LIQUID'): <class 'str'>, Optional('TIME_PROGRAM'): <class 'str'>, Optional('TRANSFERLINE_TEMPERATURE'): <class 'str'>, Optional('WASHING_BUFFER'): <class 'str'>, Optional('WEAK_WASH_SOLVENT_NAME'): <class 'str'>, Optional('WEAK_WASH_VOLUME'): <class 'str'>, Optional('STRONG_WASH_SOLVENT_NAME'): <class 'str'>, Optional('STRONG_WASH_VOLUME'): <class 'str'>, Optional('TARGET_SAMPLE_TEMPERATURE'): <class 'str'>, Optional('SAMPLE_LOOP_SIZE'): <class 'str'>, Optional('SAMPLE_SYRINGE_SIZE'): <class 'str'>, Optional('RANDOMIZATION_ORDER'): <class 'str'>, Optional('CHROMATOGRAPHY_COMMENTS'): <class 'str'>}), 'COLLECTION': Schema({'COLLECTION_SUMMARY': <class 'str'>, Optional('COLLECTION_PROTOCOL_ID'): <class 'str'>, Optional('COLLECTION_PROTOCOL_FILENAME'): <class 'str'>, Optional('COLLECTION_PROTOCOL_COMMENTS'): <class 'str'>, Optional('SAMPLE_TYPE'): <class 'str'>, Optional('COLLECTION_METHOD'): <class 'str'>, Optional('COLLECTION_LOCATION'): <class 'str'>, Optional('COLLECTION_FREQUENCY'): <class 'str'>, Optional('COLLECTION_DURATION'): <class 'str'>, Optional('COLLECTION_TIME'): <class 'str'>, Optional('VOLUMEORAMOUNT_COLLECTED'): <class 'str'>, Optional('STORAGE_CONDITIONS'): <class 'str'>, Optional('COLLECTION_VIALS'): <class 'str'>, Optional('STORAGE_VIALS'): <class 'str'>, Optional('COLLECTION_TUBE_TEMP'): <class 'str'>, Optional('ADDITIVES'): <class 'str'>, Optional('BLOOD_SERUM_OR_PLASMA'): <class 'str'>, Optional('TISSUE_CELL_IDENTIFICATION'): <class 'str'>, Optional('TISSUE_CELL_QUANTITY_TAKEN'): <class 'str'>}), 'METABOLOMICS WORKBENCH': Schema({'VERSION': <class 'str'>, 'CREATED_ON': <class 'str'>, Optional('STUDY_ID'): <class 'str'>, Optional('ANALYSIS_ID'): <class 'str'>, Optional('PROJECT_ID'): <class 'str'>, Optional('HEADER'): <class 'str'>, Optional('DATATRACK_ID'): <class 'str'>}), 'MS': Schema({'INSTRUMENT_NAME': <class 'str'>, 'INSTRUMENT_TYPE': <class 'str'>, 'MS_TYPE': <class 'str'>, 'ION_MODE': <class 'str'>, Optional('MS_COMMENTS'): <class 'str'>, Optional('CAPILLARY_TEMPERATURE'): <class 'str'>, Optional('CAPILLARY_VOLTAGE'): <class 'str'>, Optional('COLLISION_ENERGY'): <class 'str'>, Optional('COLLISION_GAS'): <class 'str'>, Optional('DRY_GAS_FLOW'): <class 'str'>, Optional('DRY_GAS_TEMP'): <class 'str'>, Optional('FRAGMENT_VOLTAGE'): <class 'str'>, Optional('FRAGMENTATION_METHOD'): <class 'str'>, Optional('GAS_PRESSURE'): <class 'str'>, Optional('HELIUM_FLOW'): <class 'str'>, Optional('ION_SOURCE_TEMPERATURE'): <class 'str'>, Optional('ION_SPRAY_VOLTAGE'): <class 'str'>, Optional('IONIZATION'): <class 'str'>, Optional('IONIZATION_ENERGY'): <class 'str'>, Optional('IONIZATION_POTENTIAL'): <class 'str'>, Optional('MASS_ACCURACY'): <class 'str'>, Optional('PRECURSOR_TYPE'): <class 'str'>, Optional('REAGENT_GAS'): <class 'str'>, Optional('SOURCE_TEMPERATURE'): <class 'str'>, Optional('SPRAY_VOLTAGE'): <class 'str'>, Optional('ACTIVATION_PARAMETER'): <class 'str'>, Optional('ACTIVATION_TIME'): <class 'str'>, Optional('ATOM_GUN_CURRENT'): <class 'str'>, Optional('AUTOMATIC_GAIN_CONTROL'): <class 'str'>, Optional('BOMBARDMENT'): <class 'str'>, Optional('CDL_SIDE_OCTOPOLES_BIAS_VOLTAGE'): <class 'str'>, Optional('CDL_TEMPERATURE'): <class 'str'>, Optional('DATAFORMAT'): <class 'str'>, Optional('DESOLVATION_GAS_FLOW'): <class 'str'>, Optional('DESOLVATION_TEMPERATURE'): <class 'str'>, Optional('INTERFACE_VOLTAGE'): <class 'str'>, Optional('IT_SIDE_OCTOPOLES_BIAS_VOLTAGE'): <class 'str'>, Optional('LASER'): <class 'str'>, Optional('MATRIX'): <class 'str'>, Optional('NEBULIZER'): <class 'str'>, Optional('OCTPOLE_VOLTAGE'): <class 'str'>, Optional('PROBE_TIP'): <class 'str'>, Optional('RESOLUTION_SETTING'): <class 'str'>, Optional('SAMPLE_DRIPPING'): <class 'str'>, Optional('SCAN_RANGE_MOVERZ'): <class 'str'>, Optional('SCANNING'): <class 'str'>, Optional('SCANNING_CYCLE'): <class 'str'>, Optional('SCANNING_RANGE'): <class 'str'>, Optional('SKIMMER_VOLTAGE'): <class 'str'>, Optional('TUBE_LENS_VOLTAGE'): <class 'str'>, Optional('MS_RESULTS_FILE'): Or(<class 'str'>, <class 'dict'>)}), 'MS_METABOLITE_DATA': Schema({'Units': <class 'str'>, 'Data': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), 'Metabolites': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), Optional('Extended'): Schema([{'Metabolite': <class 'str'>, Optional(<class 'str'>): <class 'str'>, 'sample_id': <class 'str'>}])}), 'NM': Schema({'INSTRUMENT_NAME': <class 'str'>, 'INSTRUMENT_TYPE': <class 'str'>, 'NMR_EXPERIMENT_TYPE': <class 'str'>, Optional('NMR_COMMENTS'): <class 'str'>, Optional('FIELD_FREQUENCY_LOCK'): <class 'str'>, Optional('STANDARD_CONCENTRATION'): <class 'str'>, 'SPECTROMETER_FREQUENCY': <class 'str'>, Optional('NMR_PROBE'): <class 'str'>, Optional('NMR_SOLVENT'): <class 'str'>, Optional('NMR_TUBE_SIZE'): <class 'str'>, Optional('SHIMMING_METHOD'): <class 'str'>, Optional('PULSE_SEQUENCE'): <class 'str'>, Optional('WATER_SUPPRESSION'): <class 'str'>, Optional('PULSE_WIDTH'): <class 'str'>, Optional('POWER_LEVEL'): <class 'str'>, Optional('RECEIVER_GAIN'): <class 'str'>, Optional('OFFSET_FREQUENCY'): <class 'str'>, Optional('PRESATURATION_POWER_LEVEL'): <class 'str'>, Optional('CHEMICAL_SHIFT_REF_CPD'): <class 'str'>, Optional('TEMPERATURE'): <class 'str'>, Optional('NUMBER_OF_SCANS'): <class 'str'>, Optional('DUMMY_SCANS'): <class 'str'>, Optional('ACQUISITION_TIME'): <class 'str'>, Optional('RELAXATION_DELAY'): <class 'str'>, Optional('SPECTRAL_WIDTH'): <class 'str'>, Optional('NUM_DATA_POINTS_ACQUIRED'): <class 'str'>, Optional('REAL_DATA_POINTS'): <class 'str'>, Optional('LINE_BROADENING'): <class 'str'>, Optional('ZERO_FILLING'): <class 'str'>, Optional('APODIZATION'): <class 'str'>, Optional('BASELINE_CORRECTION_METHOD'): <class 'str'>, Optional('CHEMICAL_SHIFT_REF_STD'): <class 'str'>, Optional('BINNED_INCREMENT'): <class 'str'>, Optional('BINNED_DATA_NORMALIZATION_METHOD'): <class 'str'>, Optional('BINNED_DATA_PROTOCOL_FILE'): <class 'str'>, Optional('BINNED_DATA_CHEMICAL_SHIFT_RANGE'): <class 'str'>, Optional('BINNED_DATA_EXCLUDED_RANGE'): <class 'str'>, Optional('NMR_RESULTS_FILE'): Or(<class 'str'>, <class 'dict'>)}), 'NMR_BINNED_DATA': Schema({'Units': <class 'str'>, 'Data': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}])}), 'NMR_METABOLITE_DATA': Schema({'Units': <class 'str'>, 'Data': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), 'Metabolites': Schema([{Or('Metabolite', 'Bin range(ppm)'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}]), Optional('Extended'): Schema([{'Metabolite': <class 'str'>, Optional(<class 'str'>): <class 'str'>, 'sample_id': <class 'str'>}])}), 'PROJECT': Schema({'PROJECT_TITLE': <class 'str'>, Optional('PROJECT_TYPE'): <class 'str'>, 'PROJECT_SUMMARY': <class 'str'>, 'INSTITUTE': <class 'str'>, Optional('DEPARTMENT'): <class 'str'>, Optional('LABORATORY'): <class 'str'>, 'LAST_NAME': <class 'str'>, 'FIRST_NAME': <class 'str'>, 'ADDRESS': <class 'str'>, 'EMAIL': <class 'str'>, 'PHONE': <class 'str'>, Optional('FUNDING_SOURCE'): <class 'str'>, Optional('PROJECT_COMMENTS'): <class 'str'>, Optional('PUBLICATIONS'): <class 'str'>, Optional('CONTRIBUTORS'): <class 'str'>, Optional('DOI'): <class 'str'>}), 'SAMPLEPREP': Schema({'SAMPLEPREP_SUMMARY': <class 'str'>, Optional('SAMPLEPREP_PROTOCOL_ID'): <class 'str'>, Optional('SAMPLEPREP_PROTOCOL_FILENAME'): <class 'str'>, Optional('SAMPLEPREP_PROTOCOL_COMMENTS'): <class 'str'>, Optional('PROCESSING_METHOD'): <class 'str'>, Optional('PROCESSING_STORAGE_CONDITIONS'): <class 'str'>, Optional('EXTRACTION_METHOD'): <class 'str'>, Optional('EXTRACT_CONCENTRATION_DILUTION'): <class 'str'>, Optional('EXTRACT_ENRICHMENT'): <class 'str'>, Optional('EXTRACT_CLEANUP'): <class 'str'>, Optional('EXTRACT_STORAGE'): <class 'str'>, Optional('SAMPLE_RESUSPENSION'): <class 'str'>, Optional('SAMPLE_DERIVATIZATION'): <class 'str'>, Optional('SAMPLE_SPIKING'): <class 'str'>, Optional('ORGAN'): <class 'str'>, Optional('ORGAN_SPECIFICATION'): <class 'str'>, Optional('CELL_TYPE'): <class 'str'>, Optional('SUBCELLULAR_LOCATION'): <class 'str'>}), 'STUDY': Schema({'STUDY_TITLE': <class 'str'>, Optional('STUDY_TYPE'): <class 'str'>, 'STUDY_SUMMARY': <class 'str'>, 'INSTITUTE': <class 'str'>, Optional('DEPARTMENT'): <class 'str'>, Optional('LABORATORY'): <class 'str'>, 'LAST_NAME': <class 'str'>, 'FIRST_NAME': <class 'str'>, 'ADDRESS': <class 'str'>, 'EMAIL': <class 'str'>, 'PHONE': <class 'str'>, Optional('NUM_GROUPS'): <class 'str'>, Optional('TOTAL_SUBJECTS'): <class 'str'>, Optional('NUM_MALES'): <class 'str'>, Optional('NUM_FEMALES'): <class 'str'>, Optional('STUDY_COMMENTS'): <class 'str'>, Optional('PUBLICATIONS'): <class 'str'>, Optional('SUBMIT_DATE'): <class 'str'>}), 'SUBJECT': Schema({'SUBJECT_TYPE': <class 'str'>, 'SUBJECT_SPECIES': <class 'str'>, Optional('TAXONOMY_ID'): <class 'str'>, Optional('GENOTYPE_STRAIN'): <class 'str'>, Optional('AGE_OR_AGE_RANGE'): <class 'str'>, Optional('WEIGHT_OR_WEIGHT_RANGE'): <class 'str'>, Optional('HEIGHT_OR_HEIGHT_RANGE'): <class 'str'>, Optional('GENDER'): <class 'str'>, Optional('HUMAN_RACE'): <class 'str'>, Optional('HUMAN_ETHNICITY'): <class 'str'>, Optional('HUMAN_TRIAL_TYPE'): <class 'str'>, Optional('HUMAN_LIFESTYLE_FACTORS'): <class 'str'>, Optional('HUMAN_MEDICATIONS'): <class 'str'>, Optional('HUMAN_PRESCRIPTION_OTC'): <class 'str'>, Optional('HUMAN_SMOKING_STATUS'): <class 'str'>, Optional('HUMAN_ALCOHOL_DRUG_USE'): <class 'str'>, Optional('HUMAN_NUTRITION'): <class 'str'>, Optional('HUMAN_INCLUSION_CRITERIA'): <class 'str'>, Optional('HUMAN_EXCLUSION_CRITERIA'): <class 'str'>, Optional('ANIMAL_ANIMAL_SUPPLIER'): <class 'str'>, Optional('ANIMAL_HOUSING'): <class 'str'>, Optional('ANIMAL_LIGHT_CYCLE'): <class 'str'>, Optional('ANIMAL_FEED'): <class 'str'>, Optional('ANIMAL_WATER'): <class 'str'>, Optional('ANIMAL_INCLUSION_CRITERIA'): <class 'str'>, Optional('CELL_BIOSOURCE_OR_SUPPLIER'): <class 'str'>, Optional('CELL_STRAIN_DETAILS'): <class 'str'>, Optional('SUBJECT_COMMENTS'): <class 'str'>, Optional('CELL_PRIMARY_IMMORTALIZED'): <class 'str'>, Optional('CELL_PASSAGE_NUMBER'): <class 'str'>, Optional('CELL_COUNTS'): <class 'str'>, Optional('SPECIES_GROUP'): <class 'str'>}), 'SUBJECT_SAMPLE_FACTORS': Schema([{'Subject ID': <class 'str'>, 'Sample ID': <class 'str'>, 'Factors': <class 'dict'>, Optional('Additional sample data'): {Optional('RAW_FILE_NAME'): <class 'str'>, Optional(<class 'str'>): <class 'str'>}}]), 'TREATMENT': Schema({'TREATMENT_SUMMARY': <class 'str'>, Optional('TREATMENT_PROTOCOL_ID'): <class 'str'>, Optional('TREATMENT_PROTOCOL_FILENAME'): <class 'str'>, Optional('TREATMENT_PROTOCOL_COMMENTS'): <class 'str'>, Optional('TREATMENT'): <class 'str'>, Optional('TREATMENT_COMPOUND'): <class 'str'>, Optional('TREATMENT_ROUTE'): <class 'str'>, Optional('TREATMENT_DOSE'): <class 'str'>, Optional('TREATMENT_DOSEVOLUME'): <class 'str'>, Optional('TREATMENT_DOSEDURATION'): <class 'str'>, Optional('TREATMENT_VEHICLE'): <class 'str'>, Optional('ANIMAL_VET_TREATMENTS'): <class 'str'>, Optional('ANIMAL_ANESTHESIA'): <class 'str'>, Optional('ANIMAL_ACCLIMATION_DURATION'): <class 'str'>, Optional('ANIMAL_FASTING'): <class 'str'>, Optional('ANIMAL_ENDP_EUTHANASIA'): <class 'str'>, Optional('ANIMAL_ENDP_TISSUE_COLL_LIST'): <class 'str'>, Optional('ANIMAL_ENDP_TISSUE_PROC_METHOD'): <class 'str'>, Optional('ANIMAL_ENDP_CLINICAL_SIGNS'): <class 'str'>, Optional('HUMAN_FASTING'): <class 'str'>, Optional('HUMAN_ENDP_CLINICAL_SIGNS'): <class 'str'>, Optional('CELL_STORAGE'): <class 'str'>, Optional('CELL_GROWTH_CONTAINER'): <class 'str'>, Optional('CELL_GROWTH_CONFIG'): <class 'str'>, Optional('CELL_GROWTH_RATE'): <class 'str'>, Optional('CELL_INOC_PROC'): <class 'str'>, Optional('CELL_MEDIA'): <class 'str'>, Optional('CELL_ENVIR_COND'): <class 'str'>, Optional('CELL_HARVESTING'): <class 'str'>, Optional('PLANT_GROWTH_SUPPORT'): <class 'str'>, Optional('PLANT_GROWTH_LOCATION'): <class 'str'>, Optional('PLANT_PLOT_DESIGN'): <class 'str'>, Optional('PLANT_LIGHT_PERIOD'): <class 'str'>, Optional('PLANT_HUMIDITY'): <class 'str'>, Optional('PLANT_TEMP'): <class 'str'>, Optional('PLANT_WATERING_REGIME'): <class 'str'>, Optional('PLANT_NUTRITIONAL_REGIME'): <class 'str'>, Optional('PLANT_ESTAB_DATE'): <class 'str'>, Optional('PLANT_HARVEST_DATE'): <class 'str'>, Optional('PLANT_GROWTH_STAGE'): <class 'str'>, Optional('PLANT_METAB_QUENCH_METHOD'): <class 'str'>, Optional('PLANT_HARVEST_METHOD'): <class 'str'>, Optional('PLANT_STORAGE'): <class 'str'>, Optional('CELL_PCT_CONFLUENCE'): <class 'str'>, Optional('CELL_MEDIA_LASTCHANGED'): <class 'str'>})}, verbose=False, metabolites=True)[source]¶ Validate
mwTab
formatted file.Parameters: Returns: Validated file.
Return type:
mwtab.mwrest¶
This module provides routines for accessing the Metabolomics Workbench REST API.
See https://www.metabolomicsworkbench.org/tools/MWRestAPIv1.0.pdf for details.
-
mwtab.mwrest.
analysis_ids
(base_url='https://www.metabolomicsworkbench.org/rest/')[source]¶ Method for retrieving a list of analysis ids for every current analysis in Metabolomics Workbench.
Parameters: base_url (str) – Base url to Metabolomics Workbench REST API. Returns: List of every available Metabolomics Workbench analysis identifier. Return type: list
-
mwtab.mwrest.
study_ids
(base_url='https://www.metabolomicsworkbench.org/rest/')[source]¶ Method for retrieving a list of study ids for every current study in Metabolomics Workbench.
Parameters: base_url (str) – Base url to Metabolomics Workbench REST API. Returns: List of every available Metabolomics Workbench study identifier. Return type: list
-
mwtab.mwrest.
generate_mwtab_urls
(input_items, base_url='https://www.metabolomicsworkbench.org/rest/', output_format='txt')[source]¶ Method for generating URLS to be used to retrieve mwtab files for analyses and studies through the REST API of the Metabolomics Workbench database.
Parameters: Returns: Metabolomics Workbench REST URL string(s).
Return type:
-
mwtab.mwrest.
generate_urls
(input_items, base_url='https://www.metabolomicsworkbench.org/rest/', **kwds)[source]¶ Method for creating a generator which yields validated Metabolomics Workbench REST urls.
Parameters: Returns: Metabolomics Workbench REST URL string(s).
Return type:
-
class
mwtab.mwrest.
GenericMWURL
(rest_params, base_url='https://www.metabolomicsworkbench.org/rest/')[source]¶ GenericMWURL class that stores and validates parameters specifying a Metabolomics Workbench REST URL.
- Metabolomics REST API requests are performed using URL requests in the form of
https://www.metabolomicsworkbench.org/rest/context/input_specification/output_specification
- where:
- if context = “study” | “compound” | “refmet” | “gene” | “protein”
- input_specification = input_item/input_value output_specification = output_item/[output_format]
- elif context = “moverz”
- input_specification = input_item/input_value1/input_value2/input_value3
- input_item = “LIPIDS” | “MB” | “REFMET” input_value1 = m/z_value input_value2 = ion_type_value input_value3 = m/z_tolerance_value
- output_specification = output_format
- output_format = “txt”
- elif context = “exactmass”
- input_specification = input_item/input_value1/input_value2
- input_item = “LIPIDS” | “MB” | “REFMET” input_value1 = LIPID_abbreviation input_value2 = ion_type_value
output_specification = None
-
class
mwtab.mwrest.
MWRESTFile
(source)[source]¶ MWRESTFile class that stores data from a single file download through Metabolomics Workbench’s REST API.
Mirrors
MWTabFile
.-
read
(filehandle)[source]¶ Read data into a
MWRESTFile
instance.Parameters: filehandle ( io.TextIOWrapper
,gzip.GzipFile
,bz2.BZ2File
,zipfile.ZipFile
) – file-like object.Returns: None Return type: None
-
write
(filehandle)[source]¶ Write
MWRESTFile
data into file.Parameters: filehandle ( io.TextIOWrapper
) – file-like object.Returns: None Return type: None
-
mwtab.mwextract¶
This module provides a number of functions and classes for extracting and saving data and metadata
stored in mwTab
formatted files in the form of MWTabFile
.
-
class
mwtab.mwextract.
ItemMatcher
(full_key, value_comparison)[source]¶ ItemMatcher class that can be called to match items from
mwTab
formatted files in the form ofMWTabFile
.
-
class
mwtab.mwextract.
ReGeXMatcher
(full_key, value_comparison)[source]¶ ReGeXMatcher class that can be called to match items from
mwTab
formatted files in the form ofMWTabFile
using regular expressions.
-
mwtab.mwextract.
generate_matchers
(items)[source]¶ Construct a generator that yields Matchers
ItemMatcher
orReGeXMatcher
.Parameters: items (iterable) – Iterable object containing key value pairs to match. Returns: Yields a Matcher object for each given item. Return type: ItemMatcher
orReGeXMatcher
-
mwtab.mwextract.
extract_metabolites
(sources, matchers)[source]¶ Extract metabolite data from
mwTab
formatted files in the form ofMWTabFile
.Parameters: - sources (generator) – Generator of mwtab file objects (
MWTabFile
). - matchers (generator) – Generator of matcher objects (
ItemMatcher
or
ReGeXMatcher
). :return: Extracted metabolites dictionary. :rtype:dict
- sources (generator) – Generator of mwtab file objects (
-
mwtab.mwextract.
extract_metadata
(mwtabfile, keys)[source]¶ Extract metadata data from
mwTab
formatted files in the form ofMWTabFile
.Parameters: Returns: Extracted metadata dictionary.
Return type:
-
mwtab.mwextract.
write_metadata_csv
(to_path, extracted_values, no_header=False)[source]¶ Write extracted metadata
dict
into csv file.Example: “metadata”,”value1”,”value2” “SUBJECT_TYPE”,”Human”,”Plant”
Parameters: Returns: None
Return type:
-
mwtab.mwextract.
write_metabolites_csv
(to_path, extracted_values, no_header=False)[source]¶ Write extracted metabolites data
dict
into csv file.Example: “metabolite_name”,”num-studies”,”num_analyses”,”num_samples” “1,2,4-benzenetriol”,”1”,”1”,”24” “1-monostearin”,”1”,”1”,”24” …
Parameters: Returns: None
Return type:
-
class
mwtab.mwextract.
SetEncoder
(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶ SetEncoder class for encoding Python sets
set
into json serializable objectslist
.-
default
(obj)[source]¶ Method for encoding Python objects. If object passed is a set, converts the set to JSON serializable lists or calls base implementation.
Parameters: obj (object) – Python object to be json encoded. Returns: JSON serializable object. Return type: dict
,list
,tuple
,str
,int
,float
,bool
, orNone
-
-
mwtab.mwextract.
write_json
(to_path, extracted_dict)[source]¶ Write extracted data or metadata
dict
into json file.Metabolites example: {
- “1,2,4-benzenetriol”: {
- “ST000001”: {
- “AN000001”: [
- “LabF_115816”, …
]
}
}
}
Metadata example: {
- “SUBJECT_TYPE”: [
- “Plant”, “Human”
]
}
Parameters: Returns: None
Return type:
mwtab.mwschema¶
This module provides schema definitions for different sections of the
mwTab
Metabolomics Workbench format.
-
mwtab.mwschema.
metabolomics_workbench_schema
¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.
project_schema
¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.
study_schema
¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.
analysis_schema
¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.
subject_schema
¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.
subject_sample_factors_schema
¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.
collection_schema
¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.
treatment_schema
¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.
sampleprep_schema
¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.
chromatography_schema
¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.
ms_schema
¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.
nmr_schema
¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.
ms_metabolite_data_schema
¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
-
mwtab.mwschema.
nmr_binned_data_schema
¶ Entry point of the library, use this class to instantiate validation schema for the data that will be validated.
License¶
The Clear BSD License
Copyright (c) 2020, Christian D. Powell, Andrey Smelter, Hunter N.B. Moseley All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted (subject to the limitations in the disclaimer below) provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
- Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from this software without specific prior written permission.
NO EXPRESS OR IMPLIED LICENSES TO ANY PARTY’S PATENT RIGHTS ARE GRANTED BY THIS LICENSE. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.