The mwtab Tutorial

The mwtab package provides classes and other facilities for parsing, accessing, and manipulating data stored in mwTab and JSON representation of mwTab formats.

Also, the mwtab package provides simple command-line interface to convert between mwTab and its JSON representation as well as validate consistency of the files.

Brief mwTab Format Overview

Note

For full official specification see the following link (mwTab file specification): http://www.metabolomicsworkbench.org/data/tutorials.php

The mwTab formatted files consist of multiple blocks. Each new block starts with #.

  • Some of the blocks contain only “key-value”-like pairs.
#METABOLOMICS WORKBENCH STUDY_ID:ST000001 ANALYSIS_ID:AN000001
VERSION              1
CREATED_ON           2016-09-17
#PROJECT
PR:PROJECT_TITLE                     FatB Gene Project
PR:PROJECT_TYPE                      Genotype treatment
PR:PROJECT_SUMMARY                   Experiment to test the consequence of a mutation at the FatB gene (At1g08510)
PR:PROJECT_SUMMARY                   the wound-response of Arabidopsis

Note

*_SUMMARY “key-value”-like pairs are typically span through multiple lines.

  • #SUBJECT_SAMPLE_FACTORS block is specially formatted, i.e. it contains header specification and tab-separated values.
#SUBJECT_SAMPLE_FACTORS:             SUBJECT(optional)[tab]SAMPLE[tab]FACTORS(NAME:VALUE pairs separated by |)[tab]Additional sample data
SUBJECT_SAMPLE_FACTORS               -       LabF_115873     Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS               -       LabF_115878     Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS               -       LabF_115883     Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS               -       LabF_115888     Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS               -       LabF_115893     Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
SUBJECT_SAMPLE_FACTORS               -       LabF_115898     Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded
  • #MS_METABOLITE_DATA (results) block contains Samples identifiers, Factors identifiers as well as tab-separated data between *_START and *_END.
#MS_METABOLITE_DATA
MS_METABOLITE_DATA:UNITS     Peak height
MS_METABOLITE_DATA_START
Samples      LabF_115904     LabF_115909     LabF_115914     LabF_115919     LabF_115924     LabF_115929     LabF_115842     LabF_115847     LabF_115852     LabF_115857     LabF_115862     LabF_115867     LabF_115873     LabF_115878     LabF_115883     LabF_115888     LabF_115893     LabF_115898     LabF_115811     LabF_115816     LabF_115821     LabF_115826     LabF_115831     LabF_115836
Factors      Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded     Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded     Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded     Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded     Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded     Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Control - Non-Wounded     Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded   Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded   Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded   Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded   Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded   Arabidopsis Genotype:fatb-ko KD; At1g08510 | Plant Wounding Treatment:Wounded   Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded        Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded        Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded        Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded        Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded        Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Control - Non-Wounded        Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded      Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded      Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded      Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded      Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded      Arabidopsis Genotype:Wassilewskija (Ws) | Plant Wounding Treatment:Wounded
1_2_4-benzenetriol   1874.0000       3566.0000       1945.0000       1456.0000       2004.0000       1995.0000       4040.0000       2432.0000       2189.0000       1931.0000       1307.0000       2880.0000       2218.0000       1754.0000       1369.0000       1201.0000       3324.0000       1355.0000       2257.0000       1718.0000       1740.0000       3472.0000       2054.0000       1367.0000
1-monostearin        987.0000        450.0000        1910.0000       549.0000        1032.0000       902.0000        393.0000        705.0000        100.0000        481.0000        265.0000        120.0000        1185.0000       867.0000        676.0000        569.0000        579.0000        387.0000        1035.0000       789.0000        875.0000        224.0000        641.0000        693.0000
...
MS_METABOLITE_DATA_END
  • #METABOLITES metadata block contains a header specifying fields and tab-separated data between *_START and *_END.
#METABOLITES
METABOLITES_START
metabolite_name      moverz_quant    ri      ri_type pubchem_id      inchi_key       kegg_id other_id        other_id_type
1,2,4-benzenetriol   239     522741  Fiehn   10787           C02814  205673  BinBase
1-monostearin        399     959625  Fiehn   107036          D01947  202835  BinBase
2-hydroxyvaleric acid        131     310750  Fiehn   98009                   218773  BinBase
3-phosphoglycerate   299     611619  Fiehn   724             C00597  217821  BinBase
...
METABOLITES_END
  • #NMR_BINNED_DATA metadata block contains a header specifying fields and tab-separated data between *_START and *_END.
#NMR_BINNED_DATA
NMR_BINNED_DATA_START
Bin range(ppm)       CDC029  CDC030  CDC032  CPL101  CPL102  CPL103  CPL201  CPL202  CPL203  CDS039  CDS052  CDS054
0.50...0.56  0.00058149      1.6592  0.039301        0       0       0       0.034018        0.0028746       0.0021478       0.013387        0       0
0.56...0.58  0       0.74267 0       0.007206        0       0       0       0       0       0       0       0.0069721
0.58...0.60  0.051165        0.8258  0.089149        0.060972        0.026307        0.045697        0.069541        0       0       0.14516 0.057489        0.042255
...
NMR_BINNED_DATA_END
  • Order of metadata and data blocks (MS)
#METABOLOMICS WORKBENCH
VERSION              1
CREATED_ON           2016-09-17
...
#PROJECT
...
#STUDY
...
#SUBJECT
...
#SUBJECT_SAMPLE_FACTORS:             SUBJECT(optional)[tab]SAMPLE[tab]FACTORS(NAME:VALUE pairs separated by |)[tab]Additional sample data
...
#COLLECTION
...
#TREATMENT
...
#SAMPLEPREP
...
#CHROMATOGRAPHY
...
#ANALYSIS
...
#MS
...
#MS_METABOLITE_DATA
MS_METABOLITE_DATA:UNITS     peak area
MS_METABOLITE_DATA_START
...
MS_METABOLITE_DATA_END
#METABOLITES
METABOLITES_START
...
METABOLITES_END
#END

Using mwtab as a Library

Importing mwtab Package

If the mwtab package is installed on the system, it can be imported:

In [1]:
import mwtab

Constructing MWTabFile Generator

The fileio module provides the read_files() generator function that yields MWTabFile instances. Constructing a MWTabFile generator is easy - specify the path to a local mwTab file, directory of files, archive of files:

In [2]:
import mwtab

mwfile_gen = mwtab.read_files("ST000001_AN000001.txt")  # single mwTab file
mwfiles_gen = mwtab.read_files("ST000001_AN000001.txt", "ST000002_AN000002.txt")  # several mwTab files
mwdir_gen = mwtab.read_files("mwfiles_dir")  # directory of mwTab files
mwzip_gen = mwtab.read_files("mwfiles.zip")  # archive of mwTab files
mwurl_gen = mwtab.read_files("1", "2")       # ANALYSIS_ID of mwTab file

Processing MWTabFile Generator

The MWTabFile generator can be processed in several ways:

  • Feed it to a for-loop and process one file at a time:
In [3]:
for mwfile in mwtab.read_files("1", "2"):
    print("STUDY_ID:", mwfile.study_id)       # print STUDY_ID
    print("ANALYSIS_ID", mwfile.analysis_id)  # print ANALYSIS_ID
    print("SOURCE", mwfile.source)            # print source
    for block_name in mwfile:                 # print names of blocks
        print("\t", block_name)
STUDY_ID: ST000001
ANALYSIS_ID AN000001
SOURCE http://www.metabolomicsworkbench.org/rest/study/analysis_id/AN000001/mwtab/txt
         METABOLOMICS WORKBENCH
         PROJECT
         STUDY
         SUBJECT
         SUBJECT_SAMPLE_FACTORS
         COLLECTION
         TREATMENT
         SAMPLEPREP
         CHROMATOGRAPHY
         ANALYSIS
         MS
         MS_METABOLITE_DATA
         METABOLITES
STUDY_ID: ST000002
ANALYSIS_ID AN000002
SOURCE http://www.metabolomicsworkbench.org/rest/study/analysis_id/AN000002/mwtab/txt
         METABOLOMICS WORKBENCH
         PROJECT
         STUDY
         SUBJECT
         SUBJECT_SAMPLE_FACTORS
         COLLECTION
         TREATMENT
         SAMPLEPREP
         CHROMATOGRAPHY
         ANALYSIS
         MS
         MS_METABOLITE_DATA
         METABOLITES

Note

Once the generator is consumed, it becomes empty and needs to be created again.

  • Since the MWTabFile generator behaves like an iterator, we can call the next() built-in function:
In [4]:
mwfiles_generator = mwtab.read_files("1", "2")

mwfile1 = next(mwfiles_generator)
mwfile2 = next(mwfiles_generator)

Note

Once the generator is consumed, StopIteration will be raised.

In [5]:
mwfiles_generator = mwtab.read_files("1", "2")
mwfiles_list = list(mwfiles_generator)

Warning

Do not convert the MWTabFile generator into a list if the generator can yield a large number of files, e.g. several thousand, otherwise it can consume all available memory.

Accessing Data From a Single MWTabFile

Since a MWTabFile is a Python collections.OrderedDict, data can be accessed and manipulated as with any regular Python dict object using bracket accessors.

  • Accessing top-level “keys” in MWTabFile:
In [7]:
mwfile = next(mwtab.read_files("ST000002_AN000002.txt"))

# list MWTabFile-level keys, i.e. saveframe names
list(mwfile.keys())
Out[7]:
['METABOLOMICS WORKBENCH',
 'PROJECT',
 'STUDY',
 'SUBJECT',
 'SUBJECT_SAMPLE_FACTORS',
 'COLLECTION',
 'TREATMENT',
 'SAMPLEPREP',
 'CHROMATOGRAPHY',
 'ANALYSIS',
 'MS',
 'MS_METABOLITE_DATA',
 'METABOLITES']
In [8]:
# access "PROJECT" block
mwfile["PROJECT"]
Out[8]:
OrderedDict([('PROJECT_TITLE',
              'Intestinal Samples II pre/post transplantation'),
             ('PROJECT_TYPE', 'Human intestinal samples'),
             ('PROJECT_SUMMARY',
              'Intestinal Samples II pre/post transplantation'),
             ('INSTITUTE', 'University of California, Davis'),
             ('DEPARTMENT', 'Davis Genome Center'),
             ('LABORATORY', 'Fiehn'),
             ('LAST_NAME', 'Fiehn'),
             ('FIRST_NAME', 'Oliver'),
             ('ADDRESS',
              '451 E. Health Sci. Drive, Davis, California 95616, USA'),
             ('EMAIL', 'ofiehn@ucdavis.edu'),
             ('PHONE', '-')])
  • Accessing individual “key-value” pairs within blocks:
In [9]:
# access "INSTITUTE" field within "PROJECT" block
mwfile["PROJECT"]["INSTITUTE"]
Out[9]:
'University of California, Davis'
  • Accessing data in #SUBJECT_SAMPLE_FACTORS block:
In [10]:
# access "SUBJECT_SAMPLE_FACTORS" block
mwfile["SUBJECT_SAMPLE_FACTORS"]["SUBJECT_SAMPLE_FACTORS"]
Out[10]:
[OrderedDict([('subject_type', '-'),
              ('local_sample_id', 'LabF_684508'),
              ('factors', 'Transplantation:After transplantation'),
              ('additional_sample_data', '')]),
 OrderedDict([('subject_type', '-'),
              ('local_sample_id', 'LabF_684512'),
              ('factors', 'Transplantation:After transplantation'),
              ('additional_sample_data', '')]),
 OrderedDict([('subject_type', '-'),
              ('local_sample_id', 'LabF_684516'),
              ('factors', 'Transplantation:After transplantation'),
              ('additional_sample_data', '')]),
 OrderedDict([('subject_type', '-'),
              ('local_sample_id', 'LabF_684520'),
              ('factors', 'Transplantation:After transplantation'),
              ('additional_sample_data', '')]),
 OrderedDict([('subject_type', '-'),
              ('local_sample_id', 'LabF_684524'),
              ('factors', 'Transplantation:After transplantation'),
              ('additional_sample_data', '')]),
 OrderedDict([('subject_type', '-'),
              ('local_sample_id', 'LabF_684528'),
              ('factors', 'Transplantation:After transplantation'),
              ('additional_sample_data', '')]),
 OrderedDict([('subject_type', '-'),
              ('local_sample_id', 'LabF_684483'),
              ('factors', 'Transplantation:Before transplantation'),
              ('additional_sample_data', '')]),
 OrderedDict([('subject_type', '-'),
              ('local_sample_id', 'LabF_684487'),
              ('factors', 'Transplantation:Before transplantation'),
              ('additional_sample_data', '')]),
 OrderedDict([('subject_type', '-'),
              ('local_sample_id', 'LabF_684491'),
              ('factors', 'Transplantation:Before transplantation'),
              ('additional_sample_data', '')]),
 OrderedDict([('subject_type', '-'),
              ('local_sample_id', 'LabF_684495'),
              ('factors', 'Transplantation:Before transplantation'),
              ('additional_sample_data', '')]),
 OrderedDict([('subject_type', '-'),
              ('local_sample_id', 'LabF_684499'),
              ('factors', 'Transplantation:Before transplantation'),
              ('additional_sample_data', '')]),
 OrderedDict([('subject_type', '-'),
              ('local_sample_id', 'LabF_684503'),
              ('factors', 'Transplantation:Before transplantation'),
              ('additional_sample_data', '')])]
In [11]:
# access individual factors (by index)
mwfile["SUBJECT_SAMPLE_FACTORS"]["SUBJECT_SAMPLE_FACTORS"][0]
Out[11]:
OrderedDict([('subject_type', '-'),
             ('local_sample_id', 'LabF_684508'),
             ('factors', 'Transplantation:After transplantation'),
             ('additional_sample_data', '')])
In [12]:
# access individual fields within factors
mwfile["SUBJECT_SAMPLE_FACTORS"]["SUBJECT_SAMPLE_FACTORS"][0]["local_sample_id"]
Out[12]:
'LabF_684508'
  • Accessing data in #MS_METABOLITE_DATA block:
In [13]:
# access entire block
mwfile["MS_METABOLITE_DATA"]
Out[13]:
OrderedDict([('MS_METABOLITE_DATA:UNITS', 'Peak height'),
             ('MS_METABOLITE_DATA_START',
              OrderedDict([('Samples',
                            ['LabF_684508',
                             'LabF_684512',
                             'LabF_684516',
                             'LabF_684520',
                             'LabF_684524',
                             'LabF_684528',
                             'LabF_684483',
                             'LabF_684487',
                             'LabF_684491',
                             'LabF_684495',
                             'LabF_684499',
                             'LabF_684503']),
                           ('Factors',
                            ['Transplantation:After transplantation',
                             'Transplantation:After transplantation',
                             'Transplantation:After transplantation',
                             'Transplantation:After transplantation',
                             'Transplantation:After transplantation',
                             'Transplantation:After transplantation',
                             'Transplantation:Before transplantation',
                             'Transplantation:Before transplantation',
                             'Transplantation:Before transplantation',
                             'Transplantation:Before transplantation',
                             'Transplantation:Before transplantation',
                             'Transplantation:Before transplantation']),
                           ('DATA',
                            [OrderedDict([('metabolite_name', '1-monoolein'),
                                          ('LabF_684508', '6047.0000'),
                                          ('LabF_684512', '2902.0000'),
                                          ('LabF_684516', '1452.0000'),
                                          ('LabF_684520', '3428.0000'),
                                          ('LabF_684524', '2985.0000'),
                                          ('LabF_684528', '16334.0000'),
                                          ('LabF_684483', '244142.0000'),
                                          ('LabF_684487', '6968.0000'),
                                          ('LabF_684491', '1928.0000'),
                                          ('LabF_684495', '19228.0000'),
                                          ('LabF_684499', '3029.0000'),
                                          ('LabF_684503', '23277.0000')]),
                             OrderedDict([('metabolite_name', '1-monostearin'),
                                          ('LabF_684508', '9771.0000'),
                                          ('LabF_684512', '6521.0000'),
                                          ('LabF_684516', '1302.0000'),
                                          ('LabF_684520', '2781.0000'),
                                          ('LabF_684524', '5789.0000'),
                                          ('LabF_684528', '4338.0000'),
                                          ('LabF_684483', '16848.0000'),
                                          ('LabF_684487', '10206.0000'),
                                          ('LabF_684491', '9398.0000'),
                                          ('LabF_684495', '1013.0000'),
                                          ('LabF_684499', '4190.0000'),
                                          ('LabF_684503', '11114.0000')]),
                             OrderedDict([('metabolite_name',
                                           '2-hydroxybutanoic acid'),
                                          ('LabF_684508', '13238.0000'),
                                          ('LabF_684512', '29774.0000'),
                                          ('LabF_684516', '4134.0000'),
                                          ('LabF_684520', '4419.0000'),
                                          ('LabF_684524', '13334.0000'),
                                          ('LabF_684528', '2115.0000'),
                                          ('LabF_684483', '11587.0000'),
                                          ('LabF_684487', '65635.0000'),
                                          ('LabF_684491', '32433.0000'),
                                          ('LabF_684495', '1823.0000'),
                                          ('LabF_684499', '4429.0000'),
                                          ('LabF_684503', '30427.0000')]),
                             OrderedDict([('metabolite_name',
                                           '2-hydroxyglutaric acid'),
                                          ('LabF_684508', '7160.0000'),
                                          ('LabF_684512', '11501.0000'),
                                          ('LabF_684516', '3202.0000'),
                                          ('LabF_684520', '17238.0000'),
                                          ('LabF_684524', '20376.0000'),
                                          ('LabF_684528', '1109.0000'),
                                          ('LabF_684483', '8276.0000'),
                                          ('LabF_684487', '12402.0000'),
                                          ('LabF_684491', '20964.0000'),
                                          ('LabF_684495', '25913.0000'),
                                          ('LabF_684499', '2709.0000'),
                                          ('LabF_684503', '70972.0000')]),
                             OrderedDict([('metabolite_name',
                                           '2-ketoisocaproic acid'),
                                          ('LabF_684508', '812.0000'),
                                          ('LabF_684512', '2011.0000'),
                                          ('LabF_684516', '738.0000'),
                                          ('LabF_684520', '2550.0000'),
                                          ('LabF_684524', '871.0000'),
                                          ('LabF_684528', '628.0000'),
                                          ('LabF_684483', '2096.0000'),
                                          ('LabF_684487', '3472.0000'),
                                          ('LabF_684491', '10669.0000'),
                                          ('LabF_684495', '432.0000'),
                                          ('LabF_684499', '1055.0000'),
                                          ('LabF_684503', '1005.0000')]),
                             OrderedDict([('metabolite_name',
                                           '2-monopalmitin'),
                                          ('LabF_684508', '1511.0000'),
                                          ('LabF_684512', '622.0000'),
                                          ('LabF_684516', '883.0000'),
                                          ('LabF_684520', '796.0000'),
                                          ('LabF_684524', '623.0000'),
                                          ('LabF_684528', '5716.0000'),
                                          ('LabF_684483', '3405.0000'),
                                          ('LabF_684487', '3196.0000'),
                                          ('LabF_684491', '1457.0000'),
                                          ('LabF_684495', '1416.0000'),
                                          ('LabF_684499', '1275.0000'),
                                          ('LabF_684503',
                                           '14445.0000')])])]))])
In [14]:
# access units field
mwfile["MS_METABOLITE_DATA"]["MS_METABOLITE_DATA:UNITS"]
Out[14]:
'Peak height'
In [15]:
# access samples field
mwfile["MS_METABOLITE_DATA"]["MS_METABOLITE_DATA_START"]["Samples"]
Out[15]:
['LabF_684508',
 'LabF_684512',
 'LabF_684516',
 'LabF_684520',
 'LabF_684524',
 'LabF_684528',
 'LabF_684483',
 'LabF_684487',
 'LabF_684491',
 'LabF_684495',
 'LabF_684499',
 'LabF_684503']
In [16]:
# access factors field
mwfile["MS_METABOLITE_DATA"]["MS_METABOLITE_DATA_START"]["Factors"]
Out[16]:
['Transplantation:After transplantation',
 'Transplantation:After transplantation',
 'Transplantation:After transplantation',
 'Transplantation:After transplantation',
 'Transplantation:After transplantation',
 'Transplantation:After transplantation',
 'Transplantation:Before transplantation',
 'Transplantation:Before transplantation',
 'Transplantation:Before transplantation',
 'Transplantation:Before transplantation',
 'Transplantation:Before transplantation',
 'Transplantation:Before transplantation']
In [17]:
# access metabolite data
mwfile["MS_METABOLITE_DATA"]["MS_METABOLITE_DATA_START"]["DATA"]
Out[17]:
[OrderedDict([('metabolite_name', '1-monoolein'),
              ('LabF_684508', '6047.0000'),
              ('LabF_684512', '2902.0000'),
              ('LabF_684516', '1452.0000'),
              ('LabF_684520', '3428.0000'),
              ('LabF_684524', '2985.0000'),
              ('LabF_684528', '16334.0000'),
              ('LabF_684483', '244142.0000'),
              ('LabF_684487', '6968.0000'),
              ('LabF_684491', '1928.0000'),
              ('LabF_684495', '19228.0000'),
              ('LabF_684499', '3029.0000'),
              ('LabF_684503', '23277.0000')]),
 OrderedDict([('metabolite_name', '1-monostearin'),
              ('LabF_684508', '9771.0000'),
              ('LabF_684512', '6521.0000'),
              ('LabF_684516', '1302.0000'),
              ('LabF_684520', '2781.0000'),
              ('LabF_684524', '5789.0000'),
              ('LabF_684528', '4338.0000'),
              ('LabF_684483', '16848.0000'),
              ('LabF_684487', '10206.0000'),
              ('LabF_684491', '9398.0000'),
              ('LabF_684495', '1013.0000'),
              ('LabF_684499', '4190.0000'),
              ('LabF_684503', '11114.0000')]),
 OrderedDict([('metabolite_name', '2-hydroxybutanoic acid'),
              ('LabF_684508', '13238.0000'),
              ('LabF_684512', '29774.0000'),
              ('LabF_684516', '4134.0000'),
              ('LabF_684520', '4419.0000'),
              ('LabF_684524', '13334.0000'),
              ('LabF_684528', '2115.0000'),
              ('LabF_684483', '11587.0000'),
              ('LabF_684487', '65635.0000'),
              ('LabF_684491', '32433.0000'),
              ('LabF_684495', '1823.0000'),
              ('LabF_684499', '4429.0000'),
              ('LabF_684503', '30427.0000')]),
 OrderedDict([('metabolite_name', '2-hydroxyglutaric acid'),
              ('LabF_684508', '7160.0000'),
              ('LabF_684512', '11501.0000'),
              ('LabF_684516', '3202.0000'),
              ('LabF_684520', '17238.0000'),
              ('LabF_684524', '20376.0000'),
              ('LabF_684528', '1109.0000'),
              ('LabF_684483', '8276.0000'),
              ('LabF_684487', '12402.0000'),
              ('LabF_684491', '20964.0000'),
              ('LabF_684495', '25913.0000'),
              ('LabF_684499', '2709.0000'),
              ('LabF_684503', '70972.0000')]),
 OrderedDict([('metabolite_name', '2-ketoisocaproic acid'),
              ('LabF_684508', '812.0000'),
              ('LabF_684512', '2011.0000'),
              ('LabF_684516', '738.0000'),
              ('LabF_684520', '2550.0000'),
              ('LabF_684524', '871.0000'),
              ('LabF_684528', '628.0000'),
              ('LabF_684483', '2096.0000'),
              ('LabF_684487', '3472.0000'),
              ('LabF_684491', '10669.0000'),
              ('LabF_684495', '432.0000'),
              ('LabF_684499', '1055.0000'),
              ('LabF_684503', '1005.0000')]),
 OrderedDict([('metabolite_name', '2-monopalmitin'),
              ('LabF_684508', '1511.0000'),
              ('LabF_684512', '622.0000'),
              ('LabF_684516', '883.0000'),
              ('LabF_684520', '796.0000'),
              ('LabF_684524', '623.0000'),
              ('LabF_684528', '5716.0000'),
              ('LabF_684483', '3405.0000'),
              ('LabF_684487', '3196.0000'),
              ('LabF_684491', '1457.0000'),
              ('LabF_684495', '1416.0000'),
              ('LabF_684499', '1275.0000'),
              ('LabF_684503', '14445.0000')])]

Manipulating Data From a Single MWTabFile

In order to change values within MWTabFile, descend into the appropriate level using square bracket accessors and set a new value.

  • Change regular “key-value” pairs:
In [18]:
# access phone number information
mwfile["PROJECT"]["PHONE"]
Out[18]:
'-'
In [19]:
# change phone number information
mwfile["PROJECT"]["PHONE"] = "1-530-754-8258"
In [20]:
# check that it has been modified
mwfile["PROJECT"]["PHONE"]
Out[20]:
'1-530-754-8258'
  • Change #SUBJECT_SAMPLE_FACTORS values:
In [21]:
# access the first subject sample factor by index
mwfile["SUBJECT_SAMPLE_FACTORS"]["SUBJECT_SAMPLE_FACTORS"][0]
Out[21]:
OrderedDict([('subject_type', '-'),
             ('local_sample_id', 'LabF_684508'),
             ('factors', 'Transplantation:After transplantation'),
             ('additional_sample_data', '')])
In [22]:
# provide additional details to the first subject sample factor
mwfile["SUBJECT_SAMPLE_FACTORS"]["SUBJECT_SAMPLE_FACTORS"][0]["additional_sample_data"] = "Additional details"
In [23]:
# check that it has been modified
mwfile["SUBJECT_SAMPLE_FACTORS"]["SUBJECT_SAMPLE_FACTORS"][0]
Out[23]:
OrderedDict([('subject_type', '-'),
             ('local_sample_id', 'LabF_684508'),
             ('factors', 'Transplantation:After transplantation'),
             ('additional_sample_data', 'Additional details')])

Printing a MWTabFile and its Components

  • Print entire file in mwTab format.
In [24]:
mwfile.print_file(file_format="mwtab")
#METABOLOMICS WORKBENCH STUDY_ID:ST000002 ANALYSIS_ID:AN000002
VERSION                 1
CREATED_ON              2016-09-17
#PROJECT
PR:PROJECT_TITLE                        Intestinal Samples II pre/post transplantation
PR:PROJECT_TYPE                         Human intestinal samples
PR:PROJECT_SUMMARY                      Intestinal Samples II pre/post transplantation
PR:INSTITUTE                            University of California, Davis
PR:DEPARTMENT                           Davis Genome Center
PR:LABORATORY                           Fiehn
PR:LAST_NAME                            Fiehn
PR:FIRST_NAME                           Oliver
PR:ADDRESS                              451 E. Health Sci. Drive, Davis, California 95616, USA
PR:EMAIL                                ofiehn@ucdavis.edu
PR:PHONE                                1-530-754-8258
#STUDY
ST:STUDY_TITLE                          Intestinal Samples II pre/post transplantation
ST:STUDY_TYPE                           MS analysis
ST:STUDY_SUMMARY                        Intestinal Samples II pre/post transplantation
ST:INSTITUTE                            University of California, Davis
ST:DEPARTMENT                           Davis Genome Center
ST:LABORATORY                           Fiehn
ST:LAST_NAME                            Hartman
ST:FIRST_NAME                           Amber
ST:ADDRESS                              451 E. Health Sci. Drive, Davis, California 95616, USA
ST:EMAIL                                -
ST:PHONE                                -
ST:NUM_GROUPS                           2
ST:TOTAL_SUBJECTS                       12
#SUBJECT
SU:SUBJECT_TYPE                         Human
SU:SUBJECT_SPECIES                      Homo sapiens
SU:TAXONOMY_ID                          9606
SU:SPECIES_GROUP                        Human
#SUBJECT_SAMPLE_FACTORS:                SUBJECT(optional)[tab]SAMPLE[tab]FACTORS(NAME:VALUE pairs separated by |)[tab]Additional sample data
SUBJECT_SAMPLE_FACTORS                  -       LabF_684508     Transplantation:After transplantation   Additional details
SUBJECT_SAMPLE_FACTORS                  -       LabF_684512     Transplantation:After transplantation
SUBJECT_SAMPLE_FACTORS                  -       LabF_684516     Transplantation:After transplantation
SUBJECT_SAMPLE_FACTORS                  -       LabF_684520     Transplantation:After transplantation
SUBJECT_SAMPLE_FACTORS                  -       LabF_684524     Transplantation:After transplantation
SUBJECT_SAMPLE_FACTORS                  -       LabF_684528     Transplantation:After transplantation
SUBJECT_SAMPLE_FACTORS                  -       LabF_684483     Transplantation:Before transplantation
SUBJECT_SAMPLE_FACTORS                  -       LabF_684487     Transplantation:Before transplantation
SUBJECT_SAMPLE_FACTORS                  -       LabF_684491     Transplantation:Before transplantation
SUBJECT_SAMPLE_FACTORS                  -       LabF_684495     Transplantation:Before transplantation
SUBJECT_SAMPLE_FACTORS                  -       LabF_684499     Transplantation:Before transplantation
SUBJECT_SAMPLE_FACTORS                  -       LabF_684503     Transplantation:Before transplantation
#COLLECTION
CO:COLLECTION_SUMMARY                   -
CO:SAMPLE_TYPE                          Tissue
#TREATMENT
TR:TREATMENT_SUMMARY                    -
TR:TREATMENT_PROTOCOL_COMMENTS          Before transplantation | After transplanation
#SAMPLEPREP
SP:SAMPLEPREP_SUMMARY                   -
SP:EXTRACTION_METHOD                    Extraction: Proteomics 2004, 4, 78-83; Splitratio: splitless 25 purge
#CHROMATOGRAPHY
CH:CHROMATOGRAPHY_TYPE                  -
CH:INSTRUMENT_NAME                      Agilent 6890N Gas Chromatograph
CH:COLUMN_NAME                          -
#ANALYSIS
AN:ANALYSIS_TYPE                        MS
#MS
MS:INSTRUMENT_NAME                      Leco Pegasus III GC-TOF
MS:INSTRUMENT_TYPE                      GC-TOF
MS:MS_TYPE                              EI
MS:ION_MODE                             POSITIVE
#MS_METABOLITE_DATA
MS_METABOLITE_DATA:UNITS        Peak height
MS_METABOLITE_DATA_START
Samples LabF_684508     LabF_684512     LabF_684516     LabF_684520     LabF_684524     LabF_684528     LabF_684483     LabF_684487     LabF_684491     LabF_684495     LabF_684499     LabF_684503
Factors Transplantation:After transplantation   Transplantation:After transplantation   Transplantation:After transplantation   Transplantation:After transplantation   Transplantation:After transplantation   Transplantation:After transplantation   Transplantation:Before transplantation  Transplantation:Before transplantation  Transplantation:Before transplantation  Transplantation:Before transplantation  Transplantation:Before transplantation  Transplantation:Before transplantation
1-monoolein     6047.0000       2902.0000       1452.0000       3428.0000       2985.0000       16334.0000      244142.0000     6968.0000       1928.0000       19228.0000      3029.0000       23277.0000
1-monostearin   9771.0000       6521.0000       1302.0000       2781.0000       5789.0000       4338.0000       16848.0000      10206.0000      9398.0000       1013.0000       4190.0000       11114.0000
2-hydroxybutanoic acid  13238.0000      29774.0000      4134.0000       4419.0000       13334.0000      2115.0000       11587.0000      65635.0000      32433.0000      1823.0000       4429.0000       30427.0000
2-hydroxyglutaric acid  7160.0000       11501.0000      3202.0000       17238.0000      20376.0000      1109.0000       8276.0000       12402.0000      20964.0000      25913.0000      2709.0000       70972.0000
2-ketoisocaproic acid   812.0000        2011.0000       738.0000        2550.0000       871.0000        628.0000        2096.0000       3472.0000       10669.0000      432.0000        1055.0000       1005.0000
2-monopalmitin  1511.0000       622.0000        883.0000        796.0000        623.0000        5716.0000       3405.0000       3196.0000       1457.0000       1416.0000       1275.0000       14445.0000
MS_METABOLITE_DATA_END
#METABOLITES
METABOLITES_START
metabolite_name moverz_quant    ri      ri_type pubchem_id      inchi_key       kegg_id other_id        other_id_type
1-monoolein     129     952993  Fiehn   5283468                 213963  BinBase
1-monostearin   399     959625  Fiehn   107036          D01947  202835  BinBase
2-hydroxybutanoic acid  131     258175  Fiehn   11266           C05984  199800  BinBase
2-hydroxyglutaric acid  129     506359  Fiehn   43              C02630  214409  BinBase
2-ketoisocaproic acid   200     310629  Fiehn   70              C00233  213388  BinBase
2-monopalmitin  218     889972  Fiehn   123409                  233239  BinBase
METABOLITES_END
#END
  • Print entire file in JSON format.
In [25]:
mwfile.print_file(file_format="json")
{
    "METABOLOMICS WORKBENCH": {
        "HEADER": "#METABOLOMICS WORKBENCH STUDY_ID:ST000002 ANALYSIS_ID:AN000002",
        "STUDY_ID": "ST000002",
        "ANALYSIS_ID": "AN000002",
        "VERSION": "1",
        "CREATED_ON": "2016-09-17"
    },
    "PROJECT": {
        "PROJECT_TITLE": "Intestinal Samples II pre/post transplantation",
        "PROJECT_TYPE": "Human intestinal samples",
        "PROJECT_SUMMARY": "Intestinal Samples II pre/post transplantation",
        "INSTITUTE": "University of California, Davis",
        "DEPARTMENT": "Davis Genome Center",
        "LABORATORY": "Fiehn",
        "LAST_NAME": "Fiehn",
        "FIRST_NAME": "Oliver",
        "ADDRESS": "451 E. Health Sci. Drive, Davis, California 95616, USA",
        "EMAIL": "ofiehn@ucdavis.edu",
        "PHONE": "1-530-754-8258"
    },
    "STUDY": {
        "STUDY_TITLE": "Intestinal Samples II pre/post transplantation",
        "STUDY_TYPE": "MS analysis",
        "STUDY_SUMMARY": "Intestinal Samples II pre/post transplantation",
        "INSTITUTE": "University of California, Davis",
        "DEPARTMENT": "Davis Genome Center",
        "LABORATORY": "Fiehn",
        "LAST_NAME": "Hartman",
        "FIRST_NAME": "Amber",
        "ADDRESS": "451 E. Health Sci. Drive, Davis, California 95616, USA",
        "EMAIL": "-",
        "PHONE": "-",
        "NUM_GROUPS": "2",
        "TOTAL_SUBJECTS": "12"
    },
    "SUBJECT": {
        "SUBJECT_TYPE": "Human",
        "SUBJECT_SPECIES": "Homo sapiens",
        "TAXONOMY_ID": "9606",
        "SPECIES_GROUP": "Human"
    },
    "SUBJECT_SAMPLE_FACTORS": {
        "SUBJECT_SAMPLE_FACTORS": [
            {
                "subject_type": "-",
                "local_sample_id": "LabF_684508",
                "factors": "Transplantation:After transplantation",
                "additional_sample_data": "Additional details"
            },
            {
                "subject_type": "-",
                "local_sample_id": "LabF_684512",
                "factors": "Transplantation:After transplantation",
                "additional_sample_data": ""
            },
            {
                "subject_type": "-",
                "local_sample_id": "LabF_684516",
                "factors": "Transplantation:After transplantation",
                "additional_sample_data": ""
            },
            {
                "subject_type": "-",
                "local_sample_id": "LabF_684520",
                "factors": "Transplantation:After transplantation",
                "additional_sample_data": ""
            },
            {
                "subject_type": "-",
                "local_sample_id": "LabF_684524",
                "factors": "Transplantation:After transplantation",
                "additional_sample_data": ""
            },
            {
                "subject_type": "-",
                "local_sample_id": "LabF_684528",
                "factors": "Transplantation:After transplantation",
                "additional_sample_data": ""
            },
            {
                "subject_type": "-",
                "local_sample_id": "LabF_684483",
                "factors": "Transplantation:Before transplantation",
                "additional_sample_data": ""
            },
            {
                "subject_type": "-",
                "local_sample_id": "LabF_684487",
                "factors": "Transplantation:Before transplantation",
                "additional_sample_data": ""
            },
            {
                "subject_type": "-",
                "local_sample_id": "LabF_684491",
                "factors": "Transplantation:Before transplantation",
                "additional_sample_data": ""
            },
            {
                "subject_type": "-",
                "local_sample_id": "LabF_684495",
                "factors": "Transplantation:Before transplantation",
                "additional_sample_data": ""
            },
            {
                "subject_type": "-",
                "local_sample_id": "LabF_684499",
                "factors": "Transplantation:Before transplantation",
                "additional_sample_data": ""
            },
            {
                "subject_type": "-",
                "local_sample_id": "LabF_684503",
                "factors": "Transplantation:Before transplantation",
                "additional_sample_data": ""
            }
        ]
    },
    "COLLECTION": {
        "COLLECTION_SUMMARY": "-",
        "SAMPLE_TYPE": "Tissue"
    },
    "TREATMENT": {
        "TREATMENT_SUMMARY": "-",
        "TREATMENT_PROTOCOL_COMMENTS": "Before transplantation | After transplanation"
    },
    "SAMPLEPREP": {
        "SAMPLEPREP_SUMMARY": "-",
        "EXTRACTION_METHOD": "Extraction: Proteomics 2004, 4, 78-83; Splitratio: splitless 25 purge"
    },
    "CHROMATOGRAPHY": {
        "CHROMATOGRAPHY_TYPE": "-",
        "INSTRUMENT_NAME": "Agilent 6890N Gas Chromatograph",
        "COLUMN_NAME": "-"
    },
    "ANALYSIS": {
        "ANALYSIS_TYPE": "MS"
    },
    "MS": {
        "INSTRUMENT_NAME": "Leco Pegasus III GC-TOF",
        "INSTRUMENT_TYPE": "GC-TOF",
        "MS_TYPE": "EI",
        "ION_MODE": "POSITIVE"
    },
    "MS_METABOLITE_DATA": {
        "MS_METABOLITE_DATA:UNITS": "Peak height",
        "MS_METABOLITE_DATA_START": {
            "Samples": [
                "LabF_684508",
                "LabF_684512",
                "LabF_684516",
                "LabF_684520",
                "LabF_684524",
                "LabF_684528",
                "LabF_684483",
                "LabF_684487",
                "LabF_684491",
                "LabF_684495",
                "LabF_684499",
                "LabF_684503"
            ],
            "Factors": [
                "Transplantation:After transplantation",
                "Transplantation:After transplantation",
                "Transplantation:After transplantation",
                "Transplantation:After transplantation",
                "Transplantation:After transplantation",
                "Transplantation:After transplantation",
                "Transplantation:Before transplantation",
                "Transplantation:Before transplantation",
                "Transplantation:Before transplantation",
                "Transplantation:Before transplantation",
                "Transplantation:Before transplantation",
                "Transplantation:Before transplantation"
            ],
            "DATA": [
                {
                    "metabolite_name": "1-monoolein",
                    "LabF_684508": "6047.0000",
                    "LabF_684512": "2902.0000",
                    "LabF_684516": "1452.0000",
                    "LabF_684520": "3428.0000",
                    "LabF_684524": "2985.0000",
                    "LabF_684528": "16334.0000",
                    "LabF_684483": "244142.0000",
                    "LabF_684487": "6968.0000",
                    "LabF_684491": "1928.0000",
                    "LabF_684495": "19228.0000",
                    "LabF_684499": "3029.0000",
                    "LabF_684503": "23277.0000"
                },
                {
                    "metabolite_name": "1-monostearin",
                    "LabF_684508": "9771.0000",
                    "LabF_684512": "6521.0000",
                    "LabF_684516": "1302.0000",
                    "LabF_684520": "2781.0000",
                    "LabF_684524": "5789.0000",
                    "LabF_684528": "4338.0000",
                    "LabF_684483": "16848.0000",
                    "LabF_684487": "10206.0000",
                    "LabF_684491": "9398.0000",
                    "LabF_684495": "1013.0000",
                    "LabF_684499": "4190.0000",
                    "LabF_684503": "11114.0000"
                },
                {
                    "metabolite_name": "2-hydroxybutanoic acid",
                    "LabF_684508": "13238.0000",
                    "LabF_684512": "29774.0000",
                    "LabF_684516": "4134.0000",
                    "LabF_684520": "4419.0000",
                    "LabF_684524": "13334.0000",
                    "LabF_684528": "2115.0000",
                    "LabF_684483": "11587.0000",
                    "LabF_684487": "65635.0000",
                    "LabF_684491": "32433.0000",
                    "LabF_684495": "1823.0000",
                    "LabF_684499": "4429.0000",
                    "LabF_684503": "30427.0000"
                },
                {
                    "metabolite_name": "2-hydroxyglutaric acid",
                    "LabF_684508": "7160.0000",
                    "LabF_684512": "11501.0000",
                    "LabF_684516": "3202.0000",
                    "LabF_684520": "17238.0000",
                    "LabF_684524": "20376.0000",
                    "LabF_684528": "1109.0000",
                    "LabF_684483": "8276.0000",
                    "LabF_684487": "12402.0000",
                    "LabF_684491": "20964.0000",
                    "LabF_684495": "25913.0000",
                    "LabF_684499": "2709.0000",
                    "LabF_684503": "70972.0000"
                },
                {
                    "metabolite_name": "2-ketoisocaproic acid",
                    "LabF_684508": "812.0000",
                    "LabF_684512": "2011.0000",
                    "LabF_684516": "738.0000",
                    "LabF_684520": "2550.0000",
                    "LabF_684524": "871.0000",
                    "LabF_684528": "628.0000",
                    "LabF_684483": "2096.0000",
                    "LabF_684487": "3472.0000",
                    "LabF_684491": "10669.0000",
                    "LabF_684495": "432.0000",
                    "LabF_684499": "1055.0000",
                    "LabF_684503": "1005.0000"
                },
                {
                    "metabolite_name": "2-monopalmitin",
                    "LabF_684508": "1511.0000",
                    "LabF_684512": "622.0000",
                    "LabF_684516": "883.0000",
                    "LabF_684520": "796.0000",
                    "LabF_684524": "623.0000",
                    "LabF_684528": "5716.0000",
                    "LabF_684483": "3405.0000",
                    "LabF_684487": "3196.0000",
                    "LabF_684491": "1457.0000",
                    "LabF_684495": "1416.0000",
                    "LabF_684499": "1275.0000",
                    "LabF_684503": "14445.0000"
                }
            ]
        }
    },
    "METABOLITES": {
        "METABOLITES_START": {
            "Fields": [
                "metabolite_name",
                "moverz_quant",
                "ri",
                "ri_type",
                "pubchem_id",
                "inchi_key",
                "kegg_id",
                "other_id",
                "other_id_type"
            ],
            "DATA": [
                {
                    "metabolite_name": "1-monoolein",
                    "moverz_quant": "129",
                    "ri": "952993",
                    "ri_type": "Fiehn",
                    "pubchem_id": "5283468",
                    "inchi_key": "",
                    "kegg_id": "",
                    "other_id": "213963",
                    "other_id_type": "BinBase"
                },
                {
                    "metabolite_name": "1-monostearin",
                    "moverz_quant": "399",
                    "ri": "959625",
                    "ri_type": "Fiehn",
                    "pubchem_id": "107036",
                    "inchi_key": "",
                    "kegg_id": "D01947",
                    "other_id": "202835",
                    "other_id_type": "BinBase"
                },
                {
                    "metabolite_name": "2-hydroxybutanoic acid",
                    "moverz_quant": "131",
                    "ri": "258175",
                    "ri_type": "Fiehn",
                    "pubchem_id": "11266",
                    "inchi_key": "",
                    "kegg_id": "C05984",
                    "other_id": "199800",
                    "other_id_type": "BinBase"
                },
                {
                    "metabolite_name": "2-hydroxyglutaric acid",
                    "moverz_quant": "129",
                    "ri": "506359",
                    "ri_type": "Fiehn",
                    "pubchem_id": "43",
                    "inchi_key": "",
                    "kegg_id": "C02630",
                    "other_id": "214409",
                    "other_id_type": "BinBase"
                },
                {
                    "metabolite_name": "2-ketoisocaproic acid",
                    "moverz_quant": "200",
                    "ri": "310629",
                    "ri_type": "Fiehn",
                    "pubchem_id": "70",
                    "inchi_key": "",
                    "kegg_id": "C00233",
                    "other_id": "213388",
                    "other_id_type": "BinBase"
                },
                {
                    "metabolite_name": "2-monopalmitin",
                    "moverz_quant": "218",
                    "ri": "889972",
                    "ri_type": "Fiehn",
                    "pubchem_id": "123409",
                    "inchi_key": "",
                    "kegg_id": "",
                    "other_id": "233239",
                    "other_id_type": "BinBase"
                }
            ]
        }
    }
}
  • Print single block in mwTab format.
In [26]:
mwfile.print_block("STUDY", file_format="mwtab")
ST:STUDY_TITLE                          Intestinal Samples II pre/post transplantation
ST:STUDY_TYPE                           MS analysis
ST:STUDY_SUMMARY                        Intestinal Samples II pre/post transplantation
ST:INSTITUTE                            University of California, Davis
ST:DEPARTMENT                           Davis Genome Center
ST:LABORATORY                           Fiehn
ST:LAST_NAME                            Hartman
ST:FIRST_NAME                           Amber
ST:ADDRESS                              451 E. Health Sci. Drive, Davis, California 95616, USA
ST:EMAIL                                -
ST:PHONE                                -
ST:NUM_GROUPS                           2
ST:TOTAL_SUBJECTS                       12
  • Print single block in JSON format.
In [27]:
mwfile.print_block("STUDY", file_format="json")
{
    "STUDY_TITLE": "Intestinal Samples II pre/post transplantation",
    "STUDY_TYPE": "MS analysis",
    "STUDY_SUMMARY": "Intestinal Samples II pre/post transplantation",
    "INSTITUTE": "University of California, Davis",
    "DEPARTMENT": "Davis Genome Center",
    "LABORATORY": "Fiehn",
    "LAST_NAME": "Hartman",
    "FIRST_NAME": "Amber",
    "ADDRESS": "451 E. Health Sci. Drive, Davis, California 95616, USA",
    "EMAIL": "-",
    "PHONE": "-",
    "NUM_GROUPS": "2",
    "TOTAL_SUBJECTS": "12"
}

Writing data from a MWTabFile object into a file

Data from a MWTabFile can be written into file in original mwTab format or in equivalent JSON format using write():

  • Writing into a mwTab formatted file:
In [28]:
with open("out/ST000001_AN000001_modified.txt", "w") as outfile:
    mwfile.write(outfile, file_format="mwtab")
  • Writing into a JSON file:
In [29]:
with open("out/ST000001_AN000001_modified.json", "w") as outfile:
    mwfile.write(outfile, file_format="json")

Converting mwTab Files

mwTab files can be converted between the mwTab file format and their JSON representation using the mwtab.converter module.

One-to-one file conversions

  • Converting from the mwTab file format into its equivalent JSON file format:
In [30]:
from mwtab.converter import Converter

# Using valid ANALYSIS_ID to access file from URL: from_path="1"
converter = Converter(from_path="1", to_path="out/ST000001_AN000001.json",
                      from_format="mwtab", to_format="json")
converter.convert()
  • Converting from JSON file format back to mwTab file format:
In [31]:
from mwtab.converter import Converter

converter = Converter(from_path="out/ST000001_AN000001.json", to_path="out/ST000001_AN000001.txt",
                      from_format="json", to_format="mwtab")
converter.convert()

Many-to-many files conversions

  • Converting from the directory of mwTab formatted files into their equivalent JSON formatted files:
In [32]:
from mwtab.converter import Converter

converter = Converter(from_path="mwfiles_dir_mwtab",
                      to_path="out/mwfiles_dir_json",
                      from_format="mwtab",
                      to_format="json")
converter.convert()
  • Converting from the directory of JSON formatted files into their equivalent mwTab formatted files:
In [33]:
from mwtab.converter import Converter

converter = Converter(from_path="out/mwfiles_dir_json",
                      to_path="out/mwfiles_dir_mwtab",
                      from_format="json",
                      to_format="mwtab")
converter.convert()

Note

Many-to-many files and one-to-one file conversions are available. See mwtab.converter for full list of available conversions.

Command-Line Interface

The mwtab Command-Line Interface provides the following functionality:
  • Convert from the mwTab file format into its equivalent JSON file format and vice versa.
  • Validate the mwTab formatted file.
In [34]:
! python3 -m mwtab --help
The mwtab command-line interface
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Usage:
    mwtab -h | --help
    mwtab --version
    mwtab convert (<from-path> <to-path>) [--from-format=<format>] [--to-format=<format>] [--validate] [--mw-rest=<url>] [--verbose]
    mwtab validate <from-path> [--mw-rest=<url>] [--verbose]

Options:
    -h, --help                      Show this screen.
    --version                       Show version.
    --verbose                       Print what files are processing.
    --validate                      Validate the mwTab file.
    --from-format=<format>          Input file format, available formats: mwtab, json [default: mwtab].
    --to-format=<format>            Output file format, available formats: mwtab, json [default: json].
    --mw-rest=<url>                 URL to MW REST interface [default: http://www.metabolomicsworkbench.org/rest/study/analysis_id/{}/mwtab/txt].

Converting mwTab files in bulk

CLI one-to-one file conversions

  • Convert from a local file in mwTab format to a local file in JSON format:
In [35]:
! python3 -m mwtab convert ST000001_AN000001.txt out/ST000001_AN000001.json \
          --from-format=mwtab --to-format=json
  • Convert from a local file in JSON format to a local file in mwTab format:
In [36]:
! python3 -m mwtab convert ST000001_AN000001.json out/ST000001_AN000001.txt \
          --from-format=json --to-format=mwtab
  • Convert from a compressed local file in mwTab format to a compressed local file in JSON format:
In [37]:
! python3 -m mwtab convert ST000001_AN000001.txt.gz out/ST000001_AN000001.json.gz \
          --from-format=mwtab --to-format=json
  • Convert from a compressed local file in JSON format to a compressed local file in mwTab format:
In [38]:
! python3 -m mwtab convert ST000001_AN000001.json.gz out/ST000001_AN000001.txt.gz \
          --from-format=json --to-format=mwtab
  • Convert from a uncompressed URL file in mwTab format to a compressed local file in JSON format:
In [39]:
! python3 -m mwtab convert 1 out/ST000001_AN000001.json.bz2 \
          --from-format=mwtab --to-format=json

Note

See mwtab.converter for full list of available conversions.

CLI Many-to-many files conversions

  • Convert from a directory of files in mwTab format to a directory of files in JSON format:
In [40]:
! python3 -m mwtab convert mwfiles_dir_mwtab out/mwfiles_dir_json \
          --from-format=mwtab --to-format=json
  • Convert from a directory of files in JSON format to a directory of files in mwTab format:
In [41]:
! python3 -m mwtab convert mwfiles_dir_json out/mwfiles_dir_mwtab \
          --from-format=json --to-format=mwtab
  • Convert from a directory of files in mwTab format to a zip archive of files in JSON format:
In [42]:
! python3 -m mwtab convert mwfiles_dir_mwtab out/mwfiles_json.zip \
          --from-format=mwtab --to-format=json
  • Convert from a compressed tar archive of files in JSON format to a directory of files in mwTab format:
In [43]:
! python3 -m mwtab convert mwfiles_json.tar.gz out/mwfiles_dir_mwtab \
          --from-format=json --to-format=mwtab
  • Convert from a zip archive of files in mwTab format to a compressed tar archive of files in JSON format:
In [44]:
! python3 -m mwtab convert mwfiles_mwtab.zip out/mwfiles_json.tar.bz2 \
          --from-format=mwtab --to-format=json

Note

See mwtab.converter for full list of available conversions.

Validating mwTab files

The mwtab package provides the validate_file() function that can validate files based on a JSON schema definition. The mwtab.mwschema contains schema definitions for every block of mwTab formatted file, i.e. it lists the types of attributes (e.g. str as well as specifies which keys are optional and which are required).

  • To validate file(s) simply call the validate command and provide path to file(s):
In [45]:
! python3 -m mwtab validate 1