3.1. Main

This is the main module in the pcpfm. All functions that are intended to be called by an end user are located here, although API access to the underlying modules is possible.

Each function in the Main object maps to a single command on the command line.

pcpfm.main.CLI()[source]

This function is called when ‘pcpfm’ is called in the terminal.

Simply a wrapper around main()

class pcpfm.main.Main[source]

Bases: object

This is simply a wrapper around all the CLI functions. By putting them in this object, we can do clever things with getattr()

static QAQC(params)[source]

This will perform various QAQC metrics on the indicated feature table. By default “all” QAQC metrics are performed which are detailed in the feature table object.

This requires passing -i with the experiment’s path. The feature table on which to perform the procedures must be given as well using either –table_moniker or -tm.

TODO: this will be deprecated in the future and performed on lazily either during report generation or qa/qc filtering.

The fields –color_by, –text_by, –marker_by can specify how to generate the figures this method generates. For each of these commands, a JSON-formatted list of sequence file fields on which to generate the corresponding cosmetic item. These are optional.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static asari(params)[source]

Perform asari on the experiment’s acquisitions. They must be have been converted or provided in .mzML format first.

The command by default assumes a ppm of 5 and the ionization mode of the experiment will be automatically inferred. If extra arguments are desired for asari, they can be provided using –extra_asari on the command line.

This requires passing -i with the experiment’s path.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static assemble(params)[source]

This is the first command in any pcpfm analysis. Starting with a sequence file, specified by ‘-s’, an output directory by ‘-o’ and a project name specified by ‘-j’, this will create the experiment directory and initialize the experiment.json.

Additional arguments include the ability to add a filter on sequence file entries using the ‘–filter’ option and a JSON dictionaries.

<<TODO>>

Additionally, the –name_field, and –path_field options will allow the user to specify what field name should be used for the name and filepath of the acquisitions. Also using –skip_list and a .txt formatted file containing sample_names to ignore, entries can be excluded from an analysis.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static assemble_study(params)[source]

static batch_correct(params)[source]

Use pyCombat to correct for batch effects using the specified batch identifier.

This requires passing -i with the experiment’s path.
This requires passing -tm with the feature table’s moniker.
This requires passing -nm with the new feature table’s moniker.
This requires passing –by_batch with the field specifying the
batch on which to correct

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static blank_masking(params)[source]

Print the list of empirical compounds and feature tables registered wiht the experiment object.

This requires passing -i with the experiment’s path.
This requires passing -tm with the feature table’s moniker.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static build_empCpds(params)[source]

For a given feature table, generate empirical compounds from its features. This uses a user-defined set of isotopes and adducts.

These can be overwritten, along with other parameters using the follwoing options:

–khipu_isotopes specifies the isotopes to use
–khipu_adducts specifies which adducts to use
–khipu_extended_adducts specifies which extended adducts to use
–khipu_adducts_neg specifies which adducts to use if mode is neg
–khipu_adducts_pos specifies which adducts to use if mode is pos
–add_singletons specifies if we should include single features in the empCpds, i.e., just one peak.
–khipu_rt_tolerance the rtime range for which to build khipus
–ppm, the mass tolerance for which to build khipus
–khipu_charges specifies which charges to consider (absolute Z)

For details on these parameters, please see Khipu’s documentation

This requires passing -i with the experiment’s path.
This requires passing -tm with the moniker of feature table
This requires passing -em with the desired empcpd moniker

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static convert(params)[source]

This will convert all .raw files to .mzML using a specified command. To provide the command, you can either modify the config file OR pass the command using the –conversion_command. for this use case, use whatever command will do the conversion but where the .raw file path would be, substitute with $RAW_PATH and where the output would go, put $OUT_PATH.

This requires passing -i with the experiment’s path.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static delete(params)[source]

Delete a specified feature table or empCpd list by moniker.

This requires passing -i with the experiment’s path.
This requires passing -tm with the table’s moniker to delete
or
This requires passing -em with the empcpd’s moniker to delete

Note: you cannot delete the feature tables generated by asari using this method.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static download_extras(params)[source]

This method will download the MoNA LC MS/MS library, and the HMDBv5 and LMSD in a JMS-compliant format. Currently this downloads from my google drive (I know not ideal). Will be fixed in the future.

By using this method you agree to the terms and conditions laid forth in the licenses for each of those repositories

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static drop_missing_features(params)[source]

Drop samples below a given percentile of inclusion.

–feature_retention_percentile defines this cutoff
–by_batch designates the field to group into batches, if
provided, the percentile is caluclated per batch first
–feature_drop_logic can be “or” or “and” and specifies how
handle the various batches. For example, if “or”, a feature will be dropped if it is below the cutoff in any batch.
This requires passing -i with the experiment’s path.
This requires passing -tm with the feature table’s moniker.
This requires passing -nm with the new feature table’s moniker.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static drop_outliers(params)[source]

This method drop samples from a feature table using the filter in the autodrop json.

By default this is a |Z| > 2.5 filter on the number of features. This Z-score is calculated using the median.

This requires passing -i with the experiment’s path.
This requires passing -tm with the feature table’s moniker.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static drop_samples(params)[source]

This method drop samples from a feature table. There are different modes to use this command in.

–drop_name will drop a sample with a given name
–filter will drop samples using a JSON formatted filter
–qaqc_filter drops samples using a JSON filter based on qaqc filters
–drop_field + –drop_value will drop all samples with a given value for a given field in the sequence file.

Optionally each command can be augmented by passing the option –drop_others which will reverse the logic of the drop.

This requires passing -i with the experiment’s path.
This requires passing -tm with the feature table’s moniker.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static finish(params)[source]

This command is a no-op command for marking the end of an anlysis in the command history.

This requires passing -i with the experiment’s path.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static generate_output(params)[source]

This command generates the three table output for downstream analysis. This includes a feature table, an annotation table, and finally the sample metadata. All results are stored in the results subdirectory according to the specified moniker.

This requires passing -i with the experiment’s path.
This requires passing -tm for the table moniker to include
This requires passing -em for the empcpd moniker to include
This requires passing -nm for the new moniker to save
generated results using.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static impute(params)[source]

Replace remaining missing values with a value to aid statistics

–interpolation_ratio this value specifies what to multiply the value generated by the interpolate_method before replacement
–interpolate_method currently limited to only min
–by_batch this field specifies what field to group samples by and interpolates within each group (probably a bad idea)
This requires passing -i with the experiment’s path.
This requires passing -tm with the feature table’s moniker.
This requires passing -nm with the new feature table’s moniker.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static l1a_annotate(params)[source]

This will generate level 1 annotations on a empcpd list using a csv file(s) with compound names, retention times and m/z values.

–targets are a list of csv filepaths with mz, retention times, compound names with column names, “mz”, “rtime”, “CompoundName” –annot_mz_tolerance this is the ppm cutoff for the precursor

ion search

–annot_rt_tolerance this is the rtime cutoff, in sec, for: the precursor ion search

This requires passing -em with the empCpd’s moniker to annotate This requires passing -nm with the new moniker for the table or

empcpd list.

Parameters:: params – This is the master configuration file generated by

parsing the command line arguments plus the defaults.

static l1b_annotate(params)[source]

This will generate level 1 annotations on a empcpd list using a csv file(s) with compound names, retention times and m/z values.

–targets: a list of csv filepaths with mz, retention times, compound names with column names, “mz”, “R”, “CompoundName”
–annot_mz_tolerance: the ppm cutoff for the precursor ion search, default = 5 ppm
–annot_rt_tolerance: the rtime cutoff, in sec, for the precursor ion search, default = 30 sec

This requires passing -i with the experiment’s path.
This requires passing -em with the empCpd’s moniker to annotate
This requires passing -nm with the new moniker for the empcpd list.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static l2_annotate(params)[source]

This will generate MS2 annotations on a provided feature table or empCpd list. Requires that MS2 spectra first be mapped.

–msp_files: Designate the path to the MSP files to use for annotation.
–annot_mz_tolerance: PPM cutoff for the precursor ion search, default = 5ppm.
–annot_rt_tolerance: Time cutoff, in seconds, for the precursor ion search, default = 30sec.
–ms2_similarity_metric: Name of any matchms method for comparing MS2 spectra, default = CosineHungarian.
–ms2_min_peak: Minimum number of matching peaks required for an MS2 match, default = 3.

This requires passing -i with the experiment’s path.
This requires passing -tm with the table’s moniker to annotate or
This requires passing -em with the empCpd’s moniker to annotate
This requires passing -nm with the new moniker for the table or empcpd list.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static l4_annotate(params)[source]

This will generate MS1 annotations on a provided feature table or empcpd list.

–log_transform_mode: can be log10 or log2
–targets: will specify what compounds to annotate, must be a JMS-compliant JSON file
–annot_mz_tolerance: this is the ppm cutoff for the search
–annot_rt_tolerance: this is the rtime cutoff, in sec, for the search

This requires passing -i with the experiment’s path.
This requires passing -tm with the table’s moniker to annotate or
This requires passing -em with the empCpd’s moniker to annotate
This requires passing -nm with the new moniker for the table or empcpd list.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static log_transform(params)[source]

Log transform a given table, by default, log2

–log_transform_mode can be log10 or log2

This requires passing -i with the experiment’s path.
This requires passing -tm with the table’s moniker to transform
This requires passing -nm with the new feature table’s moniker

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static map_ms2(params)[source]

This maps MS2 spectra to the empCompounds based on rt and mz similarity.

Once mapped, they can be annotated using MS2 similarity via l2_annotate and l1a_annotate.

–annot_mz_tolerance this is the ppm cutoff for the precursor: ion search, default is 5 ppm
–annot_rt_tolerance this is the rtime cutoff, in sec, for: the precursor ion search, default is 30 sec

This will scan for all MS2 spectra in the experiment. Additional MS2, from AcquireX for example, can be added by specifying the path to them using –ms2_dir.

This requires passing -i with the experiment’s path.
This requires passing -em with the empCpd’s moniker to annotate
This requires passing -nm with the new moniker for the empcpd list.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static normalize(params)[source]

Normalize a feature table based on the TIC of the features present in over a certain percentile of samples.

–TIC_normalization_percentile defines this cutoff
–by_batch designates the field to group into batches, if
provided, normalization will be done within batches first
–normalize_value can be ‘mean’ or ‘median’, this will be the
value to which the TICs will be normalized
This requires passing -i with the experiment’s path.
This requires passing -tm with the feature table’s moniker.
This requires passing -nm with the new feature table’s moniker

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static preprocess(params)[source]

Using the mappings in the preprocessing config, this will alter a provided sequence file and add the extra fields.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static process_params()[source]

This process parses the command line arguments and returns the parameters in a dictionary. Default parameters are specified in the example_parameters.py file and some are read dynamically from .json files as specified in that file. Note that any parameters given as .json files will be assumed to be a file path to a json file and read as such. This allows complex datastructures to be specified for some parameters.

Returns:: parameters dictionary

static report(params)[source]

This will generate a pdf report using a JSON template

–report_config will override the default template
This requires passing -i with the experiment’s path.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

static summarize(params)[source]

Print the list of empirical compounds and feature tables registered wiht the experiment object.

This requires passing -i with the experiment’s path.

Parameters:: params – This is the master configuration file generated by parsing the command line arguments plus the defaults.

pcpfm.main.main()[source]: This is the main function for the pipeline