3.4. Experiment

Experiment.py contains the pipeline’s implementation of the Experiment object which in turn inherits from the Experiment object in the metdatamodel.

The Experiment is largely a computational aid and orchestrates an analysis. This includes keeping track of intermediate results and acquisitions.

class pcpfm.Experiment.Experiment(experiment_name, experiment_directory, acquisitions=None, qcqa_results=None, feature_tables=None, empCpds=None, log_transformed_feature_tables=None, ionization_mode=None, cosmetics=None, used_cosmetics=None, MS2_methods=None, MS1_only_methods=None, command_history=None, study=None, sequence=None, final_empCpds=None, species=None, tissue=None, provenance=None, converted_subdirectory=None, raw_subdirectory=None, acquisition_data=None, annotation_subdirectory=None, filtered_feature_tables_subdirectory=None, ms2_directory=None, qaqc_figs=None, asari_subdirectory=None, output_subdirectory=None)[source]

Bases: Experiment

The experiment object represents a set of acquisitions.

This super vague constructor was useful during testing, now will explicitly define all the fields.

add_acquisition(acquisition, mode='link')[source]

This method adds an acquisition to the list of acquisitions in the experiment, ensures there are no duplicates and then links or copies the acquisition, currently only as a .raw file, to the experiment directory

Args:

acquisition (object): an Acquistiion object mode (str): how to move acquisitions into the experiment, default “link”, can be “copy” method_field (str): this is the field to check for the method name, used to shortcircuit

MS2 determination

asari(asari_cmd, force=False)[source]

This command will run asari on the mzml acquisitions in an experiment. The details of the command to be ran is defined by asari_cmd.

Args:

asari_cmd (str or list): can be string or space delimited list, must

contain the fields $IONIZATION_MODE, $CONVERTED_SUBDIR, and $ASARI_SUBDIR which will be populated by this function.

$CONVERTED_SUBDIR is where input mzml is located $ASARI_SUBDIR is where the results will be output $IONIZATION_MODE is the ionization mode of the experiment

force (bool): if true, rerun asari if previously ran

batches(batch_field)[source]

This will group samples into ‘batches’, based on the user provided ‘batch_field’.

Parameters:

batch_field – field by which to batch samples

Returns:

dictionary of batches to lists of samples

static construct_experiment_from_CSV(experiment_directory, csv_filepath, sample_filter=None, name_field='File Name', path_field='Filepath', sample_skip_list_fp=None)[source]

For a given sequence file, create the experiment object, and add all acquisitions

Parameters:
  • experiment_directory (str) – path to store experiment and intermediates

  • csv_filepath (str) – filepath to sequence CSV

  • ionization (str, optional) – default None, can be ‘pos’ or ‘neg’. The ionization mode of the experiment. If None, it will be determined automatically

  • filter (dict, optional) – a filter dictionary, only matching entries are included

  • name_field (str, optional) – the column from which to extract the acquisition name

  • path_field (str, optional) – the column from which to extract the acquisition filepath

  • sample_skip_list (str, optional) – path to a txt file with sample names to exclude

Returns:

experiment object

convert_raw_to_mzML(conversion_command, num_cores=4)[source]

Convert all raw files to mzML

Args

conversion_command (str or list): This specifies the command to call

to perform the conversion. Can be list or space-delimited string. Must contain $RAW_PATH and $OUT_PATH where the input and output file names will go.

static create_experiment(experiment_name, experiment_directory, sequence=None)[source]

This is the main constructor for an experiment object.

This requires a name for the experiment, the directory to which to write intermediates and optionally the ionization mode.

Parameters:
  • experiment_name (str) – a moniker for the experiment

  • experiment_directory (str) – if true, return the object else its path. Defaults to False.

  • ionization_mode (str) – the ionization mode of the acquisitions can be ‘pos’, ‘neg’, or None for auto-detection.

Returns:

experiment object

create_sample_annotation_table()[source]

Create the sample annotation table which maps samples to their metadata.

Returns:

the table as a dataframe

Return type:

dataframe

delete_empCpds(moniker)[source]

This method will safely delete an empcpd and unregister it with the experiment.

Parameters:

moniker (str) – the empcpd moniker to delete

delete_feature_table(moniker)[source]

This method will safely delete a feature table and unregister it with the experiment.

Parameters:

moniker (str) – the table moniker to delete

filter_samples(sample_filter, return_field=None)[source]

Find the set of acquisitions that pass the provided filter and return either the acquisition object or the specified field of each passing sample

Args:

filter (dict): a filter dictionary return_field (str, optional): if provided and valid

the field specified is return per matching acquisition.

Returns:

list of matching acquisitions or the return_field value of the acquisitions

generate_cosmetic_map(field=None, provided_cos_type='color', seed=None)[source]

This generates the mapping of acquisition properties to colors, markers, text, etc. used in figure generation. This allows for consistency across runs as the mapping is stored in the object.

Args:

field (str, optional): field for which to generate the mapping cos_type (str, optional): ‘color’, ‘marker’, or ‘text’ seed (int, optional): used for shuffling, by setting this value, the

same exact mapping can be generated each time.

Returns:

the mapping of fields to cosmetic types.

Return type:

dict

generate_output(empCpd_moniker, table_moniker)[source]

This generates and stores the the feature table, sample annotation table, and the feature annotation table to the output directory. It also copies the JSON for the desried empcpd and experiment to the directory.

Parameters:
  • empCpd_moniker (str) – moniker of empcpd to use

  • table_moniker (str) – moniker of table to use

property ionization_mode

This returns the user-specified or determined ionization mode of the experiment’s acquisitions. Lazily evaluated.

Returns:

the ionization mode ‘pos’ or ‘neg’

static load(experiment_json_filepath)[source]

Reconstitute the experiment object from a saved JSON file representing the object

Parameters:

experiment_json_filepath (str) – path to the JSON file

Returns:

an experiment object

property ms2_acquisitions

This returns all acquisitions in the experiment that have MS2. Lazily evaluated.

Returns:

list of acquisitions with MS2

order_samples()[source]

This updates the ordered_samples param, part of metdatamodel implementation.

retrieve_empCpds(moniker, as_object=False)[source]

For a given moniker return either the empcpd object or its path.

Parameters:
  • moniker (str) – the empcpd to retrieve

  • as_object (bool, optional) – if true, return the object else its path. Defaults to False.

Returns:

the feature table or its path

Return type:

str or object

retrieve_feature_table(moniker, as_object=False)[source]

For a given moniker return either the feature table object or its path.

Parameters:
  • moniker (str) – the table to retrieve

  • as_object (bool, optional) – if true, return the object else its path. Defaults to False.

Returns:

the feature table or its path

Return type:

str or object

property sample_names

This returns the name of all acquisitions in the experiment Lazily evaluated.

Returns:

names of all samples in the experiment

save()[source]

This saves the experiment object as a JSON object inside the experiment directory.

subdirectories = {'acquisition_data': 'acquisitions/', 'annotation_subdirectory': 'annotations/', 'asari_subdirectory': 'asari/', 'converted_subdirectory': 'converted_acquisitions/', 'filtered_feature_tables_subdirectory': 'filtered_feature_tables/', 'ms2_directory': 'ms2_acquisitions/', 'output_subdirectory': 'output/', 'qaqc_figs': 'QAQC_figs/', 'raw_subdirectory': 'raw_acquisitions/'}
summarize()[source]

Print the list of empCpds and feature tables in the experiment to the console