ProLIF (Protein-Ligand Interaction Fingerprints) Analysis

run_prolif

StreaMD bundles ProLIF-based tools to extract protein-ligand interaction fingerprints (PLIFs) from trajectories. Trajectories and topology files are generated by Running Molecular Dynamics.

This functionality is based on ProLIF tool.

Usage

run_prolif -h

usage: run_prolif [-h] [-i DIRNAME [DIRNAME ...]] [--xtc FILENAME] [--tpr FILENAME] [-l STRING] [-s INTEGER] [--protein_selection STRING] [-a STRING] [-d WDIR] [-v]
                  [--hostfile FILENAME] [-c INTEGER] [--n_jobs INTEGER] [--width FILENAME] [--height FILENAME] [--occupancy float] [--not_save_pics] [-o string]

Get protein-ligand interactions from MD trajectories using ProLIF module.

options:
  -h, --help            show this help message and exit
  -i DIRNAME [DIRNAME ...], --wdir_to_run DIRNAME [DIRNAME ...]
                        single or multiple directories for simulations.
                                                     Should consist of: md_out.tpr and md_fit.xtc files (default: None)
  --xtc FILENAME        input trajectory file (XTC). Will be ignored if --wdir_to_run is used (default: None)
  --tpr FILENAME        input topology file (TPR). Will be ignored if --wdir_to_run is used (default: None)
  -l STRING, --ligand STRING
                        residue name of a ligand in the input trajectory. (default: UNL)
  -s INTEGER, --step INTEGER
                        step to take every n-th frame. ps (default: 1)
  --protein_selection STRING
                        The protein selection atoms. Example: "protein" or "protein and byres around 20.0 resname UNL" (default: protein)
  -a STRING, --append_protein_selection STRING
                        the string which will be concatenated to the protein selection atoms. Example: "resname ZN or resname MG". (default: None)
  -d WDIR, --wdir WDIR  Working directory for program output. If not set the current directory will be used. (default: None)
  -v, --verbose         print progress. (default: False)
  --hostfile FILENAME   text file with addresses of nodes of dask SSH cluster. The most typical, it can be passed as $PBS_NODEFILE variable from inside a PBS script. The first line in this file will be the address of the scheduler running on the standard port 8786. If omitted, calculations will run on a single machine as usual. (default: None)
  -c INTEGER, --ncpu INTEGER
                        number of CPU per server. Use all available cpus by default.
  --n_jobs INTEGER      Number of processes to run per each trajectory. Provided CPUs (--ncpu arg) will be distributed between number of trajectories and number of processes per each trajectory (--n_jobs arg). (default: 1)
  --width FILENAME      width of the output pictures (default: 15)
  --height FILENAME     height of the output pictures (default: 10)
  --occupancy float     occupancy of the unique contacts to show. Applied for plifs_occupancyX.html (for each complex) and prolif_output_occupancyX.png (all systems aggregated plot) (default: 0.6)
  --not_save_pics       not create html and png files (by frames) for each unique trajectory. Only overall prolif png file will be created. (default: False)
  -o string, --out_suffix string
                        Unique suffix for output files. By default, start-time_unique-id.Unique suffix is used to separate outputs from different runs.

Examples

Protein-ligand system

run_prolif --wdir_to_run md_files/md_run/protein_H_HIS_ligand_1 md_files/md_run/protein_H_HIS_ligand_2 -c 128 -v -s 5

Protein-ligand-cofactor system

Residue names for ligands/cofactors can be found in md_files/md_run/protein-ligand/all_ligand_resid.txt.

# Include a cofactor in the protein selection
run_prolif --wdir_to_run md_files/md_run/protein_H_HIS_ligand_* --append_protein_selection MG GTP

# Treat a cofactor as the ligand of interest
run_prolif --wdir_to_run md_files/md_run/protein_H_HIS_ligand_* --ligand GTP

Effective Parallel Calculations

To control parallelism:

  1. --ncpu: maximum number of cores available (defaults to all CPUs).

  2. --n_jobs: number of processes per trajectory. StreaMD distributes --ncpu across trajectories and --n_jobs. By default, --n_jobs is capped (12) to avoid the ProLIF bottleneck; override explicitly if needed.

Outputs

  1. Per trajectory: plifs.csv, plifs.png, plifs_map.png, plifs.html (HTML/PNG creation can be disabled with --not_save_pics)

  2. Aggregated: prolif_output_<unique-suffix>.{csv,png} summarizing all analyzed simulations (unique suffix separates runs)

See Outputs and Files for where these files are written alongside other analysis results.

Supplementary Scripts

run_prolif applies these automatically, but they can be used directly:

prolif_drawmap

prolif_drawmap -h
usage: prolif_drawmap [-h] -i FILENAME [FILENAME ...] [-o FILENAME] [--width FILENAME] [--height FILENAME] [--base_size FILENAME]

Draw prolif plot for analysis binding mode of multiple ligands

options:
  -h, --help            show this help message and exit
  -i FILENAME [FILENAME ...], --input FILENAME [FILENAME ...]
                        input file with prolif output for the set of molecules. Supported formats: *.csv
                        Ex: prolif_output.csv
  --occupancy float
                        minimum occupancy of the unique contacts to show
  --width int      width of the output picture
  --height int     height of the output picture
  --base_size int  base size of the output picture

prolif_draw_by_frame

prolif_draw_by_frame -h
usage: prolif_draw_by_frame [-h] -i [FILENAME ...] [-o FILENAME] [--filt_only_H] [--width FILENAME] [--height FILENAME] [--base_size FILENAME]

options:
  -h, --help            show this help message and exit
  -i [FILENAME ...], --input [FILENAME ...]
                        input file with prolif output for the unique molecule. Supported formats: *.csv
                        Ex: plifs.csv
  --occupancy float
                        minimum occupancy of the unique contacts to show. Show all contacts by default.
  --filt_only_H         filt residues where only hydrophobic contacts occur
  --width int      width of the output picture
  --height int     height of the output picture
  --base_size int  base size of the output picture

Use -h with any command to view the full option set.