ProLIF (Protein-Ligand Interaction Fingerprints) Analysis
run_prolif
StreaMD bundles ProLIF-based tools to extract protein-ligand interaction fingerprints (PLIFs) from trajectories. Trajectories and topology files are generated by Running Molecular Dynamics.
This functionality is based on ProLIF tool.
Usage
run_prolif -h
usage: run_prolif [-h] [-i DIRNAME [DIRNAME ...]] [--xtc FILENAME] [--tpr FILENAME] [-l STRING] [-s INTEGER] [--protein_selection STRING] [-a STRING] [-d WDIR] [-v]
[--hostfile FILENAME] [-c INTEGER] [--n_jobs INTEGER] [--width FILENAME] [--height FILENAME] [--occupancy float] [--not_save_pics] [-o string]
Get protein-ligand interactions from MD trajectories using ProLIF module.
options:
-h, --help show this help message and exit
-i DIRNAME [DIRNAME ...], --wdir_to_run DIRNAME [DIRNAME ...]
single or multiple directories for simulations.
Should consist of: md_out.tpr and md_fit.xtc files (default: None)
--xtc FILENAME input trajectory file (XTC). Will be ignored if --wdir_to_run is used (default: None)
--tpr FILENAME input topology file (TPR). Will be ignored if --wdir_to_run is used (default: None)
-l STRING, --ligand STRING
residue name of a ligand in the input trajectory. (default: UNL)
-s INTEGER, --step INTEGER
step to take every n-th frame. ps (default: 1)
--protein_selection STRING
The protein selection atoms. Example: "protein" or "protein and byres around 20.0 resname UNL" (default: protein)
-a STRING, --append_protein_selection STRING
the string which will be concatenated to the protein selection atoms. Example: "resname ZN or resname MG". (default: None)
-d WDIR, --wdir WDIR Working directory for program output. If not set the current directory will be used. (default: None)
-v, --verbose print progress. (default: False)
--hostfile FILENAME text file with addresses of nodes of dask SSH cluster. The most typical, it can be passed as $PBS_NODEFILE variable from inside a PBS script. The first line in this file will be the address of the scheduler running on the standard port 8786. If omitted, calculations will run on a single machine as usual. (default: None)
-c INTEGER, --ncpu INTEGER
number of CPU per server. Use all available cpus by default.
--n_jobs INTEGER Number of processes to run per each trajectory. Provided CPUs (--ncpu arg) will be distributed between number of trajectories and number of processes per each trajectory (--n_jobs arg). (default: 1)
--width FILENAME width of the output pictures (default: 15)
--height FILENAME height of the output pictures (default: 10)
--occupancy float occupancy of the unique contacts to show. Applied for plifs_occupancyX.html (for each complex) and prolif_output_occupancyX.png (all systems aggregated plot) (default: 0.6)
--not_save_pics not create html and png files (by frames) for each unique trajectory. Only overall prolif png file will be created. (default: False)
-o string, --out_suffix string
Unique suffix for output files. By default, start-time_unique-id.Unique suffix is used to separate outputs from different runs.
Examples
Protein-ligand system
run_prolif --wdir_to_run md_files/md_run/protein_H_HIS_ligand_1 md_files/md_run/protein_H_HIS_ligand_2 -c 128 -v -s 5
Protein-ligand-cofactor system
Residue names for ligands/cofactors can be found in md_files/md_run/protein-ligand/all_ligand_resid.txt.
# Include a cofactor in the protein selection
run_prolif --wdir_to_run md_files/md_run/protein_H_HIS_ligand_* --append_protein_selection MG GTP
# Treat a cofactor as the ligand of interest
run_prolif --wdir_to_run md_files/md_run/protein_H_HIS_ligand_* --ligand GTP
Effective Parallel Calculations
To control parallelism:
--ncpu: maximum number of cores available (defaults to all CPUs).--n_jobs: number of processes per trajectory. StreaMD distributes--ncpuacross trajectories and--n_jobs. By default,--n_jobsis capped (12) to avoid the ProLIF bottleneck; override explicitly if needed.
Outputs
Per trajectory:
plifs.csv,plifs.png,plifs_map.png,plifs.html(HTML/PNG creation can be disabled with--not_save_pics)Aggregated:
prolif_output_<unique-suffix>.{csv,png}summarizing all analyzed simulations (unique suffix separates runs)
See Outputs and Files for where these files are written alongside other analysis results.
Supplementary Scripts
run_prolif applies these automatically, but they can be used directly:
prolif_drawmap
prolif_drawmap -h
usage: prolif_drawmap [-h] -i FILENAME [FILENAME ...] [-o FILENAME] [--width FILENAME] [--height FILENAME] [--base_size FILENAME]
Draw prolif plot for analysis binding mode of multiple ligands
options:
-h, --help show this help message and exit
-i FILENAME [FILENAME ...], --input FILENAME [FILENAME ...]
input file with prolif output for the set of molecules. Supported formats: *.csv
Ex: prolif_output.csv
--occupancy float
minimum occupancy of the unique contacts to show
--width int width of the output picture
--height int height of the output picture
--base_size int base size of the output picture
prolif_draw_by_frame
prolif_draw_by_frame -h
usage: prolif_draw_by_frame [-h] -i [FILENAME ...] [-o FILENAME] [--filt_only_H] [--width FILENAME] [--height FILENAME] [--base_size FILENAME]
options:
-h, --help show this help message and exit
-i [FILENAME ...], --input [FILENAME ...]
input file with prolif output for the unique molecule. Supported formats: *.csv
Ex: plifs.csv
--occupancy float
minimum occupancy of the unique contacts to show. Show all contacts by default.
--filt_only_H filt residues where only hydrophobic contacts occur
--width int width of the output picture
--height int height of the output picture
--base_size int base size of the output picture
Use -h with any command to view the full option set.