# ProLIF (Protein-Ligand Interaction Fingerprints) Analysis `run_prolif` StreaMD bundles ProLIF-based tools to extract protein-ligand interaction fingerprints (PLIFs) from trajectories. Trajectories and topology files are generated by {doc}`running_md`. This functionality is based on [ProLIF tool](https://github.com/chemosim-lab/ProLIF). ## Usage `run_prolif -h` ``` usage: run_prolif [-h] [-i DIRNAME [DIRNAME ...]] [--xtc FILENAME] [--tpr FILENAME] [-l STRING] [-s INTEGER] [--protein_selection STRING] [-a STRING] [-d WDIR] [-v] [--hostfile FILENAME] [-c INTEGER] [--n_jobs INTEGER] [--width FILENAME] [--height FILENAME] [--occupancy float] [--not_save_pics] [-o string] Get protein-ligand interactions from MD trajectories using ProLIF module. options: -h, --help show this help message and exit -i DIRNAME [DIRNAME ...], --wdir_to_run DIRNAME [DIRNAME ...] single or multiple directories for simulations. Should consist of: md_out.tpr and md_fit.xtc files (default: None) --xtc FILENAME input trajectory file (XTC). Will be ignored if --wdir_to_run is used (default: None) --tpr FILENAME input topology file (TPR). Will be ignored if --wdir_to_run is used (default: None) -l STRING, --ligand STRING residue name of a ligand in the input trajectory. (default: UNL) -s INTEGER, --step INTEGER step to take every n-th frame. ps (default: 1) --protein_selection STRING The protein selection atoms. Example: "protein" or "protein and byres around 20.0 resname UNL" (default: protein) -a STRING, --append_protein_selection STRING the string which will be concatenated to the protein selection atoms. Example: "resname ZN or resname MG". (default: None) -d WDIR, --wdir WDIR Working directory for program output. If not set the current directory will be used. (default: None) -v, --verbose print progress. (default: False) --hostfile FILENAME text file with addresses of nodes of dask SSH cluster. The most typical, it can be passed as $PBS_NODEFILE variable from inside a PBS script. The first line in this file will be the address of the scheduler running on the standard port 8786. If omitted, calculations will run on a single machine as usual. (default: None) -c INTEGER, --ncpu INTEGER number of CPU per server. Use all available cpus by default. --n_jobs INTEGER Number of processes to run per each trajectory. Provided CPUs (--ncpu arg) will be distributed between number of trajectories and number of processes per each trajectory (--n_jobs arg). (default: 1) --width FILENAME width of the output pictures (default: 15) --height FILENAME height of the output pictures (default: 10) --occupancy float occupancy of the unique contacts to show. Applied for plifs_occupancyX.html (for each complex) and prolif_output_occupancyX.png (all systems aggregated plot) (default: 0.6) --not_save_pics not create html and png files (by frames) for each unique trajectory. Only overall prolif png file will be created. (default: False) -o string, --out_suffix string Unique suffix for output files. By default, start-time_unique-id.Unique suffix is used to separate outputs from different runs. ``` ## Examples ### Protein-ligand system ```bash run_prolif --wdir_to_run md_files/md_run/protein_H_HIS_ligand_1 md_files/md_run/protein_H_HIS_ligand_2 -c 128 -v -s 5 ``` ### Protein-ligand-cofactor system Residue names for ligands/cofactors can be found in `md_files/md_run/protein-ligand/all_ligand_resid.txt`. ```bash # Include a cofactor in the protein selection run_prolif --wdir_to_run md_files/md_run/protein_H_HIS_ligand_* --append_protein_selection MG GTP # Treat a cofactor as the ligand of interest run_prolif --wdir_to_run md_files/md_run/protein_H_HIS_ligand_* --ligand GTP ``` ## Effective Parallel Calculations To control parallelism: 1) `--ncpu`: maximum number of cores available (defaults to all CPUs). 2) `--n_jobs`: number of processes per trajectory. StreaMD distributes `--ncpu` across trajectories and `--n_jobs`. By default, `--n_jobs` is capped (12) to avoid the ProLIF bottleneck; override explicitly if needed. ## Outputs 1) Per trajectory: `plifs.csv`, `plifs.png`, `plifs_map.png`, `plifs.html` (HTML/PNG creation can be disabled with `--not_save_pics`) 2) Aggregated: `prolif_output_.{csv,png}` summarizing all analyzed simulations (unique suffix separates runs) See {doc}`outputs` for where these files are written alongside other analysis results. ## Supplementary Scripts `run_prolif` applies these automatically, but they can be used directly: **prolif_drawmap** ``` prolif_drawmap -h usage: prolif_drawmap [-h] -i FILENAME [FILENAME ...] [-o FILENAME] [--width FILENAME] [--height FILENAME] [--base_size FILENAME] Draw prolif plot for analysis binding mode of multiple ligands options: -h, --help show this help message and exit -i FILENAME [FILENAME ...], --input FILENAME [FILENAME ...] input file with prolif output for the set of molecules. Supported formats: *.csv Ex: prolif_output.csv --occupancy float minimum occupancy of the unique contacts to show --width int width of the output picture --height int height of the output picture --base_size int base size of the output picture ``` **prolif_draw_by_frame** ``` prolif_draw_by_frame -h usage: prolif_draw_by_frame [-h] -i [FILENAME ...] [-o FILENAME] [--filt_only_H] [--width FILENAME] [--height FILENAME] [--base_size FILENAME] options: -h, --help show this help message and exit -i [FILENAME ...], --input [FILENAME ...] input file with prolif output for the unique molecule. Supported formats: *.csv Ex: plifs.csv --occupancy float minimum occupancy of the unique contacts to show. Show all contacts by default. --filt_only_H filt residues where only hydrophobic contacts occur --width int width of the output picture --height int height of the output picture --base_size int base size of the output picture ``` Use `-h` with any command to view the full option set.