Direct IFP Mode¶
Although at first PyPLIF-HIPPOS is originally intended for analyzing the protein-ligand interaction from docking tools, sometimes we would like to analyze protein-ligand interaction from various computational chemistry software. For example sometimes we just want to analyze the protein-ligand interaction from MD trajectories, or maybe when we would like to analyze the output of a certain docking tool. Direct IFP mode is a feature that allow these kind of analysis.
Note that at the moment, Direct IFP mode will only accept mol2
, pdb
, and pdbqt
file format.
Analyzing Protein-Ligand Interaction from one protein and multiple ligand pose¶
In this example we would use few ways in which we could analyze the interaction between several ligands against a single protein. This example is suitable for analyzing the docking results of a docking tool that is not supported by PyPLIF-HIPPOS.
For the protein input we simply use the protein keyword followed by the protein file name. As for the ligand, there are basically two ways to read the ligand file names. The first one is by directly mentioning the ligand file names, the second way is by mentioning file names where each file is a text file that contain list of ligand file names.
First lets take a look on how we can do this via mentioning the ligand file names.
This can be done by setting three options in the configuration file: direct_ifp
,
protein
, and ligand_files
. The example files for this can be found in
examples-input/09-direct_ifp/example_0/pdb
directory.
direct_ifp true
protein 3chr_protein.pdb
ligand_files 3chp_4BO.pdb 3chq_4BQ.pdb 3chr_4BS.pdb 3chs_4BU.pdb
similarity_coef tanimoto mcconnaughey
full_ref 0000000000000000000001000000000010000000000000000000000010100000000000000000000000001000000000000000000000010000000000000000000000000
residue_name GLN134 GLN136 ALA137 TYR267 GLY269 MET270 GLU271 TRP311 PHE314 GLU318 VAL367 LEU369 PRO374 ASP375 ALA377 TYR378 SER379 PRO382 TYR383
residue_number 134 136 137 267 269 270 271 311 314 318 367 369 374 375 377 378 379 382 383
full_outfile example_0_outfile.csv
sim_outfile example_0_similarity.csv
The other way to read multiple ligand files is using expression pattern like so:
direct_ifp true
protein protein.pdb
ligand_files 3ch*.pdb
similarity_coef tanimoto mcconnaughey
full_ref 0000000000000000000001000000000010000000000000000000000010100000000000000000000000001000000000000000000000010000000000000000000000000
residue_name GLN134 GLN136 ALA137 TYR267 GLY269 MET270 GLU271 TRP311 PHE314 GLU318 VAL367 LEU369 PRO374 ASP375 ALA377 TYR378 SER379 PRO382 TYR383
residue_number 134 136 137 267 269 270 271 311 314 318 367 369 374 375 377 378 379 382 383
full_outfile example_0_outfile.csv
sim_outfile example_0_similarity.csv
The above example can be found in examples-input/09-direct_ifp/example_1/pdb
directory. In above example,
the result will be the same as example_0
which will generate two output file, one for the interaction bitstring
example_0_outfile.csv
, and the other one for similarity score example_0_similarity.csv
. Here is the example
output for example_0_outfile.csv
3chp_4BO.pdb 0000000000000000000001000000000000000000000000000100000010100000000000000000010000001000000000000000000001010000000000000000000001000
3chq_4BQ.pdb 0000000000000000000001000100000000000000000000001000000000100000000001000000010000001000000000000000000001010000000000000000000001000
3chr_4BS.pdb 0000000000000000000001000000000000010000000000101100000010100000000001000000010000001000000000000000000000010000000000000000000000000
3chs_4BU.pdb 0000000000010000000001000000000000000000000000101000000010100000000001000000010000001000000000000010000001010000000000000000000001000
And here is the example output for example_0_similarity.csv
3chp_4BO.pdb 0.500 0.389
3chq_4BQ.pdb 0.333 0.067
3chr_4BS.pdb 0.417 0.288
3chs_4BU.pdb 0.357 0.218
The second one is by providing the file containing a list of ligand file names. To do this we can use the
ligand_list
or multiple_ligand_list
option instead of ligand_files
. ligand_list
will accept a single
text file containing the ligand file names like so
direct_ifp true
protein protein.pdb
ligand_list ligand_input.txt
similarity_coef tanimoto mcconnaughey
full_ref 0000000000000000000001000000000010000000000000000000000010100000000000000000000000001000000000000000000000010000000000000000000000000
residue_name GLN134 GLN136 ALA137 TYR267 GLY269 MET270 GLU271 TRP311 PHE314 GLU318 VAL367 LEU369 PRO374 ASP375 ALA377 TYR378 SER379 PRO382 TYR383
residue_number 134 136 137 267 269 270 271 311 314 318 367 369 374 375 377 378 379 382 383
full_outfile example_0_outfile.csv
sim_outfile example_0_similarity.csv
Where the ligand_input.txt
content is
3chp_4BO.pdb
3chq_4BQ.pdb
3chr_4BS.pdb
3chs_4BU.pdb
While multiple_ligand_list
can accept multiple file that contain ligand file names like so
direct_ifp true
protein protein.pdb
multiple_ligand_list ligand_input1.txt ligand_input2.txt
similarity_coef tanimoto mcconnaughey
full_ref 0000000000000000000001000000000010000000000000000000000010100000000000000000000000001000000000000000000000010000000000000000000000000
residue_name GLN134 GLN136 ALA137 TYR267 GLY269 MET270 GLU271 TRP311 PHE314 GLU318 VAL367 LEU369 PRO374 ASP375 ALA377 TYR378 SER379 PRO382 TYR383
residue_number 134 136 137 267 269 270 271 311 314 318 367 369 374 375 377 378 379 382 383
full_outfile example_0_outfile.csv
sim_outfile example_0_similarity.csv
The example files for ligand_list
and multiple_ligand_list
can be found in examples-input/09-direct_ifp/example_2/pdb
and examples-input/09-direct_ifp/example_3/pdb
. And the output of these options will be similar to previous
examples.
Analyzing Protein-Ligand Interaction from multiple protein-ligand complex¶
In this example we will use three different protein-ligand complex, which could represent a simple MD trajectory. Therefore this kind of method is suitable for MD trajectory analysis.
In order to analyze several pair of protein-ligand we can use direct_ifp
and complex_list
option.
Notice that in this example we will not using protein
option since the protein already included in the
complex_list file.
First lets take a look at the complex_list.txt
which contain the protein-ligand pair that will be analyzed
3cho_protein.pdb 3cho_4BG.pdb
3chr_protein.pdb 3chr_4BS.pdb
3chs_protein.pdb 3chs_4BU.pdb
This example files can be found in examples-input/09-direct_ifp/example_4/pdb
. Next we can use the following
configuration file to analyze the above protein-ligand pairs
direct_ifp true
complex_list complex_list.txt
similarity_coef tanimoto mcconnaughey
full_ref 0000000000000000000001000000000010000000000000000000000010100000000000000000000000001000000000000000000000010000000000000000000000000
residue_name GLN134 GLN136 ALA137 TYR267 GLY269 MET270 GLU271 TRP311 PHE314 GLU318 VAL367 LEU369 PRO374 ASP375 ALA377 TYR378 SER379 PRO382 TYR383
residue_number 134 136 137 267 269 270 271 311 314 318 367 369 374 375 377 378 379 382 383
full_outfile example_0_outfile.csv
sim_outfile example_0_similarity.csv
Running the above example will give us the following output for example_0_outfile.csv
3cho_protein.pdb_3cho_4BG.pdb
0000000000000000000001000000000010000000000000000000000010100000000000000000000000001000000000000000000000010000000000000000000000000
3chr_protein.pdb_3chr_4BS.pdb
0000000000000000000001000000000000010000000000101100000010100000000001000000010000001000000000000000000000010000000000000000000000000
3chs_protein.pdb_3chs_4BU.pdb
0000000000010000000001000000000000010000000000101100000010100000000001000000010000001000000000000000000001010000000000000000000001000
and example_0_similarity.csv
3cho_protein.pdb_3cho_4BG.pdb
1.000 1.000
3chr_protein.pdb_3chr_4BS.pdb
0.417 0.288
3chs_protein.pdb_3chs_4BU.pdb
0.333 0.190