Generating Reference Bitstring using HIPPOS-genref

Generating Reference Bitstring with Backbone (Default Setting)

Reference bitstring is an essential requirement for similarity coefficient (eg. Tanimoto or McConnaughey coefficient) calculation, which is the common method for comparing the interaction fingerprinting of a test compound and a reference (native ligand) interactions on certain protein.

To generate Reference bitstring, first of all, you need to open the command prompt and enter the examples\01-na_reference folder. This example uses the Neuraminidase enzyme for two reasons, first, it is one of the enzymes used in DUD-E (Directory of Useful Decoy Enhanced) therefore you could use it to measure the effect of interaction fingerprinting on the enrichment factor. Second, it can demonstrate all of the seven interaction types in interaction fingerprinting.

As you can see there are three folders and two txt configuration files. Each folder represents a crystal structure of Neuraminidase, and contain the original PDB file and the split component (protein, ligand, and water) generated with SPORES. PDB file alone can not be used as the reference, it has to be in mol2 or pdbqt to ensure that the atom typing, charge assignment, and protonation is identical to the docking environment (whether for PLANTS or VINA). So the PDB files here only act as the source if you want to use PDBQT files as the reference instead. 1b9s, 1b9t, and 1b9v are the PDB ID of the same Neuraminidase, where each of them bound to different ligand (FDI, RAI, and RA2 respectively). Therefore the protein name and ligand name should use protein.mol2 and the corresponding ligand file name as you can see in genref-config.txt.

# first residue is 77
residue_name    ARG116 GLU117 LEU132 LYS148 ASP149 ARG150 ARG154 TRP177 SER178 ILE221 ARG223 THR224 GLU226 ALA245 HIS273 GLU275 GLU276 ARG292 ASP294 GLY347 ARG374 TRP408 TYR409
residue_number  40 41 56 72 73 74 78 101 102 145 147 148 150 169 197 199 200 216 218 271 298 332 333

proteins        1b9s/protein.mol2 1b9t/protein.mol2 1b9v/protein.mol2
ligands         1b9s/ligand_FDI468_0.mol2 1b9t/ligand_RAI468_0.mol2 1b9v/ligand_RA2468_0.mol2

outfile         ref-results.txt

The first line is merely the commented line, everything started with # sign will be ignored by hippos and hippos-genref. If you open the mol2 protein file with a text editor you will see that the first residue is Glutamate with residue number 77. However, when the file is read by Openbabel it will count as residue number 1.

First residue of Neuraminidase

This is where things started to get tricky, because we have to supply both residue_name and residue_number properly, or else it will not work as to how we want it to be. residue_name can be acquired easily by converting your protein into mol2 format, while the corresponding residue_number must be retrieved from the column before residue name (eg. residue name ARG116 and GLU117 correspond to residue number 40 and 41 respectively):

Residue ARG116 and GLU117

The residue_name and residue_number in genref-config above are retrieved by visualizing any residue within 5 angstroms from the native ligand using VMD (you can use any other molecule visualization tool), regardless of how important the residue in enzyme inhibition.

The next lines are proteins and ligands, notice that there are 3 protein molecules and 3 ligand molecules which means that there are 3 protein-ligand pairs as references. Where the first protein will be matched with first ligand and so on. However in most cases, one protein-ligand pair is enough, this example uses 3 protein-ligand pairs as a demonstration of multiple references.

The last line is the output file name, it is optional so when not defined the output file will be genref-results.txt.

After we understand the input file and the configuration file, hippos-genref could be run with the following command:

hippos-genref genref-config.txt

After hippos-genref finished file ref-results.txt will be generated.

ref-results.txt from hippos-genref

Inside ref-results.txt we can see that there are 3 results from 3 protein-ligand pairs. Each result consisted of protein-ligand pair name and interaction bitstring where each residue represented by 7 bit of interactions from both the side-chain and the backbone.

Generating Reference Bitstring without Backbone

Sometimes we would like to omit the interaction between ligand and the backbone protein. In that case, we should change the output_mode to full_nobb by adding output_mode full_nobb to our hippos-genref config file as appear in genref-config-nobb.txt

# first residue is 77
residue_name  ARG116 GLU117 LEU132 LYS148 ASP149 ARG150 ARG154 TRP177 SER178 ILE221 ARG223 THR224 GLU226 ALA245 HIS273 GLU275 GLU276 ARG292 ASP294 GLY347 ARG374 TRP408 TYR409
residue_number  40 41 56 72 73 74 78 101 102 145 147 148 150 169 197 199 200 216 218 271 298 332 333

proteins    1b9s/protein.mol2 1b9t/protein.mol2 1b9v/protein.mol2
ligands     1b9s/ligand_FDI468_0.mol2 1b9t/ligand_RAI468_0.mol2 1b9v/ligand_RA2468_0.mol2

output_mode full_nobb

outfile     ref-results-nobb.txt

Now run hippos-genref again with the following command:

hippos-genref genref-config-nobb.txt

After hippos-genref finished file ref-results-nobb.txt will be generated.

ref-results-nobb.txt from hippos-genref

Just like in the default setting, it will generate 3 results. And although they appear the same as before, this time the bitstrings are generated without taking backbone atoms into account.

Generating Simplified Reference Bitstring

It is also possible to calculate simplified interaction between ligand and the backbone protein. In that case, we should change the output_mode to simplified by adding output_mode simplified to our hippos-genref config file as appear in genref-config-simplified.txt

# first residue is 77
residue_name  ARG116 GLU117 LEU132 LYS148 ASP149 ARG150 ARG154 TRP177 SER178 ILE221 ARG223 THR224 GLU226 ALA245 HIS273 GLU275 GLU276 ARG292 ASP294 GLY347 ARG374 TRP408 TYR409
residue_number  40 41 56 72 73 74 78 101 102 145 147 148 150 169 197 199 200 216 218 271 298 332 333

proteins    1b9s/protein.mol2 1b9t/protein.mol2 1b9v/protein.mol2
ligands     1b9s/ligand_FDI468_0.mol2 1b9t/ligand_RAI468_0.mol2 1b9v/ligand_RA2468_0.mol2

output_mode simplified

outfile     ref-results-simplified.txt

Now run hippos-genref again with the following command:

hippos-genref genref-config-simplified.txt

After hippos-genref finished file ref-results-simplified.txt will be generated.

ref-results-simplified.txt from hippos-genref

Just like in the default setting, it will generate 3 results. And although they appear the same as before, this time the bitstrings are simplified.