Host-Pathogen Interactions Alignment (HPIA) algorithm
A part of the supplementary material for:
"Mining protein interaction data to characterize Burkholderia mallei pathogenicity mechanisms"

Vesna Memišević,1 Nela Zavaljevski,1 Seesandra V. Rajagopala,2 Keehwan Kwon,2
Rembert Pieper
,2 David DeShazer,3 Jaques Reifman,1* and Anders Wallqvist1

1Department of Defense Biotechnology High Performance Computing Software Applications Institute, Telemedicine and    
   Advanced
Technology Research Center, U.S. Army Medical Research and Materiel Command, Fort Detrick, MD
2J. Craig Venter Institute, Rockville, MD
3Bacteriology Division, U.S. Army Medical Research Institute of Infectious Diseases, Fort Detrick, MD
*Correspondence should be addressed to Jaques Reifman, phone: +1 301 619 7915; fax: +1 301 619 1983

Data sets Instructions Annotations Output Examples Gallery Cite

Data sets:




How to run:

 
























  • Download and unpack the HPIA program. The current version of the program is compiled on Unix64 machines. For a different version, please contact the authors.
  • Ensure that the file "HPIA_unix64" has an "execute" permission.
    If not, use "chmod u+x 
    HPIA_unix64" command to modify the file permission.
  • Ensure that the folder with the executable version of HPIA program ("./HPIA_unix64") also contains:
    • a folder "RF" (a folder where the resulting files will be saved) and
    • a file "UNIPROT_gdv.txt" (a file that contains graphlet degree vectors obtained from the human protein interaction network).
  • Run "./HPIA_unix64" without any parameters to display usage directions.
  • Usage directions:
    "./HPIA_unix64 <network1> <network2> <annotationFile> <outputFile> <-RS>(optional) <-AS>(optional)"
           where:
    • <network1> and <network2> represent host-pathogen networks in the form of tab-separated edge lists; the first column contains pathogen proteins and the second column contains host proteins;
    • <annotationFile> contains information about the location of protein annotation files, protein similarity files, and files with seed proteins. For more information, see "Annotation file" specifications below;
    • <outputFile> represents a basic file name that would be assigned to all resulting/output files. All resulting/output files will be in the "RF" folder. For more information, see "Output files" specifications below;
    • <-RS> is an optional parameter that specifies whether or not to use the relaxed seed guidance. If omitted, all seeds will be aligned to each other. If used, only seeds with a high similarity will be aligned to each other.
    • <-AS> is an optional parameter that allows the algorithm to search the data for additional seeds. We recommend using the <-RS> parameter with this parameter.
TOP

Annotation file:










































HPIA uses the annotation file to specify the source of information about pathogen and host proteins that will be used to calculate pairwise pathogen-pathogen and host-host protein similarities. This file also allows users to specify a list of seeds, i.e., pairs of proteins that should be aligned to each other. The following are the annotation file fields:
  • GO annotation for pathogen (PGO) and host (HGO) proteins:
               #GO annotation - pathogen(s)
               PGO = "pathogen1.go; pathogen2.go"
               #GO annotation - host(s)
               HGO = "host.go"

          File format: "proteinNameOrID    proteinNameOrID    GOterm1; GOterm2; GOterm3"
          For example: "Q96QU6    1A1L1_HUMAN    GO:0009058; GO:0003824; GO:0030170"
  • Sequence similarity (BLAST E-value) between two pathogen proteins (PSQS) and between two host proteins (HSQS):
               #Sequence similarity - pathogen proteins
               PSQS = "pathogen1-pathogen2.txt"
               #Sequence similarity - host proteins
               HSQS = "host1-host2.txt"

          File format: "protein1    protein2    E-value"
          For example: "Q8CWH4    Q62AP6    2.00E-04"
  • Graphlet degree vectors obtained from the human interactome (for human host proteins only, UNIPROT IDs only):
               #Topological similarity: (0) do not use; (1) use the embeded host topological data
               HTPO = "0"          
  • Graphlet degree vectors for each host-pathogen interaction network:
              #Graphlet degree vectors for host-pathogen networks
              TSIM = "gdv_network1.gdv; gdv_network2.gdv"

          File format: "protein    value1    value2   ... value73"
          For example: "Q0WIY5    3    32    3    0    124    52    211    1    6    0    0    0    0    0    0    133    170   
                                   149    378    1422    292    20    910    0    0    0    0    0    0    0    0    0    0    0    0    40   
                                    39    118    6    0    0    0    0    0    0    0    0    0    0    6    0    0    0    0    0    0    0    0  
                                    0
    0    0    0    0    0    0    0    0    0    0    0    0    0    0"
  • A user-specified list of seeds (proteins that should be aligned to each other):
               #List of seeds
               SEEDS = "alignThese.txt"

          File format: "protein1    protein2"
          For example: "BopE     SopE"

Lines that start with # represent comments and will be ignored by the program. We recommend that users comment out the fields they are not using.

All user-specified file names should be within quotation marks. If multiple files are provided for a single field (for example see PGO or  TSIM), file names should be within the same quotation marks, separated by semicolon.

For more details about annotation file and specific annotation formats, see the examples below.
TOP
 

Output files:































HPIA creates six different output files:
  • "outputName"_AL_stats.txt:
    • This file contains alignment statistics for local (first four rows) and global (last 4 rows) alignments. Specifically, it contains the number and percentage of all aligned proteins, aligned pathogen proteins, aligned host proteins, and aligned interactions. The file also contains information about the total number of proteins (pathogen, host, and both) and the total number of interactions in both sets.
  • "outputName"_Local_proteins.txt:
      • This file contains a list of proteins aligned to each other in the "local alignment" phase of the HPIA algorithm. Host proteins are marked with a flag "H" and pathogen proteins are marked with a flag "P."
  • "outputName"_Local_edges1.txt:
    • This file contains a list of interactions in the first network that are aligned to interactions in the second network during the "local alignment" phase of the HPIA algorithm.
  • "outputName"_Local_edges2.txt:
    • This file contains a list of interactions in the second network that are aligned to interactions in the first network during the "local alignment" phase of the HPIA algorithm.
  • "outputName"_Global_extendedProteins.txt:
    • This file contains a list of all aligned proteins. Host proteins are marked with a flag "H" and pathogen proteins are marked with a flag "P."
  • "outputName"_Global_extendedEdges.txt:
    • This file contains a list of all aligned interactions (in both networks). Note: The list of aligned interactions is the same as the lists in "outputName"_Local_edges1.txt  and  "outputName"_Local_edges2.txt  due to the nature of the global alignment phase of the HPIA algorithm.
TOP

Example 1:

 













Align human-B. mallei and human-Y. pestis interaction networks
  • To align human-B. mallei and human-Y. pestis interaction networks using the default similarity, run
     
     ./HPIA_unix64 bmo.txt yps.txt annotationDefault.txt bmypDefault 
  • To align human-B. mallei and human-Y. pestis interaction networks using sequence similarity, run
     
     ./HPIA_unix64 bmo.txt yps.txt annotationSequence
    BMYP.txt bmypSequence
  • To align human-B. mallei and human-Y. pestis interaction networks using various annotations (see "annotationBMSA.txt" for details) and seed nodes, run
     ./HPIA_unix64 bmo.txt yps.txt annotationBMYP.txt bmypResults -RS -AS
  • Go to "Gallery" to see the resulting alignments.
TOP

Example 2:

















Align human-B. mallei and human-S. enterica interaction networks

  • To align human-B. mallei and human-S. enterica interaction networks using the default similarity, run
     
     ./HPIA_unix64 bmo.txt sal.txt annotationDefault.txt bmseDefault
  • To align human-B. mallei and human-Y. pestis interaction networks using sequence similarity, run
     
     ./HPIA_unix64 bmo.txt sal.txt annotationSequence
    BMSE.txt bmseSequence
  • To align human-B. mallei and human-S. enterica interaction networks using various annotations (see "annotationBMSA.txt" for details) and seed nodesrun
     ./HPIA_unix64 bmo.txt sal.txt annotationBMSE.txt bmseResults -RS -AS
  • Go to "Gallery" to see the resulting alignments.
TOP

Example 3:















Align human-F. tularensis and human-Y. pestis interaction networks
  • Download and unpack HPIA program. Download "Additional examples" file and unpack it intp the folder with the HPIA program (ensure that the folder "SpeciesData" and all ".txt" files are in the same folder with the "HPIA_unix64" program).
  • Data sources:
    • Human-F. tularensis and human-Y. pestis interactions obtained from PMID: 20711500 
  • To align human-F. tularensis and human-Y. pestis interaction networks using the default similarity, run
     
     ./HPIA_unix64 fratt.txt yerpe.txt annotationDefault.txt ftypDefault
  • To align human-F. tularensis and human-Y. pestis interaction networks using topological similarity, run
     
     ./HPIA_unix64 fratt.txt yerpe.txt annotationTopologyFTYP
    .txt ftypTopology
  • To align human-F. tularensis and human-Y. pestis interaction networks using various annotations (see "annotationBMSA.txt" for details) and seed nodesrun
     ./HPIA_unix64 fratt.txt yerpe.txt annotationFTYP.txt ftypResults -RS -AS
  • Go to "Gallery" to see the resulting alignments.
TOP

Gallery:




































Host-pathogen protein-protein interaction networks and the resulting HPIA alignments. Click on the image to see a full-size figure.


Human-Y. pesits interactions Human-B. mallei interactions Human-S. enterica interactions
Human-Y. pestis interactions Human-B. mallei
interactions
Human-S. enterica
interactions


Aligned human-B. mallei and human-Y. pestis interactions Aligned human-B. mallei and human-S. enterica interactions
Aligned human-B. mallei and
human-Y. pestis interactions
Aligned human-B. mallei and
human-S. enterica interactions



Human-F. tularensis interactions Human-Y. pestis interactions Aligned human-F. tularensis and human Y. pestis interactions
Human-F. tularensis interactions Human-Y. pestis interactions Aligned human-F. tularensis and human-Y. pestis interactions


TOP

Cite as:

Memisevic, V., N. Zavaljevski, S. V. Rajagopala, K. Kwon, R. Pieper, D. DeShazer, J. Reifman, and A. Wallqvist. Mining host-pathogen protein interactions to characterize Burkholderia mallei infectivity mechanisms. PLOS Computational Biology. 2015 March 4; 11(3):e1004088.
TOP

Last update: 03/19/2015