Reconstituting protein interaction networks using parameter-dependent
domain-domain interactions

(Supplementary executable files)

Vesna Memišević, Anders Wallqvist, and Jaques Reifman

Department of Defense Biotechnology High Performance Computing Software Applications Institute,
Telemedicine and Advanced Technology Research Center, US Army Medical Research and Materiel Command,
Fort Detrick, MD 21702

Corresponding author. Tel.: +1 301 619 7915; Fax: +1 301 619 1983.

HOME


Protein-domain annotation merging procedure

Files:

How to run:
























































  • Download and unpack dMerger using the link above. Current version of the program is compiled on Unix64 machines. For different version, please contact the authors.
  • Run "./dMerger_unix64" without any parameters and it will display usage directions
  • Usage directions:
    "./dMerger_unix64 DB_list.txt seq_list.txt yes_list.txt no_list.txt outputFilename"
  • Where:
    • “DB_list.txt” represents a list of files that contain individual domain annotation databases. E.g.,

      d:/DBall/PFAMA.txt
      d:/DBall/SF.txt
      d:/DBall/SMART.txt
    • Domain annotation file should be in the following format (tab delimeter) and should not contain space characters:

      protein   domain_ID   domain_name domain_start  domain_end    domain_description
    •  “seq_files.txt” represents a list of FASTA protein sequences for each organisms for which the annotation merging procedure is performed. E.g., 

      d:/SeqAll/yeast.txt
    • Protein names in the FASTA file have to match protein names provided in the domain annotation files.
    •  “yes_list.txt” represents a user specified list of domain pairs that represent the same domain. E.g.,
      COX3         CYTOCHROME_C_OXIDASE_SUBUNIT_III-LIKE
      PK              PYRUVATE_KINASE
    • “no_list.txt” represents a user specified list of domain pairs that should never be considered as the same domain. E.g.,

      KINASE      NOT_KINASE
    •   “outputFilename” represents the name that will be assigned to the following output files (all files will be in “RF” folder):
        • “outputFilename_DOM_profile.txt” – contains mapping between new domains and the original domains
        • “outputFilename_ND_domains.txt” – contains description of the merged protein-domain annotation in the same format as the annotation from the original databases.
        •  “outputFilename_ND_list.txt” – contains the new protein domain annotation in the form of protein – merged_domain list.
        • “outputFilename_Not_DOM.txt” – contains the list of (merged) domains that cover similar sequence position, but do not have similar domain names/labels.

      Format: domain1  domain2  number_of_overlaps   shared_amino_acids(%)

        • “outputFilename_RR_domains.txt” – contains the list of merged tandem domains (from original databases)
        • “outputFilename_SEQ.txt” – contains a sequence profile for each protein for each merged domain annotation

      In the case of multiple species (sequences), program will assign to each protein name a number that corresponds to the species, where the order is the same as given in the seq_files.txt file.

       In the case of multiple species (sequences), merged domains represent domains merged across both species. E.g.,

      1_YHR104W_D15_48_69
      2_P53_HUMAN_D12_45_112

Example:



  • Download and unpack domain annotation merging program and a set of examples using the link above, then run as:

     “./dMerger_unix64 dbList.txt seqList.txt yesList.txt noList.txt dmExample
Cite as:

  • Memišević V., Wallqvist A., and Reifman J.: Reconstituting protein interaction networks using parameter-dependent domain-domain interactions. BMC Bioinformatics 2013 14:154.
HOME

Last update: 05/21/2013