NEBULON- a system for the inference of functional relationships of gene products from the rearrangement of predicted operons

Sarath Chandra Janga1, Julio Collado-Vides1 & Gabriel Moreno-Hagelsieb2

1Program of Computational Genomics, CIFN-UNAM, Apdo Postal 565-A, Cuernavaca, Morelos, 62100 Mexico. 2 Department of Biology, Wilfrid Laurier University , 75 University Avenue West, Waterloo , ON , Canada , N2L 3C5.


Abstract

Here we introduce Nebulon, a system to build networks of predicted functional relationships of gene products based on their organization into operons in any available genome. The system is based on a previously developed method to predict operons by the distances between adjacent genes in the same strand, and on the high recombination rate of operon associations across genomes that reveal functional relationships among gene products. Our system can use different kinds of thresholds to accept a functional relationship, from those related to the prediction of operons, to finding the association in at least a given number of non-redundant genomes. We also work by shells, meaning that we decide on the number of linking iterations to allow for the complementation of related gene sets. The method shows high reliability benchmarked against knowledge databases of functional interactions. We also exemplify with several known and characterized functional gene sets.


Accessing Nebulon

Nebulon can be accessed by one of the two ways

a) The GUI can be accessed by following the link below.The GUI gives information for each link in nebulon, like source of the evidence,number of evidences and the functions of the genes in a given nebulome etc.For further information and help to use the interface please read the documentation of Nebulon-GUI.

http://tikal.cifn.unam.mx/nebulon

b) The command line toolkit of Nebulon provides a means to get the links between a given gene in a given genome under different thresholds.It is downloadable from here. We recommend using the command-line version if your interest lies in obtaining/analyzing interactions for large number of genes because for large outputs the GUI could be slow. Please read the documentation that comes with this tar-ball for any help in using the parameters and the program itself. The output of the interactions is a simple tab separated file and so could be used to view in cytoscape or yed.


Sections
1) Description of pathway identifiers used in Figure 4.

2) Distribution of Internal Vs External links for each Non-redundant genome in nebulon and the three letter codes for complete genomes used in command-line tool.

3) Table1 showing the newly found links in the recovery of argR regulon.

4) Interactions file in gml format for Figure 5.

5) Interaction files in gml format for Figures 6(a) and 6(b).

6) Interaction file in gml format for Figure 7 and Table 2 showing the genes having at least two links with genes related to Nitrogen Fixation in Sinorhizobium meliloti.

7) Table showing the number of functional links which are unique to nebulon(Found only by nebulon) and those which are also identified by STRING in different genomes (Only available for those genomes which can be mapped to entrez genomes database from string).


1.The pathway identifiers used in Figure 2 stand for

PATHWAY ID
DESCRIPTION
MAP00010 Glycolysis / Gluconeogenesis
MAP00020 Citrate cycle (TCA cycle)
MAP00030 Pentose phosphate pathway
MAP00040 Pentose and glucuronate interconversions
MAP00061 Fatty acid biosynthesis (path 1)
MAP00071 Fatty acid Metabolism
MAP00130 Ubiquinone biosynthesis
MAP00190 Oxidative phosphorylation
MAP00193 ATP synthesis
MAP00195 Photosynthesis
MAP00220 Urea cycle and metabolism of amino groups
MAP00230 Purine metabolism
MAP00240 Pyrimidine metabolism
MAP00260 Glycine, serine and threonine metabolism
MAP00290 Valine,leucine and isoleucine degradation
MAP00300 Lysine biosynthesis
MAP00340 Histidine metabolism
MAP00400 Phenylalanine,tyrosine and tryptophan biosynthesis
MAP00500 Starch and sucrose metabolism
MAP00520 Nucleotide sugars metabolism
MAP00550 Peptidoglycan biosynthesis
MAP00630 Glyoxylate and dicarboxylate metabolism
MAP00632 Benzoate degradation via CoA ligation
MAP00640 Propanoate metabolism
MAP00650 Butanoate metabolism
MAP00720 Reductive carboxylate cycle (CO2 fixation)
MAP00770 Pantothenate and CoA biosynthesis
MAP00790 Folate metabolism
MAP00860 Porphyrin and chlorophyll metabolism
MAP00910 Nitrogen metabolism
MAP02040 Flagellar assembly
MAP03070 Type III secretion system

 


2. Graph showing the distribution of internal vs external links in all the NR genomes can be downloaded from here in Postscript format or in PDF format(Note: The figure has the genomes sorted into Proteobacteria,Firmicutes and Archea).Description of the three letter codes used can be found here and the Table which contains the link-proportions in each genome for internal and external links for the raw interaction data and at 0.4 log-liklihood threshold can be found here.The columns in this table represent the following

column 1 : Genome

column 2 : Number of internal links identified

column 3 : Number of Exernal links identified

column 4 : Number of links due to fusions

column 5 : Total number of links identified.(Note: A link can be identified by more than one means i.e internal,external or fusion so this need not be the commulative of the above)

column 6 : Fraction of links which are internal.

column 7 : Fraction of links which are external.


3. Table1 showing the details of the newly found links in argR regulon.

Gene name or identifier

Number of evidences & genomes in which the evidence is found

No. of intervening genes & log-likelihoods

Function of protein

recN *

5 – B. halodurans , B. subtilis , O. iheyensis , S. aureus Mu50, T. tengcongensis

0(0.4291), 0(0.4291), 0(0.5067), 0(0.8840), 0(0.8840)

Protein used in recombination and DNA repair

astC *

3 – C. efficiens YS-314, M. avium paratuberculosis , S. coelicolor

1(0.8840), 1(1.1343), 0(0.7944)

Amino acid biosynthesis, Arginine acetylornithine delta-aminotransferase

mutS *

2 – S. agalactiae 2603, S. pneumoniae R6

0(0.1721), 0(0.8840)

DNA-replication, repair. Methyl-directed mismatch repair

yfcH *

2 – H. ducreyi 35000HP, P. multicoda

0(1.1343), 0(0.5067)

Putative enzyme

dfp*

1 – T. thermophilus HB27

0 (0.7944)

DNA-replication, repair. Flavoprotein affecting synthesis of DNA and pantothenate metabolism

gmk

1 – T. thermophilus HB27

2 (0.7944)

Purine ribonucleotide biosynthesis, guanylate kinase

ychE *

1 – T. thermophilus HB27

0 (0.7944)

Putative transport

dxs

1 – T. tengcongensis

2 (0.4291)

Central intermediary metabolism, 1-deoxyxylulose-5-phosphate synthase

yfj B*

1 – T. tengcongensis

0 (0.8840)

Hypothetical protein

folD

1 – O. iheyensis

4 (0.5067)

Biosynthesis of cofactors, Folic acid 5,10-methylene-tetrahydrofolate dehydrogenase

ispA

1 – O. iheyensis

1 (0.5067)

Biosynthesis of cofactors, geranyltransferase

nusB

1 – O. iheyensis

5 (0.5067)

RNA synthesis, Transcription termination, L factor

xseA

1 – O. iheyensis

3 (0.5067)

Degradation of DNA

* Cases where we expect the genes to be linked functionally because the log-likelihood scores are high and the orthologs are conserved with no intervening genes in the genome of evidence. The genes gmk , ychE and yfjB have been predicted to be regulated by ArgR (Robison et.al , 1998) . It can also be noticed that in all these cases the genes are either putative, hypothetical or poorly annotated indicating the possibility of these associations to be real. In all 13 of these links we only expect the links marked ‡ (3 in number) to be false positives because of the high number of intervening genes. Such links could serve as a guide for future refinements in Nebulon. Complete genome names can be found in the web page.


4. Interactions in gml format for Figure 5 can be downloaded from here .This file can be viewed in cytoscape or yed.The same interactions with detailed information can also be obtained from the GUI of nebulon with the query gene as argR and query Genome as Escherichia coli K12 .


5. Interactions in gml format for Figure 6 and 7 can be downloaded from here.These files can be viewed in cytoscape or yed . The interactions with detailed information for figure 6(a) can also be obtained from the GUI or command line toolkit of nebulon with the query gene as tufA and query Genome as Escherichia coli K12 and those for Figure 6(b) can be obtained from with query gene as flgA , query Genome as Escherichia coli K12 and number of generations as 2 (Please note that currently Graphical interface doesn't support the number of generations threshold, so we recommend you to use command-line tool if you are interested in more than one generation queries.).


6. Interactions in gml format for Figure 7 can be downloaded from here.These files can be viewed in cytoscape or yed .

Table 2. Genes having at least two links with genes related to nitrogen fixation in Sinorhizobium meliloti.

Gene name or identifier

Number of links to core

Function of protein

cysH

5

Probable Thioredoxin dependent padops reductase 3'-phosphoadenylylsulfate sulfotransferase cysteine biosynthesis protein

cysG

4

Probable siroheme synthase protein

cysQ

4

Putative transmembrane protein

SMc02124

4

Putative nitrite reductase protein

cobA

3

Probable uroporphyrin-III C-methyltransferase protein

fixG

3

Iron sulfur membrane protein

fixI1

3

Copper transport ATPase

cysD

2

Putative sulfate adenylate transferase subunit 2 cysteine biosynthesis protein

dcp

2

Probable peptidyl-dipeptidase A protein

etf

2

Probable electron transfer flavoprotein-ubiquinone oxidoreductase

fixI2

2

E1-E2 type cation ATPase

fixN1

2

Heme b / copper cytochrome c oxidase subunit

fixO2

2

Cytochrome c oxidase

fixP1

2

Di-heme cytochrome c

glcF

2

Probable glycolate oxidase iron-sulfur subunit protein

ispB

2

Putative octaprenyl-diphosphate synthase protein

ivdH

2

Putative isovaleryl-CoA dehydrogenase protein

pfs

2

Putative MTA/SAH nucleosidase P46 includes: 5'-methylthioadenosine nucleosidase and S-adenosylhomocysteine nucleosidase protein

rpsJ

2

Probable 30S ribosomal protein S10

SMa1207

2

FixK-like regulatory protein

SMa2359

2

Conserved hypothetical protein

SMb20753

2

Putative acyl-CoA dehydrogenase protein

SMb21225

2

Putative inositol monophosphatase, possibly involved in PAPS metabolism protein

SMb21232

2

Putative nucleotide sugar epimerase dehydratase protein

SMc00977

2

Putative acyl-COA dehydrogenase protein

SMc01153

2

Probable enoyl COA hydratase protein

SMc02123

2

Conserved hypothetical protein

thiF

2

Putative Thiamine biosynthesis transmembrane protein

typA

2

Probable GTP-binding protein

ubiE

2

Probable ubiquinone/menaquinone biosíntesis methyltransferase protein


7. Click here to see the table containing the details of the functional links identified by nebulon per genome. (Those which are unique to nebulon and those which are also found by string).

Note that in each case nebulon recovers significant number of links which are not found by string, irrespective of the fact that string uses a number of genomic context tools,High throughput experimental data, Coexpression data (which is normally from microarrays) and text-mining.

 


For Questions/Comments, please mail: sarath AT cifn.unam.mx