NEBULON- a system for the inference of functional relationships of gene products from the rearrangement of predicted operons
Sarath Chandra Janga1, Julio Collado-Vides1 & Gabriel Moreno-Hagelsieb2
1Program of Computational Genomics, CIFN-UNAM, Apdo Postal 565-A, Cuernavaca, Morelos, 62100 Mexico. 2 Department of Biology, Wilfrid Laurier University , 75 University Avenue West, Waterloo , ON , Canada , N2L 3C5.
Abstract
Here we introduce Nebulon, a system to build networks of predicted functional relationships of gene products based on their organization into operons in any available genome. The system is based on a previously developed method to predict operons by the distances between adjacent genes in the same strand, and on the high recombination rate of operon associations across genomes that reveal functional relationships among gene products. Our system can use different kinds of thresholds to accept a functional relationship, from those related to the prediction of operons, to finding the association in at least a given number of non-redundant genomes. We also work by shells, meaning that we decide on the number of linking iterations to allow for the complementation of related gene sets. The method shows high reliability benchmarked against knowledge databases of functional interactions. We also exemplify with several known and characterized functional gene sets.
Accessing Nebulon
Nebulon can be accessed by one of the two ways
a) The GUI can be accessed by following the link below.The GUI gives information for each link in nebulon, like source of the evidence,number of evidences and the functions of the genes in a given nebulome etc.For further information and help to use the interface please read the documentation of Nebulon-GUI.
http://tikal.cifn.unam.mx/nebulon
b) The command line toolkit of Nebulon provides a means to get the links between a given gene in a given genome under different thresholds.It is downloadable from here. We recommend using the command-line version if your interest lies in obtaining/analyzing interactions for large number of genes because for large outputs the GUI could be slow. Please read the documentation that comes with this tar-ball for any help in using the parameters and the program itself. The output of the interactions is a simple tab separated file and so could be used to view in cytoscape or yed.
Sections
1) Description of pathway identifiers used in Figure 4.
3) Table1 showing the newly found links in the recovery of argR regulon.
4) Interactions file in gml format for Figure 5.
5) Interaction files in gml format for Figures 6(a) and 6(b).
1.The pathway identifiers used in Figure 2 stand for
PATHWAY ID |
DESCRIPTION |
| MAP00010 | Glycolysis / Gluconeogenesis |
| MAP00020 | Citrate cycle (TCA cycle) |
| MAP00030 | Pentose phosphate pathway |
| MAP00040 | Pentose and glucuronate interconversions |
| MAP00061 | Fatty acid biosynthesis (path 1) |
| MAP00071 | Fatty acid Metabolism |
| MAP00130 | Ubiquinone biosynthesis |
| MAP00190 | Oxidative phosphorylation |
| MAP00193 | ATP synthesis |
| MAP00195 | Photosynthesis |
| MAP00220 | Urea cycle and metabolism of amino groups |
| MAP00230 | Purine metabolism |
| MAP00240 | Pyrimidine metabolism |
| MAP00260 | Glycine, serine and threonine metabolism |
| MAP00290 | Valine,leucine and isoleucine degradation |
| MAP00300 | Lysine biosynthesis |
| MAP00340 | Histidine metabolism |
| MAP00400 | Phenylalanine,tyrosine and tryptophan biosynthesis |
| MAP00500 | Starch and sucrose metabolism |
| MAP00520 | Nucleotide sugars metabolism |
| MAP00550 | Peptidoglycan biosynthesis |
| MAP00630 | Glyoxylate and dicarboxylate metabolism |
| MAP00632 | Benzoate degradation via CoA ligation |
| MAP00640 | Propanoate metabolism |
| MAP00650 | Butanoate metabolism |
| MAP00720 | Reductive carboxylate cycle (CO2 fixation) |
| MAP00770 | Pantothenate and CoA biosynthesis |
| MAP00790 | Folate metabolism |
| MAP00860 | Porphyrin and chlorophyll metabolism |
| MAP00910 | Nitrogen metabolism |
| MAP02040 | Flagellar assembly |
| MAP03070 | Type III secretion system |
2. Graph showing the distribution of internal vs external links in all the NR genomes can be downloaded from here in Postscript format or in PDF format(Note: The figure has the genomes sorted into Proteobacteria,Firmicutes and Archea).Description of the three letter codes used can be found here and the Table which contains the link-proportions in each genome for internal and external links for the raw interaction data and at 0.4 log-liklihood threshold can be found here.The columns in this table represent the following
column 1 : Genome
column 2 : Number of internal links identified
column 3 : Number of Exernal links identified
column 4 : Number of links due to fusions
column 5 : Total number of links identified.(Note: A link can be identified by more than one means i.e internal,external or fusion so this need not be the commulative of the above)
column 6 : Fraction of links which are internal.
column 7 : Fraction of links which are external.
3. Table1 showing the details of the newly found links in argR regulon.
Gene name or identifier |
Number of evidences & genomes in which the evidence is found |
No. of intervening genes & log-likelihoods |
Function of protein |
recN * |
5 – B. halodurans , B. subtilis , O. iheyensis , S. aureus Mu50, T. tengcongensis |
0(0.4291), 0(0.4291), 0(0.5067), 0(0.8840), 0(0.8840) |
Protein used in recombination and DNA repair |
astC * |
3 – C. efficiens YS-314, M. avium paratuberculosis , S. coelicolor |
1(0.8840), 1(1.1343), 0(0.7944) |
Amino acid biosynthesis, Arginine acetylornithine delta-aminotransferase |
mutS * |
2 – S. agalactiae 2603, S. pneumoniae R6 |
0(0.1721), 0(0.8840) |
DNA-replication, repair. Methyl-directed mismatch repair |
yfcH * |
2 – H. ducreyi 35000HP, P. multicoda |
0(1.1343), 0(0.5067) |
Putative enzyme |
dfp* |
1 – T. thermophilus HB27 |
0 (0.7944) |
DNA-replication, repair. Flavoprotein affecting synthesis of DNA and pantothenate metabolism |
gmk |
1 – T. thermophilus HB27 |
2 (0.7944) |
Purine ribonucleotide biosynthesis, guanylate kinase |
ychE * |
1 – T. thermophilus HB27 |
0 (0.7944) |
Putative transport |
dxs |
1 – T. tengcongensis |
2 (0.4291) |
Central intermediary metabolism, 1-deoxyxylulose-5-phosphate synthase |
yfj B* |
1 – T. tengcongensis |
0 (0.8840) |
Hypothetical protein |
folD ‡ |
1 – O. iheyensis |
4 (0.5067) |
Biosynthesis of cofactors, Folic acid 5,10-methylene-tetrahydrofolate dehydrogenase |
ispA |
1 – O. iheyensis |
1 (0.5067) |
Biosynthesis of cofactors, geranyltransferase |
nusB ‡ |
1 – O. iheyensis |
5 (0.5067) |
RNA synthesis, Transcription termination, L factor |
xseA ‡ |
1 – O. iheyensis |
3 (0.5067) |
Degradation of DNA |
* Cases where we expect the genes to be linked functionally because the log-likelihood scores are high and the orthologs are conserved with no intervening genes in the genome of evidence. The genes gmk , ychE and yfjB have been predicted to be regulated by ArgR (Robison et.al , 1998) . It can also be noticed that in all these cases the genes are either putative, hypothetical or poorly annotated indicating the possibility of these associations to be real. In all 13 of these links we only expect the links marked ‡ (3 in number) to be false positives because of the high number of intervening genes. Such links could serve as a guide for future refinements in Nebulon. Complete genome names can be found in the web page.
4. Interactions in gml format for Figure 5 can be downloaded from here .This file can be viewed in cytoscape or yed.The same interactions with detailed information can also be obtained from the GUI of nebulon with the query gene as argR and query Genome as Escherichia coli K12 .
5. Interactions in gml format for Figure 6 and 7 can be downloaded from here.These files can be viewed in cytoscape or yed . The interactions with detailed information for figure 6(a) can also be obtained from the GUI or command line toolkit of nebulon with the query gene as tufA and query Genome as Escherichia coli K12 and those for Figure 6(b) can be obtained from with query gene as flgA , query Genome as Escherichia coli K12 and number of generations as 2 (Please note that currently Graphical interface doesn't support the number of generations threshold, so we recommend you to use command-line tool if you are interested in more than one generation queries.).
6. Interactions in gml format for Figure 7 can be downloaded from here.These files can be viewed in cytoscape or yed .
Table 2. Genes having at least two links with genes related to nitrogen fixation in Sinorhizobium meliloti.
Gene name or identifier |
Number of links to core |
Function of protein |
cysH |
5 |
Probable Thioredoxin dependent padops reductase 3'-phosphoadenylylsulfate sulfotransferase cysteine biosynthesis protein |
cysG |
4 |
Probable siroheme synthase protein |
cysQ |
4 |
Putative transmembrane protein |
SMc02124 |
4 |
Putative nitrite reductase protein |
cobA |
3 |
Probable uroporphyrin-III C-methyltransferase protein |
fixG |
3 |
Iron sulfur membrane protein |
fixI1 |
3 |
Copper transport ATPase |
cysD |
2 |
Putative sulfate adenylate transferase subunit 2 cysteine biosynthesis protein |
dcp |
2 |
Probable peptidyl-dipeptidase A protein |
etf |
2 |
Probable electron transfer flavoprotein-ubiquinone oxidoreductase |
fixI2 |
2 |
E1-E2 type cation ATPase |
fixN1 |
2 |
Heme b / copper cytochrome c oxidase subunit |
fixO2 |
2 |
Cytochrome c oxidase |
fixP1 |
2 |
Di-heme cytochrome c |
glcF |
2 |
Probable glycolate oxidase iron-sulfur subunit protein |
ispB |
2 |
Putative octaprenyl-diphosphate synthase protein |
ivdH |
2 |
Putative isovaleryl-CoA dehydrogenase protein |
pfs |
2 |
Putative MTA/SAH nucleosidase P46 includes: 5'-methylthioadenosine nucleosidase and S-adenosylhomocysteine nucleosidase protein |
rpsJ |
2 |
Probable 30S ribosomal protein S10 |
SMa1207 |
2 |
FixK-like regulatory protein |
SMa2359 |
2 |
Conserved hypothetical protein |
SMb20753 |
2 |
Putative acyl-CoA dehydrogenase protein |
SMb21225 |
2 |
Putative inositol monophosphatase, possibly involved in PAPS metabolism protein |
SMb21232 |
2 |
Putative nucleotide sugar epimerase dehydratase protein |
SMc00977 |
2 |
Putative acyl-COA dehydrogenase protein |
SMc01153 |
2 |
Probable enoyl COA hydratase protein |
SMc02123 |
2 |
Conserved hypothetical protein |
thiF |
2 |
Putative Thiamine biosynthesis transmembrane protein |
typA |
2 |
Probable GTP-binding protein |
ubiE |
2 |
Probable ubiquinone/menaquinone biosíntesis methyltransferase protein |
7. Click here to see the table containing the details of the functional links identified by nebulon per genome. (Those which are unique to nebulon and those which are also found by string).
Note that in each case nebulon recovers significant number of links which are not found by string, irrespective of the fact that string uses a number of genomic context tools,High throughput experimental data, Coexpression data (which is normally from microarrays) and text-mining.
For Questions/Comments, please mail: sarath AT cifn.unam.mx