The solution for organizing and analysing your microbes.

Gene trait matching

Gene-trait matching is an effective approach to identify genes that could be responsible for an observed phenotype.

The principle is simple: If one group of genomes has a specific trait that another does not, search for orthologous genes (or functional annotations) that occur in one group and are absent in the other.


Open the gene trait matching page and enter two non-intersecting groups of genomes, for example: Lactobacillaceae vs Propionibacteriaceae and Streptococcaceae .

gene trait matching demo

The resulting table can be downloaded in CSV format through the settings sidebar.

Example use case

The following example is based on a real experiment with 39 strains from the same microbial genus. 23 strains can metabolise a specific compound (green), the others (red) can not.

gene trait matching good example

To find the responsible gene(s), open the gene-trait-matching view, define the two groups of genomes, and click on ‘Submit’.

In this case, gene-trait matching found a strong correlation of the trait with a small number of orthologs. A closer look indicated that these orthologous genes were always located close to each other on the genome. A follow-up RNA-Seq experiment also showed a link between the phenotype and this gene cluster.

This experiment was ideal for gene-trait matching because the strains were relatively closely related and the trait was distributed amongst different clusters. If this is not the case, gene-trait matching is less likely to work. For example, had the phenotype been strongly correlated with the phylogenetic clusters, like in the image below, too many genes would probably have shown up as significantly different between the two groups.

gene trait matching bad example


By default, OpenGenomeBrowser will run a Fisher’s exact test for each orthologous gene and apply Benjamini/Hochberg multiple testing correction (alpha = 10 %).

Caution: It is not guaranteed that the assumption of Fisher’s test, that all isolates have a random and independently distributed probability for exhibiting each state, is valid because of population structure. See the example above. For more information, read the paper Brynildsrud et al, Genome Biol, 2016 about the Scoary tool.

(In the future, I will probably implement something like Scoary’s empirical p-value as an additional output column.)

Advanced usage

In the settings sidebar (weel on top right), it is possible to change…