Research & Development Cannabinoid analysis, Classification

The Theory of Evolution

Comparative genomic analyses sheds light on origin of cannabinoid oxidocyclase genes, revealing a potential new classification system

Phoebe Harkin | 06/22/2021 | Interview

For all that is known about cannabis, its origin and evolution remain remarkably unclear. In a bid to shed some light on the ancient crop, researchers at the Biosystematics Group, Wageningen University, The Netherlands, performed comparative genomic analyses of Cannabis, related genera within the cannabis family, and selected outgroup species.

The results showed that cannabinoid oxidocyclase genes originated in the Cannabis lineage from within a larger gene expansion in the Cannabaceae family. Localization and divergence of oxidocyclase genes in the genome revealed two main syntenic blocks, each comprising tandemly repeated genes.

By comparing these blocks with those of closely related species, the researchers put forward an evolutionary model of gene duplication and divergence across Cannabis cultivars. They also propose a comprehensive classification of three main clades and seven subclades that can simplify gene referencing and identification.

More than that, the researchers hypothesized that cannabinoid phenotype is primarily determined by presence/absence of single-copy genes, as opposed to the variation in the number of gene copies, as some studies suggest (1).

To find out more, we spoke to lead author of the study, Robin van Velzen.

Why did you decide to study cannabis origin and evolution?

I am an evolutionary biologist and interested in how plants evolve to produce new chemicals. Of course, Cannabis is an especially interesting plant that makes unique chemicals (cannabinoids) that are of particular interest due their medicinal purposes. The plant enzymes producing the three main cannabinoids – tetrahydrocannabinol (THC), cannabidiol (CBD), and cannabichromene (CBC) – are known. Yet, much about their origin and evolution was still unclear.

How many select outgroup species did you analyze?

We performed two subsequent analyses. The first was to see if the genes encoding these enzymes originated within Cannabis or existed already in other closely related plants such as hop. To address this question we analysed seven available plant genome sequences: one from Cannabis, three from closely related species within the cannabis family – hop (Humulus lupulus), Parasponia, and Trema – and three more from other plant families including the model plant Arabidopsis thaliana. Our results revealed that the three known cannabinoid biosynthesis genes originated specifically in Cannabis.

The second analysis focused on the evolution of this Cannabis-specific group of genes across different cultivars, again based on available genome sequences. We looked not only at genes that encode full-length functional enzymes, but also at genes that encode “broken” enzymes due to mutations (pseudogenes). And that revealed three main groups (clades; A,B, and C) subdivided into seven subgroups (subclades), where the genes encoding THCA and CBCA synthase are very closely related members of clade A, while the gene encoding CBDA synthase is a member of clade B. Some genes in clade C and subclade A3 encode full-length enzymes, but we do not yet know their function in the Cannabis plant.

How might comprehensive classification of three main clades and seven subclades aid gene referencing and identification?

In previous studies, genes were commonly referred to as “looking like THC synthase” (THCS-like) or “looking like CBD synthase” (CBDS-like). The problem is that if two studies identify THCS-like genes it is difficult to know if they are referring to the same gene or two different genes.

Moreover, we found that sometimes the exact same gene would be considered THCS-like in one study and CBDS-like in another. Consequently, the precise identity of genes described often remains unclear, making it difficult (for us – but probably also other researchers) to accurately interpret results.

Based on our new classification, every gene in this Cannabis-specific group can now be unequivocally referred to as a member of any of our seven subclades. We therefore hope that this classification will serve as a useful reference for the cannabis science community.

Does your theory that cannabinoid phenotype is primarily determined by the presence/absence of single-copy genes challenge conventional thinking?

It is hard to specify “conventional thinking” in relation to these genes. Originally, based on careful examination of genetic crosses, THC and CBD synthase genes were generally considered two different alleles of the same gene. However, after the advent of genome sequencing, it became clear that these in fact comprised separate genes. More recently, some studies suggested that the variation in the number of gene copies has an effect on cannabinoid levels of the plant. However, our study showed THC and CBD synthase genes are strictly single-copy and that the variation occurs primarily in pseudogenes that cannot have a direct effect on cannabinoid production. We therefore hypothesize that levels of THC and CBD are the result of presence or absence, sequence variation, and expression of these two genes.

What’s next?

We will continue this research to elucidate the origin and evolution of genes encoding other enzymes in the cannabinoid pathway. We will also examine how the DNA code of these genes determines enzyme function. This information could help breeders develop new Cannabis cultivars.

Research & Development Cannabinoid analysis, Classification

R van Velzen and M Schranz, Genome Biol Evol, [Online ahead of print] (2021). PMID: 34100927

Phoebe Harkin