Somatic Mutations in meristem tissue in plants

Somatic Mutations in meristem tissue in plants

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

In angiosperm, in which layer of the meristem does a new mitotic mutation occurring has chance to be found in a pollen grain or in an ovule?

I also welcome some insights about non-angiosperm plants.

You can see the wikipedia article on meristem.

The apical meristem differentiates into floral meristem that gives rise to flowers. From this the cells specifically expressing APETALA3 (AP3), PISTILLATA (PI), AGAMOUS (AG) and SEPALLATA (SEP) would give rise to the stamen [ref].

CRC gene is essential for female development and plants lacking this will not produce pistils. So cells expressing this gene will give rise to female parts[ref].

Nonetheless mutation happening at any stage of the development of floral meristem can be transmitted to the pollens/ovules if the mutated cells stochastically give rise to that lineage.

Somatic mutation

A somatic mutation is change in the DNA sequence of a somatic cell of a multicellular organism with dedicated reproductive cells that is, any mutation that occurs in a cell other than a gamete, germ cell, or gametocyte. Unlike germline mutations, which can be passed on to the descendants of an organism, somatic mutations are not usually transmitted to descendants. This distinction is blurred in plants, which lack a dedicated germline, and in those animals that can reproduce asexually through mechanisms such as budding, as in members of the cnidarian genus Hydra.

While somatic mutations are not passed down to an organism's offspring, somatic mutations will be present in all descendants of a cell within the same organism. Many cancers are the result of accumulated somatic mutations.

From the somatic cell to the germ cell

An international scientific consortium including the Freiburg plant biologist Prof. Dr. Thomas Laux has discovered a regulatory pathway that turns plants' ordinary somatic cells into germ cells for sexual reproduction. The researchers recently published their findings in the scientific journal Science.

In contrast to humans and animals, plants do not set aside a specialized cell lineage (germline) for the future production of gametes during early embryogenesis. Instead, the germ cells of plants are established de novo from somatic cells in the floral reproductive organs, the stamens and carpels. To this end, the selected cells switch their cell division mode from mitosis, cell proliferation maintaining the chromosome number, to meiosis, the division that reduces the number of chromosomes and where genetic recombination occurs. Plants have therefore evolved strategies to enable somatic cells to switch to germline fate and to do so in the right place and at the right time.

Laux and colleagues have identified multiple genes in the model organism Arabidopsis thaliana that give the start signal for switching from mitose to meiose. The starting point for the findings presented in Science are mutants that create multiple germ cells instead of a singular one in each ovule. Key of the newly discovered pathway is the limitation of activity of the transcription factor WUSCHEL, which Laux's team had identified several years ago as an important regulator of pluripotent stem cells that are able to develop into every cell type in the organism. The involvement of WUSCHEL in creating germ cells is a discovery that provides molecular evidence for the longstanding hypothesis derived from paleobotanical studies that the reproductive ovules and the shoot meristem have evolved from the same precursor organ in ancient plants. The newly discovered regulatory mechanism shows how plants are able to limit switching to the germ cell program so that only a single germ cell emerges, while the surrounding cells take on other tasks.

Somatic embryogenesis in Arabidopsis thaliana is facilitated by mutations in genes repressing meristematic cell divisions

Embryogenesis in plants can commence from cells other than the fertilized egg cell. Embryogenesis initiated from somatic cells in vitro is an attractive system for studying early embryonic stages when they are accessible to experimental manipulation. Somatic embryogenesis in Arabidopsis offers the additional advantage that many zygotic embryo mutants can be studied under in vitro conditions. Two systems are available. The first employs immature zygotic embryos as starting material, yielding continuously growing embryogenic cultures in liquid medium. This is possible in at least 11 ecotypes. A second, more efficient and reproducible system, employing the primordia timing mutant (pt allelic to hpt, cop2, and amp1), was established. A significant advantage of the pt mutant is that intact seeds, germinated in 2,4-dichlorophenoxyacetic acid (2, 4-D) containing liquid medium, give rise to stable embryonic cell cultures, circumventing tedious hand dissection of immature zygotic embryos. pt zygotic embryos are first distinguishable from wild type at early heart stage by a broader embryonic shoot apical meristem (SAM). In culture, embryogenic clusters originate from the enlarged SAMs. pt somatic embryos had all characteristic embryo pattern elements seen in zygotic embryos, but with higher and more variable numbers of cells. Embryogenic cell cultures were also established from seedling, of other mutants with enlarged SAMs, such as clavata (clv). pt clv double mutants showed additive effects on SAM size and an even higher frequency of seedlings producing embryogenic cell lines. pt clv double mutant plants had very short fasciated inflorescence stems and additive effects on the number of rosette leaves. This suggests that the PT and CLV genes act in independent pathways that control SAM size. An increased population of noncommitted SAM cells may be responsible for facilitated establishment of somatic embryogenesis in Arabidopsis.

Low number of fixed somatic mutations in a long-lived oak tree

Because plants do not possess a defined germline, deleterious somatic mutations can be passed to gametes, and a large number of cell divisions separating zygote from gamete formation may lead to many mutations in long-lived plants. We sequenced the genome of two terminal branches of a 234-year-old oak tree and found several fixed somatic single-nucleotide variants whose sequential appearance in the tree could be traced along nested sectors of younger branches. Our data suggest that stem cells of shoot meristems in trees are robustly protected from the accumulation of mutations.

To identify fixed somatic variants (those present in an entire sector of the Napoleon Oak) and to reconstruct their origin and distribution among branches, we collected 26 leaf samples from different locations on the tree. We first sequenced the genome from leaves sampled on terminal ramets of one lower and one upper branch of the tree. We then used a combination of short-read Illumina and single-molecule real-time (SMRT, Pacific Biosciences) sequencing to generate a de novo assembly of the oak genome. After removing contigs <1,000 bp, we established a draft sequence of approximately 720 megabases (Mb) at a coverage of approximately 70×, with 85,557 scaffolds and the minimum length to cover 50% of the genome (N50) of 17,014. Our sequence is thus in broad agreement with the published estimated genome size of 740 Mbp 5 . The oak genome is predicted to encode 49,444 predicted protein-coding loci (Supplementary Table 1).

Examples of Somatic Mutations

Somatic Mutations in Dogs

Would you like to write for us? Well, we're looking for good writers who want to spread the word. Get in touch with us and we'll talk.

Normal parent with puppy showing somatic mutation for coat color

A common example of somatic mutations in dogs is the variation in the color of the fur coat. In case of breeding between dogs of the same breed with different coat colors, the pups that are born usually exhibit coat colors of either one of the parents. But in some cases, where somatic mutations arise, the resultant pup exhibits a coat with colors of both parent’s coats.

For example, a yellow Labrador is mated with a brown Labrador, each of the pups is expected to be either yellow or brown. However, in the event of somatic mutations, the mutant pup would show a coat with both yellow and brown colored patches. Somatic mutations also give rise to other characteristics such as white colored paws (resembling socks) and different patterns of pigmentation or discoloration across the animal’s body.

Somatic Mutations in Horses

Horse showing discolored patches due to somatic mutations

Unusual and unexplained colorings that are seen on a horse’s coat that are not in accordance to the markings of its breed are usually caused due to the occurrence of somatic mutation. The mutations cause the coat to exhibit white, black, and gray markings on the animal’s body. They may be in the shape of patches or spots. The different markings that arise due to mutations are called various names according to the pattern observed. A rabicano is characterized by a smattering of white hair along the body of the horse and a whitish skunk tail. The pangare condition exhibits a sort of shading effect across the body of the animal, and a bloody shoulder shows the presence of wine colored marks or spots on the shoulder. In most cases, the mutation is exhibited in the form of random colored patches or streaks on the body.

Somatic Hypermutation in Antibody Production

Immunoglobulin genes that code for antibodies, are made up of variable (V), diverse (D), joining (J), and constant (C). The diverse region binds to the specific variable region and to a joining region, forming a VDJ section. This section, with the help of the J region, binds to the C regions, thereby producing a specific antibody. The D, J, and C regions remain more or less the same for all the antibodies. The unique character of each antibody is imparted to it via the changes in the variable region (V). The 4 regions combine variably, hence giving rise to a variety of antibodies in the system.

In the presence of antigens, microbes, or foreign molecules, the cells initiate a cellular mechanism through which the immune system is altered so as to subdue and eliminate the foreign particles. This occurs via a process called somatic hypermutation. It is initiated when a B lymphocyte cell recognizes the presence of an antigen, and stimulates its own proliferation. During the course of this rapid division, the variable regions of the immunoglobulin gene are stimulated to possess a higher rate of mutation than normal, giving rise to hypermutated variable regions. This in turn introduces even higher diversity in the type of antibodies being produced in response to an antigen. These mutations are induced only in response to immunogenic stimulation, and are not hereditary.

Theory of Carcinogenesis

The most common disease caused due to the occurrence of somatic mutations is cancer. Cancer arises due to two types of functional mutations that occur in particular genes.

Gain of Function Mutation – These mutations usually occur on proto-oncogenes. These are genes with normal cellular functions that gain cancerous properties when mutated. On mutation, this gene would turn into an oncogene.

Examples of such a gene would include genes like ras, myc, raf, etc. They are regulators of proliferation and transcription, and the mutation causes them to remain in a “switched on” state, leading to uncontrolled proliferation and transcription, and eventually leading to a cancerous mass of cells.

Loss of Function Mutation – These mutations occur on tumor suppressor genes or (anti-oncogenes) such as p53 or rb, which keeps the cellular processes in check and regulates the cell cycle by repression. In the event of mutations on this gene, it loses its function, and the control on the cellular proliferation is lost, thereby allowing the cell to progress to a cancerous state.

The occurrence of mutations in these genes follows Knudson’s two-hit hypothesis. The hypothesis claims that both the alleles of a tumor suppressor gene must be mutated, for the cancer to progress.

The 1 st hit – It refers to the mutation and loss of function of one of the alleles of the anti-oncogene. In hereditary cases, individuals are born with the first hit, but in cases of sporadic carcinogenesis, the individuals acquire this hit. Its occurrence gives rise to a heterozygosity of the alleles, where, despite the loss of one allele, the other allele compensates and functions normally.

The 2 nd hit – It refers to the loss of function mutation acquired by the remaining normal allele. This causes a loss of the heterozygosity, and since no functional allele is now present to control the cell proliferation, the cells proliferate uncontrollably, and the individual develops cancer.

Somatic mutations occur not only in animals, but also in plants. But since the reproductive organ is developed from and borne on the somatic parts of the plant, the somatic mutations often translate into germ-line mutations. These mutations are important in such cases, as they give rise to new and diverse cultivars of the plant.

In case of animals, the study of somatic mutations is important in order to understand the basis for various phenotypes, and also the basis for non-hereditary disease. It would also aid in resolving diseases with the development of targeted gene therapy and personalized medicines.

Related Posts

Streptomyces, lactobacillis and E. coli are some examples of helpful bacteria. To know more about the species of beneficial bacteria, read on.

Mutation, a change in the sequence of genes, is divided into various types such as beneficial, harmful, and neutral, based on their effects. We are here to discuss beneficial mutation&hellip

The main difference between germline and somatic mutation lies in the fact whether they are heritable or not. But, there's more to it. This BiologyWise post gives a systematic comparison&hellip

Access options

Get full journal access for 1 year

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Get time limited or full article access on ReadCube.

All prices are NET prices.


In plants, gametogenesis occurs late in development, and somatic mutations can therefore be transmitted to the next generation. Longer periods of growth are believed to result in an increase in the number of cell divisions before gametogenesis, with a concomitant increase in mutations arising due to replication errors. However, there is little experimental evidence addressing how many cell divisions occur before gametogenesis. Here, we measured loss of telomeric DNA and accumulation of replication errors in Arabidopsis with short and long life spans to determine the number of replications in lineages leading to gametes. Surprisingly, the number of cell divisions within the gamete lineage is nearly independent of both life span and vegetative growth. One consequence of the relatively stable number of replications per generation is that older plants may not pass along more somatically acquired mutations to their offspring. We confirmed this hypothesis by genomic sequencing of progeny from young and old plants. This independence can be achieved by hierarchical arrangement of cell divisions in plant meristems where vegetative growth is primarily accomplished by expansion of cells in rapidly dividing meristematic zones, which are only rarely refreshed by occasional divisions of more quiescent cells. We support this model by 5-ethynyl-2′-deoxyuridine retention experiments in shoot and root apical meristems. These results suggest that stem-cell organization has independently evolved in plants and animals to minimize mutations by limiting DNA replication.

In contrast to most animals, plants lack a developmentally defined germline. Instead, gametes are derived late in plant development following variable periods of vegetative growth (1). An important consequence of this developmental strategy is that somatic mutations acquired during vegetative growth can be transmitted to the next generation (2). Numerous studies have been conducted in attempts to understand whether and how somatic mutations contribute to fitness and evolution in plants (3 ⇓ ⇓ ⇓ ⇓ –8). DNA replication during cell division is hypothesized to be a leading cause of genetic mutation (9 ⇓ –11), and mutation rates are highly correlated with genome duplications in many taxa (12 ⇓ ⇓ ⇓ –16). Thus, a critical impediment to the studies examining the role of somatic mutation in plant genome evolution is the lack of knowledge on the number of cell divisions separating a zygote from its gametes, a characteristic termed “cell depth” (17), and how that number changes with vegetative growth. To our knowledge, estimates of gametic cell depth in plants are limited to calculations based on mitotic index and growth rates (5, 18) or total cell numbers and DNA content (19), which provide no information on correlations between cell depth and development.

In contrast to the paucity of knowledge on cell depth, cell lineage analyses have been conducted in multiple plant species. The primary origin of all above-ground tissues of a plant is a dome-like structure named the shoot apical meristem (SAM). These cell-fate analyses have demonstrated that stem cells within the SAM do not have predetermined fates but give rise to organs in a probabilistic way based on their location within this stem-cell niche (20 ⇓ ⇓ –23). In Arabidopsis, it is estimated that two to four genetically effective cells (GEC) in the dry seed are the progenitors of late rosettes as well as flowers (20, 21, 24). In late-flowering mutants that undergo prolonged vegetative growth, the extra leaves produced are derived from these two to four cells, and not from expanded growth of cells normally responsible for earlier leaves (25). Similar results have been reported in maize mutants that undergo additional vegetative growth (26), suggesting that this is a conserved feature of plant growth. It is generally accepted that over longer periods of growth, divisions of the genetically effective cells will increase the cell depth in the SAM before flowering and gametogenesis, resulting in an increase in the number of somatically acquired mutations transmitted to offspring (6).

Here we report quantitative analysis of germline DNA replications in Arabidopsis and test whether the number of replications increases with prolonged vegetative growth. We used two independent methodologies, based on intrinsic properties of DNA replication. First, we measured loss of telomeric DNA due to the end-replication problem in telomerase mutants, which are incapable of maintaining telomeres. Second, we measured accumulation of mutations due to polymerase misincorporation in mutants deficient for mismatch repair. This analysis showed that the number of DNA replications increased only slightly under long-lived conditions, demonstrating that the cell depth of gametes is not linearly proportional to the vegetative growth period.


The fate of a cell in the shoot meristem is dependent on its position
The shoot meristem of dicotyledons, which gives rise to the stalk and leaves, is comprised of three layers:
L1 (outermost layer, 1 cell thick)
L2 (lies beneath L1, 1 cell thick)
L3 (inner most layer)
L1 & L2 comprise the tunica and divide by anticlinal divisions (perpendicular to layer).
L3 cells divide in any plane and make up the corpus.
Cell fate has been determined by generating chimeric tissues.
Chimeras are composed of cells of different genetoypes and are made by treatment with radiation or chemicals (colchicine).
Periclinal chimeras have one of the three layers marked differently.
In angiosperms, L1 becomes the epidermis while L2 & L3 produce cortex and vascular tissue.
Occassionly L1 or L2 cells divide periclinally, invade a new layer and adopt the fate of the new layer (regulative).
Mericlinal chimeras, the result of irradiation or mobilization of a transposon, are plants that have an entire sector marked by a clone.
These have been used to produce a probabilistic fate map in maize and Arabidopsis.

Meristem development is dependent on signals from the plant
In maize, the apical meristem gives rise to number of nodes (16 -22) and the tassel.
Isolated meristems do not retain memory of how many nodes produced and will generate a complete set of nodes.
The number of nodes is determined by interaction of the meristem with the plant.
Pea seedlingís meristems when bisected will regulate into 2 complete meristems.
Removal of part of a meristem will result in regeneration of a complete meristem.
Removal of a complete meristem results in an incipient meristem (at the base of the leaf) to develop.
Thus growing meristems inhibit the growth of nearby ones.
Leaf positioning (phyllotaxy) involves lateral inhibition and often produces a helical pattern of leaves on a stalk.

Root tissues are produced from root apical meristems by a highly sterotypical pattern of cell divisions
Root meristems resemble shoot meristems but have two important differences:
1) the Root Cap covers the root meristem (protection) &
2) no segmental arrangement as seen with the node-internode-node module.
The root is set up early in the late heart-stage embryo by a set of initial cells.
Each column of root cells originate with a specific cell in the meristem via a specific patten of cell division.
Nevertheless, this process is under regulatory control as laser ablation of developing root cells result in normal tissue.
At the centre of the root meristem is a quiescent centre of cells that do not divide.
There is no obvious segmental arrangement of the root as is seen with node-internode-leaf module of the shoot.


This study was supported by the National Science Foundation (IOS-1546867) to RJS and JS and the National Institutes of Health (R01-GM134682) to RJS and DWH. FJ and RJS acknowledge support from the Technical University of Munich-Institute for Advanced Study funded by the German Excellent Initiative and the European Seventh Framework Programme under grant agreement no. 291763. FJ is also supported by the SFB/Sonderforschungsbereich924 of the Deutsche Forschungsgemeinschaft (DFG). RJS is a Pew Scholar in the Biomedical Sciences, supported by The Pew Charitable Trusts. BTH was supported by the National Institute of General Medical Sciences of the National Institutes of Health (T32GM007103). The work conducted by the U.S. Department of Energy Joint Genome Institute is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Sequencing in this project was partially supported by a JGI community sequencing project grant CSP1678 to RS and GAT.

A phylogenomic approach reveals a low somatic mutation rate in a long-lived plant

Somatic mutations can have important effects on the life history, ecology, and evolution of plants, but the rate at which they accumulate is poorly understood and difficult to measure directly. Here, we develop a method to measure somatic mutations in individual plants and use it to estimate the somatic mutation rate in a large, long-lived, phenotypically mosaic Eucalyptus melliodora tree. Despite being 100 times larger than Arabidopsis, this tree has a per-generation mutation rate only ten times greater, which suggests that this species may have evolved mechanisms to reduce the mutation rate per unit of growth. This adds to a growing body of evidence that illuminates the correlated evolutionary shifts in mutation rate and life history in plants.

1. Background

Trees grow from multiple meristems which contain stem cells that divide to produce the somatic and reproductive tissues. A mutation occurring in a meristematic cell will be passed on to all resulting tissues, potentially causing an entire branch including leaves, stems, flowers, seeds, and pollen to have a genotype different from the rest of the plant [1,2]. These different genotypes may lead to phenotypic changes, potentially with important consequences for plant ecology and evolution [3–8]. For example, somatic mutations could explain how long-lived plants adapt to changing ecological conditions [9], and are thought to influence long-term variation in the rates of evolution and speciation among plant lineages [10]. Somatic mutations can degrade genetic stocks used in agriculture and forestry [11,12], confer herbicide resistance to weed species, [13] and have been linked to declining plant fitness in polluted areas [14]. However, despite the importance of somatic mutations and recent progress in understanding them [1,2,15–18], there remain significant analytical challenges in inferring somatic mutation rates from sequencing data in plants.

We present a solution to the challenges of measuring the somatic mutation rate that leverages the phylogeny-like structure of the plant itself to estimate the genome-wide somatic mutation rate of the individual. Our strategy has three key features. First, we sequence the full genome of three biological replicates of eight branch tips. Using three biological replicates per branch tip significantly reduces the false-positive rate, because many types of error (e.g. sequencing error or mutations induced during DNA extraction or library preparation) are very unlikely to appear at the same position in all three replicates, making it easy to distinguish these errors from biological signal. Second, our strategy includes an inbuilt positive control, because we can ask whether the phylogenetic tree we reconstruct from the set of putative somatic mutations across the eight branch tips reflects the known physical structure of the tree (i.e. whether phylogeny correctly reconstructs ontogeny, as is expected for plant development in most cases, but see below). Third, the approach allows us to estimate the false-negative rate and the false-discovery rate of our inferences directly from the replicate samples (see below).

We applied this approach to a long-lived yellow box (Eucalyptus melliodora) tree, notable for its phenotypic mosaicism: a single large branch in this individual is resistant to defoliation by Christmas beetles (Anoplognathus spp Coleoptera: Scarabaeidae) due to stable differences in leaf chemistry and gene expression [19,20]. We find that the rate of somatic mutation per generation is relatively high, but the rate per metre of growth is surprisingly low in comparison to other species. We suggest potential proximate and ultimate reasons for the wide variation in somatic mutation rates across plants.

2. Material and methods

(a) Field sampling

We used a known mosaic E. melliodora (yellow box). This tree is found near Yeoval, NSW, Australia (−32.75°, 148.65°). We collected the ends of eight branches in the canopy (figure 1). Branches were collected using an elevated platform mounted on a truck and were placed into labelled and sealed polyethylene bags which were immediately buried in dry ice in the field. Within the 24 h of collection, the samples were transferred to −80°C until DNA extraction. Simultaneously, we used a thin rope to trace each branch from the tip to the main stem. These rope lengths were measured to determine the lengths of the physical branches of the tree.

Figure 1. The Eucalyptus melliodora individual sequenced in this study. The eight branch tips sampled are shown by numbered green circles with internal nodes of the tree shown as letters in blue circles. Circles with dashed outlines are from the far side of the tree. Pink lines trace the physical branches that connect the sampled tips. The herbivore-resistant branch comprises samples 1–3.

(b) DNA extraction, library preparation, and sequencing

The branches were maintained below −80°C on dry ice and in liquid nitrogen while sub-sampled in the laboratory. From each branch, we selected a branch tip which had at least three consecutive leaves still attached to the stem. From this branch tip, we independently sub-sampled roughly 100 mg of leaf from the ‘tip-side' of the mid-vein on three consecutive leaves using a single hole punch into a labelled microcentrifuge tube containing two 3.5 mm tungsten carbide beads. The sealed tube was submerged in liquid nitrogen before the leaf material was ground in a Qiagen TissueLyser (Qiagen, Venlo, Netherlands) at 30 Hz in 30 s intervals before being submerged in liquid nitrogen again. This was repeated until the leaf tissue was a consistent powder, up to a total of 3.5 min grinding time.

DNA was extracted from this leaf powder using the Qiagen DNeasy Plant Mini Kit (Qiagen, Venlo, The Netherlands), following the manufacturer's instructions. DNA was eluted in 100 µl of elution buffer. DNA quality was assessed by gel electrophoresis (1% agarose in 1 × TAE containing ethidium bromide), and quantity was determined by Qubit Fluorometry (Invitrogen, California, USA) following manufacturer's instructions.

We used a Bioruptor (Diagnode, Seraing (Ougrée), Belgium) to fragment 1 μg of DNA to an average size of 300 bp (35 s on ‘High', 30 s off for 35 cycles at 4°C). The fragmented DNA was purified using 1.6 × SeraMag Magnetic Beads (GE LifeSciences, Illinois, USA) following the manufacturer's instructions. We used Illumina TruSeq DNA Sample Preparation kit (Illumina Inc., California, USA) following the manufacturer's instructions to generate paired-end libraries for sequencing. These libraries were sequenced on an Illumina HiSeq 2500 (Illumina Inc., California, USA) at the Biomolecular Resource Facility at the Australian National University, Canberra.

(c) Creation of pseudo-reference genome

Since there is no available reference genome for E. melliodora, we created a pseudo-reference genome by iterative mapping and consensus calling. To do this, we first mapped all of our reads to version 2.1 of the E. grandis reference genome [21] using NGM [22] and then updated the E. grandis reference genome using bcftools consensus [23]. We iteratively repeated this procedure until we saw only marginal improvement in the number of unmapped reads and reads that mapped with a mapping quality of zero. The alignment originally contained 67 M unmapped reads and 311 M reads that mapped with zero mapping quality, out of a total of 1792 M reads. After the first iteration, the alignment contained 61 M unmapped reads and 349 M reads that mapped with zero mapping quality. After the last iteration, the alignment contained 59 M unmapped reads and 311 M reads that mapped with zero mapping quality. The consensus of this alignment served as the reference for all further downstream analyses.

(d) Variant calling for positive control

To call variants for the positive control, we mapped each replicate of each branch tip (24 samples in total) to the final pseudo-reference genome using NGM and called genotypes using GATK 4 according to the GATK best practices workflow [24]. This resulted in a full genome alignment of all 24 samples (three replicates of eight branches) and produced an initial set of 9 679 544 potential variable sites, a number which includes all heterozygous sites in the genome.

We then filtered variants to minimize the false-positive rate by retaining only those sites in which: (i) genotype calls were identical within all three replicates of each branch tip (see also electronic supplementary material, §1) (ii) at least one branch tip had a different genotype than the other branch tips (iii) the site is biallelic, since multiple somatic mutations are likely to be extremely rare (iv) the total depth across all samples is less than or equal to 500 (i.e. roughly twice the expected depth of 240×), since excessive depth is a signal of alignment issues (v) the ExcessHet annotation was less than or equal to 40, since excessive heterozygosity at a site is a sign of genotyping errors, particularly in a site that is actually uniformly heterozygous throughout the tree but at which genotyping errors have caused a mutation to be called and (vi) the site is not in a repetitive region determined by a lift-over of the E. grandis RepeatMasker annotation, as variation in repeat regions is often due to alignment error. This filtering produced a set of 99 high-confidence sites containing putative somatic mutations. The number of mutations that remained after the application of each filter is described in §5 of the electronic supplementary material.

(e) Positive control

Using the set of 99 high-confidence putative somatic mutations, we use the Phangorn package in R [25] to calculate the parsimony score of all 10 395 possible phylogenetic trees of eight taxa. This estimates the number of somatic mutations that would be required to explain each of the 10 395 phylogenetic trees, using the Fitch algorithm implemented in the Phangorn package. Of these trees, three had the maximum-parsimony score of 78. One of these three trees matched the topology of the physical tree (figure 2).

Figure 2. Phylogenetic trees reconstructed from somatic mutations resemble the physical structure of the tree more closely than expected by chance. (a) The PD between the physical tree (figure 1) and all 10 395 possible phylogenetic trees of eight taxa is shown as a histogram. A tree with the same topology as the physical tree will have a PD of 0. The solid red line represents the boundary of the smallest 5% of the distribution of PDs, such that a tree with a PD lower than this line is more similar to the physical tree than expected by chance. All of the maximum-parsimony trees (dashed red lines) and the one maximum-likelihood tree (solid blue line) are more similar to the physical tree than expected by chance. (b) A side-by-side comparison of the physical tree (left, branch lengths in metres) and the maximum-likelihood tree (right, branch lengths in substitutions per site) inferred with the JC model. Letters on the nodes of the physical tree (left) correspond to the same letters of internal nodes in figure 1. Numbers on the maximum-likelihood tree (right) are bootstrap percentages. There is a single difference between the two trees: the inferred tree groups samples M8 and M5 together with low bootstrap support (44%), which is a grouping that does not occur in the physical tree.

Next, we calculated the path difference (PD) between all 10 395 trees and the physical tree topology. The PD measures differences between two phylogenetic tree topologies [26] by comparing the differences between the path lengths of all pairs of taxa. Here we use the variant of the PD that treats all branch lengths as equal, because we are interested only in topological differences between trees, not branch length differences. Comparing all 10 395 trees to the physical tree topology provides a null distribution of PDs between all trees and the physical tree topology, which we can use to ask whether each of the three maximum-parsimony trees is more similar to the physical tree topology than would be expected by chance. To do this, we simply ask whether the PD of each of the three observed maximum-parsimony trees falls within the lower 5% of the distribution of PDs from all 10 395 trees. This was the case for all three maximum-parsimony trees (p < 0.001 in all cases figure 2), suggesting that our data contain biological signal which render the phylogenetic trees reconstructed from somatic mutations more similar than would be expected by chance to the physical tree.

(f) Variant calling for estimating the rate and spectrum of somatic mutations

Using the physical tree topology to define the relationship between samples, we called somatic mutations using DeNovoGear's dng-call method [27] compiled from Model parameters were estimated from 3-fold degenerate sites in our NGM alignment, via VCFs generated by bcftools mpileup and bcftools call with--pval-threshold = 0. We estimated maximum-likelihood parameters using the Nelder–Mead numerical optimization algorithm implemented in the R package dfoptim ( We then called genotypes using the GATK best practices workflow as above, but with the standard-min-confidence-threshold-for-calling argument set to 0, causing the output VCF to contain every potentially variable site in the alignment. Thus, we used GATK to generate high-quality pileups from our alignments. These pileups were then analysed by dng-call to identify (i) heterozygous sites and (ii) de novo somatic mutations. Since successful haplotype construction in a region indicates a high-quality alignment, we used Whatshap 0.16 [28] to generate haplotype blocks from the heterozygous sites.

Next, we filtered our de novo variant set to remove potential false positives. We removed variants that (i) were on a haplotype block with a size less than 500 nucleotides (among other things, this filter will remove many putative variants that fall in long repeat regions) (ii) were within 1000 nucleotides of another de novo variant (indicative of alignment issues such as might occur in repeats and other regions) (iii) had an log likelihood of the data (LLD) score less than −5 (indicative of poor model fit) and (iv) had a de novo mutation probability (DNP) score less than 0.99999 (retaining only the highest confidence variants). This produced a final variant set of 90 variants.

(g) Estimation of the false-negative rate

To estimate the number of mutations that we were likely to have filtered out in our variant calling pipeline, we used the method of Ness et al. [29], adapted to the current phylogenetic framework. Specifically, we randomly selected 14 000 sites from the first 11 scaffolds of the pseudo-reference genome and randomly assigned 1000 of these sites to each of the 14 branches on the tree. For each of these sites, we induced in silico mutations into the raw reads with a three-step procedure. We first estimated the observed genotype at the root using DeNovoGear call at each site. We then chose a mutant genotype by mutating one of the alleles to a randomly chosen different base using a transition/transversion ratio of 2, reflecting the observed transition/transversion ratio of eucalypts. We edited the raw reads as follows: for each mutation, we defined the samples to be mutated as all of those samples that descend from the branch on which the in silico mutation occurred. For example, an in silico mutation occurring on branch B → C in figure 1 would affect all three replicates of samples 1, 2 and 3. We then edited the reads that align to the site in question to reflect the new mutation, depending on whether the reference genotype was homozygous or heterozygous. For homozygous sites, we selected the number of reads to mutate by generating a binomially distributed random number with a probability of 0.5 and a number of observations equal to the number of reads with the reference genotype. We then randomly selected that the number of reads with the reference allele to mutate to the mutant allele and edited the raw reads accordingly. For a heterozygous site, we edited the reads to replace all occurrences of the reference allele to mutant allele. The result of this procedure is the generation of a new set of raw fastq files, which now contain information on 1000 in silico mutations for every branch in the physical tree.

To determine the false-negative rate of the variant calling pipeline, we re-ran the entire pipeline using the edited reads and recorded how many of the 14 000 in silico mutations were recovered by the pipeline. This number was 4193, suggesting that our false-negative rate is 70.05%. In other words, we expect that our empirical analysis recovered roughly three in 10 true mutations, because our power is limited in part by attempts to filter out false positives, which also removes a number of true positives.

(h) Estimation of the false-discovery rate

To determine the false-discovery rate of the variant calling pipeline, we simulated random trees of our samples (where each of the eight branches is represented by three tips that denote the three replicates of that branch) by shuffling the tip labels until the tree had a maximal Robinson–Foulds distance from the original tree. This 24-taxon tree shares no splits with the original 24-taxon tree, so any phylogenetic information should be removed. We simulated 100 such trees and called variants using the pipeline above, but assuming that these trees were the physical tree, and ignoring any sites we had previously called as variable. Thus, any variants called by the pipeline must be false positives. We recovered 11 false positive calls over 100 simulations (i.e. 0.11 false-positive mutations per simulation), indicating our false-discovery rate is approximately 0.12%. We calculated the false-discovery rate only once, after the details of the pipeline were finalized, to avoid overfitting our pipeline to artefactually reduce the false-discovery rate.

3. Results and discussion

(a) Field sampling and sequencing

We selected eight branch tips that maximized the intervening physical branch length on the tree (figure 1), reasoning that this would increase our power by maximizing the number of sampled cell divisions and thus somatic mutations. We performed independent DNA extractions from three leaves from each branch tip, prepared three independent libraries for Illumina sequencing and sequenced each library to 10× coverage (assuming a roughly 500 Mbp genome size, as is commonly observed in Eucalyptus species [30]) using 100 bp paired-end sequencing on an Illumina HiSeq 2500. Quality control of the sequence data verified that each sample was sequenced to approximately 10× coverage and that each branch tip was therefore sequenced to approximately 30× coverage.

(b) Positive control analysis

We first performed a positive control to confirm that the phylogeny of a set of high-confidence somatic variants matches the physical structure of the tree. This approach relies on being able to infer the ontogeny of the tree with sufficient accuracy that a valid comparison can be made between the ontogeny of the tree and a phylogeny generated from that tree's somatic variants. Documenting a plant's ontogeny with sufficient accuracy may not be possible for all plant species or individuals. Nevertheless, the physical structure of the tree we studied was clear (figure 1), and although Eucalyptus trees are known to frequently lose branches, branch loss and regrowth should not affect the correlation between ontogeny and phylogeny provided that sufficient mutations accumulate during cell replication. To perform the phylogenetic positive control, we created a pseudo-reference genome using our data to update the genome of E. grandis (see methods). We then called variants using GATK [31] in all three replicates of all eight branch tips and used a set of strict filters (see methods and supplementary information) designed to avoid false-positive mutations in order to arrive at an alignment of 99 high-confidence somatic variants. To find the phylogenetic trees that best explain this alignment, we calculated the alignment's parsimony score on all 10 395 possible phylogenetic trees of eight samples. Parsimony is an appropriate method here because we do not expect more than one mutation to occur at any single site on any single branch of the E. melliodora tree. We then asked whether the three phylogenetic trees with the most parsimonious scores were more similar to the physical structure of the tree than would be expected by chance. To do this, we calculated the PD between the structure of the physical tree and each of the three most parsimonious trees. We then compared these differences to the null distribution of PDs generated by comparing the structure of the physical tree to all possible 10 395 trees of eight samples (figure 2a). All three maximum-parsimony trees were significantly more similar to the physical tree than would be expected by chance (p < 0.001 in all cases figure 2a, dashed red lines). Furthermore, one of the most parsimonious trees is identical to the structure of the physical tree, and a maximum-likelihood tree calculated from the same data shows just one topological difference compared to the structure of the physical tree, in which sample 8 is incorrectly placed as sister to sample 5, but with low bootstrap support of 44% (figure 2a, blue line figure 2b). As would be expected if plants accumulate somatic mutations as they grow, there is a significant correlation between the branch lengths of the physical tree measured in metres and the branch lengths of the maximum-parsimony tree of the same topology measured in number of somatic mutations (linear model forced through the origin: R 2 = 0.82, p < 0.001 see also electronic supplementary material, §4). Notably, while various factors such as the difficulty of correctly inferring plant ontogeny may limit the utility of a phylogenetic positive control such as we present here (i.e. may produce false-negative results in which the structure of the tree appears, erroneously, to differ from the phylogeny of the sequenced genomes), it is unlikely that these factors would erroneously cause a close match between the physical structure of the tree and a phylogeny generated from the genomes of eight branches of that tree (i.e. a false positive). We therefore conclude that these analyses demonstrate that the phylogeny recovered from the genomic data matches the physical structure of the tree and confirm that there is a strong biological signal in our data.

(c) Estimation of the somatic mutation rate

We next developed a full maximum-likelihood framework that extends the existing models in DeNovoGear [27] to detect somatic mutations in a phylogenetic context and used this framework to estimate the full rate and spectrum of somatic mutations in the individual E. melliodora (see Material and methods). This method improves on the approach we used in our positive control, above, because it increases our power to detect true somatic mutations and avoid false positives by assuming that the phylogenetic structure of the samples follows the physical structure of the tree, an assumption that is validated by the analyses above. It also makes better use of the replicate sampling design than the method we use for our positive control, above, by directly modelling the expected variation in sequencing data across our three biological replicates under the expectation that all three replicates were sequenced from a single underlying genotype (see methods and electronic supplementary material). Using this framework, we identified 90 high-confidence somatic variants.

Of the 90 high-confidence variants we identified, 20 were in genes. Of these, six were in coding regions, with five non-synonymous mutations and one synonymous mutation. The small sample size of synonymous and non-synonymous mutations means that we cannot provide a meaningful estimate of the ratio of non-synonymous to synonymous somatic mutations, although such an estimate would help to understand the extent to which somatic mutations may be under selection. We detected seven mutations on the branch that separates the herbivore-resistant samples from the other samples (branch B → C, figure 1). Although we lack the functional evidence to determine whether any of these mutations are directly involved in the resistance phenotype, two of the mutations occur near genes that are plausible candidates for further investigation. One mutation occurs near Eucgr.C00081, which is a zinc-binding CCHC-type protein belonging to a small protein family known to bind RNA or ssDNA in Arabidopsis thaliana and thus potentially involved in gene expression regulation. Another mutation occurs near Eucgr.I01302, an acid phosphatase that may have as a substrate phosphoenol pyruvate, and therefore may be involved in pathways associated with the production of various secondary metabolites, including those identified in a recent GWAS study in a closely related eucalypt [32].

We used the replicate sampling design of our analysis to estimate the false-negative rate and the false-discovery rate of our approach. It is necessary to estimate both the number of false-negative mutations and the number of false-positive mutations in order to estimate a somatic mutation rate. The former allows one to correct for the number of somatic mutations which a pipeline has failed to detect, while the latter allows one to correct for the number of somatic mutations which a pipeline has erroneously inferred. We estimated the false-negative rate by creating 14 000 in silico somatic mutations in the raw reads [33], comprising 1000 in silico mutations for each of the 14 branches of the physical tree, and measuring the recovery rate of these in silico mutations using our maximum-likelihood approach. We were able to recover 4193 of the in silico mutations, suggesting that our recovery rate is 29.95%, and thus our false-negative rate is 70.05%. This false-negative rate was similar across all of the 14 branches in the tree (see electronic supplementary material, §2). Our ability to recover mutations differs substantially between repeat regions and non-repeat regions: we recover 40% of the simulated mutations in non-repeat regions, but just 13% of the simulated mutations in repeat regions (which make up roughly 40% of the genome). This difference is explained primarily by the stringent filters we use, that lead us to screen out many putative somatic mutations in repeat regions. We then estimated the number of false-positive mutations in our data, and hence the false-discovery rate (the percentage of the observed mutations that are false positives) by repeating our detection pipeline after permuting the labels of samples and replicates to remove all phylogenetic information in the data, and only considering sites that we had not previously identified as variable (see methods). By removing phylogenetic information and previously identified variable sites, we can be sure that any mutations detected by this pipeline are false positives. Across 100 such permutations, we detected 11 false-positive mutations in total, suggesting that our pipeline generates 0.11 false-positive variant calls per experiment, and that the false-discovery rate for our analysis is 0.12%.

Based on these analyses, we can estimate the mutation rate per metre of physical growth and per year. We estimate that the true number of somatic mutations in our samples is 300 (calculated as: (90 high-confidence mutations minus 0.11 false-positive mutations)/the recovery rate of 0.2995)). Since we sampled a total of 90.1 m of physical branch length, this equates to 3.3 somatic mutations per diploid genome per metre of branch length, or 2.75 × 10 −9 somatic mutations per base per metre of physical branch length. Although the exact age of this individual is unknown and difficult to estimate––it lives in a temperate climate and does not produce growth rings––its age is nevertheless almost certainly between 50 and 200 years old. Given that the physical branch length connecting each sampled branch tip to the ground varies between 8.4 m and 20.3 m, we estimate that the mutation rate per base per year for a single apical meristem lies in the range 1.16 × 10 −10 to 1.12 × 10 −9 (i.e. 8.4 × 2.75 × 10 −9 /200 to 20.3 × 2.75 × 10 −9 /50). It is important to note that it remains unclear whether mutations in growing plants accumulate linearly with the amount of physical growth. Indeed, evidence is accumulating that in at least some (and perhaps most) species, mutations may accumulate primarily at branching events rather than during elongation of individual branches [34,35]. If this is the case, then the correlation we observe between the physical branch length and the number of inferred somatic mutations (see above, and electronic supplementary material, §4) may be due to a correlation between the physical length of a branch and the number of branching events that occurred along that branch during the plant's development. It is not possible to directly estimate the number of branching events along each branch in the individual tree we used in this study, because we expect that the tree will have regularly lost branches throughout its life, leaving no accurate record of the number of branching events.

(d) What drives differences in somatic mutation rates among species?

With some additional assumptions, it is also possible to estimate the mutation rate per generation and to compare this to estimates from other plants. The average height of an adult E. melliodora individual is between 15 m and 30 m [36], so if we assume that all somatic mutations are potentially heritable (about which there is limited evidence [1] and ongoing discussion [37]), we can estimate the per-generation mutation rate. To do this, we assume that a typical seed will be produced from a branch that has experienced 15–30 m of linear growth from the seed [36], and that mutations will have accumulated along that branch at 2.75 × 10 −9 somatic mutations per base per metre of physical branch length, estimated above. We therefore estimate that the heritable somatic mutation rate per generation is between 4.13 × 10 −8 and 8.25 × 10 −8 mutations per base. For comparison the roughly 20 cm tall Arabidopsis thaliana has a per-generation mutation rate of 7.1 × 10 −9 mutations per base [38]. To the extent that such a comparison is accurate, which will be somewhat limited because the former estimate considers only somatic mutations and the latter considers all heritable mutations including those caused during meiosis, we can then compare these estimates. Comparing the estimates suggests that despite being roughly 100 times taller than Arabidopsis thaliana, the per-generation mutation rate of E. melliodora is just approximately 10 times higher, which is achieved by a roughly fifteen-fold reduction in the mutation rate per physical metre of plant growth.

Our work adds to a growing body of evidence that low somatic mutation rates per unit of growth are a general feature of many large plant species [1,2,15,16,18]. For example, a recent study of the Sitka spruce estimated a per-generation somatic mutation rate of 2.7 × 10 −8 , with confidence intervals that overlap ours [15]. While this per-generation rate is very similar to the one we estimate here, the rate of somatic mutation per metre of growth is around an order of magnitude lower in the Sitka spruce than our estimate for E. melliodora (2.75 × 10 −9 somatic mutations per base pair per metre of growth for E. melliodora estimated here, versus 3.5 × 10 −10 somatic mutations per base pair per metre of growth for Sitka spruce, estimated by dividing the per-generation mutation rate of 2.7 × 10 −8 mutations per base by the average height of studied individuals of 76 m [15], an appropriate calculation because the somatic mutation rate was estimated from paired samples taken from the base and the top of a collection of individual trees). Lower somatic mutation rates per unit of growth in larger plants may be the result of selection for reduced somatic mutation rates in response to the accumulation of increased genetic load in larger individuals [1,2,10,15,39–41]. This pattern could also explain why larger plants tend to have lower average rates of molecular evolution than their smaller relatives [10,42].

Several possible mechanisms might account for a reduction in accumulation of mutations per unit of growth in larger plants. Selection may favour reduction in the mutation rate per cell division through enhanced DNA repair to reduce the lifetime mutation risk. Alternatively, it may be that the reduction in the mutation rate is due to slower cell division. For example, plant meristems contain a slowly dividing population of cells in the central zone of the apical meristem, and these cells are known to divide more slowly in trees than in smaller plants [43]. Indeed, the rate of cell division in the central zone is so low that one estimate put the total number of cell divisions per generation in large trees as low as one hundred [43]. Regardless of the underlying mechanism, the surprisingly low rates of somatic mutation in large plants reported here and elsewhere suggest an emerging picture in which there is a strong link between the somatic mutation rates and life history across the plant kingdom. Longevity and size are two aspects of plant life history likely to be of central importance to the evolution of somatic mutation rates. Larger plants may suffer from a higher accumulation of somatic mutations because of the necessity for additional cell divisions. Plants that live longer may suffer from a higher accumulation of somatic mutations because of the accumulation of DNA damage over time and/or increased cell turnover in long-lived tissues. The relative importance of these two factors may differ among clades, species, and individual tissues and is likely to also depend on the balance between DNA damage and repair between cell divisions [44], the accuracy of DNA replication, cell size, and the rate of cell division. We hope that the approach we describe here will help in further understanding how these and other factors contribute to the accumulation or avoidance of somatic mutations in plants.

Watch the video: Types of Plant Tissues (October 2022).