Harvard University COVID-19 updates

Department News



Some fundamental cellular activities are directed by RNA-based machines. The RNAs in these machines adopt distinct three-dimensional structures in order to facilitate their function, much in the way that proteins fold into structures. For example, ribosomes, the large machines that produce our proteins, are mostly made of complex RNA structures. Other structural RNAs involved in fundamental cellular functions include the transfer RNAs, which recruit the building blocks that make proteins, and the spliceosomal RNAs, which are part of the machine that splices introns out of messenger RNAs. There are even examples of RNAs in large RNA-protein complexes for which we have little idea of their function, such as the Vault RNAs.

A signature of RNA structure that is clearly identifiable computationally is the co-variation of the nucleotide bases that come into contact (“are base-paired”) in the structure. Because pairs involving A:U and G:C (and to some extent G:U) are the most stable, two positions that are structurally paired tend to only allow mutations through evolution that preserve the RNA base pairing rules. Analyses of this kind were fundamental in computationally elucidating the base-paired structure of ribosomal RNAs to great precision before the three-dimensional structure was determined by crystallography. Thus, we expect that biologically relevant RNA structures would have the telltale signature of sequence covariation.

Over the last decade or so, now widely used RNA sequencing techniques have given us the ability to take a census of all RNAs (whether structural or not) present inside cells. These RNA-seq experiments have told us a consistent tale, that there are many more RNAs lurking inside a cell than one might have anticipated based on our best knowledge of the number of genes. They were called long noncoding RNAs (lncRNAs), to distinguish them from other numerous short regulatory RNAs in cells that are also being studied intensively, such as microRNAs. Why are there thousands of lncRNAs? How many of them have a function, as opposed to biochemical noise in the system? How many of them are structural, as opposed to functioning as linear RNAs, or being protein-coding mRNAs that we’ve missed?

It is not an easy task to elucidate what even one lncRNA does. Perhaps the best studied lncRNA is Xist, which is responsible for the inactivation of one of the two X chromosomes in mammalian female cells. Xist has been known for more than 20 years, but still its mechanism of action is largely unknown. We do not even know if Xist functions through a conserved RNA structure, or through its conserved RNA linear sequence. If we could show that Xist does have a conserved RNA structure, knowledge of that structure would help us design more focused experiments to probe its molecular mechanism.

A lot of effort has been put into discerning possible base-paired structures in several lncRNAs. But even random sequences can be folded into plausible-looking structures, and RNA folding predictions are largely unable to distinguish structural RNAs from nonstructural RNAs, even with the most recent methods that use a combination of computational and experimental techniques. A key question is whether a proposed lncRNA structure shows the telltale sequence covariation evidence that the structure is important enough to be conserved through evolution. The computational tools used by lncRNA researchers have been incapable of telling when a covariation signal is strong enough to be evidence for an RNA structure, as opposed to background noise caused by fluctuations in sequence evolution.

In this work, Rivas and colleagues present a computational tool to test for an evolutionarily conserved RNA structure, by rigorously quantitating the statistical strength of observed pairwise covariations, while also being fast and easy for biologists to use on any RNA alignment they want to study. This new method takes into account one of the most confounding sources of false covariation signals, that the common origin of biological sequences produces covariations that could be confounded with covariations due to RNA structure. The method, R-scape (RNA Structural Covariation Above Phylogenetic Expectation) pinpoints the covariations that can truly be considered as statistical evidence for a conserved RNA structure.

Interestingly, for structures that had previously been proposed for three of the best studied lncRNAs — Xist, HOTAIR, and steroid receptor activator RNA – reassessment of those structures with R-scape show that none of the covariation evidence that was used to support them is statistically significant above background. These results reopen the debate about whether these three lncRNAs have a function that depends on a conserved structure, and it will prompt reexamination of the evidence for conserved structure in other lncRNAs. R-scape contributes an important new computational tool for studying lncRNAs.

Read more in Nature Methods, PDF

Elena Rivas (l) and Sean Eddy

Elena Rivas (l) and Sean Eddy