During development, multipotent cells make a series of cell fate decisions, eventually leading to the various distinct cell types in the body. With the advent of high-throughput measurement techniques, it is possible to measure the expression levels of all genes in individual cells. The goal is to extract the intermediate cell states, lineage decisions and the key transcription factors that control the sequence of decisions.
Analyzing high dimensional data and extracting informative patterns is extremely challenging. Perhaps therefore, in the literature, the task of identifying cell types, inferring lineage relationships and extracting the molecular circuits have been treated as independent challenges. Surprisingly, we discovered that solving all three inference problems simultaneously was much more effective. In particular, not only could we identify cell states and lineage transitions, but also extract key master regulators that control lineage decisions.
We developed a statistical framework to simultaneously infer cell fate lineage trees and identify the corresponding key cell fate transition genes. After validating their algorithm on data from immune and intestinal development, we applied it single-cell data from human in vitro cortical development generated by collaborators at the Allen Institute for Brain Science. Our analysis revealed an early split between forebrain and mid- or hindbrain progenitors and identified key transcription factors likely to be important in driving this decision. These results were validated in a companion manuscript published in Cell Stem Cell (PDF).
As molecular biology continues to generate more high-dimensional data, it is becoming crucial to find computational techniques to bridge the gap between complex data and mechanistic insight. Building this bridge is a fundamental challenge of computational biology and our study is a step in this direction.