Author response: The Human Cell Atlas

Article Figures and data Abstract Introduction What is the Human Cell Atlas, and what could we learn from it? Taxonomy: cell types Histology: cell neighborhood and position Development: transitions to differentiated cell types Physiology and homeostasis: cycles, transient responses and plastic states Disease: cells and cellular ecosystems Molecular mechanisms: intracellular and inter-cellular circuits A user's guide to the Human Cell Atlas: applications in research and medicine Toward a Human Cell Atlas Conclusion References Decision letter Author response Article and author information Metrics Abstract The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community. https://doi.org/10.7554/eLife.27041.001 Introduction The cell is the fundamental unit of living organisms. Hooke reported the discovery of cells in plants in 1665 (Hooke, 1665) and named them for their resemblance to the cells inhabited by monks, but it took nearly two centuries for biologists to appreciate their central role in biology. Between 1838 and 1855, Schleiden, Schwann, Remak, Virchow and others crystalized an elegant Cell Theory (Harris, 2000), stating that all organisms are composed of one or more cells; that cells are the basic unit of structure and function in life; and that all cells are derived from pre-existing cells (Mazzarello, 1999; Figure 1). Figure 1 Download asset Open asset A hierarchical view of human anatomy. A graphical depiction of the anatomical hierarchy from organs (such as the gut), to tissues (such as the epithelium in the crypt in the small intestine), to their constituent cells (such as epithelial, immune, stromal and neural cells). https://doi.org/10.7554/eLife.27041.002 To study human biology, we must know our cells. Human physiology emerges from normal cellular functions and intercellular interactions. Human disease entails the disruption of these processes and may involve aberrant cell types and states, as seen in cancer. Genotypes give rise to organismal phenotypes through the intermediate of cells, because cells are the basic functional units, each regulating their own program of gene expression. Therefore, genetic variants that contribute to disease typically manifest their action through impact in a particular cell types: for example, genetic variants in the IL23R locus increase risk of autoimmune diseases by altering the function of dendritic cells and T-cells (Duerr et al., 2006), and DMD mutations cause muscular dystrophy through specific effects in skeletal muscle cells (Murray et al., 1982). For more than 150 years, biologists have sought to characterize and classify cells into distinct types based on increasingly detailed descriptions of their properties, including their shape, their location and relationship to other cells within tissues, their biological function, and, more recently, their molecular components. At every step, efforts to catalog cells have been driven by advances in technology. Improvements in light microscopy were obviously critical. So too was the invention of synthetic dyes by chemists (Nagel, 1981), which biologists rapidly found stained cellular components in different ways (Stahnisch, 2015). In pioneering work beginning in 1887, Santiago Ramón y Cajal applied a remarkable staining process discovered by Camillo Golgi to show that the brain is composed of distinct neuronal cells, rather than a continuous syncytium, with stunningly diverse architectures found in specific anatomical regions (Ramón y Cajal, 1995); the pair shared the 1906 Nobel Prize in Physiology or Medicine for their work. Starting in the 1930s, electron microscopy provided up to 5000-fold higher resolution, making it possible to discover and distinguish cells based on finer structural features. Immunohistochemistry, pioneered in the 1940s (Arthur, 2016) and accelerated by the advent of monoclonal antibodies (Köhler and Milstein, 1975) and Fluorescence-Activated Cell Sorting (FACS; Dittrich and Göhde, 1971; Fulwyler, 1965) in the 1970s, made it possible to detect the presence and levels of specific proteins. This revealed that morphologically indistinguishable cells can vary dramatically at the molecular level and led to exceptionally fine classification systems, for example, of hematopoietic cells, based on cell-surface markers. In the 1980s, Fluorescence in situ Hybridization (FISH; Langer-Safer et al., 1982) enhanced the ability to characterize cells by detecting specific DNA loci and RNA transcripts. Along the way, studies showed that distinct molecular phenotypes typically signify distinct functionalities. Through these remarkable efforts, biologists have achieved an impressive understanding of specific systems, such as the hematopoietic and immune systems (Chao et al., 2008; Jojic et al., 2013; Kim and Lanier, 2013) or the neurons in the retina (Sanes and Masland, 2015). Despite this progress, our knowledge of cell types remains incomplete. Moreover, current classifications are based on different criteria, such as morphology, molecules and function, which have not always been related to each other. In addition, molecular classification of cells has largely been ad hoc – based on markers discovered by accident or chosen for convenience – rather than systematic and comprehensive. Even less is known about cell states and their relationships during development: the full lineage tree of cells from the single-cell zygote to the adult is only known for the nematode C. elegans, which is transparent and has just ~1000 cells. At a conceptual level, one challenge is that we lack a rigorous definition of what we mean by the intuitive terms 'cell type' and 'cell state'. Cell type often implies a notion of persistence (e.g., being a hepatic stellate cell or a cerebellar Purkinje cell), while cell state often refers to more transient properties (e.g., being in the G1 phase of the cell cycle or experiencing nutrient deprivation). But, the boundaries between these concepts can be blurred, because cells change over time in ways that are far from fully understood. Ultimately, data-driven approaches will likely refine our concepts. The desirability of having much deeper knowledge about cells has been well recognized for a long time (Brenner, 2010; Eberwine et al., 1992; Shapiro, 2010; Van Gelder et al., 1990). However, only in the past few years has it begun to seem feasible to undertake the kind of systematic, high-resolution characterization of human cells necessary to create a systematic cell atlas. The key has been the recent ability to apply genomic profiling approaches to single cells. By 'genomic approaches' we mean methods for large-scale profiling of the genome and its products, including DNA sequence, chromatin architecture, RNA transcripts, proteins, and metabolites (Lander, 1996). It has long been appreciated that such methods provide rich and comprehensive descriptions of biological processes. Historically, however, they could only be applied to bulk tissue samples comprised of an ensemble of many cells, providing average genomic measures for a sample, but masking their differences across cells. The result is as unsatisfying as trying to understand New York, London or Mumbai based on the average properties of their inhabitants. The first single-cell genomic characterization method to become feasible at large-scale is trancriptome analysis by single cell RNA-Seq (Box 1; Hashimshony et al., 2012; Jaitin et al., 2014; Picelli et al., 2013; Ramsköld et al., 2012; Shalek et al., 2013). Initial efforts first used microarrays and then RNA-seq to profile RNA from small numbers of single cells, which were obtained either by manual picking from in situ fixed tissue, using flow-sorting or, later on, with microfluidic devices, adapted from devices developed initially for qPCR-based approaches (Crino et al., 1996; Dalerba et al., 2011; Marcus et al., 2006; Miyashiro et al., 1994; Zhong et al., 2008). Now, massively parallel assays can process tens and hundreds of thousands of single cells simultaneously to measure their transcriptional profiles at rapidly decreasing costs (Klein et al., 2015; Macosko et al., 2015; Shekhar et al., 2016) with increasing accuracy and sensitivity (Svensson et al., 2017; Ziegenhain et al., 2017). In some cases, it is even possible to register these sorted cells to their spatial positions in images (Vickovic et al., 2016). Single-cell RNA sequencing (scRNA-seq) is rapidly becoming widely disseminated. Box 1: Key experimental methods for single-cell genomics Over the past several years, powerful approaches have emerged that make it possible to measure molecular profiles and signatures at single-cell resolution. The field remains very active, with new methods being rapidly developed and existing ones improved. Single-cell RNA-Seq (scRNA-seq) refers to a class of methods for profiling the transcriptome of individual cells. Some may take a census of mRNA species by focusing on 3'- or 5'-ends (Islam et al., 2014; Macosko et al., 2015), while others assess mRNA structure and splicing by collecting near-full-length sequence (Hashimshony et al., 2012; Ramsköld et al., 2012). Strategies for single-cell isolation span manual cell picking, initially used in microarray studies (Eberwine et al., 1992; Van Gelder et al., 1990), FACS-based sorting into multi-well plates (Ramsköld et al., 2012; Shalek et al., 2013), microfluidic devices (Shalek et al., 2014; Treutlein et al., 2014), and, most recently, droplet-based (Klein et al., 2015; Macosko et al., 2015) and microwell-based (Fan et al., 2015; Yuan and Sims, 2016) approaches. The droplet and microwell approaches, which are currently coupled to 3'-end counting, have the largest throughput, allowing rapid processing of tens of thousands of cells simultaneously in a single sample. scRNA-seq is typically applied to freshly dissociated tissue, but emerging protocols use fixed cells (Nichterwitz et al., 2016; Thomsen et al., 2016) or nuclei isolated from frozen or lightly fixed tissue (Habib et al., 2016b; Lake et al., 2016). Applications to fixed or frozen samples would simplify the process flow for scRNA-seq, as well as open the possibility of using archival material. Power analyses provides a framework for comparing the sensitivity and accuracy of these approaches (Svensson et al., 2017; Ziegenhain et al., 2017). Finally, there has been progress in scRNA-Seq with RNA isolated from live cells in their natural microenvironment using transcriptome in vivo analysis (Lovatt et al., 2014). Mass cytometry (CyTOF) and related methods allow multiplexed measurement of proteins based on antibodies barcoded with heavy metals (Bendall et al., 2014; Levine et al., 2015). In contrast to comprehensive profiles, these methods invglve pre-defined signatures and require an appropriate antibody for each target, but they can process many millions of cells for a very low cost per cell. They are applied to fixed cells. Recently, the approach has been extended to the measurement of RNA signatures through multiplex hybridization of nucleic-acid probes tagged with heavy metals (Frei et al., 2016). Single-cell genome and epigenome sequencing characterizes the cellular genome. Genomic methods aim either to characterize the whole genome or capture specific pre-defined regions (Gao et al., 2016). Epigenomic methods may capture regions based on distinctive histone modifications (single-cell ChIP-Seq; Rotem et al., 2015a), accessibility (single-cell ATAC-Seq; Buenrostro et al., 2015; Cusanovich et al., 2015), or likewise characterize DNA methylation patterns (single-cell DNAme-Seq; Farlik et al., 2015; Guo et al., 2013; Mooijman et al., 2016; Smallwood et al., 2014) or 3D organization (single-cell Hi-C; Nagano et al., 2013; Ramani et al., 2017). Combinatorial barcoding strategies have been used to capture measures of accessibility and 3D organization in tens of thousands of single cells (Cusanovich et al., 2015; Ramani et al., 2017). Single cell epigenomics methods are usually applied to nuclei, and can thus use frozen or certain fixed samples. Some methods, such as single-cell DNA sequencing, are currently applied to relatively few cells, due to the size of the genome and the sequencing depth required. Other methods, such as single-cell analysis of chromatin organization (by either single-cell ATAC-Seq; Buenrostro et al., 2015; Cusanovich et al., 2015) or single-cell ChIP-Seq (Rotem et al., 2015a), currently yield rather sparse data, which presents analytic challenges and benefits from large numbers of profiled cells. Computational analyses have begun to address these issues by pooling of signal across cells and across genomic regions or loci (Buenrostro et al., 2015; Rotem et al., 2015a) and by imputation (Angermueller et al., 2016). Single-cell multi-omics techniques aim to collect two or more types of data (transcriptomic, genomic, epigenomic, and proteomic) from the same single cell. Recent studies have simultaneously profiled the transcriptome together with either the genome (Angermueller et al., 2016; Dey et al., 2015; Macaulay et al., 2015), the epigenome (Angermueller et al., 2016), or protein signatures (Albayrak et al., 2016; Darmanis et al., 2016; Frei et al., 2016; Genshaft et al., 2016). Efforts to combine three and more approaches are underway (Cheow et al., 2016). Multi-omic methods could help fill in causal chains from genetic variation to regulatory mechanisms and phenotypic outcome in health and in disease, especially cancer. Multiplex in situ analysis and other spatial techniques aim to detect a limited number of nucleic acids and/or proteins in situ in tissue samples – by hybridization (for RNA), antibody staining (for proteins), sequencing (for nucleic acids), or other tagging strategies. These in situ results can then be used to map massive amounts of single-cell genomic information from dissociated cells onto the tissue samples providing important clues about spatial relationships and cell-cell communication. Some strategies for RNA detection, such as MERFISH (Chen et al., 2015b; Moffitt et al., 2016b) or Seq-FISH (Shah et al., 2016), combine multiplex hybridization with microscopy-based quantification to assess distributions at both the cellular and subcellular level; other early studies have performed in situ transcription (Tecott et al., 1988), followed by direct manual harvesting of cDNA from individual cells (Crino et al., 1996; Tecott et al., 1988). Some approaches for protein detection, such as Imaging Mass Cytometry (Giesen et al., 2014) and Mass Ion Bean Imaging (Angelo et al., 2014), involve staining a tissue specimen with antibodies, each labeled with a barcode of heavy metals, and rastering across the sample to measure the proteins in each 'pixel'. This technique permits the reconstruction of remarkably rich images. Finally, more recent studies have performed RNA-seq in situ in cells and in preserved tissue sections (Ke et al., 2013; Lee et al., 2014). Many in situ methods can benefit from tissue clearing and/or expansion to improve detection and spatial resolution (Chen et al., 2015a; Chen et al., 2016a; Moffitt et al., 2016a; Yang et al., 2014). The complexity and accuracy of these methods continues to improve with advances in sample handling, chemistry and imaging. Various methods are also used, for example, to measure transcriptomes in situ with barcoded arrays (Ståhl et al., 2016). Cell lineage determination Because mammals are not transparent and have many billions of cells, it is not currently possible to directly observe the fate of cells by microscopy. Various alternative approaches have been developed (Kretzschmar and Watt, 2012). In mice, cells can be genetically marked with different colors (Barker et al., 2007) or DNA barcodes (Lu et al., 2011; Naik et al., 2013; Perié and Duffy, 2016), and their offspring traced during development. Recent work has used iterative CRISPR-based genome editing to generate random genetic scars in the fetal genome and use them to reconstruct lineages in the adult animal (McKenna et al., 2016). In humans, where such methods cannot be applied, human cell lineages can be monitored experimentally in vitro, or by transplantation of human cells to immunosuppressed mice (Morton and Houghton, 2007; O'Brien et al., 2007; Richmond and Su, 2008), or can be inferred from in vivo samples by measuring the DNA differences between individual sampled cells, arising from random mutations during cell division, and using the genetic distances to construct cellular phylogenies, or lineages et al., 2014; et al., 2013). this of are many methods at of and high-throughput are being developed to in situ gene expression in tissues at single-cell and even resolution (Chen et al., 2015b; et al., 2013; Lee et al., 2014; et al., 2014; et al., 2016; et al., the of of proteins at cellular or resolution (Angelo et al., 2014; Chen et al., 2015a; et al., 2014; et al., 2011; et al., 2014; Yang et al., of chromatin state (Buenrostro et al., 2015; Cusanovich et al., 2015; Farlik et al., 2015; Guo et al., 2013; et al., 2013; Mooijman et al., 2016; Rotem et al., 2015a; Rotem et al., 2015b; Smallwood et al., and DNA mutations to allow reconstruction of cell lineages et al., 2014; et al., 2016; et al., 2013; et al., et al., 2013). Various are also single-cell methods to simultaneously measure several types of molecular profiles in the same cell (Albayrak et al., 2016; et al., 2016; et al., 2014; Darmanis et al., 2016; Dey et al., 2015; Frei et al., 2016; Genshaft et al., 2016; Macaulay et al., 2015). a there is a growing sense in the scientific community that the time is for a to complete the Human Cell Atlas that pioneering 150 years Various have in a number of over the past two years, in an international in London in In addition, several efforts are underway or in – for example, related to brain cells and immune cells. by such efforts, including the have sought information from the scientific community about the notion of cell or tissue The of this is to the scientific community in this the is driven by that have or are to in the the is an the of a cell and its potential for and an can to new understanding of and and inter-cellular and our ability to the impact of on cells. It will also yield molecular with applications in both research and a Human Cell Atlas Project would be a shared international effort diverse scientific are in the Human Cell Atlas the first of this which will on a was on What is the Human Cell Atlas, and what could we learn from it? At its most basic level, the Human Cell Atlas must a comprehensive reference catalog of all human cells based on their properties and transient as well as their and an is more than just a it is a map that aims to show the relationships its By it can fundamental processes – to the of through the of To be an must also be an certain while The – a at the between and – this challenge in in about an with of (Box and Over the map of the more and more and – ad – the map the size of the and Box in In that the of such that the map of a single the of a and the map of the the of a In and the a of the size was that of the and which for with The were not of the of as their that that map was and not some was that they it up to the of and In the of the there are of that inhabited by and in all the there is other of the of from of Moreover, an must provide a of on which one can and concepts at many levels and even can be at level of and information into a key is a Human Cell Atlas key provide and show A natural would be to describe each human cell by a of molecular markers. For example, one describe each cell by the expression level of each of the human that each cell would be as a in the of markers could be to the expression levels of the levels of the of each the chromatin state of every and and the levels of each protein or each of each The and type of information to collect will based on a of and the biological provided by each et al., 2016; et al., 2013; et al., 2015). For specific it will be to for we will largely to the of gene which can be at The Atlas have or to and anatomical information (e.g., a morphology, or tissue information (e.g., the of the individual or time an and disease information is for results based on molecular profiles with rich knowledge about cell biology, and to capture and this information In some the Human Cell Atlas Project fundamental unit is a is to the Human Project fundamental unit is a are efforts to create for that the two key that human and and provide a for biological research and with the Human we will also for important where cell states can be and genetic and other approaches can be used to function and the Human Cell Atlas in important ways from the Human the of cell that it will require a distinct experimental and will involve making molecular and cellular the to will also be a a we could of an Human Cell Atlas that all markers in every cell in a every spatial position (by three for the every cell at every of a (by for time the cells by a and the of such cell from every human to differences in and it is not possible to construct such an However, it is increasingly feasible to sample from the of to understand the key and relationships all human cells. to the of the scientific community about a Human Cell we the central scientific What could we to learn from such an A Human Cell Atlas would have a impact on and medicine by our understanding of intracellular and intercellular to a new level of resolution. It would also provide signatures and for basic research detection, and genetic of every cell and applications and response to In the we and describe some early that these concepts will based on emerging It is that a Human Cell Atlas Project will require and will the of new It will also the of new and approaches that may have applications far – to biological in in the led to the by and of key methods, including the analysis of and experimental design 2015). Taxonomy: cell types The most fundamental level of analysis is the of cell In an where cells are as in a cells be in some appropriate not to differences in physiological states (e.g., the in molecular systems and 2010; et al., 2014; Kim et al., 2015; Shalek et al., 2013), and measurement et al., 2015; et al., 2014; Kim et al., 2015; Shalek et al., 2013; Shalek et al., 2014; et al., 2016). a cell be as a or a and 2010; et al., either in the or in a onto a that features. this notion is it is to give a definition of a 'cell are often as based on and molecular differences (Sanes and Masland, 2015). higher are finer ones may be less and may not a either because distinct types or because some are and not it remains based on molecular and physiological properties with each other. New methods will be both to discover types and to classify cells and, to refine the concepts and 2015; et al., 2013; et al., 2015; and 2017; et al., 2016). for data provide an framework et al., 2015; et al.,

How to cite this publication

Aviv Regev, Sarah A. Teichmann, Eric S. Lander, Ido Amit, Christophe Benoist, Ewan Birney, Bernd Bodenmiller, Peter Campbell, Piero Carninci, Menna R. Clatworthy, Hans Clevers, Bart Deplancke, Ian Dunham, James Eberwine, Roland Eils, Wolfgang Enard, Andrew Farmer, Lars Fugger, Berthold Göttgens, Nir Hacohen, Muzlifah Haniffa, Martin Hemberg, Seung K. Kim, Paul Klenerman, Arnold R. Kriegstein, Ed S. Lein, Sten Linnarsson, Emma Lundberg, Joakim Lundeberg, Partha P. Majumder, John C. Marioni, Miriam Mérad, Musa M. Mhlanga, Martijn C. Nawijn, Mihai G. Netea, Garry P. Nolan, Dana Pe’er, Anthony Phillipakis, Chris P. Ponting, Stephen R. Quake, Wolf Reik, Orit Rozenblatt–Rosen, Joshua R. Sanes, Rahul Satija, Ton N. Schumacher, Alex K. Shalek, Ehud Shapiro, Padmanee Sharma, Jay W. Shin, Oliver Stegle, Michael Stratton, Michael J. T. Stubbington, Fabian J. Theis, Mathias Uhlén, Alexander van Oudenaarden, Allon Wagner, Fiona M. Watt, Jonathan S. Weissman, B Wold, Ramnik J. Xavier, Nir Yosef (2017). Author response: The Human Cell Atlas. , DOI: https://doi.org/10.7554/elife.27041.011.

Related publications

Why join Raw Data Library?

Quality

Datasets shared by verified academics with rich metadata and previews.

Control

Authors choose access levels; downloads are logged for transparency.

Free for Academia

Students and faculty get instant access after verification.

Publication Details

Type

Preprint

Year

2017

Authors

Datasets

Total Files

Language

DOI

https://doi.org/10.7554/elife.27041.011

Join Research Community

Access datasets from 50,000+ researchers worldwide with institutional verification.

Get Free Access

Author response: The Human Cell Atlas

Frequently asked questions

Is access really free for academics and students?

How is my data protected?

Can I request additional materials?

Advance your research today

Author response: The Human Cell Atlas

Frequently asked questions

Is access really free for academics and students?

How is my data protected?

Can I request additional materials?

Advance your research today

Access Research Data

This PDF is not available in different languages.

Hans Clevers

Abstract

How to cite this publication

Related publications

Why join Raw Data Library?

Quality

Control

Free for Academia

Publication Details

Join Research Community