Article Figures and data Abstract Introduction Results Discussion Materials and methods References Decision letter Author response Article and author information Metrics Abstract Metagenomics and single-cell genomics have enabled genome discovery from unknown branches of life. However, extracting novel genomes from complex mixtures of metagenomic data can still be challenging and represents an ill-posed problem which is generally approached with ad hoc methods. Here we present a microfluidic-based mini-metagenomic method which offers a statistically rigorous approach to extract novel microbial genomes while preserving single-cell resolution. We used this approach to analyze two hot spring samples from Yellowstone National Park and extracted 29 new genomes, including three deeply branching lineages. The single-cell resolution enabled accurate quantification of genome function and abundance, down to 1% in relative abundance. Our analyses of genome level SNP distributions also revealed low to moderate environmental selection. The scale, resolution, and statistical power of microfluidic-based mini-metagenomics make it a powerful tool to dissect the genomic structure of microbial communities while effectively preserving the fundamental unit of biology, the single cell. https://doi.org/10.7554/eLife.26580.001 Introduction Advances in sequencing technologies have enabled the development of shotgun metagenomics and single-cell approaches to investigate environmental microbial communities. These studies revealed many previously uncharacterized genomes (Brown et al., 2015; Eloe-Fadrosh et al., 2016; Kashtan et al., 2014; Rinke et al., 2013), increasing the total number of sequenced microbial genomes to more than 50,000 (Joint Genome Institute's Integrated Microbial Genomes database, accessed December 1, 2016). However, the majority of environmental microbial diversity remains uncharacterized due to limitations in current techniques. Conventional shotgun metagenomic sequencing offers the ability to assemble genomes from a single heterogeneous sample, but is effective only if the complexity of the sample is not too great (Howe et al., 2014). Furthermore, it is often difficult to separate contigs belonging to closely related organisms because techniques designed to resolve these differences, such as tetranucleotide analysis, depend on ad hoc assumptions about nucleotide usage (Dick et al., 2009). It is possible to perform rigorous assemblies from independent single-cell genome amplifications but at the expense of lowering throughput (Rinke et al., 2014; Blainey, 2013). In addition, when performed in plates, single-cell sequencing approaches are typically expensive and laborious, although microfluidic approaches have helped to alleviate that limitation (Blainey et al., 2011). We report here a new mini-metagenomics approach which combines the advantages of shotgun and single-cell metagenomic analyses. This approach uses microfluidic parallelization to separate an environmental sample into many small sub-samples containing 5–10 cells, significantly reducing complexity of each sub-sample and allowing high quality assembly while enabling higher throughput than typical single-cell methods. Although each sub-sample contains a limited mixture of several genomes, single-cell resolution is regained through correlations of genome co-occurrence across sub-samples, which in turn enables a rigorous statistical interpretation of confidence and genome association. We validated this approach using a synthetic mixture of defined microbial species and then applied it to analyze two hot spring samples from Yellowstone National Park. Among 29 genomes larger than 0.5 Mbps, most belong to known bacterial and archaeal phyla but represent novel lineages at lower taxonomic levels; three genomes represent deeply branching novel phylogenies. Functional analysis revealed different metabolic pathways that cells may use to achieve the same biochemical process such as nitrogen and sulfur reduction. Using information associated with genome occurrence across sub-samples, we further assessed abundance and genome variations at the single-cell level. Our analyses demonstrate the power of the mini-metagenomic approach in de-convolving genomes from complex samples and assessing diversity in a mixed microbial population. Results Evaluating microfluidic-based mini-metagenomics using a mock community reveals improved whole genome amplification Microfluidic-based mini-metagenomics begins with microfluidic partitioning of each environmental sample randomly into 96 sub-samples with 5–10 cells per sub-sample (Figure 1A). 96 lysis and MDA (Multiple Displacement Amplification) reactions are performed in independent chambers of an automated and commercially available Fluidigm C1 Integrated Fluidic Circuit (IFC) (Figure 1B, Figure 1—figure supplement 1), which significantly reduces the time and effort required to perform these reactions. The C1 IFC has been used previously for mammalian single-cell RNA-seq and genome analysis experiments (Wu et al., 2014; Pollen et al., 2014; Treutlein et al., 2014; Gawad et al., 2014). The hardware, including the microfluidic circuit, was not altered for the mini-metagenomic experiments, but we did adapt a new reagent kit and designed scripts and protocols for amplifying genomic DNA from microbial cells. After whole-genome amplification, DNA from each sub-sample is harvested into a 96 well plate, and all subsequent wet lab steps are performed on the bench top. Sequencing libraries are prepared with distinct barcodes labeling DNA derived from each sub-sample and sequenced on the Illumina Nextseq platform (Figure 1C). Because of the small volumes used for microfluidic reactions, precious microbial samples and/or low cell concentrations that may not yield enough DNA for shotgun metagenomics can still be analyzed using this process. Another advantage of using a microfluidic platform is the enclosed reaction environment that limits potential contamination often observed with low input MDA reactions in well plates, a problem that, with other approaches, requires sophisticated protocols for contamination prevention and removal (Woyke et al., 2011). In addition, smaller MDA reaction volumes (~300 nL) are associated with lower gain and less amplification bias, which effectively improves coverage uniformity of amplified genomes (de Bourcy et al., 2014) (Materials and methods). Figure 1 with 3 supplements see all Download asset Open asset Microfluidic-based mini-metagenomics pipeline. (A) An environmental microbial sample is loaded onto a Fluidigm C1 IFC at the appropriate concentration so that cells are randomly dispersed into 96 microfluidic chambers at 5–10 cells per chamber. (B) Lysis and MDA are performed on the microfluidic device to generate 1–100 ng genomic DNA per sub-sample. (C) Nextera libraries are prepared from the amplified DNA off-chip and sequenced using 2 × 150 bp runs on the Illumina NextSeq platform. (D) Sequencing reads from each sub-sample are first assembled independently, then (E) sub-sample contigs are combined to form longer mini-metagenomic contigs. Contigs longer than 10 kbp are processed in the following steps. (F) Reads from each sub-sample are aligned to mini-metagenomic contigs > 10 kbp. (G) An occurrence map is generated, demonstrating the presence pattern of each contig in all sub-samples based on coverage. (H) Finally, contigs are binned into genome clusters based on a pairwise p value generated from co-occurrence information. Steps enclosed in the gray rectangle (A, B) are performed on the Fluidigm C1 IFC. Step C is carried out in 96 well plates. Steps D to H are performed in silico. https://doi.org/10.7554/eLife.26580.002 Sequence data is processed through a custom bioinformatics pipeline that takes advantage of information encapsulated in distinct yet related sub-samples (Yu, 2017). Reads from each sub-sample are trimmed and assembled into sub-sample contigs (Figure 1D), creating genome subassemblies. Since cells representing the same genome may appear in multiple sub-samples, overlapping genome subassemblies can be identified. Therefore, combined assembly of sub-sample reads and contigs results in longer mini-metagenomic contigs from which more meaningful biological information can be inferred (Figure 1E). With this mini-metagenomic approach, cells from the same phylogenetic groups are randomly partitioned into different sub-samples, providing a physically defined approach to bin metagenomic contigs. Aligning reads from each sub-sample to mini-metagenomic contigs enables us to determine the co-occurrence pattern of each contig (Figure 1F,G). Based on presence patterns across sub-samples, a p value for each pair of contigs is computed based on Fisher's exact test (Materials and methods), where a small p is interpreted as an indicator that two contigs belong to cells of the same genome. Finally, co-occurrence based p values are used as a pairwise distance metric for sequence-independent contig clustering (Figure 1H; Materials and methods). We tested performance of the mini-metagenomic amplification on the Fluidigm C1 IFC using a mixture of five bacterial species with known genomes (E. vietnamensis, S. oneidensis, E. coli, M. ruber, and P. putida) (Table 1; Materials and methods). We first used a dilution of the control sample at ~10 cells per sub-sample. Then, we further diluted the control sample so that each sub-sample contained 0.5 cells on average, effectively performing microfluidic single-cell MDA with the same downstream steps (Materials and methods). Performing MDA from multiple cells in a 300 nL microfluidic chamber improved genome coverage at similar sequencing depths because smaller MDA reaction volumes tend to reduce amplification gain and bias (de Bourcy et al., 2014; Zong et al., 2012). Table 1 Species used to construct mock bacterial communities. https://doi.org/10.7554/eLife.26580.006 Species nameGC content (%)Culture sourceGrowth mediumAssession numbersEchinicola vietnamensis KMM622144.7DSM 17526Marine brothNC_019904.1Shewanella oneidensis MR145.9JGILBNC_004347.2Escherichia coli MG165550.8ATCC 700926LBNC_000913.3Pseudomonas putida F161.9ATCC 700007Nutrient mediumNC_009512.1Meiothermus ruber63.4DSM 1279Termus ruber mediumNC_013946.1 Thirty mini-metagenomic and 36 single-cell limiting dilution sub-samples yielded 169 and 194 million paired end reads, respectively. Initial trimming removed 14 ± 1.1% (s.d.) reads; 80 ± 1.5% (s.d.) of the remaining reads mapped uniquely to reference genomes (Figure 1—figure supplement 2A). Unmapped reads made up only 3% of all reads (Figure 1—figure supplement 2B). Improperly mapped reads (3 ± 0.9% s.d.) were mostly short, low quality sequences, and chimeras represented <1% of the reads. Mini-metagenomic MDA reactions produced higher median coverage than single-cell MDA reactions at all sequencing depths (Figure 1—figure supplement 3A). The proportion of assembled genome as a function of aligned genome, however, did not differ significantly between mini-metagenomic and single-cell methods (Figure 1—figure supplement 3B). Therefore, data from mock bacterial communities demonstrate that with the mini-metagenomic method, less sequencing cost is required to recover similar genome coverage as compared to single-cell experiments, with the improvement mostly associated with the amplification rather than assembly steps. Microfluidic-based mini-metagenomics enables contig binning based on co-occurrence patterns Next, we performed microfluidic mini-metagenomic sequencing on two hot spring samples from Yellowstone National Park (Figure 2; Materials and methods). Sample #1 was collected from Bijah Spring and sample #2 was collected from Mound Spring (Table 2). 121 and 133 million paired end reads were obtained from the two samples respectively. We removed sub-samples with less than 800,000 paired end reads, yielding 49 and 93 sub-samples respectively. After quality filtering, >90% reads from each sub-sample were incorporated into contigs during independent assemblies (Figure 2—figure supplement 1). Re-assembling combined sub-sample reads increased contig length (Figure 2—figure supplement 2A). We obtained 643 and 1474 contigs longer than 10 kbp from the two samples, respectively, and used these contigs for subsequent analyses (Figure 2—figure supplement 2B; Table 3). To compare this performance to shotgun metagenomic assemblies, we obtained 32.5 million and 51.4 million shotgun metagenomic reads from Bijah Spring and Mound Spring samples and down sampled combined mini-metagenomic reads to match shotgun sequencing depths. After assembly (Materials and methods), we observed that for the Bijah Spring sample, where 49 sub-samples were available, the shotgun metagenomic assembly produced more contigs over 10 kbp (Figure 2—figure supplement 3A). However, for the Mound Spring sample, where 93 sub-samples were used, the mini-metagenomic assembly produced more contigs over 10 kbp. The largest contig from the mini-metagenomic assembly was also longer for the Mound Spring sample (Figure 2—figure supplement 3B). Therefore, it is likely that increasing the number of mini-metagenomic sub-samples improves assembly performance compared to shotgun metagenomics. It is also possible that the observed assembly improvement is due to higher complexity of the Mound Spring bacterial community, validating our hypothesis that the mini-metagenomic method benefits the analysis of complex microbial populations. Figure 2 with 6 supplements see all Download asset Open asset Genome bins extracted from microfluidic-based mini-metagenomic sequencing of Yellowstone National Park samples. Two samples from Bijah Spring in Mammoth Norris Corridor (A, C) and Mound Spring in Lower Geyser Spring Basin (B, D) were collected from Yellowstone National Park and analyzed using the microfluidic-based mini-metagenomic pipeline. (A, B) Heat maps of contig coverage across sub-samples are clustered hierarchically to reveal contigs that appear in similar sets of sub-samples (left). Colors represent logarithm of coverage in terms of number of base pairs in base 2. Pairwise p values generated using Fisher's exact test based on co-occurrence pattern of contig pairs reveal contig clusters (right). Shading here represents logarithm of p value in base 10 after correcting for multiple comparisons. (C, D) tSNE dimensionality reduction generated from pairwise p values. Each point represents a 10 kbp or longer contig. Colors represent assignment of each contig to a particular phylum based on annotation of genes on the contig. Black X's represent contigs unable to be assigned to any phylum because too many genes have unknown annotation. Genome bins larger than 0.5 Mbp are numbered and those with substantial numbers of single-copy marker genes for incorporation into Figure 4 are labeled in bold. Dotted circles outline genomes predominantly containing contigs unassigned at the phylum level. https://doi.org/10.7554/eLife.26580.007 Table 2 Information of hot spring samples from Yellowstone National Park. https://doi.org/10.7554/eLife.26580.014 Sample #1Sample #2NPS study numberYELL_05788YELL_05788Sample areaMammoth Norris CorridorLower Geyser BasinLocation nameBijah SpringMound SpringSample typeSedimentSedimentCollection timeSeptember 11, 2009 5:00pmSeptember 14, 2009 2:25pmLocation44.761133 N, 110.730900 W44.564833 N, 110.859933 WLocation typeHot springHot springTemperature (°C)6555pH7.09.0 Table 3 Yellowstone sample sequencing and assembly statistics. https://doi.org/10.7554/eLife.26580.015 Sample #1Sample #2IMG genome ID33000060683300006065Number of molecules sequenced120.5 M133.1 MNumber of contigs (>10 kbp)6431474Number of subsamples analyzed4993 Another advantage of mini-metagenomics lies in the information from sub-samples. In order to bin contigs into genomes, reads from each sub-sample were aligned back to mini-metagenomic contigs and coverage was tabulated (Figure 2A,B, Figure 2—figure supplement 2C,D). Many contig sets share similar coverage patterns across sub-samples, suggesting that they originate from the same genome. Because MDA results in variable coverage profiles, we generated a binary occurrence map by applying a coverage threshold (Materials and methods). Pairwise p values were computed with Fisher's exact test (Figure 2A,B, Figure 2—figure supplement 4). Finally, dimensionality reduction using pairwise p values as a distance metric generated clusters of contigs belonging to the same genomes (Maaten and Hinton, 2008) (Figure 2—figure supplement 5). To verify the validity of presence-based contig clusters, we annotated all predicted open reading frames (ORFs) (Huntemann et al., 2016) (Materials and methods). The relationship between contig length and the number of genes found is linear (Figure 2—figure supplement 6), consistent with small non-coding regions in bacterial genomes. Annotations were found for ~50% of the genes, and lineage assignment was performed using JGI's IMG/ER pipeline for metagenome annotation based on gene annotations from the same contig (Huntemann et al., 2016). If protein sequences of most genes on a contig were distantly related to known sequences, the contig was designated as unknown (Figure 2C,D) (Materials and methods). Approximately 70% of all contigs were assigned at the phylum level. Contigs with known assignment and from the same cluster always belonged to the same phylum, demonstrating that the presence-based method can correctly bin metagenomic contigs into genomes (Figure 2C,D). Some unassigned contigs are scattered among genome bins with known phylum level assignments and likely represent novel genes. Genome bins containing predominantly unassigned contigs are indicated by dotted circles (Figure 2C,D) and likely represent deeply branching lineages. We selected 29 partial genome bins from both hot spring samples with genome sizes over 0.5 Mbp for downstream analyses (Figure 3A). Assessment of genome bins demonstrated various levels of completeness and <5% marker gene duplication (Figure 3B,C). Genome completeness was not necessarily correlated with genome size because completeness was assessed through single-copy marker genes using CheckM (Parks et al., 2015). Using lineage assignment based on gene annotations, we identified eight genome bins with known phylum level assignments from the Bijah Spring sample. Three genomes (Bijah #3, Bijah #5, and Bijah #14) had unknown assignments (Figure 2C). In the Mound Spring sample, more reads and sub-samples resulted in 14 genomes with known phylum level assignments and three unassigned genomes (Mound #3, Mound #4, and Mound #7) were extracted (Figure 2D). A singleton contig 3.3 Mbp in length (#40) was also included as a separate genome. In both environmental samples, the ability to separate distinct clusters of Bacteroidetes contigs represents an advantage of our sequence-independent binning approach, where the presence of bacterial species across microfluidic chambers is determined only by Poisson distribution. Hence, the likelihood that closely related bacterial cells were isolated into the same chambers was small. Figure 3 Download asset Open asset Assembled size and completeness of Yellowstone hot spring genomes. Genomes of Bijah Spring and Mound Spring samples are sorted by assembled genome size (A). Names represent phylum level assignment based on annotated genes (Figure 2), concatenated marker gene phylogenetic tree (Figure 4), or individual marker gene trees (Figure 4—figure supplement 1, Figure 4—figure supplement 2). (B) Genome completeness is assessed through single-copy marker genes; those incorporated into Figure 4 have phyla names colored in blue (for short branching lineages) or red (for deeply branching lineages). (C) Degree of marker gene duplication in assembled genomes assessed using CheckM (Parks et al., 2015). https://doi.org/10.7554/eLife.26580.016 Since extracted genomes of known phyla often represented novel lineages at lower taxonomic levels, we attempted to identify their phylogenetic placements. Seventeen genomes (shown in bold in Figure 2C,D), including four unassigned genomes, were complete enough for phylogenetic tree construction based on 56 single copy marker genes (Eloe-Fadrosh et al., 2016) (Figure 4; Materials and methods). All marker gene based phylogenetic assignments were consistent with annotation based assignments. We identified 13 genomes with short to medium branch length from known lineages (red), and four deeply branching lineages that may represent potentially novel phyla (red and starred). In addition to bacterial lineages, we also extracted partial archaeal genomes belonging to Euryarchaeota (Mound #16) and Bathyarchaeota (Mound #14) (Figure 4). The remaining two genomes (Bijah #14 and Mound #3) with unassigned phylogeny that were also not complete enough for incorporation into the final phylogenetic tree were examined independently. Phylogenetic trees built from blast results of individual ribosomal protein sequences suggest that both genomes belonged to the phylum Ignavibacteriae (Figure 4—figure supplement 1, Figure 4—figure supplement 2). Figure 4 with 2 supplements see all Download asset Open asset Phylogenetic distribution of selected Yellowstone hot spring genomes (red branches) across a representative set of bacterial and archaeal lineages. Query genomes which potentially represent novel phyla are marked with a star, and those falling into known phyla are highlighted in yellow. Bootstrap support values are displayed at the nodes as filled circles in the following categories: no support (black; <50), weak support (grey; 50–70), moderate support (white; 70–90), while absence of circles indicates strong support (>90 bootstrap support). For details on taxon sampling and tree inference, see Materials and methods. https://doi.org/10.7554/eLife.26580.017 Functional analyses reveal dominant energy metabolism in Yellowstone hot spring samples To understand more about the metabolism of these organisms, we performed BLAST of all ORFs contained in each genome against the KEGG database and mapped results onto KEGG modules (Materials and methods). Figure 5 illustrates the proportion of identified KEGG module genes from each genome belonging to pathways associated with nitrogen, methane, and sulfur metabolism. At the community level, the Mound Spring population displayed higher potential for methanogenesis than the Bijah Spring population, and methanogenesis can be carried out by members of the Euryarchaeota (Mound # 16) lineages involving mcr and hdr complex (Ferry, 2011). Formaldehyde assimilation, on the other hand, could be carried out in both communities. In nitrogen metabolism, we identified five genomes including Ignavibacteriae (Mound #40), Bacteroidetes (Mound #2), Nitrospirae (Mound #19), Thermodesulfobacteria (Mound #21), and Ignavibacteriae (Bijah #2) carrying nar, nrf, and nir genes, possibly participating in the conversion of nitrate to ammonia (Sodergren and DeMoss, 1988; Einsle et al., 1999; Cantera and Stein, 2007). Interestingly, the archaeal genome belonging to Bathyarchaeota (Mound #14) carries nifD and nifH, capable of converting nitrogen directly to ammonia (Fani et al., 2000) (Figure 5B). Although we did not identify denitrification genes in the Mound Spring community, a genome extracted from Bijah Spring belonging to Spirochaetes carried nirS, norBC, and nosZ genes, capable of reducing nitrite to nitrogen (Figure 5A). Figure 5 Download asset Open asset Functional analysis of Yellowstone hot spring genomes. Abundant genes involved in energy metabolism of (A) Bijah Spring and (B) Mound Spring genomes. Each row represents description of a pathway based on KEGG energy metabolism modules. Each column represents one genome bin. Shading of each square represents the ratio of genes in each KEGG module that are also present in a particular genome bin. Modules are labeled as nitrogen metabolism, methane metabolism, or sulfur metabolism. https://doi.org/10.7554/eLife.26580.020 Bacterial clades associated with thermophilic environments in the Yellowstone National Park are known for their sulfur reduction (converting sulfate to sulfide) activities (Henry et al., 1994; Inskeep et al., 2013). Two well characterized pathways are assimilatory and dissimilatory processes, where sulfur is either incorporated into cellular materials or serves only as the terminal electron acceptor (PeckPeck, 1961). We identified genes associated with the assimilatory sulfate reduction pathway (cysNC, sat, cysC, cysH, and cysI) from three Mound Spring genomes including Ignavibacteriae (Mound #40), Bacteroidetes (Mound #2), Fervidibacteria (Mound #8) and one Bijah Spring genome – Spirochaetes (Bijah #7). In the Mound Spring population, we also identified key genes (dsrAB) responsible for dissimilatory sulfate reduction from Nitrospirae (Mound #19) and Termodesulfobacteria (Mound #21) genomes (Figure 5). These insights based on genes with known annotations demonstrate the ability of mini-metagenomics to reveal different metabolic pathways associated with individual genomes within and across environmental samples. Microfluidic-based mini-metagenomics facilitates assessment of genome abundance and population diversity with single-cell resolution Unlike typical shotgun metagenomic methods that use coverage depth as a proxy for genome abundance, mini-metagenomics uses occurrence of genomes across sub-samples to estimate abundance through counting cells. Thresholding coverage depth into an occurrence profile can be seen as digitizing an otherwise noisy analog signal, potentially reducing the effect of amplification bias, copy number variation, and genome size across genomes. Because cells were well-mixed before loading into the Fluidigm C1 IFC and that the microfluidic structures inside each chamber were much larger than the size of a microbial cell, cells were distributed randomly into microfluidic chambers, with occurrence profiles satisfying a Poisson distribution. We quantified the number of detected cells of all genomes in both samples (Figure 6A,C,D; Materials and methods). Assembled genomes from the Bijah sample covered one order of magnitude in abundance. Genomes #10 (Deferribacteres) and #8 (Chlorobi) were most abundant, with 49 and 29 cells represented. On the lower end, Bijah #5 (S2R-29) was only represented in four sub-samples, most likely representing only four cells (Figure 6C). In the Mound Spring sample, more sub-samples were sequenced, allowing the quantification of a larger abundance range across two orders of magnitude (Figure 6D). Mound #40 (Ignavibacteria) was the most abundant lineage, appearing in all but seven sub-samples. The abundance of this genome was likely the reason for the successful assembly of its entire genome (Figure 6B). Mound #7 (Chlorobi) and deeply branching Mound #15 were also abundant, allowing the capture of 36 and 65 cells respectively. For rare organisms, we detected Mound #11 (Bacteroidetes) and Mound #14 (Bathyarchaeota), which were present at <1%. Comparing with relative abundance computed from shotgun coverage, we observed general agreement in relative abundance profiles, although a couple of genomes in Bijah Springs displayed large differences (Figure 6—figure supplement 1A). Relative abundances were higher for less abundant genomes when the mini-metagenomic method was use, demonstrating its sensitivity to extracting rare phylogenies (Figure 6—figure supplement 1A). In total, examining genomes larger than 0.5 Mbp, we captured 192 and 509 cells from Bijah and Mound Springs, respectively, corresponding to four to six cells per sub-sample. Even though this number was smaller than our intended ten cells per chamber, the difference was likely due to lysis inefficiency or smaller genomes excluded from our analysis. Figure 6 with 2 supplements see all Download asset Open asset Abundance and population variation of Yellowstone hot spring genomes. (A) Abundance is derived from the occurrence pattern of contig clusters, where Poisson distribution is used to infer the number of cells processed. (B) SNPs are tabulated and normalized by the total size of sequenced genome. Most SNPs are in coding regions of the genome, of which the majority are synonymous. (C, D) Map of genome occurrence patterns across all sub-samples for (C) Bijah and (D) Mound Springs samples. White demonstrates the presence of at least one cell of a particular genome in a sub-sample. The total number of cells can be inferred using Poisson statistics. https://doi.org/10.7554/eLife.26580.021 In addition to quantifying abundance through counting single cells, genetic variation among individual cells with the same genome can be assessed as well. An examination of species level lineage assignment to contigs revealed that genomes had consistent species level assignments when such assignments were present (Figure 6—figure supplement 1B). Therefore, even though genomes were generated from a collection of cells across sub-samples, these cells represented genetic lineages at the species levels. We quantified observed single nucleotide polymorphisms (SNPs) in each assembled genome norma