A case study on bioinformatics in microbial research
In the effort to understand the basic conditions for life on distant
planets, NASA is investigating microbial mat communities here on
Earth, since microbes are believed to be the first form of life to
evolve about 3 billion years ago. Microbial mat communities are
layered colonies of microbes that depend heavily on each other for
exchanging chemicals and energy. One of the ways to do research into
microbial mat communities is to perform molecular analyses of the
DNA of the various microbes.
Software from CLC bio is used in this work to help scientists focus on their research by making common bioinformatics tools ready at hand and easy to use. The integrated framework of the CLC Workbenches is designed to support work processes such as primer design, assembly and BLAST searches.
In this case study, we will explain how software from CLC bio is used by scientists at the Exobiology Branch of the NASA Ames Research Center.
Why microbial mats are interesting for exobiology research
Microbial mat communities represent, in gross morphology, some of
the earliest known microbial communities on Earth (see
figure 1).

Figure 1: Cross sectional view of a microbial mat from Guerrero Negro, Mexico. (NASA Microbes Image Gallery. http://microbes.arc.nasa.gov/gallery/guerrero.html).
As a common form of microbial community existing on Earth during the Precambrian, and dominant in the Proterozoic era, these systems are hypothesized to have played a significant role in the development of modern oceanic and atmospheric conditions. In particular, the development of oxygen photosynthesis by cyanobacteria (blue-green algae), which are dominant in modern mat systems, is thought to have played a particularly large role in shaping the modern environment. Fossilized lipid biomarkers from ancient mat communities, together with their stable carbon isotopic signatures, have helped establish a connection between modern mat ecosystems and their ancient counterparts. Studies of modern mat communities can help scientists understand ancient microbial ecosystems and the impact these and similar communities have had on the development of the modern earth. Microbial mat communities from Guerrero Negro, Mexico, are living in a very harsh environment with extremely high concentrations of salt, and they are used as model modern mat systems and are a central focus of several Ames Exobiology investigations.
Hypersaline microbial mats developing in evaporation ponds of the Guerrero Negro Saltern are found growing along a salinity gradient from 6\% to 16\%. These systems are dependent on the primary production of phototrophic organisms, primarily cyanobacteria, but including diatoms and anoxygenic phototrophic bacteria. Prior studies have shown that salinity can affect population structure directly (e.g. salinity tolerances of individual organisms) and indirectly (e.g. secondary effects as a result of modified oxygen diffusion rates), and as a result modify communal carbon flow.
Research focus
Researchers at The Exobiology Branch of the NASA Ames Research Center are currently employing various molecular tools to characterize shifts in community structure of the total microbial population as a result of alterations in salinity levels. In addition to domain level analyses (Bacteria, Archae and Eukaryotes), they intend to target photosynthetic microorganisms, including oxygenic phototrophs (i.e. Diatoms and Cyanobacteria) and anoxygenic phototrophs (i.e. Green and Purple Photosynthetic Bacteria). In most hypersaline mats, cyanobacteria are the major mat building organisms, and the largest contributors to the primary production of the system (see figure 2).

Figure 2. Light microscope view of the cyanobacterium Microcoleus chthonoplastes. (NASA Microbes Image Gallery. http://microbes.arc.nasa.gov/gallery/lightms.html).
Most primary productivity occurs via oxygenic photosynthesis by cyanobacteria or diatoms, and carbon is fixed using the Calvin cycle. Anoxygenic photosynthesis can be conducted by cyanobacteria operating photosystem I, and by anoxygenic photosynthetic bacteria (i.e. green and purple sulfur, and green non-sulfur). The goal is to determine the impact of these shifts on stable carbon isotope partitioning as a means to understand carbon cycling as affected by salinity. The researchers hope to achieve this by combining nucleic-acid based molecular techniques for population analysis with lipid biomarker and carbon isotope measurements for quantitative information regarding microbial population structure and information regarding anabolic pathways utilized by mat primary producers.
CLC software used for research in microbial mat communities
In the molecular analyses, the researchers target both functional genes and the 16 or 18S rRNA gene. Both functional and phylogenetic analyses are performed. An illustration of a work flow including the use of CLC software is shown in figure 3.

Figure 3. Illustration of typical work flow.
The work flow alternates between tasks performed in the lab and bioinformatics tasks where the CLC Workbench is used. First, DNA and RNA are extracted from samples taken from the evaporation ponds of the Guerrero Negro Saltern. Next, the CLC Workbench is used to design primers (described in details below) for the subsequent PCR amplification. Since the products of the PCR reactions are similar in size (bp), a denaturing gradient gel electrophoresis (DGGE) is performed, visualizing differences in the sequence of the products.
The PCR products are sequenced, and the resulting trace files are imported into the CLC Workbench where they are assembled (described in details below). Some of the genes are long, e.g. the dsrAB gene which is approximately 1900 bp, and therefore multiple sequence reactions have to be performed before the entire gene is covered.
Within the CLC Workbench, more bioinformatics analyses are performed: creation of phylogenetic trees, BLAST searches for similar sequences, and annotation of the new sequences.
In the end, by applying these bioinformatics tools to the samples from Guerrero Negro Saltern, the researchers have improved their knowledge about the microbes living in these communities. One more step has been taken in the effort to understand what life was like on early Earth which will indicate what to look for when trying to find life on other planets.
Zooming in on Assembly
The raw sequencing data (the reads) are imported into the CLC Workbench. They are assigned quality scores which are used to trim low-quality trace data so that the reads align properly. The quality scores are also shown graphically below the sequence (see the light green color in figure 4).

Figure 4. Assembling to the reference. Gene annotations are visual aids when determining differences from the reference sequence (the blue annotation is the 16S gene).
The reads are both forward and reverse which is automatically detected during the assembly, and the orientation is reflected in the color of the reads.
The sequences used for primer design are used as reference when assembling the reads. The gene annotations on the sequences are used during the assembly process to determine where the genes start and end. Inconsistencies are highlighted and browsed through manually in order to see whether they are caused by sequencing errors or real differences in the sequence.
A table showing the inconsistencies between the reads is used both for getting an overview of the quality of the sequencing and for browsing the inconsistencies in a convenient way (see figure 5).

Figure 5: The conflicts are shown in a table.
Because some of the genes require multiple sequence reactions, there are a lot of reads to be assembled, but the graphical overview of traces and annotations in CLC Workbench makes it easy to work even with large quantities of data.
The result of the assembly is a contig sequence which is used in the following steps for alignments, phylogeny and BLAST searches.
Zooming in on BLAST search
A major part of the work when analyzing the assembled sequences is performing BLAST searches at the NCBI database. The BLAST search is conducted from within the CLC Workbench, eliminating the need for copying and pasting into a browser.
The search is performed on the entire database, but it is limited to bacteria in order to get results that are relevant for comparison. The BLAST search is used for functional and phylogenetic comparison of the sequenced organisms with known sequences from the database.
The result of a BLAST search is shown in figure 6.

Figure 6: The graphical display of the result of a BLAST search.
This graphical view is supplemented by a table with more detailed information on the search hits (description, E-value, etc) as shown in figure 7.

Figure 7: A table with detailed information about the result of a BLAST search.
The researcher examines the similarities between the sequences, and some of the sequences are downloaded and aligned with the data from Guerrero Negro. For the conserved regions, annotations are transferred to the sequenced data, providing valuable information for the next step in the analysis which is the functional and phylogenetic characterization of the organism.



















