Dunn, C.W., Zapata, F., Munro, C., Siebert S., and Hejnol, A. 2018. Pairwise comparisons across species are problematic when analyzing functional genomic data. Proceedings of the National Academy of Sciences 115(3):E409-E417.
Comparative functional genomics studies have typically used pairwise species comparisons, which do not account for the evolutionary histories of the taxa of interest. The group discussed this in the context of Fig.1A, which indicates that pairwise comparisons involve many non-independent comparisons. In contrast, Fig.1B indicates the phylogenetic independent contrasts method used in this paper, which involves independent comparisons that are fewer in number. We noted that phylogenetically explicit approaches require additional information (e.g. time-calibrated phylogenies), possibly making pairwise comparisons easier to implement. Dunn et al. (2018) is the first study to examine how results may differ when using these different approaches.
Dunn et al. (2018) specifically reanalyzed the results of two other studies, Kryuchkova-Mostacci and Robinson-Rechavi (2016; hereafter KMRR) and Levin et al. (2016). KMRR evaluated the ortholog conjecture, which is the idea that orthologs, which arise through speciation events, remain more functionally similar to one another than paralogs, which arise through gene duplication events; we noted that we might expect greater divergence in functionality among paralogs due to processes such as neo- and sub-functionalization. KMRR assessed the orthologue conjecture using pairwise comparisons of tau, which is a metric of expression tissue specificity that condenses expression datasets into one value for each gene. KMRR found support for the ortholog conjecture. Dunn et al. (2018) reanalyzed this dataset by using tau in phylogenetic independent contrasts; based on the equation in Fig.1B, we concluded that the authors must have substituted x1 and x2 in the numerator with node-specific values of tau. Dunn et al. (2018) did not find support for the ortholog conjecture. In the context of Fig.2(A,C,E), the group discussed how the KMRR pairwise comparison method results in nearly indistinguishable findings when using data simulated in ways that should support or refute the ortholog conjecture. In contrast, the phylogenetic contrasts approach correctly refutes or supports the ortholog conjecture when using simulated data, leading the group to conclude that phylogenetic methods are superior.
We further discussed the Levin et al. (2016) paper, which found that gene expression similarity in pairwise comparisons of distantly related species was lower at the mid-stage of development than at the early or late stages, providing support for an inverse hourglass model of developmental gene expression. We discussed the major problems with this study that Dunn et al. (2018) identified, with the first issue being that pairwise comparisons including the ctenophore skewed similarity scores during the mid-stage of development to low values. Removing these comparisons led to rejection of the inverse hourglass model. Also, the inappropriate inclusion of reciprocal pairwise comparisons doubled the number of data points. Levin et al. (2018) also used an incorrect statistical test that assesses whether distributions are greater or smaller than one another, while also evaluating whether the shapes of distributions differ. We noted that the results of their analyses may reflect differences in the shapes of the distributions, even though the authors were not interested in this.
The group concluded by discussing how phylogenetically explicit methods might impact the results of the studies we have read over the course of the semester. We determined that most of the previous articles concerned intraspecific variation in gene expression, so phylogenetic methods may be less important. However, the issues presented in Dunn et al. (2018) likely still apply below the species level when looking at multiple populations with different evolutionary relationships.
4 November 2019
Liu Y, Beyer A, Aebersold R. (2016) On the dependency of cellular protein levels on the mRNA abundance. Cell 165:535-550
Large scale high-throughput technologies have allowed researchers to analyze the genomes, transcriptomes and proteomes to further explore the relationship between transcripts and proteins. In this paper the researchers reviewed the complex relationship between mRNA abundance and protein levels. Previous research provides evidence that mRNA transcripts alone might not be enough to quantify protein abundance due many processes, such as variations in translation rates, protein synthesis delay and protein transport. In order review the relationship between protein and mRNA levels they emphasized changes in expression in three different scenarios, steady-state, long-term state changes, and short-term adaptations.
The researchers found that under steady state conditions variations in protein levels are determined by mrna levels. Studies using different statistical models have shown that 56-84% of variations in protein levels were explained by variations in mRNA abundance. In the case of steady state conditions mRNA abundance can be used to make inferences about protein expression without having to analyze the proteome. However, in a transition state the correlation between mRNA levels and protein levels is weakened until a new steady state is reached. For short-term adaptations mRNA levels do not reflect protein abundance due to the need for translation on demand, which means that mRNA levels are constitutively expressed to ensure proteins are rapidly available to respond to a stimulus. Conversely stable protein complexes are constantly maintained despite fluctuations in mRNA levels in order to activate and deactivate specific complexes thereby conserving cellular resources and energy. The authors also review how cellular resources affect protein – mRNA correlations as well as energy constraints. In a cell there are a limited number of ribosomes in which mRNA transcripts compete for leading to weak correlations between mRNA and protein levels.
The authors conclude that generally mRNA concentrations do correlate with protein levels during a steady state. This does not hold true when there are deviations from the steady state, which is why the authors stress that it is important for researchers to clearly define the context of the study. This is because the correlations between mRNA concentrations and protein levels begin to break down on smaller and shorter time scales as well as during dynamic phases due to post-translational processes. The authors state that there are still many important questions that that need to be answered about mRNA and protein correlations and that our current knowledge also needs to validate and test as well.
28 Oct 2019
Ho, Wei-Chin, and Jianzhi Zhang. “Evolutionary adaptations to new environments generally reverse plastic phenotypic changes.” Nature communications 9.1 (2018): 350.
Metabolic Flux refers to the rate of turnover of molecules through metabolic pathway, measured in biomass production. Metabolic flux is required as an organism responds to a new environment, making it a more direct measure of plastic response, as reduction in flux can be considered detrimental to fitness. To address the question of whether or not plastic response is a stepping-stone to genetic adaptation, the authors analyzed metabolic flux as a direct measure of fitness in E. coli adapting to 50 novel environments, comparing their findings to several studies that examined plastic response with differential expression. Our group discussed several advantages for using metabolic flux rather than differential expression to address questions regarding fitness consequences of plastic response. For one, expression differences may or may not correspond to genes directly related to fitness. Gene expression analyses also often can’t directly mechanistically link expression differences to phenotype, as the authors were able to do with metabolic flux.
Using metabolic flux analyses, the authors sought to determine whether plastic response is largely reinforcing (causing phenotypic change in the same direction as genetic adaptation) or reversing (phenotypic change in the opposite direction of genetic adaptation). In the six differential expression studies the authors examined, a large majority of plastic responses examined were classified as reversing. This was similar to their own results, showing that the vast majority of plastic changes in flux were also reversions. In addition, the authors found that phenotypic values in adapted environments are most commonly restored, rather than under-restored or over-restored, in response to a new environment. The flux analysis and comparison with transcriptome data revealed that a large majority of genetic responses reverse the initial phenotypic change caused by plastic response. With the assumption made by the authors that reinforcement supports the idea that plastic response is a stepping-stone to genetic change, the results suggest that plastic response is not a necessary precursor to genetic adaptation. However, clarification of the intended meaning of the term “stepping-stone” is needed, as our group discussed the possibility of plastic response generating selection against the phenotypic change they cause, thus facilitating genetic adaptation in the reversing direction.
21 Oct 2019
Ghalambor CK, Hoke KL, Ruell EW, Fischer EK, Reznick DN, Hughes KA. (2015). Non-adaptive plasticity potentiates rapid adaptive evolution of gene expression in nature. Nature letters 525:372-375.
The authors tested the relationship between the direction of plasticity (adaptive vs. non-adaptive) and the direction of evolution in natural populations of Trinidadian guppies. The authors sampled individuals from a high predation (HP) site and a low predation (LP) site, and from two relatively benign sites where individuals from HP were introduced and allowed to reproduce for a few generations (Intro1 and Intro2). Individuals from each subpopulation were brought back into the lab where half the individuals were reared with a predator cue and the other half reared without the predator cue.
Using RNA-seq to measure expression, the transcripts in the Intro1 and Intro2 populations showed the same pattern of divergence in gene expression at the LP group. Using a principal component analysis, the Intro1 and Intro2 groups clustered together, but further away from the LP group along the PC1 axis, indicative of genetic divergence between the populations. Moreover, the Intro1 and Intro2 groups were clustered further away from the HP groups along the PC2 axis, which was based on their exposure to and their adaptive response to predator cues in the lab. The authors state that this clustering along the PC2 axis is representative of rapid adaptive evolution.
To further support this finding, the authors permuted the data by randomly assigning, or rearranging, the fish column ID’s to create a null hypothesis. Permuting the original data provides a more accurate null for the data itself and to account for any underlying correlation. Doing so, there was a highly negative correlation in the author’s permuted data thereby supporting the strong correlation between treatment group and gene expression pattern.
The overall importance of the paper is to contribute to the ongoing debate whether plasticity can truly enhance populations through the evolutionary process. The authors conclude that there is a vast amount of plasticity that is non-adaptive and inhibits populations from producing an evolutionary response to new environments.
14 Oct 2019
Kerry McGowan and Hannah Hapner
Shaw JR, Hampton TH, King BL, Whitehead A, Galvez F, Gross RH, Keith N, Emily Notch E, Jung D, Glaholt SP, Chen CY, Colbourne JK, Stanton, BA. (2014). Natural selection canalizes expression variation of environmentally induced plasticity-enabling genes. Mol. Biol. Evol. 31(11):3002-3015.
Phenotypic plasticity refers to one genotype having the ability to switch phenotypes under different environmental conditions. In this paper, the researchers studied response to salinity in the fish Fundulus heteroclitus, in which individuals were exposed to both fresh and saltwater conditions. The researchers used arsenic to determine which genes were differentially expressed (DE) in gill tissues and potentially involved in this phenotypic switch. Our group questioned the mechanism behind arsenic and gills, as arsenic has been shown to block phenotypic changes involved in acclimation to saltwater. The authors didn’t specifically mention the mechanism, so all we could do was speculate.
The researchers used microarrays and gene set enrichment analysis (GSEA) to gather and analyze the data. There were six treatment groups, differing in salinity exposure and presence/absence of arsenic. There were a total of 24 fish, four per treatment. All were males due to the small sample size, as sexual differences in DE could not be accounted for. With that being said, the authors assumed males were representative of the entire population, which may be debatable.
The researchers measured DE at several time points (1 hr. and 24 hrs.), comparing expression back to a freshwater control. They identified “main effects” genes, which were genes that varied in the presence of saltwater at 1 hr. and 24 hrs. They also identified “interaction” genes, which were genes that showed an interaction between saltwater and arsenic at these the same time points. The researchers hypothesized that these interaction genes are likely directly related to a phenotypically plastic response to seawater, where as main effects genes may show DE due to a myriad of factors not directly associated with acclimation to saltwater.
The researchers also hypothesized that selection on genes controlling plastic response reduces inter-individual variation in how these fish respond to salinity. To show this, they measured the coefficient of variation (COV) for interaction and main effect gene sets, and found the interaction COV to be lower than that of the main COV. The researchers also looked at a wild-caught dataset and found that freshwater fish have a less variable response to salinity than mesohaline or coastal populations. Findings from this study also suggest that genes involved in plasticity would have fewer regulators from the interaction gene set compared to the main effects gene set. They found that, on average, interaction genes had fewer upstream relationships per gene than main effects genes.
7 Oct 2019
Capraro, A., O’Meally, D., Waters, S. A., Patel, H. R., Georges, A., & Waters, P. D. (2019). Waking the sleeping dragon: gene expression profiling reveals adaptive strategies of the hibernating reptile Pogona vitticeps. BMC genomics, 20(1), 460.
Hibernation is characterized by the reduction in metabolic rate over an extended period of adverse conditions, with arousals from hibernation occurring every few days in some hibernators. Even with extreme physiological changes occurring every year in hibernators, there are no adverse effects upon arousal from hibernation. While this phenomenon has been studied in multiple organisms, reptiles have not been explored to date. To address this, the researchers sampled three tissue types (brain ,heart, and skeletal muscle) from three different time points (late hibernation, 2 days post-arousal, 2 months post arousal).Both the proteome and transcriptome from these tissues were analyzed, with mass spectrometry and mRNA-seq, respectively.
The researchers found that the post arousal time points had little difference either in transcript count or proteins expressed and therefore compared late hibernation to the combination of the two. Comparing between hibernation and post arousal, they found 2,482 genes were differentially expressed across all 3 tissues. Furthermore, skeletal muscle had the highest number of genes differentially expressed, followed by heart and then brain. A Gene Ontology analysis revealed that increased expression of genes within stress response pathways were common in all tissues during hibernation. Additionally, there was an enrichment for genes which regulate transcription, either through chromatin remolding or miRNA machinery.
The proteomic analysis revealed far fewer differentially expressed proteins when compared to the transcriptomic analysis. At most, 58 proteins were different between hibernation and post arousal in the brain. The fewest, 36 proteins, were found to be different between the two time points in the heart tissue. Largely, the transcriptomic changes disagreed with the observed proteomic changes. Finally, the researchers show that reptiles seem to have a different pattern in their hibernation gene expression when compared to other hibernating studies. This highlights the need to continue broadening the species used in hibernation studies.
30 Sept 2019
Garieri, M., Delaneau, O., Santoni, F., Fish, R. J., Mull, D., Carninci, P., … & Fort, A. (2017). The effect of genetic variation on promoter usage and enhancer activity. Nature communications, 8(1), 1358.
Using expression quantitative trait loci, the researchers sought to determine the effect of genetic variation on promoter usage and enhancer activity. eQTLs are loci that explain a portion of the variation in mRNA transcript levels. Often, the eQTLs map to a non-coding part of the genome, indicating that variation in these expression phenotypes may be due to regulatory variants. eQTLs are found to be enriched in both promoter and enhancer regions in many cases. puQTLS (promoter usage QTLs) and eaQTLs (enhancer associated) were mapped in TADs (topologically associating domains), which are cis-windows that regulatory elements act on. This does little to account for trans regulation of genes, which was not part of the scope of this study.
The researchers used CAGE (Cap Analysis of Gene Expression) profiling in order to identify presence of promoter and enhancer variants affecting gene expression. Natural genetic variation in cultured human EBV-transformed lymphoblastoid cells were used given the extensive annotation of the human genome. A large part of the discussion was given to the classification of the type of puQTL-associated genes. The five groups were broadly divided into single promoter and multi-promoter, with further classification based upon relative effect size. We discussed which group would have the greatest effect on phenotype, and thus possibly cause over representation, but no clear consensus was reached besides that group 3 and 4 might be easier to identify.
23 Sep 2019
Josephs, E., Lee, Y., Stinchcombe, J., Wright, S. (2015). Association mapping reveals the role of purifying selection in the maintenance of genomic variation in gene expression. Proc. Nat. Acad. Sci. USA 112, 15390–15395.
The researchers sought to provide more insight into the poorly understood picture of what evolutionary forces contribute to the maintenance of genetic variation in quantitative traits of a population. Using the established hypothesis of purifying selection being one of these forces, the study examines this potential through RNA-seq analytics using loci linked to total expression variation (eQTLs) and allele specific expression variation (aseQTLs) within a single population of the plant Capsella grandiflora. The purifying selection hypothesis predicts that this force results in an excess of low-frequency variants and a negative correlation between low frequency alleles and selection coefficients, maintaining genetic variation with the maintenance of these variants. The researchers highlighted support for purifying selection as an important force in maintaining variation of traits, based on their results showing a negative correlation between phenotypic effect size and allele frequency at loci associated with eQTLs and aseQTLs. This study also found that alleles at these loci were rarer than expected, giving more weight to the utility of purifying selection in maintaining the genetic variation at loci related to quantitative phenotypes.
We discussed the basic concepts of eQTLs, aseQTLs, and requirements for mapping these metrics of variation to phenotypic data. Association mapping identifies variants that contribute to phenotypes and phenotypes must be continuous (e.g. growth, size, color, etc.). There has to be variation in phenotypes and the assumption that phenotypes are heritable and not being expressed solely from environmental factors. The phenotype for this paper is the number of transcripts produced through RNA-seq. We zoomed out from technical aspects of the paper to review more fundamental ideas, which was more beneficial to the class majority. We covered the structure of genes and gene component functions. eQTLs affect expression of one or multiple genes, while aeQTLs are allele specific. It was explained that eQTLs and aeQTLs were more likely at SNPs located at the transcription site and CNSs at 5’ UTRs, because these areas are likely affecting expression. SNPs located in exons are used only as markers to distinguish transcripts, and cannot use these SNPs to determine expression levels. We also clarified meanings of “cis” vs. “trans” as acting on the same gene region of location (CIS) and an area somewhere else in the genome (trans).
16 Sept 2019
Ryan P. Thompson
Anne-Marie Dion-Côté, Sébastien Renaut, Eric Normandeau, Louis Bernatchez, RNA-seq Reveals Transcriptomic Shock Involving Transposable Elements Reactivation in Hybrids of Young Lake Whitefish Species, Molecular Biology and Evolution, Volume 31, Issue 5, May 2014, Pages 1188–1199, https://doi.org/10.1093/molbev/msu069
The article focused on allopatric speciation, which has occurred in whitefish populations. Three different groups were the focus of the study – a dwarf limnetic species, the normal benthic species, and viable hybrids generated in a lab setting. The discussion began by focusing on the type of speciation that had occurred and that the focus of the research was with regards to secondary contact after an allopatric speciation event. The group concluded that Whitefish were chosen so that researchers could better understand reactivation of transposons and speciation divergence. It was brought up that the initial divergence which occurred around 12,000 years ago was more significant than divergence that may have taken place since. The majority of upregulation occurred in non-coding regions. DNMT1 was downregulated in the malformed backcrosses which led to less regulation of transposons and therefore a higher rate of errors.
Critiques of the study focused on the viability of hybridization occurring in a natural setting. The two populations could only produce a hybrid if they were generated in a lab because they live in different habitats (different layers of the same lake). Therefore, large scale hybridization is not likely in their natural habitat. A comment on the paper was that 1 female was bred with 5 males for each group which eliminated potential maternal effects. Also, worth mentioning, the authors stated that a percentage of the eggs never properly developed but failed to study the signature of the embryos and whether or not the signature could be potentially correlated to the parental transcriptome. A critique of the paper was that different techniques were used to compare data from different developmental stages. This didn’t allow for strong comparisons to be made between different developmental stages.
Several members of the group brought up that the dwarf populations were more clustered in figure 3 whereas, the malformed populations were slightly more spread out. Therefore, there is more variance in expression of genes in the malformed populations compared to other populations. F1 stress led to reactivation of transposable elements which led to problems in backcrossed generation. Finally, the group concluded that full genome resequencing could have been used instead of RNA-Seq to determine where the transposable elements reinserted themselves into the genome. In addition to this, the transcriptome of the parent and offspring could have been compared to one another.
9 Sept 19
McCarthy, D. J., Chen, Y., & Smyth, G. K. (2012). Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic acids research, 40(10), 4288–4297.
This article is about the improvement of statistical frameworks for RNA seq expression studies using the Bioconductor package edge R. We reviewed steps leading to the statistical analysis of gene expression data: Applying treatments to groups with individuals whose genes vary biologically, and processing RNA samples that can vary from technical sources such as sequencing and library preparation quality.
We summarized the topic of gene-wise dispersion as applying a weight to a gene’s calculation based on how much information it can pull, while giving outliers less weight and pulling points toward the mean. We discussed fitting Generalized Linear Models (GLMs) because the count data is not normally distributed, and this article used a negative binomial distribution to account for overdispersion. This article was the first to parse out the Biological Coefficient of Variation (BCV) from the Technical Coefficient of Variation (TCV).
Next, we discussed why we expect the BCV to be the dominant source of variation. BCV is important because to know what genes are differentially expressed under certain treatments, it is important to account for the baseline variation from technical sources. False Discovery Rate (FDR), is the expected proportion of false positives, and corrections for FDR, such as the stringent Bonferroni correction, are important because a high number of genes will be compared, which results in a high proportion of false positives.
This was one of the early papers about edgeR, and we discussed some of the computational gains made in this article, such as a line-search modification in the GLM algorithm, use of an Adjusted Profile likely-hood (APL), and assessing all genes in parallel rather than one gene at a time.
2 Sep 2019
Chris Kozakiewicz, WSU
Veilleux, H.D., R. Taewoo, J.M. Donelson, T. Ravasi, and P.L. Munday (2018). Molecular response to extreme summer temperatures differs between two genetically differentiated populations of a coral reef fish. Frontiers in Marine Science 5: 349. doi: 10.3389/fmars.2018.00349
This week’s discussion concerned a study of differential gene expression in a coral reef fish sampled from two populations along a latitudinal gradient in response to elevated temperatures. The authors found greater abundance of differentially expressed genes (DEGs) among temperature treatments in the population with the highest latitude. They interpreted this higher DEG abundance as a signature of higher thermal plasticity due to the higher degree of seasonal temperature variation experienced in this environment. The authors grouped DEGs by cellular function and drew conclusions regarding whether particular functions were being up or downregulated in response to temperature treatments.
Our group began with a discussion of phenotypic plasticity, whereby a single genotype produces more than one phenotype in response to environmental variation. The reaction norm, depicted as a curve of phenotypes distributed around an optimum, describes the range of phenotypic expression for a single genotype and thus the degree of phenotypic plasticity that a given genotype confers. The authors predicted that populations at lower latitudes are more sensitive to extreme temperatures because they have narrower reaction norms and live closer to their thermal maximum. This occurs because lower latitudes tend to experience less diurnal and seasonal temperature variation, producing a narrower range of thermal conditions to which populations must adapt. We discussed exceptions to this rule, including high elevation areas, which experience greater temperature variation, and arctic regions, which experience reduced temperature variation. For aquatic organisms, shallower environments such as streams, intertidal zones and coral reefs experience greater temperature variation than deep pelagic and benthic environments.
Our discussion then turned to the authors’ experimental design and how these might have biased levels of gene expression among the populations. For example, although all animals were reared in the lab from juveniles, animals from one population were wild caught, while those originating from the other population were hatched in aquaria. We also noted imbalanced sample sizes and sex ratios, with a strong male bias in the low-latitude population, and questioned why the authors did not down-sample to rectify this. We explored how filtering steps for identifying DEGs might influence our ability to detect them, and the trade-off between sensitivity and confidence in DEG detection. We also questioned the authors’ approach of using abundances of DEGs to determine whether specific cellular functions were being up or downregulated, suggesting that it may be important to account for variation among DEGs in the magnitude of their effect on the molecular pathway. For example, differential expression of a gene associated with a rate-limiting step would likely have a greater overall effect on the cellular function with which it is associated. We discussed the ability of GO-term enrichment to identify groups of genes that are enriched for certain processes as an alternative means of identifying generalized functional responses. Finally, we speculated about how up or downregulation in the identified cellular functions might affect these populations’ ability to survive prolonged warmer temperatures, and whether the authors’ findings indicate that the study populations were locally adapted.