Centenarian hotspots in Denmark        
Via bioRxiv: Centenarian Hotspots in Denmark. The abstract: Background: The study of regions with high prevalence of centenarians is motivated by a desire to find determinants of healthy ageing. While existing research has focused on selected candidate geographical regions, we...
          Response to To Increase Trust, Change the Social Design Behind Aggregated Biodiversity Data        

Nico Franz and Beckett W. Sterner recently published a preprint entitled "To Increase Trust, Change the Social Design Behind Aggregated Biodiversity Data" on bioRxiv http://dx.doi.org/10.1101/157214 Below is the abstract:

Growing concerns about the quality of aggregated biodiversity data are lowering trust in large-scale data networks. Aggregators frequently respond to quality concerns by recommending that biologists work with original data providers to correct errors "at the source". We show that this strategy falls systematically short of a full diagnosis of the underlying causes of distrust. In particular, trust in an aggregator is not just a feature of the data signal quality provided by the aggregator, but also a consequence of the social design of the aggregation process and the resulting power balance between data contributors and aggregators. The latter have created an accountability gap by downplaying the authorship and significance of the taxonomic hierarchies ≠ frequently called "backbones" ≠ they generate, and which are in effect novel classification theories that operate at the core of data-structuring process. The Darwin Core standard for sharing occurrence records plays an underappreciated role in maintaining the accountability gap, because this standard lacks the syntactic structure needed to preserve the taxonomic coherence of data packages submitted for aggregation, leading to inferences that no individual source would support. Since high-quality data packages can mirror competing and conflicting classifications, i.e., unsettled systematic research, this plurality must be accommodated in the design of biodiversity data integration. Looking forward, a key directive is to develop new technical pathways and social incentives for experts to contribute directly to the validation of taxonomically coherent data packages as part of a greater, trustworthy aggregation process.

Below I respond to some specific points that annoyed me about this article, at the end I try and sketch out a more constructive response. Let me stress that although I am the current Chair of the GBIF Science Committee, the views expressed here are entirely my own.

Trust and social relations

Trust is a complex and context-sensitive concept...First, trust is a dependence relation between a person or organization and another person or organization. The first agent depends on the second one to do something important for it. An individual molecular phylogeneticist, for example, may rely on GenBank (Clark et al. 2016) to maintain an up-to-date collection of DNA sequences, because developing such a resource on her own would be cost prohibitive and redundant. Second, a relation of dependence is elevated to being one of trust when the first agent cannot control or validate the second agent's actions. This might be because the first agent lacks the knowledge or skills to perform the relevant task, or because it would be too costly to check.

Trust is indeed complex. I found this part of the article to be fascinating, but incomplete. The social network GBIF operates in is much larger than simply taxonomic experts and GBIF, there are relationships with data providers, other initiatives, a broad user community, government agencies that approve it's continued funding, and so on. Some of the decisions GBIF makes need to be seen in this broader context.

For example, the article challenges GBIF for responding to errors in the data by saying that these should be "corrected at source". This a political statement, given that data providers are anxious not to ceed complete control of their data to aggregators. Hence the model that GBIF users see errors, those errors get passed back to source (the mechanisms for tis is mostly non-existent), the source fixes it, then the aggregator re-harvests. This model makes assumptions about whether sources are either willing or able to fix these errors that I think are not really true. But the point is this is less about not taking responsibility, but instead avoiding treading on toes by taking too much responsibility. Personally I think should take responsibility for fixing a lot of these errors, because it is GBIF whose reputation suffers (as demonstrated by Franz and Sterner's article).


A third step is to refrain from defending backbones as the only pragmatic option for aggregators (Franz 2016). The default argument points to the vast scale of global aggregation while suggesting that only backbones can operate at that scale now. The argument appears valid on the surface, i.e., the scale is immense and resources are limited. Yet using scale as an obstacle it is only effective if experts were immediately (and unreasonably) demanding a fully functional, all data-encompassing alternative. If on the other hand experts are looking for token actions towards changing the social model, then an aggregator's pursuit of smaller-scale solutions is more important than succeeding with the 'moonshot'.

Scalability is everything. GBIF is heading towards a billion occurrence records and several million taxa (particularly as more and more taxa from DNA-barcoding taxa are added). I'm not saying that tractability trounces trust, but it is a major consideration. Anybody advocating a change has got to think about how these changes will work at scale.

I'm conscious that this argument could easily be used to swat away any suggestion ("nice idea, but won't scale") and hence be a reason to avoid change. I myself often wish GBIF would do things differently, and run into this problem. One way around it is to make use of the fact that GBIF has some really good APIs, so if you want GBIF to do something different you can build a proof of concept to show what could be done. If that is sufficiently compelling, then the case for trying to scale it up is going to be much easier to make.

Multiple classifications

As a social model, the notion of backbones (Bisby 2000) was misguided from the beginning. They disenfranchise systematists who are by necessity consensus-breakers, and distort the coherence of biodiversity data packages that reflect regionally endorsed taxonomic views. Henceforth, backbone-based designs should be regarded as an impediment to trustworthy aggregation, to be replaced as quickly and comprehensively as possible. We realize that just saying this will not make backbones disappear. However, accepting this conclusion counts as a step towards regaining accountability.

This strikes me as hyperbole. "They disenfranchise systematists who are by necessity consensus-breakers". Really? Having backbones in no way prevents people doing systematic research, challenging existing classifications, or developing new ones (which, if they are any good, will become the new consensus).

We suggest that aggregators must either author these classification theories in the same ways that experts author systematic monographs, or stop generating and imposing them onto incoming data sources. The former strategy is likely more viable in the short term, but the latter is the best long-term model for accrediting individual expert contributions. Instead of creating hierarchies they would rather not 'own' anyway, aggregators would merely provide services and incentives for ingesting, citing, and aligning expert-sourced taxonomies (Franz et al. 2016a).

Backbones are authored in the sense that they are the product of people and code. GBIF's is pretty transparent (code and some data on github, complete with a list of problems). Playing Devil's advocate, maybe the problem here is the notion of authorship. If you read a paper with 100's of authors, why does that give you any greater sense of accountabily? Is each author going to accept responsibility for (or being to talk cogently about) every aspect of that paper? If aggregators such as GBIF and Genbank didn't provide a single, simple way to taxonomically browse the data I'd expect it would be the first thing users would complain about. There are multiple communities GBIF must support, including users who care not at all about the details of classification and phylogeny.

Having said that, obviously these backbone classifications are often problematic and typically lag behind current phylogenetic research. And I accept that they can impose a certain view on how you can query data. GenBank for a long time did not recognise the Ecdysozoa (nematodes plus arthropods) despite the evidence for that group being almost entirely molecular. Some of my research has been inspired by the problem of customising a backbone classification to better more modern views (doi:10.1186/1471-2105-6-208).

If handling multiple classifications is an obstacle to people using or contributing data to GBIF, then that is clearly something that deserves attention. I'm a little sceptical, in that I think this is similar to the issue of being able to look at multiple versions of a document or GenBank sequence. Everyone says it's important to have, I suspect very few people ever use that functionality. But a way forward might be to construct a meaningful example (in other words an live demo, not a diagram with a few plant varieties).

Ways forward

We view this diagnosis as a call to action for both the systematics and the aggregator communities to reengage with each other. For instance, the leadership constellation and informatics research agenda of entities such as GBIF or Biodiversity Information Standards (TDWG 2017) should strongly coincide with the mission to promote early-stage systematist careers. That this is not the case now is unfortunate for aggregators, who are thereby losing credibility. It is also a failure of the systematics community to advocate effectively for its role in the biodiversity informatics domain. Shifting the power balance back to experts is therefore a shared interest.

Having vented, let me step back a little and try and extract what I think the key issue is here. Issues such as error correction, backbones, multiple classifications are important, but I guess the real issue here is the relationship between experts such as taxonomists and systematists, and large-scale aggregators (note that GBIF serves a community that is bigger than just these researchers). Franz and Sterner write:

...aggregators also systematically compromise established conventions of sharing and recognizing taxonomic work. Taxonomic experts play a critical role in licensing the formation of high-quality biodiversity data packages. Systems of accountability that undermine or downplay this role are bound to lower both expert participation and trust in the aggregation process.

I think this is perhaps the key point. Currently aggregation tends to aggregate data and not provenance. Pretty much every taxonomic name has at one point or other been published by somebody. For various reasons (including the crappy way most nomenclature databases cite the scientific literature) by the time these names are assembled into a classification by GBIF the names have virtually no connection to the primary literature, which also means that who contributed the research that led to that name being minted (and the research itself) is lost. Arguably GBIF is missing an opportunity to make taxonomic and phylogenetic research more visible and discoverable (I'd argue this is a better approach than Quixotic efforts to get all biologists to always cite the primary taxonomic literature).

Franz and Sterner's article is a well-argued and sophisticated assessment of a relationship that isn't working the way it could. But to talk in terms of "power balance" strikes me as miscasting the debate. Would it not be better to try and think about aligning goals (assuming that is possible). What do experts want to achieve? What do they need to achieve those goals? Is it things such as access to specimens, data, literature, sequences? Visibility for their research? Demonstrable impact? Credit? What are the impediments? What, if anything, can GBIF and other aggregators do to help? In what way can facilitating the work of experts help GBIF?

In my own "early-stage systematist career" I had a conversation with Mark Hafner about the Louisiana State University Museum providing tissue samples for molecular sequencing, essentially a "project in a box". Although Mark was complaining about the lack credit for this (a familiar theme) the thing which struck me was how wonderful it would be to have such a service - here's everything you need to do your work, go do some science. What if GBIF could do the same? Are you interested in this taxonomic group, well here's the complete sum of what we know so far. Specimens, literature, DNA sequences, taxonomic names, the works. Wouldn't that be useful?

Franz and Sterner call for "both the systematics and the aggregator communities to reengage with each other". I would echo this. I think that the sometimes dysfunctional relationship between experts and aggregators is partly due to the failure to build a community of researchers around GBIF and its activities. The focus of GBIF's relationship with the scientific community has been to have a committee of advisers, which is a rather traditional and limited approach ("you're a scientist, tell us what scientists want"). It might be better served if it provided a forum for researchers to interact with GBIF, data providers, and each other.

I stated this blog (iPhylo) years ago to vent my frustrations about TreeBASE. At the time I was fond of a quote from a philosopher of science that I was reading, to the effect that we only criticise those things that we care about. I take Franz and Sterner's article to indicate that they care about GBIF quite a bit ;). I'm looking forward to more critical discussion about how we can reconcile the needs of experts and aggregators as we seek to make global biodiversity data both open and useful.

          Copyright and the Use of Images as Biodiversity Data        
170px Copyright svgWilli Egloff, Donat Agosti, Puneet Kishor, David Patterson, and Jeremy A. Miller have published an interesting preprint entitled “Copyright and the Use of Images as Biodiversity Data”
DOI:10.1101/087015 in which they argue that taxonomic images aren't copyrightable. I'm not convinced, and have commented on the bioRxiv site. Frustratingly bioRxiv puts comments into a moderation queue (in my opinion the stupidest thing to do if you want to enable conversation) so I've posted my comment here.

It seems to me that there are two deeply problematic aspects to this claim. The first is that taxonomic illustration is not creative. This seems, at best, arguable. I've illustrated new species, and it sure felt like I was doing creative work. Arguably every creative work adheres to conventions of a discipline, how does this by itself make copyright irrelevant?

Secondly, I'm unconvinced that a legal opinion that hasn't been tested in a court is worth much. We can assert whatever interpretation of copyright we want, I doubt that would stop legal action by a person or organisation that felt it could benefit from such action. The real question will be whether treating taxonomic images as outside of copyright would be considered a sufficient threat to someone's business model for them to take action.

I completely support the idea that the images (and all taxonomic-relevant data) should be completely free and open, but simply asserting that it should be doesn't make it so.

          bioRxiv is here        

For years many in the biological sciences community have been jealous of the exist of arXiv. This preprint server allows researchers to distribute their work widely to all comers. On occasion when when there have been debates about mimicking arXiv for biology there has been skepticism about the nature of the outcomes (my own rejoinder […]

The post bioRxiv is here appeared first on Gene Expression.

          Recasting the cancer stem cell hypothesis: unification using a continuum model of microenvironmental forces        
New Results

Recasting the cancer stem cell hypothesis: unification using a continuum model of microenvironmental forces

Jacob G. ScottAndrew DhawanAnita HjelmelandJustin LathiaMasahiro HitomiAlexander G. FletcherPhilip K. MainiAlexander R. A. Anderson


Since the first evidence for cancer stem cells in leukemia, experimentalists have sought to identify tumorigenic subpopulations in solid tumors. In parallel, scientists have argued over the implications of the existence of this subpopulation. On one side, the cancer stem cell hypothesis posits that a small subset of cells within a tumor are responsible for tumorigenesis and are capable of recapitulating the entire tumor on their own. Under this hypothesis, a tumor may be conceptualized as a series of coupled compartments, representing populations of progressively differentiated cell types, starting from stem cells. The allure of this model is that it elegantly explains our therapeutic failures: we have been targeting the wrong cells. Alternatively, the stochastic model states that all cells in a tumor can have stem-like properties, and have an equally small capability of forming a tumor. As tumors are, by nature, heterogeneous, there is ample evidence to support both hypotheses. We propose a mechanistic mathematical description that integrates these two theories, settling the dissonance between the schools of thought and providing a road map for integrating disparate experimental results into a single theoretical framework. We present experimental results from clonogenic assays that demonstrate the importance of defining this novel formulation, and the clarity that is provided when interpreting these results through the lens of this formulation.

          The effects of mutational process and selection on driver mutations across cancer types        
New Results

The effects of mutational process and selection on driver mutations across cancer types

Daniel TemkoIan TomlinsonSimone SeveriniBenjamin Schuster-BoecklerTrevor Graham


Epidemiological evidence has long associated environmental mutagens with increased cancer risk. However, links between specific mutation-causing processes and the acquisition of individual driver mutations have remained obscure. Here we have used public cancer sequencing data to infer the independent effects of mutation and selection on driver mutation complement. First, we detect associations between a range of mutational processes, including those linked to smoking, ageing, APOBEC and DNA mismatch repair (MMR) and the presence of key driver mutations across cancer types. Second, we quantify differential selection between well-known alternative driver mutations, including differences in selection between distinct mutant residues in the same gene. These results show that while mutational processes play a large role in determining which driver mutations are present in a cancer, the role of selection frequently dominates.


          Extinction Times In Tumor Public Goods Games        

Extinction Times In Tumor Public Goods Games

Philip GerleePhilipp M. Altrock


Cancer evolution and progression are shaped by Darwinian selection and cell-to-cell interactions. Evolutionary game theory incorporates both of these principles, and has been recently as a framework to describe tumor cell population dynamics. A cornerstone of evolutionary dynamics is the replicator equation, which describes changes in the relative abundance of different cell types, and is able to predict evolutionary equilibria. Typically, the replicator equation focuses on differences in relative fitness. We here show that this framework might not be sufficient under all circumstances, as it neglects important aspects of population growth. Standard replicator dynamics might miss critical differences in the time it takes to reach an equilibrium, as this time also depends on cellular birth and death rates in growing but bounded populations. As the system reaches a stable manifold, the time to reach equilibrium depends on cellular death and birth rates. These rates shape evolutionary timescales, in particular in competitive co-evolutionary dynamics of growth factor producers and free-riders. Replicator dynamics might be an appropriate framework only when birth and death rates are of comparable magnitude. Otherwise, population growth effects cannot be neglected when predicting the time to reach an equilibrium, and cellular events have to be accounted for explicitly.


          Mechanistic Modeling Quantifies The Influence Of Tumor Growth Kinetics On The Response To Anti-Angiogenic Treatment        

Mechanistic Modeling Quantifies The Influence Of Tumor Growth Kinetics On The Response To Anti-Angiogenic Treatment

Thomas D. Gaddy, Stacey D. Finley


Tumors exploit angiogenesis, the formation of new blood vessels from pre-existing vasculature, in order to obtain nutrients required for continued growth and proliferation. Targeting factors that regulate angiogenesis, including the potent promoter vascular endothelial growth factor (VEGF), is therefore an attractive strategy for inhibiting tumor growth. Systems biology modeling enables us to identify tumor-specific properties that influence the response to those anti-angiogenic strategies. Here, we build on our previous systems biology model of VEGF transport and kinetics in tumor-bearing mice to include a tumor compartment whose volume depends on the “angiogenic signal” produced when VEGF binds to its receptors on tumor endothelial cells. We trained and validated the model using in vivo measurements of xenograft tumor volume to produce a model that accurately predicts the tumor's response to anti-angiogenic treatment. We applied the model to investigate how tumor growth kinetics influence the response to anti-angiogenic treatment targeting VEGF. Based on multivariate regression analysis, we found that certain intrinsic kinetic parameters that characterize the growth of tumors could successfully predict response to anti-VEGF treatment. This model is a useful tool for predicting which tumors will respond to anti-VEGF treatment, complementing pre-clinical in vivo studies.

          Stochastic model of contact inhibition and the proliferation of melanoma in situ        

Stochastic model of contact inhibition and the proliferation of melanoma in situ

Mauro Cesar C MoraisIzabella StuhlAlan U SabinoWillian W LautenschlagerAlexandre S QueirogaTharcisio C TortelliRoger ChammasYuri SuhovAlexandre F Ramos


Contact inhibition is a central feature orchestrating cell proliferation in culture experiments with its loss being associated with malignant transformation and tumorigenesis. We performed a co-culture experiment with human metastatic melanoma cell line (SK-MEL-147) and immortalized keratinocyte cells (HaCaT). After 8 days a spatial pattern was detected, characterized by the formation of clusters of melanoma cells surrounded by keratinocytes constraining their proliferation. In addition, we observed that the proportion of melanoma cells within the total population has increased. To explain our results we propose a spatial stochastic model (following a philosophy of the Widom-Rowlinson model from Statistical Physics and Molecular Chemistry) where we consider cell proliferation, death, migration, and cell-to-cell interaction through contact inhibition. Our numerical simulations demonstrate that loss of contact inhibition is a sufficient mechanism, appropriate for an explanation of the increase in the proportion of tumor cells and generation of spatial patterns established in conducted experiments.

          A cautionary tale on using tumour growth rate to predict survival        

A cautionary tale on using tumour growth rate to predict survival

Hitesh MistryFernando Ortega


A recurrent question within oncology drug development is predicting phase III outcome for a new treatment using early clinical data. One approach to tackle this problem has been to derive metrics from mathematical models that describe tumour size dynamics termed re-growth rate and time to tumour re-growth. They have shown to be strong predictors of overall survival in numerous studies but there is debate about how these metrics are derived and if they are more predictive than empirical end-points. This work explores the issues raised in using model-derived metric as predictors for survival analyses. Re-growth rate and time to tumour re-growth were calculated for three large clinical studies by forward and reverse alignment. The latter involves re-aligning patients to their time of progression. Hence it accounts for the time taken to estimate re-growth rate and time to tumour re-growth but also assesses if these predictors correlate to survival from the time of progression. We found that neither re-growth rate nor time to tumour re-growth correlated to survival using reverse alignment. This suggests that the dynamics of tumours up until disease progression has no relationship to survival post progression. For prediction of a phase III trial we found the metrics performed no better than empirical end-points. These results highlight that care must be taken when relating dynamics of tumour imaging to survival and that bench-marking new approaches to existing ones is essential.


          Allele Frequency Spectrum in a Cancer Cell Population        

Allele Frequency Spectrum in a Cancer Cell Population

Hisashi Ohtsuki, Hideki Innan


A cancer grows from a single cell, thereby constituting a large cell population. In this work, we are interested in how mutations accumulate in a cancer cell population. We provide a theoretical framework of the stochastic process in a cancer cell population and obtain near exact expressions of allele frequency spectrum or AFS (only continuous approximation is involved) from both forward and backward treatments under a simple setting; all cells undergo cell division and die at constant rates, b and d, respectively, such that the entire population grows exponentially. This setting means that once a parental cancer cell is established, in the following growth phase, all mutations are assumed to have no effect on b or d (i.e., neutral or passengers). Our theoretical results show that the difference from organismal population genetics is mainly in the coalescent time scale, and the mutation rate is defined per cell division, not per time unit (e.g., generation). Except for these two factors, the basic logic are very similar between organismal and cancer population genetics, indicating that a number of well established theories of organismal population genetics could be translated to cancer population genetics with simple modifications.



          Heterogeneity in the tumour size dynamics differentiates Vemurafenib, Dabrafenib and Trametinib in metastatic melanoma        

Heterogeneity in the tumour size dynamics differentiates Vemurafenib, Dabrafenib and Trametinib in metastatic melanoma

Hitesh Mistry, David Orrell, Raluca Eftimie


Molecular heterogeneity in tumours leads to variability in drug response both between patients and across lesions within a patient. These sources of variability could be explored through analysis of routinely collected clinical trial imaging data. We applied a mathematical model of tumour growth to analyse both within and between patient variability in tumour size dynamics to clinical data from three drugs, Vemurafenib, Dabrafenib and Trametinib, used in the treatment of metastatic melanoma. The analysis revealed: 1) existence of homogeneity in drug response and resistance development within a patient; 2) tumour shrinkage rate does not relate to rate of resistance development; 3) Vemurafenib and Dabrafenib, two BRAF inhibitors, have different variability in tumour shrinkage rates. Overall these results show how analysis of the dynamics of individual lesions can shed light on the within and between patient differences in tumour shrinkage and resistance rates, which could be used to gain a macroscopic understanding of tumour heterogeneity.

Keywords: heterogeneity, vemurafenib, dabrafenib, trametinib, melanoma, metastasis 



          Optimal structure of heterogeneous stem cell niche: The importance of cell migration in delaying tumorigenesis        

Optimal structure of heterogeneous stem cell niche: The importance of cell migration in delaying tumorigenesis

Leili ShahriyariAli Mahdipour Shirayeh


Studying the stem cell niche architecture is a crucial step for investigating the process of oncogenesis and obtaining an effective stem cell therapy for various cancers. Recently, it has been observed that there are two groups of stem cells in the stem cell niche collaborating with each other to maintain tissue homeostasis. One group comprises the border stem cells, which is responsible to control the number of non-stem cells as well as stem cells. The other group, central stem cells, regulates the stem cell niche. In the present study, we develop a bi-compartmental stochastic model for the stem cell niche to study the spread of mutants within the niche. The analytic calculations and numeric simulations, which are in perfect agreement, reveal that in order to delay the spread of mutants in the stem cell niche, a small but non-zero number of stem cell proliferations must occur in the central stem cell compartment. Moreover, the migration of border stem cells to the central stem cell compartment delays the spread of mutants. Furthermore, the fixation probability of mutants in the stem cell niche is independent of types of stem cell division as long as all stem cells do not divide fully asymmetrically. Additionally, the progeny of central stem cells have a much higher chance than the progeny of border stem cells to take over the entire niche.

          Collateral sensitivity networks reveal evolutionary instability and novel treatment strategies in ALK mutated non-small cell lung cancer.        

Collateral sensitivity networks reveal evolutionary instability and novel treatment strategies in ALK mutated non-small cell lung cancer.

Andrew DhawanDaniel NicholFumi KinoseMohamed E. AbazeedAndriy MarusykEric B.HauraJacob G. Scott


Drug resistance remains an elusive problem in cancer therapy, particularly with novel targeted therapy approaches. Much work is currently focused upon the development of an increasing arsenal of targeted therapies, towards oncogenic driver genes such as ALK-EML4, to overcome the inevitable resistance that develops as therapies are continued over time. The current clinical paradigm after failure of first line ALK TKI is to administer another drug in the same class. As to which drug however, the answer is uncertain, as clinical evidence is lacking. To address this shortcoming, we evolved resistance in an ALK rearranged non-small cell lung cancer line (H3122) to a panel of 4 ALK tyrosine kinase inhibitors used in clinic, and performed a collateral sensitivity analysis to each of the other drugs. We found that all of the ALK inhibitor resistant cell lines displayed a significant cross-resistance to all other ALK inhibitors. To test for the stability of the resistance phenotypes, we evaluated the ALK-inhibitor sensitivities after drug holidays of varying length (1, 3, 7, 14, and 21 days). We found the resistance patterns to be stochastic and dynamic, with few conserved patterns. This unpredictability led us to an expanded search for treatment options for resistant cells. In this expansion, we tested a panel of 6 more anti-cancer agents for collateral sensitivity among the resistant cells, uncovering a multitude of possibilities for further treatment, including cross-sensitivity to several standard cytotoxic therapies as well as the HSP-90 inhibitors. Taken together, these results imply that resistance to targeted therapy in non-small cell lung cancer is truly a moving target; but also one where there are many opportunities to re-establish sensitivities where there was once resistance.

          Reconstructing phylogenies of metastatic cancers        

Reconstructing phylogenies of metastatic cancers

Johannes G ReiterAlvin P Makohon-Moore, Jeffrey M Gerold, Ivana Bozic, Krishnendu Chatterjee, Christine A Iacobuzio-Donahue, Bert Vogelstein, Martin A Nowak


Reconstructing the evolutionary history of metastases is critical for understanding their basic biological principles and has profound clinical implications. Genome-wide sequencing data has enabled modern phylogenomic methods to accurately dissect subclones and their phylogenies from noisy and impure bulk tumor samples at unprecedented depth. However, existing methods are not designed to infer metastatic seeding patterns. We have developed a tool, called Treeomics, that utilizes Bayesian inference and Integer Linear Programming to reconstruct the phylogeny of metastases. Treeomics allowed us to infer comprehensive seeding patterns for pancreatic, ovarian, and prostate cancers. Moreover, Treeomics correctly disambiguated true seeding patterns from sequencing artifacts; 7% of variants were misclassified by conventional statistical methods. These artifacts can skew phylogenies by creating illusory tumor heterogeneity among distinct samples. Last, we performed in silico benchmarking on simulated tumor phylogenies across a wide range of sample purities (30-90%) and sequencing depths (50-800x) to demonstrate the high accuracy of Treeomics compared to existing methods.

          Evolutionary dynamics of CRISPR gene drives        

Evolutionary dynamics of CRISPR gene drives

Charleston Noble, Jason Olejarz, Kevin Esvelt, George Church, Martin Nowak


The alteration of wild populations has been discussed as a solution to a number of humanity's most pressing ecological and public health concerns. Enabled by the recent revolution in genome editing, CRISPR gene drives, selfish genetic elements which can spread through populations even if they confer no advantage to their host organism, are rapidly emerging as the most promising approach. But before real-world applications are considered, it is imperative to develop a clear understanding of the outcomes of drive release in nature. Toward this aim, we mathematically study the evolutionary dynamics of CRISPR gene drives. We demonstrate that the emergence of drive-resistant alleles presents a major challenge to previously reported constructs, and we show that an alternative design which selects against resistant alleles greatly improves evolutionary stability. We discuss all results in the context of CRISPR technology and provide insights which inform the engineering of practical gene drive systems.



          tHapMix: simulating tumour samples through haplotype mixtures        

tHapMix: simulating tumour samples through haplotype mixtures

Sergii Ivakhno, Camilla Colombo, Stephen Tanner, Philip Tedder, Stefano Berri, Anthony J. Cox


Motivation: Large-scale rearrangements and copy number changes combined with different modes of clonal evolution create extensive somatic genome diversity, making it difficult to develop versatile and scalable variant calling tools and create well-calibrated benchmarks. Results: We developed a new simulation framework tHapMix that enables the creation of tumour sam-ples with different ploidy, purity and polyclonality features. It easily scales to simulation of hundreds of somatic genomes, while re-use of real read data preserves noise and biases present in sequencing platforms. We further demonstrate tHapMix utility by creating a simulated set of 140 somatic genomes and showing how it can be used in training and testing of somatic copy number variant calling tools. Availability and implementation: tHapMix is distributed under an open source license and can be downloaded from https://github.com/Illumina/tHapMix .



          Non-linear tumor-immune interactions arising from spatial metabolic heterogeneity        

Non-linear tumor-immune interactions arising from spatial metabolic heterogeneity

Mark Robertson-TessiRobert J GilliesRobert A GatenbyAlexander RA Anderson


A hybrid multiscale mathematical model of tumor growth is used to investigate how tumoral and microenvironmental heterogeneity affect the response of the immune system. The model includes vascular dynamics and evolution of metabolic tumor phenotypes. Cytotoxic T cells are simulated, and their effect on tumor growth is shown to be dependent on the structure of the microenvironment and the distribution of tumor phenotypes. Importantly, no single immune strategy is best at all stages of tumor growth.


          Toxicity Management in CAR T cell therapy for B-ALL: Mathematical modelling as a new avenue for improvement        

Toxicity Management in CAR T cell therapy for B-ALL: Mathematical modelling as a new avenue for improvement.


          Population heterogeneity in mutation rate increases mean fitness and the frequency of higher order mutants        

Population heterogeneity in mutation rate increases mean fitness and the frequency of higher order mutants

          Dissecting resistance mechanisms in melanoma combination therapy        

Dissecting resistance mechanisms in melanoma combination therapy


We present a compartment model that explains melanoma cell response and resistance to mono and combination therapies. Model parameters were estimated by utilizing an optimization algorithm to identify parameters that minimized the difference between predicted cell populations and experimentally measured cell numbers. The model was then validated with in vitro experimental data. Our simulations show that although a specific timing of the combination therapy is effective in controlling tumor cell populations over an extended period of time, the treatment eventually fails. We subsequently predict a more optimal combination therapy that incorporates an additional drug at the right moment.

          Worth a read: A simple proposal for the publication of journal citation distributions        
This paper in BioRXiv is definitely worth checking out. Abstract is below: Although the Journal Impact Factor (JIF) is widely acknowledged to be a poor indicator of the quality of individual papers, it is used routinely to evaluate research and researchers. Here, we present a simple method for generating the citation distributions that underlie JIFs. … Continue reading Worth a read: A simple proposal for the publication of journal citation distributions
          Should we try to infer trees on tree-unlikely matrices?        

Spermatophyte morphological matrices that combine extinct and extant taxa notoriously have low branch support, as traditionally established using non-parametric bootstrapping under parsimony as optimality criterion. Coiro, Chomicki & Doyle (2017) recently published a pre-print to show that this can be overcome to some degree by changing to Bayesian-inferred posterior probabilities. They also highlight the use of support consensus networks for investigating potential conflict in the data. This is a good start for a scientific community that so far has put more of their trust in either (i) direct visual comparison of fossils with extant taxa or (ii) collections of most parsimonious trees inferred based on matrices with high level of probably homoplasious characters and low compatibility. But do those matrices really require or support a tree? Here, I try to answer this question.


Coiro et al. mainly rely on a recent matrix by Rothwell & Stockey (2016), which marks the current endpoint of a long history of putting up and re-scoring morphology-based matrices (Coiro et al.’s fig. 1b). All of these matrices provide, to various degrees, ambiguous signal. This is not overly surprising, as these matrices include a relatively high number of fossil taxa with many data gaps (due to preservation and scoring problems), and combine taxa that perished a hundred or more millions years ago with highly derived, possibly distant-related modern counterparts.

Rothwell & Stockey state (p. 929) "As is characteristic for the results from the analysis of matrices with low character state/taxon ratios, results of the bootstrap analysis (1000 replicates) yielded a much less fully resolved tree (not figured)." Coiro et al.’s consensus trees and network based on 10,000 parsimony bootstrap replicates nicely depicts this issue, and may explain why Rothwell & Stockey decided against showing those results. When studying an earlier version of their matrix (Rothwell, Crepet & Stockey 2009), they did not provide any support values, citing a paper published in 2006, where the authors state (Rothwell & Nixon 2006, p. 739): “… support values, whether low or high for particular groups, would only mislead the reader into believing we are presenting a proposed phylogeny for the groups in question. Differences among most-parsimonious trees are sufficient to illuminate the points we wish to make here, and support values only provide what we consider to be a false sense of accuracy in these assessments”.

Do the data support a tree?

The problem is not just low support. In fact, the tree showed by Rothwell & Stockey with its “pectinate arrangement” conflicts in parts with the best-supported topology, a problem that also applied to its 2009 predecessor. This general “pectinate” arrangement of a large, low or unsupported grade is not uncommon for strict consensus trees based on morphological matrices that include fossils and extant taxa (see e.g. the more proximal parts of the Tree of Life, e.g. birds and their dinosaur ancestors).

The support patterns indicate that some of the characters are compatible with the tree, but many others are not. Of the 34 internodes (branches) in the shown tree (their fig. 28 shows a strict consensus tree based on a collection of equally parsimonious trees), 12 have lower bootstrap support under parsimony than their competing alternatives (Fig. 1). Support may be generally low for any alternative, but the ones in the tree can be among the worst.

The main problem is that the matrix simply does not provide enough tree-like signal to infer a tree. Delta Values (Holland et al. 2002) can be used as a quick estimate for the treelikeliness of signal in a matrix. In the case of large all-spermatophyte matrices (Hilton & Bateman 2006; Friis et al. 2007; Rothwell, Crepet & Stockey 2009; Crepet & Stevenson 2010), the matrix Delta Values (mDV) are ≥ 0.3. For comparison, molecular matrices resulting in more or less resolved trees have mDV of ≤ 0.15. The individual Delta Values (iDV), which can be an indicator of how well a taxon behaves during tree inference, go down to 0.25 for extant angiosperms – very distinct from all other taxa in the all-spermatophyte matrices with low proportions of missing data/gaps – and reach values of 0.35 for fossil taxa with long-debated affinities.

The newest 2016 matrix is no exception with a mDV of 0.322 (the highest of all mentioned matrices), and iDVs range between 0.26 (monocots and other extant angiosperms) and 0.39 for Doylea mongolica (a fossil with very few scored characters). In the original tree, Doylea (represented by two taxa) is part of the large grade and indicated as the sister to Gnetidae (or Gnetales) + angiosperms (molecular trees associate the Gnetidae with conifers and Ginkgo). According to the bootstrap analysis, Doylea is closest to the extant Pinales, the modern conifers. Coiro et al. found the same using Bayesian inference. Their posterior probability (PP) of a Doylea-Podocarpus-Pinus clade is 0.54, and Rothwell & Stockey’s Doylea-Ginkgo-angiosperm clade conflicts with a series of splits with PPs up to 0.95.

Figure 1. Parsimony bootstrap network based on 10,000 pseudoreplicate trees
inferred from the matrix of Rothwell & Stockey.
Edges not found in the authors’ tree in red, edges also found in the tree in green.
Extant taxa in blue bold font. The edge length is proportional to the frequency of the
according split (taxon bipartition, branch in a possible tree) in the pseudoreplicate
tree sample. The network includes all edges of the authors’ tree except for
Doylea + Gnetidae + Petriellales + angiosperms vs. all other gymnosperms and
extinct seed plant groups. Such a split has also no bootstrap support (BS < 10)
using least-square and maximum likelihood optimum criteria.

Do the data require a tree?

As David made a point in an earlier post, neighbour-nets are not really “phylogenetic networks” in the evolutionary sense. Being unrooted and 2-dimensional, they don’t depict a phylogeny, which has to be a sort of (rooted) tree, a one-dimensional graph with time as the only axis (this includes reticulation networks where nodes can be the crossing point of two internodes rather than their divergence point). The neighbour-net algorithm is an extension into two dimensions of the neighbour-joining algorithm, the latter infers a phylogenetic tree serving a distance criterion such as minimum evolution or least-squares (Felsenstein 2004). Essentially, the neighbour-net is a ‘meta-phylogenetic’ graph inferring and depicting the best and second-best alternative for each relationship. Thus, neighbour-nets can help to establish whether the signal from a matrix, treelike or not as it is the cases here, supports potential and phylogenetic relationships, and explore the alternatives much more comprehensively than would be possible with a strict-consensus or other tree (Fig. 2).

Figure 2. Neighbour-net based on a mean distance matrix inferred
from the matrix of Rothwell & Stockey.
The distance to the "progymnosperms", a potential ancestral group of the
seed plants, can be taken as a measurement for the derivedness of each
major group. The primitive seed ferns are placed between progymnosperms
 and the gymnosperms connected by partly compatible edge bundles; the
putatively derived "higher seed ferns" isolated between the progymnosperms
and the long-edged angiosperms. Shared edge-bundles and 'neighbourness'
reflect quite well potential phylogenetic relationships and eventual ambiguities,
as in the case of Gnetidae. Colouring as in Figure 1; some taxon names
are abbreviated.

In addition, neighbour-nets usually are better backgrounds to map patterns of conflicting or partly conflicting support seen in a bootstrap, jackknife or Bayesian-inferred tree sample. In Fig. 3, I have mapped the bootstrap support for alternative taxon bipartitions (branches in a tree) on the background of the neighbour-net in Fig. 2.

Obvious and less-obvious relationships are simultaneously revealed, and their competing support patterns depicted. Based on the graph, we can see (edge lengths of the neighbour-net) that there is a relatively weak primary but substantial bootstrap support for the Petriellales (a recently described taxon new to the matrix) as sister to the angiosperms. Several taxa, or groups of closely related taxa, are characterised by long terminal edges/edge bundles, rooting in the boxy central part of the graph. Any alternative relationship of these taxa/taxon groups receives equally low support, but there are notable differences in the actual values.

There is little signal to place most of the fossil “seed ferns” (extinct seed plants) in relation to the modern groups, and a very ambiguous signal regarding the relationship of the Gnetidae (or Gnetales) with the two main groups of extant seed plants, the conifers (Pinidae; see C. Earle’s gymnosperm database) and angiosperms (for a list and trees, see P. Stevens’ Angiosperm Phylogeny Website).

The Gnetidae is a strongly distinct (also genetically) group of three surviving genera, being a persistent source of headaches for plant phylogeneticists. Placed as sister to the Pinaceae (‘Gnepine’ hypothesis) in early molecular trees (long-branch attraction artefact), the currently favoured hypothesis (‘Gnetifer’) places the Gnetidae as sister to all conifers (Pinatidae) in an all-gymnosperm clade (including Gingko and possibly the cycads).

As favoured by the branch support analyses, and contrasting with the preferred 2016 tree, the two Doyleas are placed closest to the conifers, nested within a commonly found group including the modern and ancient conifers and their long-extinct relatives (Cordaitales), and possibly Ginkgo (Ginkgoidae). In the original parsimony strict consensus tree, they are placed in the distal part as sister to a Gnetidae and Petriellales + angiosperms (possibly long-branch attraction). The grade including the ‘primitive seed ferns’ (Elkinsia through Callistophyton), seen also in Rothwell and Stockey’s 2016 tree, may be poorly supported under maximum parsimony (the criterion used to generate the tree), but receives quite high support when using a probabilistic approach such as maximum likelihood bootstrapping or Bayesian inference to some degree (Fig. 3; Coiro, Chomicki & Doyle 2017).

Figure 3. Neighbour-net from above used to map alternative support patterns.
Numbers refer to non-parametric bootstrap (BS) support for alternative phylogenetic
splits under three optimality criteria: maximum likelihood (ML) as implemented in
RAxML (using MK+G model), maximum parsimony (MP), and least-squares
(via neighbour-joining, NJ; using PAUP*); and Bayesian posterior probabilties
(using MrBayes 3.2; see Denk & Grimm 2009, for analysis set-up). The circular
arrangement of the taxa allows tracking most edges in the authors’ tree and their,
sometimes better supported, alternatives. The edge lengths provide direct
information about the distinctness of the included taxa to each other; the structure
of the graph informs about the how tree-like the signal is regarding possible
phylogenetic relationships or their alternatives. Colouring as in Figure 1;
some taxon names are abbreviated.

Numerous morphological matrices provide non-treelike signals. A tree can be inferred, but its topology may be only one of many possible trees. In the framework of total evidence, this may be not such a big problem, because the molecular partitions will predefine a tree, and fossils will simply be placed in that tree based on their character suites. Without such data, any tree may be biased and a poor reflection of the differentiation patterns.

By not forcing the data in a series of dichotomies, neighbour-nets provide a quick, simple alternative. Unambiguous, well-supported branches in a tree will usually result in tree-like portions of the neighbour net. Boxy portions in the neighbour-net pinpoint the ambiguous or even problematic signals from the matrix. Based on the graph, one can extract the alternatives worth testing or exploring. Support for the alternatives can be established using traditional branch support measures. Since any morphological matrix will combine those characters that are in line with the phylogeny as well as those that are at odds with it (convergences, character misinterpretations), the focus cannot be to infer a tree, but to establish the alternative scenarios and the support for them in the data matrix.


Coiro M, Chomicki G, Doyle JA. 2017. Experimental signal dissection and method sensitivity analyses reaffirm the potential of fossils and morphology in the resolution of seed plant phylogeny. bioRxiv DOI:10.1101/134262

Crepet WL, Stevenson DM. 2010. The Bennettitales (Cycadeoidales): a preliminary perspective of this arguably enigmatic group. In: Gee CT, ed. Plants in Mesozoic Time: Morphological Innovations, Phylogeny, Ecosystems. Bloomington: Indiana University Press, pp. 215-244.

Denk T, Grimm GW. 2009. The biogeographic history of beech trees. Review of Palaeobotany and Palynology 158: 83-100.

Felsenstein J. 2004. Inferring Phylogenies. Sunderland, MA, U.S.A.: Sinauer Associates Inc.

Friis EM, Crane PR, Pedersen KR, Bengtson S, Donoghue PCJ, Grimm GW, Stampanoni M. 2007. Phase-contrast X-ray microtomography links Cretaceous seeds with Gnetales and Bennettitales. Nature 450: 549-552 [all important information needed for this post is in the supplement to the paper; a figure showing the actual full analysis results can be found at figshare]

Hilton J, Bateman RM. 2006. Pteridosperms are the backbone of seed-plant phylogeny. Journal of the Torrey Botanical Society 133: 119-168.

Holland BR, Huber KT, Dress A, Moulton V. 2002. Delta Plots: A tool for analyzing phylogenetic distance data. Molecular Biology and Evolution 19: 2051-2059.

Rothwell GW, Crepet WL, Stockey RA. 2009. Is the anthophyte hypothesis alive and well? New evidence from the reproductive structures of Bennettitales. American Journal of Botany 96: 296–322.

Rothwell GW, Nixon K. 2006. How does the inclusion of fossil data change our conclusions about the phylogenetic history of the euphyllophytes? International Journal of Plant Sciences 167: 737–749.

Rothwell GW, Stockey RA. 2016. Phylogenetic diversification of Early Cretaceous seed plants: The compound seed cone of Doylea tetrahedrasperma. American Journal of Botany 103: 923–937.

Schliep K, Potts AJ, Morrison DA, Grimm GW. 2017. Intertwining phylogenetic trees and networks. Methods in Ecology and Evolution DOI:10.1111/2041-210X.12760.

          Cichlids, species and trees        
Lake Malawi, in south-eastern Africa, is famous for its large diversity of cichlid fishes. Indeed, it sometimes seems to have more biologists studying these fish than there are actual fish in the lake, even though there are allegedly hundreds of cichlid fish species in that lake. In this sense, it is somewhat similar to Lake Baikal, in southern Siberia, home to the sole species of freshwater seals.

The cichlid biologists are interested in describing the extensive fish diversity, pondering its origin, and thus its contribution to the study of speciation. After all, we are talking about what is usually claimed to be "the most extensive recent vertebrate adaptive radiation". So, we are talking here as much about population genetics as we are about ichthyology.

Inevitably, the genome biologists have been spotted in the vicinity of the lake; and we now have a preliminary report from them:
Milan Malinsky, Hannes Svardal, Alexandra M. Tyers, Eric A. Miska, Martin J. Genner, George F. Turner, Richard Durbin (2017) Whole genome sequences of Malawi cichlids reveal multiple radiations interconnected by gene flow. BioRxiv 143859.
These authors summarize the situation like this:
We characterize [the] genomic diversity by sequencing 134 individuals covering 73 species across all major lineages. Average sequence divergence between species pairs is only 0.1-0.25%. These divergence values overlap diversity within species, with 82% of heterozygosity shared between species. Phylogenetic analyses suggest that diversification initially proceeded by serial branching from a generalist Astatotilapia-like ancestor. However, no single species tree adequately represents all species relationships, with evidence for substantial gene flow at multiple times.
The last sentence seems to be somewhat disingenuous. How could a single tree be expected to describe this scale of biodiversity? Any rapid radiation of diversity is unlikely to be completely tree-like. The increase in diversity can be modeled as a tree, sure, but it is very unlikely that there will be instant separation of the taxa, and so the tree model will be ignoring a large part of the evolutionary action. There will, for example, be ongoing introgression between the diverging taxa, as well as hybridization due to incomplete breeding barriers. These avenues for gene flow can best be modeled as a network, not a tree.

The issue here is that the authors write the paper solely from the perspective of an expected phylogenetic tree, and then feel compelled to explain why they do  not produce such a tree. Indeed, the authors present their paper as a study of "violations of the species tree concept".

For data analysis, they proceed as follows:
To obtain a first estimate of between-species relationships we divided the genome into 2543 non-overlapping windows, each comprising 8000 SNPs (average size: 274kb), and constructed a Maximum Likelihood (ML) phylogeny separately for each window, obtaining trees with 2542 different topologies.
So, only two sequence blocks produced the same tree, presumably by random chance. An example "tree" for 12 OTUs is shown in the diagram. It superimposes a possible mitochondrial trees on a summary of the "genome tree".

Example phylogeny from Malinsky (2012)

The authors continue:
The fact that we are using over 25 million variable sites suggests these differences are not due to sampling noise, but reflect conflicting biological signals in the data. For example, gene flow after the initial separation of species can distort the overall phylogeny and lead to intermediate placement of admixed taxa in the tree topology.
Note that gene flow is seen to "distort" the phylogeny rather than being an integral part of it. In this case, "phylogeny" apparently refers solely to the diversification part evolutionary history, rather than to the whole history.

The ultimate questions from this paper are: "what is a species concept?", and "what is a species tree?". The authors write a lot about species and trees, and yet their data provide very clear evidence that both "species" and "tree" are very restrictive concepts for studying the cichlids of Lake Malawi.

Coincidentally, another recent paper tackles the same problems:
Britta S. Meyer, Michael Matschiner, Walter Salzburger (2017) Disentangling incomplete lineage sorting and introgression to refine species-tree estimates for Lake Tanganyika cichlid fishes. Systematic Biology 66: 531-550.
The authors describe their work, on the same fish group but in a lake further north-west, as follows:
Because of the rapid lineage formation in these groups, and occasional gene flow between the participating species, it is often difficult to reconstruct the phylogenetic history of species that underwent an adaptive radiation. In this study, we present a novel approach for species-tree estimation in rapidly diversifying lineages, where introgression is known to occur, and apply it to a multimarker data set containing up to 16 specimens per species for a set of 45 species of East African cichlid fishes (522 individuals in total), with a main focus on the cichlid species flock of Lake Tanganyika. We first identified, using age distributions of most recent common ancestors in individual gene trees, those lineages in our data set that show strong signatures of past introgression ... We then applied the multispecies coalescent model to estimate the species tree of Lake Tanganyika cichlids, but excluded the lineages involved in these introgression events, as the multispecies coalescent model does not incorporate introgression. This resulted in a robust species tree.
Once again, phylogeny = species tree.

          Bayesian inference of phylogenetic networks        

Over the years, a number of methods have been explored for constructing evolutionary networks, starting with parsimony criteria for optimization, and moving on to likelihood-based inference. However, the development of Bayesian methods has been somewhat delayed by the computational complexities involved.

Network from Radice (2012)

The earliest work on this topic seems to be the thesis of:
Rosalba Radice (2011) A Bayesian Approach to Phylogenetic Networks. PhD thesis, University of Bath, UK.
Apparently, the only part of this work to be published has been:
Rosalba Radice (2012) A Bayesian approach to modelling reticulation events with application to the ribosomal protein gene rps11 of flowering plants. Australian & New Zealand Journal of Statistics 54: 401-426.
The method described requires the prior specification of the species tree (phylogeny), and the position and number of the reticulation events. The algorithm was implemented in the R language.

More recently, methods have been developed that infer phylogenies by using (i) incomplete lineage sorting (ILS) to model gene-tree incongruence arising from vertical inheritance, and (ii) introgression / hybridization to model gene-tree incongruence attributable to horizontal gene flow. ILS has been addressed using the multispecies coalescent.

The first of these publications was:
Dingqiao Wen, Yun Yu, Luay Nakhleh (2016) Bayesian inference of reticulate phylogenies under the multispecies network coalescent. PLoS Genetics 12(5): e1006006. [Correction: 2017 PLoS Genetics 13(2): e1006598]
The method requires the set of gene trees as input, along with the number of reticulations. The algorithm was implemented in the PhyloNet package.

In the past few months, two manuscripts have appeared that try to co-estimate the gene trees and the species network, using the original sequence data (assumed to be without recombination) as input:
Dingqiao Wen, Luay Nakhleh (2017) Co-estimating reticulate phylogenies and gene trees from multi-locus sequence data. bioRxiv 095539. [v.2; v.1: 2016]
Chi Zhang, Huw A Ogilvie, Alexei J Drummond, Tanja Stadler (2017) Bayesian inference of species networks from multilocus sequence data. bioRxiv 124982.
The algorithm for the first method has been implemented in the PhyloNet package, while the second has been implemented in the Beast2 package.

Finally, another manuscript describes a method utilizing data based on single nucleotide polymorphisms (SNPs) and/or amplified fragment length polymorphisms (AFLPs), which thus sidesteps the assumption of no recombination:
Jiafan Zhu, Dingqiao Wen, Yun Yu, Heidi Meudt, Luay Nakhleh (2017) Bayesian inference of phylogenetic networks from bi-allelic genetic markers. bioRxiv 143545.
This method has also been implemented in PhyloNet.

Due to the computational complexity of likelihood inference, all of these methods are currently severely restricted in the number of OTUs that can be analyzed, irrespective of whether these involve multiple samples from the same species or not. In this sense, parsimony-based inference or approximate likelihood methods are still useful for constructing evolutionary networks of any size. However, progress is clearly being made to alleviate the computational restrictions.

          A test case for phylogenetic methods and stemmatics: the Divine Comedy        

In a previous post I gave an outline of stemmatics, and briefly touched on the adoption and advantages of phylogenetic methods for textual criticism (On stemmatics and phylogenetic methods). Here I present the results of an empirical investigation I have been conducting, in which such methods are used to study some philological dilemmas of a cornerstone work in textual criticism, Dante Alighieri's Divine Comedy. I am reproducing parts of the text and the results of a paper still under review; the NEXUS file for this research is available on GitHub.

Before describing the analysis, I discuss the work and its tradition, as well as some of the open questions concerning its textual criticism. This should not only allow the main audience of this blog to understand (and perhaps question) my work, but it is also a way to familiarize you with the kind of research conducted in stemmatics. After all, the first step is the recensio, a deep review of all information that can be gathered about a work.

The Divine Comedy

The Divine Comedy is an Italian medieval poem, and one of the most successful and influential medieval works. It is written in a rigid structure that, when compared to other works, guaranteed it a certain resistance to copy errors, as most changes would be immediately evident. Composed of three canticas (Inferno, Purgatory, and Paradise), the first of its 100 cantos were written in 1306-07, with the work completed not long before the death of the author in 1321. Written mostly during Dante's exile from his home city, Florence (Tuscany), like many works of the time it was published as the author wrote it, and not only upon completion. In fact, it is even possible, while not proven, that the author changed some cantos and published revisions, thus being himself the source of unresolvable differences.

No original manuscript has survived, but scholarship has traced the development of the tradition from copies and historical research. The poem is one of the most copied works of the Middle Ages, with more than 600 known complete copies, besides 200 partial and fragmentary witnesses. For of comparison, there are around 80 copies of Chaucer's Canterbury Tales,which is itself a successful work by medieval standards

Commercial enterprises soon developed to attend the market demand of its success. In terms of geographical diffusion, quantitative data suggests that, before the Black Death that ravaged the city of Florence in 1348, scribal activity was more intense in Tuscany than in Northern Italy, where the author had died. Among the hypotheses for its textual evolution, the results of my investigation support the widespread hypothesis that Dante published his work with Florentine orthography in Northern Italy. That is, the first copies adopted Northern orthographic standards, which would then revert to Tuscan customs, with occasional misinterpretations, when the work found its way back to Florence. These essentials of the transmission must be considered when curating a critical edition, as the less numerous Northern manuscripts, albeit with an adapted orthography, can in general be assumed to be closer to the archetype (if there ever was one to speak of) than Florentine ones.

The tradition is characterized by intentional contamination, as the work soon became a focus of politics and grammar prescriptivism. Errors and contamination have already been demonstrated in the earliest securely dated manuscript, the Landiano of 1336 (cf. Shaw, 2011), and can be already identified in the first commentaries dating from the 1320s (such as in the one by Jacopo Alighieri, the author's son).

Critical studies

Here are some details about previous studies. I have included considerable stemmatic information, but I include a biological analogy to help make sense for non-experts.

The first critical editions date from the 19th century, but a stemmatic approach would only be advanced at the end of that century, by Michele Barbi. Facing the problem of applying Lachmann's method to a long text with a massive tradition, in 1891 Barbi proposed his list of around 400 loci (samples of the text), inviting scholars to contribute the readings in the manuscripts they had access to. His project, which intended to establish a complete genealogy without the need for a full collatio, had disappointing results, with only a handful of responses. Mario Casella would later (1921) conduct the first formal stemmatic study on the poem, grouping some older manuscripts in two families, α and β, of unequal number of witnesses but equal value for the emendatio. His two families are not rooted at a higher level, but he observed that they share errors supporting the hypothesis of a common ancestor, likely copied by a Northern scribe.

Casella's stemma, reproduced from Shaw (2011).

Forty years later, Giorgio Petrocchi proposed to overcome the large stemma by employing only witnesses dating from before the editorial activity of Giovanni Boccaccio, as his alterations and influence were considered to be too pervasive. Petrocchi defended a cut-off date of 1355 as being necessary for a stemmatic approach that would otherwise have been impossible, given the level of contamination of later copies. The restriction in the number of witnesses was contrasted by his expansion of the collatio to the entire text, criticizing Barbi's loci as subjective selections for which there was no proof of sufficiency.

Making use of analogies with biology, we may say that Barbi proposed to establish a tree from a reduced number of "proteins" for all possible "taxa". Casella considered this to be impracticable and, selecting a few representative "fossils", built a tree from a large number of phenotypic characteristics. Finally, Petrocchi produced a network while considering the entire "genome" for all "fossils" dated from before an event that, while well-supported in theory (we could compare its effects to a profound climate change), was nonetheless arbitrary.

Petrocchi's stemma, reproduced from Shaw (2011).

Questions about Petrocchi's methodology and assumptions were soon raised, particularly regarding the proclaimed influence of Boccaccio, without quantitative proofs either that his editions were as influential as asserted or that all later witnesses were superfluous for stemmatics. Later research focused on questioning his stemma. For example, the absence of consensus about the relationship between the Ash and Ham manuscripts, the supposedly weak demonstration of the polytomy of Mad, Rb, and Urb (the "Northern manuscripts"), and the dating of Gv (likely copied fifty to a hundred years after Petrocchi's assumption). Evidence was presented that Co, a key manuscript in his stemma, could not be an ancestor of Lau (its copyist was still active in the 15th century), and that Ga contained disjunctive errors not found in its supposed decedents. Abusing once more of the biological analogy, the dating of his "fossils" was in some cases plainly wrong.

Federico Sanguineti presented an alternative stemma in 2001, arguing that a rigorous application of stemmatics would evidence errors in Petrocchi. To that end, he decided to resurrect Barbi's loci and trace the first complete genealogy, without arbitrary and a priori decisions about the usefulness of the textual witnesses. Sanguineti defended the suggestion that, after this proper recensio, a small number of manuscripts (which he eventually set to seven) would be sufficient for emendation. His stemma, described as "optimistic in its elegance and minimalism" (Shaw 2011), resulted in a critical edition that heavily relied in a single manuscript, Urb, the only witness of his β family (as Rb was displaced from the proximity it had in Petrocchi's stemma, and Mad was excluded from the analysis). Keeping with the biological analogy, he proposed building a tree from an extremely reduced number of "proteins", but for all "taxa". In the end, however, the reduced number of "proteins" was considered only for seven "taxa", selected mostly due to their age.

Sanguineti's stemma, reproduced from Shaw (2011).

The edition of Sanguineti was attacked by critics, who confronted the limited number of manuscripts used in the emendatio, the position of Rb, the high value attributed to LauSC, and the unparalleled importance of Urb, all resulting in an unexpected Northern coloring to the language of a Florentine writer. Regarding his methodology, reviewers pointed out that stemmatic principles had not been followed strictly, as the elimination was not restricted to descripti, but extendied to branches that were considered to be too contaminated

The digital edition of Prue Shaw (2011) was developed as a project for phylogenetic testing of Sanguineti's assumptions. Her edition includes complete manuscript transcriptions, and the transcriptions include all of the layers of revision of each manuscript (original readings and corrections by later hands), and are complemented by high-quality reproductions of the manuscripts. After testing the validity of Sanguineti's method and stemma, Shaw concluded that his claims do not "stand up to close scrutiny", and that the entire edition is compromised, because Rb "is shown unequivocally to be a collaterale of Urb, and not a member of α as [Sanguineti] maintains".

Applying phylogenetic methods

With the goal of following and, to a large part, replicating Shaw (2011), I have analyzed signals of phylogenetic proximity for validating stemmatic hypotheses, produced both a computer-generated and a computer-assisted phylogeny (equivalent to a stemma), and evaluated the performance of suchphylogenies with methods of ancestral state reconstruction.

I wanted to investigate the proximity of witnesses and the statistical support for the published stemmas. After experiments with rooted graphs, I made a decision to use NeighborNets, in which splits are indicative of observed divergences and edge lengths are proportional to the observed differences. These unrooted split networks were preferable because they facilitated visual investigation, and also provided results for the subsequent steps. These involved exploring the topology and evaluating potential contaminations, guiding the elimination of taxa whose data would be redundant for establishing prior hypotheses on genealogical relationships. Analyses were conducted using all manuscript layers and critical editions, both with and without bootstrapping, thus obtaining results supported in terms of inferred trees as well as of character data.

NeighborNet of the manuscripts and revisions from my data, generated with SplitsTree
(Huson & Bryant 2006)

The analysis confirmed most of the conclusions of Shaw (2011) — there are no doubts about the proximity and distinctiveness of Ash and Ham, with Sanguineti's hypothesis (in which they are collaterals) better supported than Petrocchi's hypothesis (in which the first is an ancestor of the second). The proximity of Mart and Triv was confirmed; but the position of the ancestors postulated by Petrocchi and Sanguineti should be questioned in face of the signals they share with LauSC, perhaps because of contamination. The most important finding, in line with Shaw and in contrast with the fundamental assumption of Sanguineti, is the clear demonstration of the relationship between Rb and Urb.

The relationship analyses allowed the generation of trees for further evaluation. Despite the goal of a full Bayesian tree-inference, I discarded that option because, without a careful and demanding selection of priors, it would yield flawed results. As such, I made the decision to build trees using both stochastic inference and user design (ie. manually). This postponed more complex topology analyses for future research, but generated the structures needed by the subsequent investigation steps; both trees are included in the datafile.

The second tree (shown below), allowing polytomies and manually constructed by myself, tries to combine the findings of Petrocchi and Sanguineti by resolving their differences with the support of the relationship analyses. Using Petrocchi's edition as a gold standard, and considering only single hypothesis reconstructions, parsimonious ancestral state reconstruction agree with 9,016 characters (79.9%). When considering multiple hypotheses, instead, reconstructions agree with 10,226 characters (90.7%). Cases of disagreement were manually analyzed and, as expected, most resulted from readings supported by the tradition but refuted by Petrocchi on exegetic grounds.

My proposed tree for the manuscripts selected by Sanguineti,
generated with PhyD3 (Kreft et al., 2017).

This tree suggests that, in general, Petrocchi's network is better supported than the tree by Sanguineti, as phylogenetic principles lead us to expect — the first was built considering statistical properties and using all of available data, while the second relied in many intuitions and hypothesis never really tested. In particular, it supports the findings of Shaw and, as such, allows us to indicate the critical edition of Petrocchi as the best one. Even more important, however, it is a further evidence of the usefulness of phylogenetic methods, when appropriately used, in stemmatics.


Alagherii, Dantis (2001) Comedìa. Edited by Federico Sanguineti. Firenze: Edizioni del Galluzzo.

Alighieri, Dante (1994) La Commedia Secondo L’antica Vulgata: Introduzione. Edited by Giorgio Petrocchi. Opere di Dante Alighieri v. 1. Firenze: Le Lettere.

Huson, Daniel H.; Bryant, David (2006) Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution 23: 254–267.

Inglese, Giorgio (2007) Inferno, Revisione del testo e commento. Roma: Carocci.

Kreft, Lukasz; Botzki, Alexander; Coppens, Frederik; Vandepoele, Klaas; Van Bel, Michiel (2017) PhyD3: a Phylogenetic Tree Viewer with Extended PhyloXML Support for Functional Genomics Data Visualization. BioRxiv. Doi: 10.1101/107276.

Leonardi, Anna M.C. (1991) Introduzione. In: La Divina Commedia, by Dante Alighieri. Milano: Arnoldo Mondadori Editore.

Shaw, Prue (2011) Commedia: a Digital Edition. Birmingham: Scholarly Digital Editions.

Trovato, Paolo (2016) Metodologia editoriale per la Commedia di Dante Alighieri. Ferrara. https://www.youtube.com/watch?v=BfKUOAR9PXA. Date of access: March 19, 2017.

          â€œIngredients in Victoria’s Secret Bombshell and Ivanka Trump eaux de parfums that repel mosquitoes” [research study]        
Humanity’s war against mosquitoes still rages, after the failure of another new strategy. This study gives details of the failure: “Ingredients in Victoria’s Secret Bombshell and Ivanka Trump eaux de parfums that repel mosquitoes,” Fangfang Zeng, Pingxi Xu, Kaiming Tan, Paulo Zarbin, Walter Leal, BioRxiv, August 3, 2017, doi.org/10.1101/172304. The authors, at the University of California Davis and Universidade Federal do Paraná, Brazil, report: “We […]
          Science to Participate in bioRxiv’s Manuscript Transfer Service        

Authors will have the opportunity to submit their manuscripts directly for consideration to Science.

          Authorea and BioRxiv Partner to Bring Preprints into 21st Century        

Authorea, the collaborative document editor for researchers, announced a partnership and direct submission agreement with bioRxiv, the leading preprint server for biological research.

          Comment on Protocols.io Tools for PLOS Authors: Reproducibility and Recognition by Lenny Teytelman        
Dear Dr. Eysenbach, Our protocols.io is not a journal but a repository. Like Dryad for data or GitHub for code, we are a repository for protocols that accompany published papers. We are also like bioRxiv for protocol preprints. We do not do peer review and are not a journal. We are not in competition with your JRP, Nature Protocols, Current Protocols, Nature Methods, MethodsX, Bio-protocol, or any of the other method-centered journals. Kind regards, Lenny Teytelman CEO, protocols.io
          #ASAPbio and bioRxiv        

Back in 2013, the researchers at Cold Spring Harbor Laboratory decided to emulate the physicists using arXiv and create a pre-print repository for biological papers. They called it bioRxiv. Use increased slowly, for several reasons. Biologists didn’t want their work … Continue reading

          Shannon’s paper available as preprint        
The lab’s latest manuscript is out for peer review. Before it is accepted for publication, you can read the preprint here. Congrats to Shannon and the lab for some great work! Let’s hope the reviewers agree. For more info on preprints, click here for bioRxiv, the preprint server for biology.
          Comment on Rewiring Neuroscience: Letter To A Young Researcher by Benyamin        
All very good points, though it does seem like there is a growing movement away from the old publishing model through platforms like http://www.biorxiv.org/. Hopefully it is the start of a more open and collaborative publishing model.
          TWiV 410: Hurricane Zika        

Guests: Sharon Isern and Scott Michael

Sharon and Scott join the TWiV team to talk about their work on dengue antibody-dependent enhancement of Zika virus infection, and identifying the virus in mosquitoes from Miami.


Become a patron of TWiV!

Links for this episode

This episode is brought to you by CuriosityStream, a subscription streaming service that offers over 1,400 documentaries and non­fiction series from the world's best filmmakers. Get unlimited access starting at just $2.99 a month, and for our audience, the first two months are completely free if you sign up at curiositystream.com/microbe and use the promo code MICROBE.

This episode is also brought to you by Drobo, a family of safe, expandable, yet simple to use storage arrays. Drobos are designed to protect your important data forever. Visit www.drobo.com to learn more. Listeners can save $100 on a Drobo system at drobostore.com by using the discount code Microbe100.

Weekly Science Picks

Sharon - Zika virus comics and cartoons and Florida weekly arbovirus reports
Scott - Real-time tracking of Zika virus evolution
Alan - Evolution of antibiotic resistance on a mega plate
 -  Windytv
Kathy - Zika virus map and timeline
Vincent - 
Ohsumi Nobel advanced information and HR 5325 funding breakdown

Send your virology questions and comments to twiv@microbe.tv

          TWiV 399: Zika la femme        

IMG_3202.jpgHosts: Vincent Racaniello, Dickson Despommier, Alan Dove, and Rich Condit

The latest Zika virus news from the ConTWiVstadors, including a case of female to male transmission, risk of infection at the 2016 summer Olympics, a DNA vaccine, antibody-dependent enhancement by dengue antibodies, and sites of replication in the placenta.

Links for this episode

This episode is sponsored by CuriosityStream. Get two months free when you sign up at curiositystream.com/microbe and use the promo code MICROBE.

Become a patron of TWiV.

Weekly Science Picks

Alan - CDC postmortem on Ebolavirus outbreak
Rich - Refutations to anti-vaccine memes (Twitter, Facebook)
Dickson - History of urbanization
Vincent - How to cut subject from background in Photoshop

Listener Picks

Marion - Skeptics Guide to the Universe podcast
Jennie - Leatherback turtles in Costa Rica

Send your virology questions and comments to twiv@microbe.tv

          TWiV 388: What could possibly go wrong?        

Hosts: Vincent Racaniello, Dickson Despommier, Alan Dove, and Kathy Spindler

Preprint servers, the structure of an antibody bound to Zika virus, blocking Zika virus replication in mosquitoes with Wolbachia, and killing carp in Australia with a herpesvirus are topics of this episode hosted by Vincent, Dickson, Alan, and Kathy.

Links for this episode

This episode is sponsored by CuriosityStream. Get two months free when you sign up at curiositystream.com/microbe and use the promo code MICROBE.

Also brought to you by ASV 2016

Weekly Science Picks

Dickson - EarthEnv
 - Research funding by lottery
Kathy - Eugenia Cheng Math and Baking
Vincent - Zika Diaries

Listener Picks

Stephen - Virus trading cards
William - Virus trading cards
Norma+Maurice - Virus trading cards
Tom - Virus trading cards

Send your virology questions and comments to twiv@microbe.tv

          TWiV 387: Quaxxed        

Hosts: Vincent Racaniello, Dickson Despommier, Alan Dove, and Kathy Spindler

Guest: Nina Martin

Nina Martin joins the TWiV team to talk about the movie Vaxxed, her bout with dengue fever, and the latest research on Zika virus.


Links for this episode

This episode is sponsored by CuriosityStream. Get two months free when you sign up at curiositystream.com/microbe and use the promo code MICROBE.

Also brought to you by ASV 2016

Weekly Science Picks

Nina - Vaccines and Your Child by Paul Offit and Charlotte Moser
Dickson - The animals of Chernobyl
 - Five rules of lab safety
Kathy - 20 best science images of the year?
Vincent - Massive undersea crab swarm

Listener Picks

JP - Global map of wind
Todd - Doc Martin

Send your virology questions and comments to twiv@microbe.tv

          TWiV 367: Two sides to a Coyne        

Hosts: Vincent Racaniello, Dickson Despommier, Alan Dove, Rich Condit, and Kathy Spindler

Guests: Carolyn Coyne and Coyne Drummond

Two Coynes join the TWiV overlords to explain their three-dimensional culture model of polarized intestinal cells for studying enterovirus infection.


Links for this episode

This episode is sponsored by ASM Microbe

Weekly Science Picks

Kathy - Tardigrade genome sequence (video)
Alan - Antibiotic action nonprofit group
Vincent - Ex Machina and genome editing moratorium
Rich - Launch photography by Ben Cooper
Kathy - HIV life cycle in video (paper)
Dickson - 2015 Nobel Lectures Physiology or Medicine
Carolyn - Metapneumovirus entry

Listener Picks

Tom - Global host-pathogen database
Trudy - Madame Curie by Eve Curie

Send your virology questions and comments to twiv@twiv.tv

          H11 Newsletter, Volume 1, Issue 2, 2017        

H11 Newsletter
 Volume 1 Issue 2 2017

Table of Contents
1.   Background
2.   FT DNA Project
3.   Project Statistics
4.   Family Finder
5.   Upgrading of reports by FT DNA
6.   Recent publication – The Genetic History of Northern Europe, Mittnik et al
7.   Blood of the Isles Database – Dr. Bryan Sykes
8.   Genetic Genealogy in Practice, Blaine T Bettinger and Debbie Parker Wayne

1.   Background:

When I introduced the H11 Newsletter in February 2011 I was not sure whether I would try for twice a year or four times per year. I have decided that visiting and reviewing the project four times a year is probably better and I will attempt to stay with that schedule. Any items of interest to members of the H11 haplogroup for submission in this newsletter please submit to Elizabeth Kipp (kippeeb@rogers.com) .

2.   FT DNA Project:

There are now 221 members in our H11 project. Full sequence results are completed on 195 members of the group. There are some members who have not taken any mtDNA tests but I have not removed them from the project as eighteen of them have tested with the Genographic Project but not yet at FT DNA. There are six kits recently purchased not yet returned.

3.   Project Statistics (yDNA statistics removed):

Combined GEDCOMs Uploaded
DISTINCT mtDNA Haplogroups
Family Finder
Genographic 2.0 Transfers
Maternal Ancestor Information
mtDNA Full Sequence
mtDNA Plus
mtDNA Subgroups
Total Members
Unreturned Kits

4.   Family Finder:

Family Finder Results within the Project may well prove to be interesting but privacy concerns prevents me from sharing any of these results with the group. However, you are able to go into your own projects and see the Family Finder Results that you have. 

5.   Upgrading of reporting by FT DNA:

FT DNA has now upgraded their assignment of haplogroups to reflect the latest phylotree:

Within the study group we have members in every sub-haplogroup except H11a5 (and it can be seen in the chart above that the mutation C15040T marks this subgrouping). It is very helpful that FT DNA has now started to assign members based on this version of the phylotree.
6.   Recent publication – The Genetic History of Northern Europe, Mittnik et al.

Thank you to one of the members of our study who sent this link to me. With the Eupedia article the following mention of H11:

"H11 is found across most of northern, central and eastern Europe, but also in Central Asia, where it might have been propagated by the Indo-European migrations (see below). H11a was identified in a Mesolithic hunter-gatherer from the Narva culture in Lithuania by Mittnik et al."

7.   Blood of the Isles Database – Dr. Bryan Sykes:

One of my first introductions to H11 was in the Blood of the Isles Database back in 2007. This database actually had two members who shared my mutations which prior to finding these entries I had only seen several matches that I shared on FT DNA. I am in the process of testing at Living DNA and knowing the locations for all of my lines back into the mid 1600s with the exception of my mtDNA line I am curious what I might discover. The location for the two samples in the Blood of the Isles Database was Argyllshire/Ayrshire, Scotland. Over the years I have found two other individuals who trace back to this area and share my mutations. A number of individuals who share these same mutations trace back to County Antrim, Ireland. One of these individuals is descendant of the group of emigrants who came with the Rev William Martin to the Carolinas in 1772 and have an ancestry that goes back to Agryllshire/Ayrshire. 

8.   Genetic Genealogy in Practice, Blaine T Bettinger and Debbie Parker Wayne:

A member of the study has mentioned this particular book (which I have also purchased and I am in the process of working my way through it) and on page 57-58 he brought to my attention “James Lick’s mtHap Haplogroup Analysis tool” and the website url dna.jameslick.com/mthap/ which may interest readers of this newsletter. The explanation of the colours and terminology, etc. :

You use this particular tool in conjunction with your FASTA file. If you have completed the complete genetic scan of the mitochondria then your FASTA file can be downloaded from your mtDNA – Results page, scroll down to the bottom of this page and on the lower right hand side “Download FASTA File.”

Any submissions to this newsletter can be submitted to Elizabeth Kipp (kippeeb@rogers.com).

          Pemodelan sistem akar tumbuhan        
Buat ibu dan bapak dari ilmu hayati, berikut artikel tentang piranti lunak pemodelan sistem akar tumbuhan yang ditulis sebagai preprint di server BiorXiv (tautan). Time/date stamp (penanda waktu unggah) yang ada di bagian header adalah salah satu penangkal upaya scooping (duplikasi ide) secara disengaja. Terkait dengan naskah preprint di atas, si penulis juga mengunggah basis datanya secara daring … Continue reading Pemodelan sistem akar tumbuhan
          TWiM #157: Back to the ancestor        

The TWiMbionts explore the role of bacteria in the genesis of moonmilk, and how ancient host proteins can be used to engineer resistance to virus infection.


Vincent Racaniello, Michele Swanson and Elio Schaechter.

Subscribe to TWiM (free) on iPhone, Android, RSS, or by email. You can also listen on your mobile device with the Microbeworld app.

Become a patron of TWiM.

Links for this episode

 Send your microbiology questions and comments (email or recorded audio) to twim@microbe.tv


          TWiM #153: Covert pathogenesis        

The TWiM team ventures into preprint space with an analysis of type VI secretion across human gut microbiomes, and provide insight into urinary tract infection: how bladder exposure to a member of the vaginal microbiota triggers E. coli egress from latent reservoirs.


Vincent Racaniello, Michael Schmidt, Michele Swanson and Elio Schaechter.

Subscribe to TWiM (free) on iPhone, Android, RSS, or by email. You can also listen on your mobile device with the Microbeworld app.

Become a patron of TWiM.

Links for this episode

Send your microbiology questions and comments (email or recorded audio) to twim@microbe.tv

          TWiM #119: Power of one        

Hosts: Vincent Racaniello, Michael Schmidt, and Elio Schaechter.

The microbophiles investigate the ratio of bacterial to human cells in our bodies, and how placing solar panels on a bacterium enables it to carry out photosynthesis.

Subscribe to TWiM (free) on iTunes, Stitcher, Android, RSS, or by email. You can also listen on your mobile device with the Microbeworld app.

Links for this episode 

This episode is sponsored by ASM Grant Writing Institute Online Webinar and 32nd Clinical Virology Symposium

Music used on TWiM is composed and performed by Ronald Jenkees and used with permission.

Send your microbiology questions and comments (email or mp3 file) to twim@twiv.tv.

Thumbnail image: Cell structure of a gram positive bacterium. This vector image is completely made by Ali Zifan - Own work; used information from Biology 10e Textbook (chapter 4, Pg: 63) by: Peter Raven, Kenneth Mason, Jonathan Losos, Susan Singer · McGraw-Hill Education.


          Dramatic Growth of Open Access June 30, 2017        

Correction: DOAJ will soon surpass 2.5 million articles, not a quarter of a billion as originally reported. 


Open access continues to demonstrate robust growth on a global scale, in terms of works that are made available open access, ongoing growth in infrastructure (new repositories, journals, book publishers), strong growth for new initiatives such as SocArxiv, BioRxiv, the Directory of Open Access Books, SCOAP3, as well as ongoing strong growth in established services such as BASE, PubMed / PubMedCentral, Internet Archive (check out the new Collections including a Trump archive and FactChecker), DOAJ (almost 2.5 million articles searchable at the article level), RePEC and arXiv. Ongoing growth in infrastructure and OA policy give every reason to expect this growth to be ongoing.

Open Data Version

Morrison, Heather, 2014, "Dramatic Growth of Open Access", hdl:10864/10660, Scholars Portal Dataverse, V17,


This edition of the Dramatic Growth of Open Access highlights two of the new kids on the OA block - SocArxiv and BioRxiv, modeled on early OA success story arXiv, topping the quarterly growth by percentage with percentage growth of about 30% each! SocArxiv now has 1,200 documents and BioRxiv 12,800.

Similarly, a relative newcomer, the Directory of Open Access Books, is in both first and second place for annual growth by percentage with 68% growth for OA books and 40% of OA publishers in the past year for a total of 8,172 open access books and 217 OA book publishers.

SCOAP3, a global initiative to transform high-energy physics publishing to open access, is showing remarkable growth, 39% in the last year and 8% in the last quarter for a total of 15,790 articles funded.

To celebrate the growth of all OA services two pictures are presented of the growth of the largest collective OA search engine that I am aware of. Together, the 5,000 content providers who contribute metadata to the Bielefeld Academic Search Engine (BASE) have made available over 112 million documents. Around 60% of these are open access, so the number of OA documents in the world can be said to be somewhere about 67 million. BASE also posts their own online statistics table and chart - check it out here.

I wish I had the time to applaud and celebrate the growth of each and every OA service, but with 5,000 services contributing to BASE (and others that don't), if I worked on this 365 days a year I would have to cover 14 initiatives every day. So please feel free to help out by applauding and celebrating the services most relevant to you - the journals in your discipline, your institutional repository, the services you find most helpful to search.

Below you will find tables listing the top services by quarterly (5% or more) and annual growth (10% or more). For the full numbers download the open data version (link above). As usual Internet Archive is well represented, with 5 items in the list of the top 13 services by quarterly growth and the top 18 services by annual growth. Internet Archive also offers 2 intriguing new services under Collections - a Trump Archive with over a thousand videos and a Fact Checker collection with over 400 items, available at https://archive.org/details/tvhttps://archive.org/details/tv

Of course PubMed and PubMedCentral are up there in the growth charts, in this quarter for total number of items (5% quarterly growth) as well as what looks (to me) like hesitant new steps by a substantial number of journals, with a 26% increase in the number of contributing journals that provide some OA and a 14% increase in the number of journals that provide OA to selected articles. The number of journals providing immediate free access and/or all articles open access continues to increase, so this is clearly growth, not backsliding.

DOAJ is included in the top growth services with 14% growth in the number of articles searchable at article level. DOAJ now has over 2.49 million articles searchable at the article level and should soon surpass 2.5 million articles.

arXiv and RePEC are on the list for strong growth in articles, and ROARMAP for growth in OA policies.
Quarterly growth (percentage) June 2017
32% SocArxiv preprints 1,200
29% BioRxiv all articles 12,280
18% # of academic peer-reviewed books (DOAB) 8,172
18% # publishers (DOAB) 217
8% SCOAP3 articles 15,790
8% Internet Archive Software 178,635
7% Video (movies)  (Internet Archive) 3,437,542
7% Texts  (Internet Archive) 12,821,051
5% Images (Internet Archive) 1,476,743
5% # of content providers (BASE) 5,621
5% Audio (recordings)  (Internet Archive) 3,477,033
5% Webpages (Internet Archive) (in billions) 298
5% PubMedCentral (number of items) 4,400,000

Annual growth (percentage) 06/30/17
68% # of academic peer-reviewed books (DOAB) 8,172
40% # publishers (DOAB) 217
39% SCOAP3 number of archives 15,790
34% Video (movies)  (Internet Archive) 3,437,542
33% Internet Archive: Software 178,635
29% # of content providers (BASE) 5,621
27% Texts  (Internet Archive) 12,821,051
26% PMC journals some OA 609
25% Internet Archive: Images 1,476,743
20% # of documents (BASE) 112,458,360
17% Audio (recordings)  (Internet Archive) 3,477,033
17% RePEc journal articles 1,491,037
14% # of articles searchable at article level (DOAJ) 2,493,835
14% PMC select deposit journals 4,296
13% RePEC downloadable 2,143,844
13% Total Policies (ROARMAP) 872
13% PMC # items 4,400,000
10% arXiv  http://arxiv.org/ 1,278,739

 This post is part of the Dramatic Growth of Open Access Series

Feel free to copy and share - with love.  Note that images are compressed by the software to reduce file size, and they are also quickly outdated. You are welcome to use the images, but my recommendation is to download the data and make your own graphics. It's easier than you think with tools like modern spreadsheet software.
          Dramatic Growth of Open Access December 31, 2016        
Download data here


Arguably the best indicator of the global collaborative growth of open access, whether through archives or publications, is the ongoing impressive growth of what we can access through the Bielefeld Academic Search Engine, which surpassed two major milestones in 2016: over 100 million documents (about 60% open access) and 5,000 content providers. The growth rates (22% for documents, 27% for content providers) are particularly impressive given the high pre-existing content rate. This is amazing success not just for BASE, but for all of us. If you've published a thesis through an institutional repository that allows for metadata harvesting, or published an article in a journal that contributes article-level data for metadata harvesting, your contribution is reflected here. This is a meta-level indicator of our global success.

I've added a new metric for medical open access, a keyword search of PubMed for "cancer" for articles with no date limit, last 5 years, last 2 years, and last year, further limited to free fulltext to determine the percentage of items for which fulltext is available. This ranges from 26% overall (no date limit), to 40 - 44% for items published in the last 2 - 5 years, to 32% for articles published in the last year.

Also added this quarter: OECD iLibrary - with more than 11,000 free books, this one publisher's OA collection is nearly double the size of the 167 publishers included in the impressivley growing Directory of Open Access Books! arXiv, in addition to an over 10% growth rate last year, inspired the recent development of two similar services, socArXiv and bioRxiv, newly added to facilitate future growth tracking. The DOAJ get-tough inclusion policy and March 2016 major weeding means the DOAJ count for titles, countries and journals searchable at the article level are all down from last year, while articles searchable at the article level through DOAJ continued to show robust growth of 13%. DOAJ's quarterly growth is back to an impressive rate of just under 3 titles per day. RePEC surpassed a milestone of 2 million downloadable items this year, while Internet Archive surpassed 3 milestones: there are now more than 3 million video and audio recordings, and more than 11 million texts (the number of IA web pages archived is way down, by the billions - such a difference it strikes me as likely due to a glitch in counting, whether before or after). Recently Open Journal Systems announced that OJS is now used by more than 10,000 active journals which <>.

Kudos and thanks to everyone in the open access movement - every researcher, author, editor, publisher, archive manager, librarian, policy-maker, and activist who is making open access happen. What of 2017? My advice: let's remember the beautiful vision of the potential unprecedented public good of open access - forged not at a time of peace and certainty, but rather within months of the trauma of 9/11 - repeated below - and keep on making it happen.

BOAI vision:
An old tradition and a new technology have converged to make possible an unprecedented public good. The old tradition is the willingness of scientists and scholars to publish the fruits of their research in scholarly journals without payment, for the sake of inquiry and knowledge. The new technology is the internet. The public good they make possible is the world-wide electronic distribution of the peer-reviewed journal literature and completely free and unrestricted access to it by all scientists, scholars, teachers, students, and other curious minds. Removing access barriers to this literature will accelerate research, enrich education, share the learning of the rich with the poor and the poor with the rich, make this literature as useful as it can be, and lay the foundation for uniting humanity in a common intellectual conversation and quest for knowledge.

Selected numbers and growth by service:

Directory of Open Access Journals 

Highlights: in March 2016 DOAJ removed more than 3,000 journals, reflecting a new get-tough inclusion policy. All journals that had not gone through DOAJ's new application process were removed. As a result, in spite of robust quarter since the removal process, most of DOAJ's key data are lower at the end of 2016 than at 2015, with the exception of number of articles searchable through DOAJ which grew by 13%.
  • 9,455 journals (down from 10,963 in 2015, a 14% decrease. Note that this quarter DOAJ added 246 journals for a current growth rate of close to 3 titles per day).
  • 6,634 journals searchable at article level (down from 6,780 in 2015, a 2% decrease. Note that this quarter DOAJ increased the number of searchable journals by 217).
  • 2,400,258 articles (up 13% from 2,123,402 at the end of 2015, very impressive given the journal weeding process)
  • 128 countries (down from 136 at the end of 2016)
Electronic Journals Library
  •  55,562 journals that can be read free-of-charge (up from 51,983 at the end of 2017, a 7% growth rate)
OECD iLibrary  * (selected data points) (just added, no growth figures)
  • 11,050 e-book titles
  • 5,130 multilingual summaries
  • 5,200 working papers
  • 5 billion data points across 42 databases
    Directory of Open Access Books
    • 5,602 books (up from 3,789 at the end of 2015, a 48% growth rate)
    • 167 publishers (up from 134 at the end of 2014, 33 publishers added, a 25% growth rate)

    3,000 repository milestone!!!
    • 3,285 repositories (up from 2,991 at the end of 2015, a 10% growth rate)
    Registry of Open Access Repositories

    •  4,365 repositories (up from 4,147 at the end of 2015, a 5% growth rate)
    Bielefeld Academic Search Engine 

    100 million document milestone!!!
    5,000 content providers milestone!!!
    • 103,090,961 documents (up from 84.25 million at the end of 2015, a 22% growth rate)
    • 5,023 content sources (up from 3,965 at the end of 2015, a 27% growth rate)

    4 million article milestone!!!
    •  4.1 million articles (up from 3.7 million at the end of 2015, an 11% growth rate)
    • 2,326 journals actively participating in PubMedCentral (up from 2,021 at the end of 2015, a 10% growth rate)
    • 1,720 journals with immediate free access (up from 1,553 at the end of 2015, an 11% growth rate)
    • 1,426 journals with all articles open access (up from 1,331 at the end of 2015, a 7% growth rate)
    • 569 journals with some articles open access (up from 423 at the end of 2015, a 35% growth rate)
      • 1,219,224 preprints (up from 1,105,906 at the end of 2015, a 10% growth rate)
      SocArXiv Preprints (launched December 7, 2016, inspired by arXiv)  **
      • 631 searchable preprints

      (in beta December 31, 2016, inspired by arXiv) ***
      • 7,500 articles (based on "all articles" search, 750 pages X 10 articles / page)

      2 million downloadable items milestone!!!
      • 2,021,534 downloadable items (up from 1,942,541 at the end of 2015, a 13% growth rate)
      • 803 total open access mandate policies (up from 762 at the end of 2015, a 5% growth rate)
      Internet Archive

      3 million milestones for video and audiorecordings!!!
      10 million milestone for texts (now 11 million)!!!
      • 11 million texts (up from 8.8 million at the end of 2015, a 26% growth rate

       * OECD iLibrary statement on free-to-read (from About page):
      All book and journal content is available to all users to read online by clicking the READ icon. Read editions are optimised for browser-enabled mobile devices and can be read online wherever there is an internet connection - desktop computer, tablets or smart phones. They are also shareable and embeddable.
      The site also features content for all users to access and download such as the OECD Factbook, OECD Working Papers, Indicators, and more.
      Subscribers benefit from full access to all content in all available formats.
      ** about SocArXiv (from the Dec. 7, 2016 launch announcement):
      SocArXiv, the open access, open source archive of social science, is officially launching in beta version today. Created in partnership with the Center for Open Science, SocArXiv provides a free, noncommercial service for rapid sharing of academic papers; it is built on the Open Science Framework, a platform for researchers to upload data and code as well as research results
      *** about bioRxiv (from about page):
      bioRxiv (pronounced "bio-archive") is a free online archive and distribution service for unpublished preprints in the life sciences. It is operated by Cold Spring Harbor Laboratory, a not-for-profit research and educational institution. By posting preprints on bioRxiv, authors are able to make their findings immediately available to the scientific community and receive feedback on draft manuscripts before they are submitted to journals.
      This post is part of the Dramatic Growth of Open Access series.

                  uBiome Reveals Details of Clinical Screening Test for Gut Health        

        uBiome, the leader in microbial genomics, has created an entirely new approach to support the clinical diagnosis of gut health conditions. The preprint of the publication in BioRxiv allows both citizen scientists and research scientists to have access the results of uBiome’s research.

        (PRWeb November 03, 2016)

        Read the full story at http://www.prweb.com/releases/2016/11/prweb13821806.htm

                  Biologists debate how to license preprints        

        Biologists debate how to license preprints
        Source: Nature

        Biology’s zeal for preprints — papers posted online before peer review — is opening up a thorny legal debate: should scientists license their manuscripts on open-access terms? Researchers have now shared more than 11,000 papers at the popular bioRxiv preprints site. But where some researchers allow their bioRxiv manuscripts to be freely redistributed and reused, others have chosen to lock them down with restrictive terms.

                  TelomereHunter: telomere content estimation and characterization from whole genome sequencing data        
        Lars Feuerbach, et al., TelomereHunter: telomere content estimation and characterization from whole genome sequencing data, bioRxiv, 2016 Here, we present TelomereHunter, a new computational tool for determining telomere content that is specifically designed for matched tumor and control pairs. In contrast to existing tools, TelomereHunter takes alignment information into account and reports the abundance of variant […]
                  Our preprint on brain-heart communication in athletes and sedentary young adults, available for peer review        
        Our recent research, revealing significant differences in how the brains of physically trained and sedentary young adults process information from the heart, is now available for commentary and formal peer review in two preprint repositories: SJS (@social_sjs) and bioRxiv (@biorxivpreprint). Each of these repositories comes with advantages and disadvantages. BioRxiv is already backed by a … Continue reading "Our preprint on brain-heart communication in athletes and sedentary young adults, available for peer review"
                  MERS-CoV spillover at the camel-human interface        
        Via bioRxiv: MERS-CoV spillover at the camel-human interface. The update: Middle East respiratory syndrome coronavirus (MERS-CoV) is a zoonotic virus originating in camels that has been causing significant mortality and morbidity in humans in the Arabian Peninsula. The epidemiology of...
                  Making Sour Beer        
        The science of yeast fermentation in sour beer production is sophisticated stuff explored in a new preprint. There are hundreds of strains of beer yeast and some are particularly well suited to the task.
                  ×¢×œ זיקה בין חשיפה לקרינה ותחלואה בחולדות ושאר יצורים.        

        רק לאחרונה (מאי 2016) התקשורת והשיח הציבורי גועשים אודות מחקר שמוכיח זיקה ישירה בין קרינה סלולרית (קונקרטית השתמשו בניסוי  ×‘שדה רדיו בתדר 900 M באפנון GSM ו- CDMA) לגידולי סרטן בראש ובלב בקרב חולדות.
        הביקורת המקצועית העניינית אודות ממצאי הניסוי הראשוניים, שמטילה ספק גדול בתקפות הממצאים, הובלעה והוסתרה, וחבל.  

        אפשר להרגע- עובדות מול רחשי חששות
        אך לפני הביקורת על המחקר עצמו כפי שמושמעת מאז שממצאים ראשוניים ממנו פורסמו, נקדים ונדווח: קרינת רדיו אלקטרומגנטית, ובתדרים סביב Hz 1G ×‘פרט (קרינה שמשמת כיום תקשורת דיגיטלית) ממלאת את האויר במרכזי האוכלוסייה האנושית מזה כ- 100 שנה. החל מראשית שנות ה- 90 של המאה ה-20 עד עכשיו, נערכו ונערכים מחקרים הבודקים אחר זיקה אפשרית בין חשיפה לקרינה סלולרית בפרט לבין סרטן בפרט ושאר נגעים. המחקרים עוסקים בבדיקת זיקה בין שימוש בסלולר לסרטן בקרב מיליוני משתמשים ברחבי העולם (כמו למשל מחקר על 790,000 נשים בבריטניה, אומדן חולי סרטן ראש באוסטרליה לאורך 30 שנה ו- 350,00 משתמשים בדנמרק[1]) ומחקרים רבים נוספים בכל האופנים והצורות, כולל על בעלי חיים ותאים. עד כה לא נמצאו ממצאים מדאיגים, או מגמה מדאיגה כלשהי, שזיקה כזאת מחמירה עם בריאות הציבור באשר הוא, פרט, בהסתייגות גדולה, לגידולים נדירים מאד בעצב השמע שבראש (כנראה כתוצאה מקרבת המכשיר לאוזן). החל מתחילת עידן הסלולר שהתפשט במהירות ברחבי העולם.
        אך הראיה המרכזית, שהגיע הזמן להעמידה במרכז הדיון היא, עד היום, אין שום ממצא עובדתי שמראה שחל שינוי שעשוי להדליק נורת אזהרה, במספר החולים בסרטן ראש.  ×‘ארה"ב מ- 1992 עד היום יש 6.4 חולים ל- 100,000 אנשים בכל הגילאים בשנה. יש הרבה נתונים חלקיים ו/או ממספר קטן של מקרים שמקטינים מאד את תקפות המסקנות לדעת אנשי מקצוע ל- "יתכן", כמו בישראל שמספר הנפטרים מ- 1970 עד  2006 עלה פי 4[2], ובשבדיה[3]בה נמצאה מגמת עליה מסוימת, כולל נטרול גורמים נוספים (אבחון לקוי של המחלה בעבר בלא רחוק), שתיתכן זיקה כלשהי לסוג נדיר של סרטן ראש, בין השאר גם לשימוש מוגבר בסלולר.
        רק לאחרונה, פרסם ארגון הבריאות העולמי, פרסם את ממצאיו על סמך בדיקה מקצועית ועניינית מפורטת, כשכל הדיווחים היו פרושים לפניו, על נתוני האמת של חולי הסרטן[4], בכפוף לאינסוף המחקרים שנעשו ונעשים, שהשפעה של קרינת הסלולר על סרטן באדם, "אפשרית", והיא באותו הסיכוי של השפעת קפה וירקות כבושים כגורמי סרטן באדם.

        במילים אחרות,  ×¢×œ סמך הממצאים של אינסוף מחקרים לאורך שני עשורים לפחות, לא נראה שתתגלה פתאום מגמה שלא זוהתה עד כה, של קשר סיבתי מובהק בין קרינה לסרטן, עד כדי פגיעה משמעותית בבריאות הציבור

        יש להימנע משאננות, אך מצד שני חשוב ויש להימנע מדמוניזציה ופניקה תקשורתית, שאין לה על מה לסמוך אלא על חששות לא מתוקפים רציונלית קיומיים במציאות המורכבת בה אנו חיים. בהחלט יש וניתן להימנע מחשיפה מוגברת לקרינת סלולר ומכשירים חשמליים ללא צורך, ודאי בקרב ילדים ובני נוער, שגופם רגיש ופגיע יותר, בכללי התנהגות והתנהלות פשוטים ונגישים וניתנים ליישום, ללא כל צורך לעשות את הבלתי ניתן ממילא- להימנע משימוש בתקשורת דיגיטלית בכלל וסלולר בפרט.

        על המחקר חשיפה לקרינה של חולדות
        המחקר החדש שביצעה מחלקת הבריאות ושירותי האנוש של ארצות הברית, שפורסם ב- 20/5, (ראו כאן)[5], מוצא זיקה מסוימת בין חשיפה לקרינה סלולרית לגידולי סרטן בחולדות.
        ממצאי המחקר המחקר הנדון מורים שחשיפה לקרינה סלולרית הגבירה את סיכוני חולדות זכרים בלבד, לפתח גידולים במח, וחשיפה בעצמה גדולה יותר גם  בלב.
        במחקר נחשפו חולדות לרמות שונות של קרינה המשמשת טלפונים סלולריים למשך 9 ש' ביום מלפני שהחולדות נולדו (החל מהאמהות שנשאו את העוברים ברחמן), למשך שנתיים.
        ·                   ×› -2 עד 3 אחוזים מחולדות זכרים פתחו גידול ממאיר במוח לעומת אפס בקבוצת ביקורת.
        ·         5%-7% גידולים ממאירים בלב לזכרים שנחשפו לקרינה בעצמה גדולה משמעותית לעוצמה הקיימת כיום במרחב הציבורי והפרטי.
        תקפות הממצאים היא גבולית, וישנם מומחים לא מעטים שמעלים ספקות, יש אף שמבטלים את ממצאי במחקר, כולל את התקפות הסטטיסטית שלו:
        ·         מספר החולדות שנחשפו קטן (המחקר מציין מספר מאות קטן של נבדקים, כלומר 1% הוא בעל-חי אחד-שניים.)
        ·         התוצאות השונות עבור זכרים – שחלו, לעומת הנקבות שלא, שאין לכך שום תמיכה מניסויים אחרים ולו על חולדות.
        ·          ×œ× נמצאו הבדלים לחשיפה לקרינה בגידולים באברים אחרים
        ·         0% חולים בקבוצת ביקורת גם זו תוצאה לא סבירה, שבמחקרים אחרים נמצאו  ×›- 2% של גידולים בקרב חולדות (זכרים ונקבות) ללא קשר לחשיפה לקרינה.
        ·         בנוסף הועלתה טענה שגם כך, עצמת החשיפה לקרינה גדולה מעצמת החשיפה של משתמש ממוצע בסלולר.

        הממצאים שפורסמו הם ראשוניים, והם חלק ממחקר רחב יותר שמתנהל זה מכבר, והתוצאות הסופיות יפורסמו בשנה הבאה. הניו-יורק טיימס, 27/5/2016, מדור בריאות[6] מדווח גם שלאור הביקורת המקצועית, המחקר שפורסם יבדק מחדש על ידי צוות מומחים אחר מזה שערך אותו.  

        לסיכום: לא כצעקתה. יש לאסוף עוד נתונים ומחקרים, אך לאחר למעלה מ- 20 שנות סלולר בציבור, נראה שניתן להנמיך את החרדות באופן ניכר, גם אם אין סיבה להיות שאננים לגמרי. וליתר ביטחון לא נהיה שאננים, נקטין את החשיפה לקרינה על ידי הרחקת המכשיר מהגוף באמצעים שונים, כולל הרחקה משעות חשיפה מיותרות ברכב, בחדרי שינה וסלון בבית וכד', מבלי לפגוע באורח החיים הדיגיטלי, שממילא לא יעלם מהנוף, ודאי לא בעתיד הנראה לעין. 

                  ÐžÐ¿ÑÑ‚а-убийцы оказались крупнейшими организмами в истории Земли        

        Международная группа ученых выяснила, что опята обладают специфичной ДНК, делающей их «серийными убийцами» деревьев. Эти гены способствуют тому, что грибы могут вырасти до гигантских размеров. Так, самым крупным организмом на Земле является опенок темный (Armillaria ostoyae). Этот гриб произрастает в лесном заповеднике Малур американского штата Орегон. Препринт исследования опубликован в репозитории bioRxiv.

        Грибница (мицелий) представляет собой подземную часть гриба и выглядит как сеть тонких разветвленных нитей. Известно, что мицелий Armillaria ostoyae в Орегоне образует не отдельные скопления, а единый организм массой в 600 тонн, и занимает площадь 8,4 квадратного километра. По оценкам ученых, возраст этого крупнейшего живого существа на планете составляет более двух тысяч лет.

        Другим крупным организмом является опенок толстоногий (Armillaria gallica), растущий в Мичигане. Он занимает площадь 0,37 квадратного километра.

        Чтобы понять, что делает Armillaria такими уникальными и успешными видами, биологи проанализировали геном различных опят, выяснив их происхождение и эволюционное развитие. Они сравнили ДНК таких видов, как A. ostoyae, A. cepistipes, A. gallica и A. solidipes, с генами других грибов. Оказалось, что для опят характерно значительное увеличение генома, что привело к появлению специфичных генов, связанных с патогенностью грибов, а также способствующих формированию ризоморфов.

        Ризоморфы — сплетения грибных нитей, которые внедряются в деревья и транспортируют из них воду и питательные вещества к плодовым телам. Они также распространяются от больных растений, способствуя заражению здоровых. Это паразитирование приводит к массовой гибели деревьев.

                  BONUS: The Evolutionary Psychology of Mate Preference. 18 Jul 2017        

        Earlier this year I posted a bonus episode featuring contributions from students in my undergraduate seminar here at Basel University. It proved to be one of the more popular episodes of the podcast. This semester I taught a masters level class on the evolutionary psychology of mate preference and, again, gave the students the task of summarising the research papers they found most interesting for a special bonus episode. As before, most of the students are not native English speakers, nor have they recorded audio before. I am super grateful they agreed to be a part of the podcast (especially after I freaked them out by telling them how many people listened to the previous bonus episode!).

        Download the MP3

        Rate me!
        Rate, review, or listen in iTunes or in Stitcher.

        Mittlere Rheinbruecke, Basel. Mariano Mantel/Flickr

        The articles covered in the show (in order of appearance):

        Singh, D. & Luis, S. (1995). Ethnic and gender consensus for the effect of waist-to-hip ratio on judgment of women's attractiveness. Human Nature, 6(1), 51-65. Read summary

        Olderbak, S. G., Malter, F., Wolf, P. S. A., Jones, D. N., & Figueredo, A. J. (2017). Predicting romantic interest at zero acquaintance: Evidence of sex differences in trait perception but not in predictors of interest. European Journal of Personality, 31(1), 42-62. Read summary

        Ha, T., van den Berg, J. E. M., Engels, R. C. M. E., & Lichtwarck-Aschoff, A. (2012). Effects of attractiveness and status in dating desire in homosexual and heterosexual men and women. Archives of Sexual Behavior, 41(3), 673-682. Read summary

        Dixson, B. J., Vasey, P. L., Sagata, K., Sibanda, N., Linklater, W. L., & Dixson, A. F. (2011). Men’s preferences for women’s breast morphology in New Zealand, Samoa, and Papua New Guinea. Archives of Sexual Behavior, 40(6), 1271-1279. Read summary

        Sherlock, J. M., Sidari, M. J., Harris, E. A., Barlow, F. K., & Zietsch, B. P. (2016). Testing the mate-choice hypothesis of the female orgasm: Disentangling traits and behaviours. Socioaffective Neuroscience & Psychology, 6(1), 31562. Read summary

        Wlodarski, R., & Dunbar, R. I. M. (2013). Menstrual cycle effects on attitudes to kissing. Human Nature, 24(4), 402-413. Read summary

        Tracy, J. L., & Beall, A. T. (2014). The impact of weather on women’s tendency to wear red or pink when at high risk for conception. PLoS One, 9(2), e88852. Read summary

        Krems, J. A., Neel, R., Neuberg, S. L., Puts, D. A., & Kenrick, D. T. (2016). Women selectively guard their (desirable) mates from ovulating women. Journal of Personality and Social Psychology, 110(4), 551-573. Read summary

        Perrett, D. I., Lee, K. J., Penton-Voak, I. S., Rowland, D. R., Yoshikawa, S., Burt, D. M., et al. (1998). Effects of sexual dimorphism on facial attractiveness. Nature, 394, 884-887. Read summary

        Dixson, B. J., & Brooks, R. C. (2013). The role of facial hair in women's perceptions of men's attractiveness, health, masculinity and parenting abilities. Evolution and Human Behavior, 34(3), 236-241. Read summary

        Lefevre, C. E., & Perrett, D. I. (2015). Fruit over sunbed: Carotenoid skin coloration is found more attractive than melanin coloration. The Quarterly Journal of Experimental Psychology, 68(2), 284-293. Read summary

        Kaufman, S. B., Kozbelt, A., Silvia, P., Kaufman, J. C., Ramesh, S., & Feist, G. J. (2016). Who finds Bill Gates sexy? Creative mate preferences as a function of cognitive ability, personality, and creative achievement. The Journal of Creative Behavior, 50(4), 294-307. Read summary

        Apostelou, M., Kasapi, K., & Arakliti, A. (2015). Will they do as we wish? An investigation of the effectiveness of parental manipulation of mating behavior. Evolutionary Psychological Science, 1(1), 28-36. Read summary

        DeBruine, L. M., Jones, B. C., & Little, A. C. (2017). Positive sexual imprinting for human eye color. bioRxiv, 135244. Read summary

                  Pride and Prejudice and journal citation distributions: final, peer reviewed version        
        Today sees the publication on bioRxiv of a revised version of our preprint outlining “A simple proposal for the publication of journal citation distributions.” Our proposal, explained in more detail in this earlier post, encourages publishers to mitigate the distorting effects … Continue reading
                  Why Elephants Don’t Get Cancer        

        Being an elephant is risky business. I'm not talking about poaching, habitat loss, or fighting with males in musth—I'm talking about the simple fact of living. Every time an elephant cell divides, it runs the risk of going haywire and developing into an out-of-control tumor. Since elephants have 100 times the number of cells that human beings do, they should have 100 times the risk of getting cancer. That's a lot of mistakes waiting to happen.

        In reality, given their size and prodigious lifespans, elephants have one of the lowest cancer mortality rates in the animal kingdom: 4.8 percent, compared to a range of 11 to 25 percent for humans. How can this be?

        Scientists at the Huntsman Cancer Institute, University of Utah School of Medicine, and Primary Children’s Hospital helped figure out the answer, published Thursday in the Journal of the American Medical Association. Another team, made up of University of Chicago researchers and their colleagues, posted a related paper this week. As it turns out, elephants have developed some ingenious safeguards against developing cancer. Understanding their cellular protections might help us learn more about how to suppress cancer in humans.

        There are countless ways that cell division can go wrong. That’s why—as we learned from the winners of this week’s Chemistry Nobel Prize—your cells come equipped with a host of repair enzymes whose sole purpose is to prevent or repair genetic mistakes. These cellular copy editors proofread each strand of newly divided DNA, identifying errors and repairing the faulty bits to ensure that your DNA stays fresh and clean and functional. In humans, just one of those enzymes can fix a thousand different kinds of errors. Not too shabby!  

        But elephants have one-upped us. For the JAMA study, researchers first compared cancer rates across the animal kingdom to find out that elephants were remarkably cancer-free given their size. (Other animals fared well, too. For comparison, rock hyraxes have a 1 percent cancer mortality rate, African wild dogs have an 8 percent rate, and lions have a 2 percent rate.) Then, they scoured the elephant genome to find out why.

        The answer resided in a key tumor-suppressing protein called p53, known as the "guardian of the genome." Compared to humans, elephants had far more genes for this protein: 38 versions versus just two. The result was a superior genetic safety net for correcting errors and ensuring that damaged, tumor-prone cells get nipped in the bud. "The enormous mass, extended life-span, and reproductive advantage of older elephants would have selected for an efficient and fail-safe method for cancer suppression," the authors write.

        To see how the genes suppressed tumors, researchers teamed up with Utah’s Hogle Zoo and Ringling Bros. Center for Elephant Conservation to isolate elephant cells and subject them to cancer triggers. (No elephants were harmed; this was all during routine wellness checks.) When they compared elephant cells to human cells, they found something amazing: The damaged cells in elephants were far more likely to resort to cell suicide—known as apoptosis—to avoid propagating errors in their descendants. It was a brutally efficient, even ruthless, system for protecting the organism at all costs.

        To behold an elephant in the wild is to be humbled before majesty. Yet perhaps it isn't just their tremendous size that contributes to this sense of smallness. It's also all the things we can't see: from their advanced memories, to their long lifespans, to their individual cells, so altruistic that they are willing to die for the benefit of the many. From these cancer-resistant Methuselahs, we humans have much to learn.

                  The Big Ideas in Cognitive Neuroscience, Explained        

        Are emergent properties really for losers? Why are architectures important? What are “mirror neuron ensembles” anyway? My last post presented an idiosyncratic distillation of the Big Ideas in Cognitive Neuroscience symposium, presented by six speakers at the 2017 CNS meeting. Here I’ll briefly explain what I meant in the bullet points. In some cases I didn't quite understand what the speaker meant so I used outside sources. At the end is a bonus reading list.

        The first two speakers made an especially fun pair on the topic of memory: they held opposing views on the “engram”, the physical manifestation of a memory in the brain.1 They also disagreed on most everything else.

        1. Charles Randy Gallistel (Rutgers University) – What Memory Must Look Like

        Gallistel is convinced that Most Neuroscientists Are Wrong About the Brain. This subtly bizarre essay in Nautilus (which was widely scorned on Twitter) succinctly summarized the major points of his talk. You and I may think the brain-as-computer metaphor has outlived its usefulness, but Gallistel says that “Computation in the brain must resemble computation in a computer.” 

        Shannon information is a set of possible messages encoded as bit patterns and sent over a noisy channel to a recipient that will hopefully decode the message with minimal error. In this purely mathematical theory, the semantic content (meaning) of a message is irrelevant. The brain stores numbers and that's that.

        • Memories (“engrams”) are not stored at synapses.
        Instead, engrams reside in molecules inside cells. The brain “encodes information into molecules inside neurons and reads out that information for use in computational operations.” A 2014 paper on conditioned responses in cerebellar Purkinje cells was instrumental in overturning synaptic plasticity (strengthening or weakening of synaptic connections) as the central mechanism for learning and memory, according to Gallistel.2 Most other scientists do not share this view.3

        • The engram is inter-spike interval.
        Spike train solutions based on rate coding are wrong. Meaning, the code is not conveyed by the firing rate of neurons. Instead, numbers are conveyed to engrams via a combinatorial interspike interval code. Engrams then reside in cell-intrinsic molecular structures. In the end, memory must look like the DNA code.

        • Emergent properties are for losers.
        “Emergent property” is a code word for “we don't know.”

        2. Tomás Ryan (@TJRyan_77) – Information Storage in Memory Engrams

        Ryan began by acknowledging that he had tremendous respect for Gallistal's speech — which was in turn powerful, illuminating, very categorical, polarizing, and rigid. But wrong. Oh so very wrong. Memory is not essentially molecular, we should not approach memory and the brain from a design perspective, and information storage need not mimic a computer.

        • The brain does not use Shannon information.
        More precisely, “the kind of information the brain uses may be very different from Shannon information.” Why is that? Brains evolved, in kludgy ways that don't resemble a computer. The information used by the brain may be encoded without having to reduce it to Shannon form, and may not be quantifiable as units.

        • Memories (“engrams”) are not stored at synapses.
        Memory is not stored by changes in synaptic weights, Ryan and Gallistel agree on this. The dominant view has been falsified by a number of studies — including one by Ryan and colleagues that used engram labeling. Specific “engram cells” can be labeled during learning using optogenetic techniques, and later stimulated to induce the recall of specific memories. These memories can be reactivated even after protein synthesis inhibitors have (1) induced amnesia, and (2) prevented the usual memory consolidation-related changes in synaptic strength.

        • We learn entirely through spike trains.
        Spike trains are necessary but not sufficient to explain how information is coded in the brain. On the other hand, instincts are transmitted genetically and are not learned via spike trains.

        • The engram is an emergent property.
        And fitting with all of the above, “the engram is an emergent property mediated through synaptic connections” (not through synaptic weights). Stable connectivity is what stores information, not molecules.

        Angela Friederici (Max Planck Institute for Human Cognitive and Brain Sciences) – Structure and Dynamics of the Language Network

        Following on the heels of the rodent engram crowd, Friederici pointed out the obvious limitations of studying language as a human trait.

        • Language is genetically predetermined.
        The human ability to acquire language is based on a genetically predetermined structural neural network. Although the degree of innateness has been disputed, a bias or propensity of brain development towards particular modes of information processing is less controversial. According to Friederici, language capacity is rooted in “merge”, a specific computation that binds words together to form phrases and sentences.

        • The “merge” computation is localized in BA 44.
        This wasn't one of my original bullet points, but I found this statement rather surprising and unbelievable. It implies that our capacity for language is located in the anterior ventral portion of Brodmann's area 44 in the left hemisphere (the tiny red area in the PHRASE > LIST panel below).

        The problem is that acute stroke patients with dysfunctional tissue in left BA 44 do not have impaired syntax. Instead, they have difficulty with phonological short-term memory (keeping strings of digits in mind, like remembering a phone number).

        • There is something called mirror neural ensembles.
          I'll just have to leave this slide here, since I really didn't understand it, even on the second viewing.

          “This is a poor hypothesis,” she said.

          Jean-Rémi King (@jrking0) – Parsing Human Minds

          King's expertise is in visual processing (not language), but his talk drew parallels between vision and speech comprehension. A key goal in both domains is to identify the algorithm (sequence of operations) that translates input into meaning.

          • Recursion is big. 
          Despite these commonalities, the structure of language presents the unique challenge of nesting (or recursion): each constituent in a sentence can be made of subconstituents of the same nature, which can result in ambiguity.

          • Architectures are important. 
          Decoding aspects of a sensory stimulus using MEG and machine learning is lovely, but it doesn't tell you the algorithm. What is the computational architecture? Is it sustained or feedforward or recurrent?

            Each architecture could be compatible with a pattern of brain activity at different time points. But do the classifiers at different time points generalize to other time points? This can be determined by a temporal generalization analysis, which “reveals a repertoire of canonical brain dynamics.”

            Danielle Bassett (@DaniSBassett– A Network Neuroscience of Human Learning: Potential to Inform Quantitative Theories of Brain and Behavior

            Bassett previewed an arc of exciting ideas where we've shown progress, followed by frustrations and failures, which may ultimately provide an opening for the really Big Ideas. Her focus is on learning from a network perspective, which means patterns of connectivity in the whole brain. What is the underlying network architecture that facilitates the spatial distributed effects?

            What is the relationship between these two notions of modularity?
            [I ask this as an honest question.]

            Major challenges remain, of course.

            • Build a bridge from networks to models of behavior.
            Incorporate well-specified behavioral models such as reinforcement learning and the drift diffusion model of decision making. These models are fit to the data to derive parameters such as the alpha parameter from reinforcement learning rate. Models of behavior can help generate hypotheses about how the system actually works.

            • Use generative models to construct theories. 
            Network models are extremely useful, but they're not theories. They're descriptors. They don't generate new frameworks for understanding what the data should look like. Theory-building is obviously critical for moving forward.

            John Krakauer (@blamlab– Big Ideas in Cognitive Neuroscience: Action

            Krakauer mentioned the Big Questions in Neuroscience symposium at the 2016 SFN meeting, which motivated the CNS symposium as well as a splashy critical paper in Neuron. He raised an interesting point about how the term “connectivity” has different meanings, i.e. the type of embedded connectivity that stores information (engrams) vs. the type of correlational connectivity when modules combine with each other to produce behavior. [BTW, is everyone here using “modules” in the same way?]

            • Machine learning will save us. 
            Krakauer discussed work on motor learning using adaptation paradigms and simple execution tasks. But there's a dirty secret: there is no computational model, no algorithmic theory of how practice makes you better on those tasks. Can the computational view get an upgrade from machine learning? Go out and read the manifesto by Marblestone, Wayne, and Kording: Toward an Integration of Deep Learning and Neuroscience. And you better learn about cost functions, because they're very important.4

            • Go back to behavioral neuroscience.
            This is the only way to work out the right cost functions. Bottom line: Networks represent weighting modules into the cost function.4 

            OVERALL, there was an emphasis on computational approaches with nods to the three levels of David Marr:

            computation â€“ algorithm – implementation

            We know from from Krakauer et al. 2017 (and from CNS meetings past and present) that co-organizer David Poeppel is a big fan of Marr. The end goal of a Marr-ian research program is to find explanations, to reach an understanding of brain-behavior relations. This requires a detailed specification of the computational problem (i.e., behavior) to uncover the algorithms. The correlational approach of cognitive neuroscience — and even the causal-mechanistic circuit manipulations of optogenetic neuroscience — just don't cut it anymore.


            1 Although neither speaker explicitly defined the term, it is most definitely not the engram as envisioned by Scientology: “a detailed mental image or memory of a traumatic event from the past that occurred when an individual was partially or fully unconscious.” The term was first coined by Richard Semon in 1904.

            2 This paper (by Johansson et al, 2014) appeared in PNAS, and Gallistel was the prearranged editor.

            3 For instance, here's Mu-ming Poo: “There is now general consensus that persistent modification of the synaptic strength via LTP and LTD of pre-existing connections represents a primary mechanism for the formation of memory engrams.”

            4 If you don't understand all this, you're not alone. From Machine Learning: the Basics.
            This idea of minimizing some function (in this case, the sum of squared residuals) is a building block of supervised learning algorithms, and in the field of machine learning this function - whatever it may be for the algorithm in question - is referred to as the cost function. 

            Reading List

            Everyone is Wrong

            Here's Why Most Neuroscientists Are Wrong About the Brain. Gallistel in Nautilus, Oct. 2015.

            Time to rethink the neural mechanisms of learning and memory. Gallistel CR, Balsam PD. Neurobiol Learn Mem. 2014 Feb;108:136-44.

            Engrams are Cool

            What is memory? The present state of the engram. Poo MM, Pignatelli M, Ryan TJ, Tonegawa S, Bonhoeffer T, Martin KC, Rudenko A, Tsai LH, Tsien RW, Fishell G, Mullins C, Gonçalves JT, Shtrahman M, Johnston ST,  Gage FH, Dan Y, Long J, Buzsáki G, Stevens C. BMC Biol. 2016 May 19;14:40.

            Engram cells retain memory under retrograde amnesia. Ryan TJ, Roy DS, Pignatelli M, Arons A, Tonegawa S. Science. 2015 May 29;348(6238):1007-13.

            Engrams are Overrated

            For good measure, some contrarian thoughts floating around Twitter...

            “Can We Localize Merge in the Brain? Yes We Can”

            Merge in the Human Brain: A Sub-Region Based Functional Investigation in the Left Pars Opercularis. Zaccarella E, Friederici AD. Front Psychol. 2015 Nov 27;6:1818.

            The neurobiological nature of syntactic hierarchies. Zaccarella E, Friederici AD. Neurosci Biobehav Rev. 2016 Jul 29. doi: 10.1016/j.neubiorev.2016.07.038.


            Asyntactic comprehension, working memory, and acute ischemia in Broca's area versus angular gyrus. Newhart M, Trupe LA, Gomez Y, Cloutman L, Molitoris JJ, Davis C, Leigh R, Gottesman RF, Race D, Hillis AE.  Cortex. 2012 Nov-Dec;48(10):1288-97.

            Patients with acute strokes in left BA 44 (part of Broca's area) do not have impaired syntax.

            Dynamics of Mental Representations

            Characterizing the dynamics of mental representations: the temporal generalization method. King JR, Dehaene S. Trends Cogn Sci. 2014 Apr;18(4):203-10.

            King JR, Pescetelli N, Dehaene S. Brain Mechanisms Underlying the Brief Maintenance of Seen and Unseen Sensory InformationNeuron. 2016; 92(5):1122-1134.

            A Spate of New Network Articles by Bassett

            A Network Neuroscience of Human Learning: Potential to Inform Quantitative Theories of Brain and Behavior. Bassett DS, Mattar MG. Trends Cogn Sci. 2017 Apr;21(4):250-264.

            This one is most relevant to Dr. Bassett's talk, as it is the title of her talk.

            Network neuroscience. Bassett DS, Sporns O. Nat Neurosci. 2017 Feb 23;20(3):353-364.

            Emerging Frontiers of Neuroengineering: A Network Science of Brain Connectivity. Bassett DS, Khambhati AN, Grafton ST. Annu Rev Biomed Eng. 2017 Mar 27. doi: 10.1146/annurev-bioeng-071516-044511.

            Modelling And Interpreting Network Dynamics [bioRxiv preprint]. Ankit N Khambhati, Ann E Sizemore, Richard F Betzel, Danielle S Bassett. doi: https://doi.org/10.1101/124016

            Behavior is Underrated

            Neuroscience Needs Behavior: Correcting a Reductionist Bias. Krakauer JW, Ghazanfar AA, Gomez-Marin A, MacIver MA, Poeppel D. Neuron. 2017 Feb 8;93(3):480-490.

            The first author was a presenter and the last author an organizer of the symposium.

            Thanks to @jakublimanowski for the tip on Goldstein (1999).