Ph.D. Defence – Jarrett Phillips

Join us on May 5, 2022, at 9:00 am for the defence of Ph.D. candidate, Jarrett Phillips. Jarrett will present a novel algorithm called HACSim that is used to estimate the sample size necessary for capturing haplotype variation. This work provides researchers with a new tool to better understand variation within species – with applications to food fraud, sustainable management, and conservation efforts (to name a few).

During the course of his degree, Jarrett has presented at several conferences and published numerous articles. He has also published an R package via CRAN that allows researchers to use the HACSim algorithm for their work. You can find the R package here. You can also explore the algorithm through an R Shiny app here. A list of some of his papers is included below.

  • Phillips JD, Gillis DJ, Hanner RH. 2022. Lack of statistical rigor in DNA barcoding likely invalidates the presence of a true species’ barcode gap. Frontiers in Ecology and Evolution 10.
  • D’Ercole J, Dincă V, Opler PA, Kondla N, Schmidt C, Phillips JD, Robbins R, Burns JM, Miller SE, Grishin N, Zakharov EV, DeWaard JR, Ratnasingham S, Hebert PDN. 2021. A DNA barcode library for the butterflies of North America. PeerJ 9:e11157
  • Phillips JD, French SH, Hanner RH, Gillis DJ. 2020. HACSim: an R package to estimate intraspecific sample sizes for genetic diversity assessment using haplotype accumulation curves. PeerJ Computer Science 6:e243
  • Phillips JD, Gillis DJ, Hanner RH. 2018. Incomplete estimates of genetic diversity within species: Implications for DNA barcoding. Ecology and Evolution 9: 2996-3010

The title, abstract, and examination committee structure for his Ph.D. defence are included below. If you are interested in attending, please reach out.

Title: A Novel Statistical Framework for Assessment of Intraspecific Haplotype Sampling Completeness

Examination Committee: Drs. Joe Sawada (Chair), Dan Gillis (School of Computer Science), Bob Hanner (Integrative Biology), Andrew Hamilton-Wright (School of Computer Science), and Karen Kopciuk (University of Calgary)


The problem of determining adequate sample sizes necessary for studies of biodiversity conservation and management is a challenging one that has received some attention in recent years. One particular area where the probing of sampling completeness is of utmost priority is DNA barcoding. Species show remarkable genomic marker variation within and among taxa, along with differing evolutionary and life histories. Thus, knowing how many specimens of a given species likely need to be collected to observe the majority of standing COI haplotype diversity present within animal species is a complex question to answer. Estimates of specimen sample sizes for DNA barcoding range from a single individual to hundreds of individuals per species (but typically around 5-10 individuals). However, due to obstacles surrounding project funding and species rarity, often just one or two specimens per species can be reasonably collected. In addition, numerous other factors, especially sequence quality and integrity, hinder the accurate and reliable estimation of specimen sample sizes from existing species-level sequence data found in large DNA repositories.

Here, a deep examination of the genetic specimen sample size problem (GSSSP) is undertaken. Specifically, a novel nonparametric stochastic local search optimization algorithm based on trends in species haplotype accumulation curves, herein called HACSim (Haplotype Accumulation Curve Simulator) is introduced. The method, available as an R package, is tested on a variety of both hypothetical and real animal species mined from the Barcode of Life Data Systems (BOLD). Through a detailed statistical simulation study, the approach is demonstrated to work well across all examined scenarios. As HACSim makes numerous simplifying assumptions that are unlikely to hold well in practice, such as panmixia (random mating), future work in incorporating elements of population structure is imperative.

In addition, it is argued that DNA barcoding currently lacks in statistical rigor needed to robustly estimate the DNA barcode gap, an important quantity expressing the difference between intraspecific and interspecific genetic variation. A number of accessible statistical solutions revolving around sample sizes needed for gap assessment, as well as visualization and inference are offered in this regard.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.