Summer URA Position

If you’re an undergraduate student at the University of Guelph interested in research, you might want to check out this opportunity. Dr. Jarrett Phillips and I are looking for a student to work with us this summer to explore eDNA and data science!

Job Title: Mining Association Rules for Environmental DNA (eDNA) Spatiotemporal Sampling

Job Posting Number: 121872 via Experience Guelph

Applications are due by: February 29th, 2024

Description of the Project:

eDNA sampling comprises a non-invasive technique to inventory biodiversity at ecological sites of interest through the collection of water, sediment, or soil samples. Often, invasive species or species at risk, such as brook trout, are the primary target. eDNA is typically collected using sophisticated commercial backpacks fitted with a vacuum, hose, and filter, thereby bypassing the need to physically observe and retrieve individual organisms for biometric measurement through invasive methods like electrofishing.

However, the success of the eDNA approach is sensitive to numerous physicochemical factors such as water and air temperatures, water pH, dissolved oxygen levels, and water conductivity to name a few. Thus, it is difficult to know which environmental variables contribute most to high eDNA concentrations in a given area, and also which variables should be collected alongside eDNA samples to maximize probabilities of species occupancy and species detection.

One novel way to address these uncertainties is with the use of association rule mining, which is an unsupervised machine learning task to discover interesting correlations among discrete variables within large datasets. Specifically, association rules take the form of conditional (“if-then”) statements. Association rules were first applied in the context of market basket analysis in supermarket settings to assess buyer preferences (Agrawal et al., 1993; DOI: 10.1145/170036.170072). Based on buyer behaviour, supermarkets could use information gleaned from association rules to optimally stock shelves, or to design promotional materials. Since then, association rules have seen myriad applications, including the assessment of university course enrollments, and the study of gene expression patterns from cancer microarrays. In addition to determining which environmental variables should be measured in tandem with retrieval of environmental samples, this work could shed light on when (time of year) to sample and where (sampling sites) to sample.

The selected applicant will have taken and successfully passed CIS*1910 (Discrete Structures in Computing I) or MATH*2000 (Proof, Sets, and Numbers), as well as possess a solid understanding of deductive logic, particularly logical implication, on which association rules are based, as well as set theory, among other topics. In addition, strong performance in CIS*4020 (Data Science) or similar courses will be beneficial. In the present role, the ‘arules’ R package will be utilized to develop plausible association rules for key eDNA sampling datasets, such as that based on Nolan et al. (2023) (DOI: 10.1007/s13412-022-00800-x). As such, the incumbent should have previous experience and strong interest in the use of R as it applies to the data science workflow. Previous experience with statistics and machine learning, particularly probability and correlation analysis, is desirable and will be considered a strong asset. However, past work with eDNA is not required. Subject to strong performance from the student, if time permits and there is interest, a supervised associative classifier for targeted species detection could be developed using the ‘arulesCBA’ R package to make predictions at yet unsampled sites for species of concern.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.