Machine Learning Classifications for Botanical Collections

June 2019 to August 2019

The Field Museum in Chicago, IL

Summary

In this 10 week internship, my main project was implementing a convolutional neural network to distinguish between herbarium images of two morphologically similar genera of ferns: Lycopodium and Selaginella. I worked under Dr. Matt von Konrat, the Head of Botanical Collections, who specializes in the study of bryophytes (early land plants). Prior to this internship, I had no experience with machine learning and had to pick up a lot of it on the spot during the project. This project is based off of a Smithsonian paper that created a model very similar to this and obtained over 95% accuracy.

The first steps were to attempt to recreate this model. However, this isn't as easy as it sounds for a few reasons. First, the Smithsonian model was created in Mathematica, which we did not have the resources for. I used Keras and Tensorflow in Python and the features in the Mathematica models didn't always exist directly in Keras. For example, in Mathematica, regularization is applied directly to the whole model, while with Keras, we had to specify one of three types of regularizers and choose which layers to apply them to. Additionally, the images we used were different since we used images from the Field Museum online herbarium. During my time, I didn't have access to a particularly robust computer and since my machine was i5 and 8GB RAM, that limited the quality and number of images I could load into the model at once.

Additionally, a topic that Dr. Matt von Konrat is researching is a new species of frullania, a microplant whose details can only be seen with a microscope. He has discovered a new species that was classified with another species in the past; as further evidence that these are different, we want to apply the machine learning model into images of these two microplant species (frullania coastal and frullania rostrata). If the machine can successfully learn and pick up the differences between the two, that is further evidence for his case of speciation.

At the end of my 10 weeks, we were able to reach an average of around 88% accuracy for the Smithsonian "replication" and 70% accuracy for the frullania. The project is currently being continued by Beth McDonald, a computer science Master's student at Northeastern Illinois University, who is mentored by computer science professor Dr. Francisco Iacobelli.

Main Takeaways

From this internship, I first learned a lot of the technicalities of machine learning and implementing my own model. This exposure made me curious, however, in discovering how to know what layers in what order with what parameters will make a successful model? There must be a better way than trial/error and I'd love to explore that! I also learned that machine learning is essentially large scale math, and this project made me curious about that aspect a little more. Additionally, I learned a lot about myself. I learned about my work habits and how I learn on my own. I didn't have much guidance from an expert and had to utilize various websites and blog posts, adapting the information to my own project. Lastly, this experience solidified for me that I would love a role in a cross disciplinary field, applying computer science into a new industry or field! It's really exciting to see applications of something I love benefit others and it motivates me knowing the greater impact my work is having.

Additional Links:

Photo Gallery

Where I got to work every day!

Group lunch out to Chinatown!

I don't drink coffee, but I love passing by Hero Coffee

Kamila (L) and Suey Yee (R) are new friends from Malaysia!

Part of our crew at the end of the summer!