ML+X Seminar with Andrew Ellington

3:00 PM – 4:00 PM CT

Machine Learning Approaches to Protein Engineering

Friday, March 26, 2021

ApBio: Teaching Machines Biochemistry

Speaker: Andrew Ellington,
Professor of Molecular Bioscience, College of Natural Sciences, UT Austin

Friday, March 26, 2021
3:00 PM – 4:00 PM CT
Register at https://utexas.qualtrics.com/jfe/form/SV_0IndjUuN8AFKlYa

Abstract

Proteins are often referred to as the workhorse of biology because of their ubiquity and functional diversity. They carry out a myriad of functions ranging from digesting food to providing immunity against harmful viruses. On their most basic level, proteins are unique sequences of amino acids which, under the proper conditions, fold into three dimensional structures capable of performing a specific task. In the Ellington lab, we have the ability to modify a protein's amino acid sequence to generate novel proteins with engineered properties and functions. With this technology, we are able to engineer proteins which can address society and industry’s most pressing problems. To better engineer proteins, we have taken a data-driven approach and built the first convolutional neural network (CNN) model experimentally verified to predict gain-of-function mutations. Our CNN model, which has been made publicly available (https://www.mutcompute.com), is currently in use for our protein engineering efforts and has most notably helped us design a protein capable of fully decomposing one of the most common items in a landfill: a plastic bottle. Currently, we are exploring the impact of different datasets and neural network architectures with the intent of developing models which can optimize protein phenotypes such as thermal stability, substrate specificity, and protein-protein interaction. To facilitate this exploration, we have built ApBio: a data engineering library tailored to the interface between protein crystal structure data and machine learning libraries. With ApBio, we lower the barrier to entry for machine learning engineers to work with protein structural data and provide a platform for rapid CNN model exploration. Ultimately, we wish to utilize data to close the gap between protein structure and protein function.

Speaker Bio

Dr. Andrew Ellington received his B.S. in Biochemistry from Michigan State University in 1981, and his Ph.D. in Biochemistry and Molecular Biology from Harvard in 1988. His post-doctoral work was with Dr. Jack Szostak at Massachusetts General Hospital, where he developed methods for the in vitro selection of functional nucleic acids and coined the term 'aptamer.' He has previously received the Office of Naval Research Young Investigator, Cottrell, and Pew Scholar awards.
Dr. Ellington's holds the Nancy Lee and Perry R. Bass Regents Chair in Molecular Biology and Wilson M. and Kathryn Fraser Research Professorship in Biochemistry. His lab works centers on the development of nucleic acid circuitry for point-of-care diagnostics, on accelerating the evolution of proteins and cells through the introduction of novel chemistries and using orthogonal control systems to engineer complex organisms.