Austin Clyde

Previously: Assistant Computational Scientist
Data Science and Learning Division
Argonne National Labratory

Email:  aclyde@anl.gov

Expertise: AI for science, labelarge language models, social impact of AI & science and technology studies (STS) »

 

Austin Clyde is an Assistant Computational Scientist in the Data Science and Learning Division of Argonne National Laboratory. He holds a Ph.D. in computer science from the University of Chicago, where he continues to lecture in the Pozen Family Center for Human Rights, teaching courses on international human rights law and artificial intelligence. He is an expert in applying and developing artificial intelligence techniques to scientific problems, particularly those in high-performance computing, drug design, and large-language models. His current research explores interpretable AI for science systems and the role of interpretation in algorithmic decision systems. The Association for Computing Machinery (ACM) has awarded his work twice with Gordan Bell Special Prize in 2020 and 2022. His involvement with the National Virtual Biotechnology Laboratory's COVID-19 Response received a Department of Energy Secretary's Honor Award. Before his current appointment, he was a visiting research fellow at the Harvard Kennedy School's Program on Science, Technology, and Society.

His primary research interests include AI and HPC for science, particularly the application of LLMs and surrogate models for drug discovery and biological deisgn, and science and technology studies (STS) , particularly understanding the relationship between rights, democracy, and AI.
I am currently on the academic job market! Feel free to check out my applicaiton materials below:

Biography » CV » Research statement » Teaching statement » DEI statement »


Research>

    My research interest is developing interpretable AI for science systems and analyzing the role of interpretation and power in algorithmic decision systems. My research is both quantitative and qualitative, working with areas of AI, drug design, synthetic biology, and science and technology studies. I am interested in how computer scientists can develop more flexible AI and HPC systems for doing science. For example, in my dissertation work on AI for drug discovery, I argue that current AI approaches in structure-based drug design fail to align with scientific theories and be explainable in their terms. This is a problem because it leaves much drug discovery opportunity on the table as interdisciplinary collaboration is key. In response to this observation, I developed novel workflows which overcome the scaling limitations that current approaches will have and are interpretable as embedded and embodying current heuristics and theoretical concepts in medicinal chemistry. In a sense, this scientific work is an exercise in translation between disciplines.

    Witnessing how practicing interpretive flexibility required a deep understanding of theoretical practices in AI and drug design, my research aims to take this on as method. I am interested in how normative and social analytics such as explainability, calibration, or scrutability intersect with other sociotechnical systems such as law, democratic theory, and human rights. Considering this, I aim to develop novel techniques which advance our understanding of explainability through sample-based explainability, extend capabilities of calibrating models to data, and transform the underlying glue between science and AI research’s goals—large language models and surrogate or edge models.

    Explainability

    Explainability is essential to the current regulatory landscape of AI, human-computer interaction, and accelerating scientific discovery with AI and HPC; however, recent explainability research focuses on models as decision makers—i.e., explainability is theorized as the relationship between the features of samples—how can models explain themselves when inferring? While these methods help ascertain the model’s ‘mental schema,’ there is another tone of explainability that asks what explains the model acting this way instead of another, raising questions about the data science process itself. Sample-based explainability understands data science as a causal process that captures data and quantifies it in some way, selects specific data versus others to use in training, and uses an algorithm to construct a computational model which can produce inferences. In this research area, I ask how intervention in the data science process and the choices made among alternatives lead to a particular model with particular behaviors rather than another. Besides its impact on regulation, discussed later in the democratic content, this method of explainability can provide quantitative answers to essential questions in active learning and scientific practice, such as: what is the benefit of adding more samples with specific characteristics, what is the impact of experimental noise in the data on specific predictions, how do certain data points provide evidence for particular predictions, and how could new instrumentation or experiments improve model performance? This can be seen in the context of autonomous discovery initiatives, for example, where active learning and explainability may be interrelated. This work will focus on my work in cancer-drug response prediction, as obtaining new experimental data from animal models or patients is expensive, data is extremely noisy across different models about a real patient response, and explainability is imperative for clinical translation.

    Scrutinity

    How can models be scrutinized as trustworthy? What is suitable due diligence? As LLMs and other media-generating models such as DALL-E continue to surprise operators with their seemingly oracle-like behavior, I fear those in charge of automated decision systems are not equipped to ‘get to know’ models and scrutinize their level of groundedness is a present-day reality. Even in AI for science initiatives such as designing autonomous laboratories, LLMs have immense potential to fill current capability gaps. Yet, LLMs’ lack of structure or fact grounding presents a significant challenge in science. Future work: I will research the capacity of LLMs for implicitly and explicitly storing knowledge graphs, treating LLMs as a kind of novel database technology.

    Calibration

    Currently, few automated calibration techniques exist for large-scale AI projects, and this divide is only deepening with the scale of models being deployed across diverse industries and science domains. I focus on developing automatic calibration for large-scale hierarchical AI workflows with existing disparate data sources as well as in cases where experimental data can be acquired. Significant progress has been made with workflows such as IMPECCABLE; however, techniques to auto-calibrate these workflows based on available data, experiments, and/or field observables are lacking. Future work: I will develop an HPC framework for the automatic calibration of large workflows that leverage information across the entire campaign and its various scales and optimally discover new data collection(s), which will efficiently improve the quality of the model calibration.

    Furthermore, I believe great effort is needed to advance calibration techniques of LLMs. I envision a radically different AI for science paradigm than current foundational model proposals. Small edge and emended AI systems are increasingly used due to their efficiency, ease of use. Traditional scientists are increasingly drawn to these simple models and using them in their work. These types of models are the most likely models to be deployed to scientific instruments. At the same time, great advances in large foundational language models have drawn renewed excitement around the performance gains in many tasks due to the synergistic injection of more and more diverse data.

    Law, Democracy, and Human Rights

    Courts, regulators, and politicians are increasingly relying on computer models and simulations for evidence and reasoning, developing regulation around AI, and grappling with the application of traditional rights in the face of technological artifacts. The ability to engage in communication and collective understanding of policies, form political opinions, and even come to share a unified scientific lifeworld are fundamental to democracy and a basic human interest. How can we afford the same transparency, understanding, and fundamental rights fostered through civic engagement in an increasingly technocratic democracy? In conjugation with developing novel methods for increasing our interpretive flexibility with models through sample-based explainability, calibration, and new methods for scrutiny, I aim to continue scholarship on understanding how interpretations of science and computational follow commitments in law, democracy, and rights. For example, the two modes of explainability I outline coreespond to modes of democratic integration: epistemic and social explainability. Epistemic explainability refers to the traditional X-AI program where the goal is to relate the decision structure of a particular model with a scientific and natural language theory. An explanation might provide a counterfactual (‘had expression of gene X increased, the phenotype prediction would have been Y’) while a global explanation might present generalized rules to summarize the model (‘gene X when expressed with gene Y is phenotype Z'’). Social explainability refers to the ways in which different collectives, regulators, and courts come to understand and interpret the system both epistemically and as embedded political, social, and corporate institutions. This distinction is important since social explainability is a pre-condition for democratic discourse in civil society, and it is being realized in law (with a distinction between GDPR Recital 71’s call for epistemic explainability and European Union’s AI Act art. 13 call for users’ ability to “interpret” models). Furthermore, the European Union’s AI Liability Directive proposes a presumption of causality when fault has been established with an AI system, and second providing a right to access evidence of a system. By focusing on sample explainability, interpretive flexibility is allowed outside of mere feature attribution. Questions such as the inclusion of some data over other data can be causally understood, allowing open interpretation of those data in social contexts, and quantitively illustrating the impact of those choices. With this charge, I plan on taking on the following projects in the short term: a review of case law focused on the question how is expertise with respect to models used, how intentions are understood in models (as this question arose the recent oral arguments of Merrill v. Milligan), who authorizes what explanations about it, and what might expand the scope of explanation be? Of course, one answer will be provided through technically developing one as above. Second, I will continue research into the kinds of civic epistemologies exercised when it comes to understanding algorithms. Third, I aim to undertake analyzing the parallels between judicial interpretation and interpterion of algorithms to ascertain what commitments follow from ways of knowing algorithms.

    While many AI ethics programs focus on explainability and virtues experts should follow in their practice, few research programs treat the idea with STS reflexivity: how do these technologies open new means for citizens to participate in world-making, and how can citizens drive the kinds of technological innovation needed in their local contexts? My research into AI civics is twofold: (1) how do we develop the kinds of public institutions which afford citizens the same access to decision-making that traditional intuitions have in a world of AI? And (2) how do we foster through public education, civic engagement, and university education new skill for an informed citizenry in a technological world. I will articulate AI as an opportunity for empowerment through epistemic justice, where citizens are able to confirm their suspicions and bring new calculability to what oppression is. My work touches human rights law and philosophy, for example, considering the right to the progressive realization of equal access to science and technology.


Current Projects More »

    (In progress)

  • Regression Enrichment Surfaces RES Python package and new methodology for analyzing deep learning models' performance in regards to virtual screening tasks.
  • RLMM RLDock OpenAI Gym environment for virtual structural docking using reinforcement learning. RLDock allows for docking flexible targets and combination with simulation workflows.
  • CANDLE CANDLE I worked on the NCI/DOE Pilot-1 cancer dose response prediction models.

Teaching

  • UChicago Instructor Instructor for Introduction to Computer Science II (CMSC 152), Department of Computer Science, Univeristy of Chicago (Summer '21)
  • CAAP Lecturer Instructor for University of Chicago, Chicago Academic Achievement Program Computer Science Course (Summer '20, '21)
  • MPCS Practicum Instructor RL for COVID-19 (Spring '20)
  • CMSC 254 TA for Machine Learning in Medicine (Fall '19)