Inferential Mechanics Part 1: Causal Mechanistic Theories of Machine Learning in Chemical Biology with Implications

2026-02-26Machine Learning

Machine Learning
AI summary

The authors explain that while machine learning is widely used in natural sciences, it often ignores the hidden cause-and-effect relationships in data, treating models like 'black boxes.' In this first paper of a series, they introduce a new framework combining chemistry, biology, and probability to help machine learning better identify underlying mechanisms, calling this ability 'focus.' They demonstrate their ideas using a group of molecules called Akt inhibitors. Future papers will explore chemical similarity and show more evidence of how missing causal links weaken machine learning in chemical biology.

machine learningcausalitychemical biologyAkt inhibitorsblack box modelschemical similarityprobability theoryinferential mechanicsreductionism
Authors
Ilya Balabin, Thomas M. Kaiser
Abstract
Machine learning techniques are now routinely encountered in research laboratories across the globe. Impressive progress has been made through ML and AI techniques with regards to large data set processing. This progress has increased the ability of the experimenter to digest data and make novel predictions regarding phenomena of interest. However, machine learning predictors generated from data sets taken from the natural sciences are often treated as black boxes which are used broadly and generally without detailed consideration of the causal structure of the data set of interest. Work has been attempted to bring causality into discussions of machine learning models of natural phenomena; however, a firm and unified theoretical treatment is lacking. This series of three papers explores the union of chemical theory, biological theory, probability theory and causality that will correct current causal flaws of machine learning in the natural sciences. This paper, Part 1 of the series, provides the formal framework of the foundational causal structure of phenomena in chemical biology and is extended to machine learning through the novel concept of focus, defined here as the ability of a machine learning algorithm to narrow down to a hidden underpinning mechanism in large data sets. Initial proof of these principles on a family of Akt inhibitors is also provided. The second paper containing Part 2 will provide a formal exploration of chemical similarity, and Part 3 will present extensive experimental evidence of how hidden causal structures weaken all machine learning in chemical biology. This series serves to establish for chemical biology a new kind of mathematical framework for modeling mechanisms in Nature without the need for the tools of reductionism: inferential mechanics.