In recent years, machine learning has achieved strong predictive performance across a broad range of scientific domains. Yet a fundamental limitation has become progressively apparent: the inability of most modern architectures to produce representations that are simultaneously accurate and interpretable. A model that yields low test error but cannot be analytically examined offers little scientific value when the goal is understanding, not merely prediction.
At ActarusLab, our research centres on this epistemological gap. We investigate whether it is possible, starting from noisy and high-dimensional datasets, to recover compact, interpretable mathematical structures that faithfully describe the governing dynamics of the underlying system.
Deep neural networks and ensemble methods have demonstrated remarkable capacity to approximate complex functions. However, the functional form learned by such architectures is encoded implicitly in millions of parameters that admit no direct physical interpretation. In practice this means:
This is not a secondary concern. In domains where interpretation constitutes part of scientific validation — quantitative finance, computational chemistry, fluid dynamics — the inability to explain a model is equivalent to an incomplete proof.
Symbolic regression (SR) differs fundamentally from parametric regression. Rather than fitting the coefficients of a predefined functional form, SR searches the space of mathematical expressions directly, treating both the structure and the parameters of the model as free variables. The objective is to identify an expression of the form:
where f is constructed from primitive operations (addition, multiplication, exponentiation, trigonometric functions, etc.) and where the search is guided by a Pareto-optimal trade-off between predictive accuracy and symbolic complexity. This is sometimes referred to as the Occam's Razor principle instantiated algorithmically.
The output is not a learned weight vector, but an explicit, human-readable mathematical expression that can be inspected, differentiated, integrated, and falsified.
In many scientific applications, empirical data are scarce, noisy, or confounded by measurement artefacts. A robust SR pipeline therefore typically begins with the construction of a high-fidelity simulation environment that maps the relevant state space of the system under study. This serves two purposes:
At ActarusLab we employ simulation frameworks such as QuTiP for quantum dynamics, Monte Carlo samplers for stochastic systems, and custom CFD integrators for continuum problems, generating datasets that can span hundreds of thousands of distinct system states.
As a simplified illustration, a symbolic search applied to a physical dynamical system might recover an expression of the form:
The significance of such a result lies not in its formal complexity, but in what it enables:
It is important to resist the framing of interpretability as an optional property — a desirable but secondary characteristic that can be traded against predictive performance. In scientific contexts, a model that cannot be examined is difficult to validate, and a model that cannot be validated is not science. Interpretability is therefore a hard constraint in our methodology, not a post-hoc consideration.
This implies a deliberate willingness to accept marginally lower predictive performance in exchange for symbolic transparency. The resulting models are verifiable, portable, and compatible with the formal language of scientific publication and regulatory compliance.
Symbolic regression offers a principled route from raw data to mathematical structure. It does not replace scientific reasoning; it automates one of its most laborious components — the enumeration and evaluation of candidate functional forms — allowing researchers to focus on interpretation, validation, and application. In this sense, it provides a genuine bridge between the complexity of data and the clarity of governing equations.