Leo Breiman writes in Statistical Modeling: The Two Cultures about how there are fundamentally two cultures of statistical modeling: data modeling and algorithmic modeling. Data modeling assumes that the data is generated by some stochastic data model while algorithmic modeling treats the data model as an unknown. This in turn highlights two motivations of using linear models: causal inference and prediction.

Causal inference tries to explain causality by determining if an underlying predictor explains a target. By identifying the data model, we can prove that the target distribution is an instance of this data model. Thus, we could translate the data model $→$ rules $→$ computer procedures.

Prediction on the other hand allows the data to guide the process, where it becomes unimportant to know exactly which rules work best and to rather use the best empirically-found rules as a prediction model. This frame of thinking allows us to model complex data sets where it is extremely hard to deduce data models.

Put in the context of ML, Mullainathan and Spiess write:

The problem of artificial intelligence has vexed researchers for decades. Even simple tasks such as digit recognition—challenges that we as humans overcome so effortlessly—proved extremely difficult to program. Introspection into how our mind solves these problems failed to translate into procedures. The real breakthrough came once we stopped trying to deduce these rules. Instead, the problem was turned into an inductive one: rather than hand-curating the rules, we simply let the data tell us which rules work best.

In the modern age where models have billions of parameters and see trillions of tokens, the question to ask is whether we care about how the model learns this highly complicated distribution or not. By shifting the framing from deduction to empiricism, researchers have been able to make countless breakthroughs in ML.