Ready to learn Machine Learning? Browse Machine Learning Training and Certification courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.
Scientific theories are what make the world comprehensible, at least for most of us. But then we heard a rumor that there is a new game in town: machine learning. Along with its sibling, big data, they threatened to drive scientific theories out of town. Machine learning, and especially deep learning, have already become a magic box for building ever more accurate predictive models. Using it, one could make predictions based on patterns found in previous observations. Traditionally, making predictions was a complicated business, involving, amongst other things, developing underlying theories for understanding how things work. But now you could throw enough data at a large enough neural network and you will have predictions coming out from the other side. So why bother with theories at all?
The rumor dissipated soon enough, because it was based on the false premise that the goal of science is to churn out predictions. It is not. The goal of science is to provide understanding. Understanding comes from explanations, and explanations are provided by theories. The whole edifice of modern science stands on the shoulders of a web of interconnected theories.
The rumor might have died, but its ghost continues to haunt us. Old school theorists tend to regard this new wave of empiricism as an attack on their profession by the plebs. And, many freshly minted data experts, coming from the less analytical lands of our newly democratized landscape, often seem to conflate theory with preconceived bias.
For me, personally, this rather sorry state of affairs is … somewhat awkward. I started out as a theoretical physicist. Theories are what help me make sense of the world. Yet, I make my living now by tinkering machine learning algorithms. I can appreciate, first hand, the power of these algorithms. Yes, machine learning is a tool, but it is a tool like no other. It fundamentally alters our relationship with information. One way or the other, our conception of what constitutes an understanding of reality will be shaped by the role that machine learning plays in science.
If rationalism is to survive this deluge of empiricism, then theorists need to find a way to incorporate machine learning meaningfully into their world. Not as a foreign clerk dealing with the mindless drudgery of mining through data, but as a full citizen and guide to the art of building scientific theories.
It is not such a strange wish. After all, most of the important advancements in how we store, process or convey information, be it new mathematical techniques or electronic computers, have found their use in the development of scientific theories. There is no reason why machine learning should remain the surly exception. The question is, how?
The template that we use for building theories is derived largely from physics. A theory is essentially a set of rules that can be used to derive predictive models of different aspects of phenomena. The explanatory power of theories comes from their ability to provide holistic pictures of aspects of reality, i.e. in being able to show that disparate phenomena emerge from a small set of simple rules. For example, the same rules of statistical mechanics can be used to calculate the thermodynamic properties (such as temperature, pressure, density) of any substance in equilibrium.
Historically, our belief in being able to explain the universe on the basis of such theoretical frameworks has been motivated largely by the spectacular successes of physics. However, thanks to insights provided by the seminal work by Kenneth Wilson and others in the last quarter of the previous century, this belief now stands on a healthy foundation of understanding.
Consider a hierarchy of rulesets, with the initial (bottom level) ruleset representing the mathematical structure of a theory and the final (top level) one representing the mathematical structure of the observed stable correlations in data. One can now think of a transformation such that the rulesets at each level are obtained by applying this transformation to the ruleset at the previous level. This process used to derive higher level rulesets from the lower level ones is called the renormalization group flow (I am using this term very loosely).
For certain kinds of transformations and rulesets, something quite remarkable and unexpected happens; starting from very different initial rulesets you end up with the same final ruleset. The final ruleset in this case is called a fixed point and the group of initial rulesets that lead to the same fixed point are said to constitute a universality class. The hypothesis of universality (or simply universality for brevity) states that rulesets and transformations that are actually found in nature are of the above kind. (See here for an introduction to universality and renormalization group).
If universality is true, then it would mean that the observed stable correlations in complex systems would be independent of the details of the underlying theory, i.e. simple theories may be good enough. And, in addition, we should see correlations having the same mathematical structure across various unrelated domains.
Universality was first observed and studied in the behavior of the thermodynamic variables of disparate systems near continuous phase transitions. Since then it has been observed in a variety of diverse and unrelated places such as the dynamics of complex networks, multi agent systems, the occurrence of pink noise and the bus system of a town in Mexico, to name a few (see here for some interesting examples). There is enough empirical evidence to believe that nature (including many man-made entities) really does indeed favor universality.
Although theories belonging to a universality class may have very different origins (with respect to the aspect of reality they are trying to explain) and mathematical details, they share some important mathematical properties which puts tight constraints on their mathematical structure. For the universality classes found in physics these properties are usually symmetries, dimensionality and locality. But, in general, they will depend of the specific universality class, and can be determined by carrying out the renormalization group flow of a member of the class.
Universality, by itself, can only partially explain why the theoretical frameworks in physics are so successful. The second part comes from the observation that the hierarchy of rulesets in physical systems corresponds very nicely with our intuition. In physics the hierarchy of rules is the hierarchy of scales or resolution. Intuitively, we expect that big things (macroscopic objects) have rules and so must small things (microscopic entities). We also know that big things are composed of small things, hence the macroscopic patterns should follow from microscopic theory. And this is exactly what happens in reality. This is the reason why (almost naive) reductionism works so well in most areas of physics.
The final piece in this puzzle has to do with the timeline of technological development. We started off by observing phenomena at the human scale, and only then started developing the technology, microscopes and telescopes, to observe phenomena at progressively smaller and larger scales. This timeline corresponds very nicely with the hierarchy of rulesets in physical systems. As a result we could develop a very fruitful feedback between theory and experiment. But, even more importantly the starting point was very crucial — for many physical systems the human scale is the one where universality kicks in. What this meant was that the stable correlations were manifest even with small amounts data and manual inspection.
To appreciate why the above points are so important, consider the situation where instead of measurements of thermodynamic properties we started off with pictures containing the snapshots of all the atoms in a box of gas at different times. How easy would it be to derive thermodynamics or statistical mechanics from this data?
The situation that we currently encounter in fields such as biology, economics or social sciences is not very different from the above situation. Unlike physics, in these fields we do not have the luxury of knowing what the hierarchy of rulesets corresponds to in reality. Neither do we know at which stage should universality kick in and we should expect to see stable correlations.
But what we did not have before, and we do have now, is a lot more data and a tool, machine learning, for distilling that data and finding these stable correlations. There is good reason to believe that deep neural networks essentially perform a version of renormalization group flow, and that one of the reasons why they are so effective is because in many situations generative processes (rulesets) for data generation are hierarchical. When viewed through the prism of universality, this means that deep neural networks provide us with access to a renormalization group flow in the universality class containing the correct underlying theory, which can then be used to constrain the mathematical structure of the underlying theory.
Consider a thought experiment where a deep neural network is provided with the snapshots of gas atoms along with the value of some complicated function of the thermodynamic variables; and we train the network with the task of predicting the value from the snapshots. Do we expect thermodynamics to emerge in the final layers of the network? Should we be able to constrain the mathematical structure of statistical mechanics from the weights of the network? There is no reason, in principle, to believe otherwise.
To come back to the question raised earlier; how can machine learning help theoretical science? Machine learning can provide the mathematical scaffolding for scientific theories, to which theorists will then add meaning and the bridge to reality. However, before we can get there we will need to develop a much better understanding of machine learning. We will need to understand machine learning algorithms from general principles. In other words, what are the analogs of symmetry, dimensionality and locality in machine learning? Perhaps, it is time to start developing a real theory of machine learning.