Deeper Neural Networks Lead To Simpler Embeddings

Andre Ye Andre Ye
March 23, 2021 AI & Machine Learning

A surprising explanation for generalization in neural networks

Recent research is increasingly investigating how neural networks, being as hyper-parametrized as they are, generalize. That is, according to traditional statistics, the more parameters, the more the model overfits. This notion is directly contradicted by a fundamental axiom of deep learning:

Increased parametrization improves generalization.

Although it may not be explicitly stated anywhere, it’s the intuition behind why researchers continue to push models larger to make them more powerful.

There have been many efforts to explain exactly why this is so. Most are quite interesting; the recently proposed Lottery Ticket Hypothesis states that neural networks are just giant lotteries finding the best subnetwork, and another paper suggests through theoretical proof that such phenomenon is built into the nature of deep learning.

Perhaps one of the most intriguing, though, is one proposing that deeper neural networks lead to simpler embeddings. Alternatively, this is known as the “simplicity bias” — neural network parameters have a bias towards simpler mappings.

Minyoung Huh et al proposed in a recent paper, “The Low-Rank Simplicity Bias in Deep Networks”, that depth increases the proportion of simpler solutions in the parameter space. This makes neural networks more likely — by chance — to find simple solutions rather than complex ones.

On terminology: the authors measure the “simplicity” of a matrix based on its rank — roughly speaking, a measurement of how linearly independent parts of the matrix are on other parts. A higher rank can be considered more complex since its parts are highly independent and thus contain more “independent information”. On the other hand, a lower rank can be considered simpler.

Huh et al begin by analyzing the rank of linear networks — that is, networks without any nonlinearities, like activation functions.

The authors trained several linear networks of different depths on the MNIST dataset. For each network, they randomly drew 128 neural network weights and the associated kernel, plotting their ranks. As the depth increases, the rank of the network’s parameters decreases.

Deeper Neural Networks Lead To Simpler Embeddings
Source: Huh et al.

This can be derived from the fact that the rank of the product of two matrices can only decrease or remain the same from the ranks of each of its constituents. If we get a little bit more abstract, we can think of this intuitively as: each matrix contains its own independent information, but when they are combined, the information of one matrix can only get muddled and entangled with the information in another matrix.

rank(AB) ≤ min (rank(A), rank(B))

What is more interesting, though, is that the same applies to nonlinear networks. When nonlinear activation functions like tanh or ReLU are applied, the pattern is repeated: higher depth, lower rank.

Effective Rank Of Kernels
Source: Huh et al.

The authors also performed hierarchical cluster kernels for different depths of the network for the two nonlinearities (ReLU and tanh). As the depth increases, the presence of block structures shows decreasing rank. The “independent information” each kernel carries decreases with depth.

Deeper Neural Networks Lead To Simpler Embeddings
Source: Huh et al.

Hence, although it may seem odd, over-parametrizing a network acts as implicit regularization. This is especially true with linearities; thus, a model’s generalization can be improved by increasing linearities.

In fact, the authors find that on both the CIFAR-10 and CIFAR-100 datasets, linearly expanding the network increases accuracy by 2.2% and 6.5% from a baseline simple CNN. On ImageNet, linear over-parametrization of AlexNet increases accuracy by 1.8%; ResNet10 accuracy increases by 0.9%, and ResNet18 accuracy increases by 0.4%.

This linear over-parametrization — which is not adding any serious learning capacity to the network, only more linear transformations — performs even better than explicit regularizers, like penalties. Moreover, this implicit regularization does not change the objective being minimized.

Perhaps what is most satisfying about this contribution is that it is in agreement with Occam’s razor — a statement many papers have questioned or outright refuted as being relevant to deep learning.

The simplest solution is usually the right one.
– Occam’s razor

Indeed, much of prior discourse has taken more parameters to mean more complex, leading many to assert that Occam’s razor, while being relevant in the regime of classical statistics, does not apply to over parametrized spaces.

This paper’s fascinating contribution argues instead that simpler solutions are in fact better, and that more successful, highly parameterized neural networks arrive at those simpler solutions because, not despite, their parametrization.

Still, as the authors term it, this is a “conjecture” — it still needs refinement and further investigation. But it’s a well-founded, and certainly interesting, one at that — perhaps with the potential to shift the conversation on how we think about generalization in deep learning.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Andre Ye

    Tags
    Artificial IntelligenceDeep learningDeeper Neural NetworksEmbeddingsMachine Learning
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    How RPA And IPA Differ And Why You Need Them

    How RPA And IPA Differ And Why You Need Them

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: support@experfy.com

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.