nonlinearity-cover-image.png

Introduction

I’ve encountered this question multiple times early on in my career from people studying machine learning.

It seems that although it’s more or less intuitive how information propagates forward through a neural network, it’s less obvious what happens if you leave out nonlinearities.

So, this is my first attempt to contribute to the collective knowledge with my take on the explanation. I don’t expect this explanation to be better than all the others already available on the internet, but maybe this one additional example helps you connect the dots, and makes everything click.

It's worth noting that this article doesn’t discuss learning at all. It only concerns what functions a neural network can theoretically represent (e.g. if we set the weights manually). But if the network cannot represent a function, it cannot learn it either.

The best introductory course on machine learning in 2021, “Machine Learning” by Andrew Ng, of course, addresses this topic, but this part has always felt a bit shaky to me.

Lecture 8.1 - Neural Networks Representation | Non Linear Hypotheses - [Andrew Ng]

To recap the relevant part of the video:

from Machine Learning by Andrew Ng

from Machine Learning by Andrew Ng

from Machine Learning by Andrew Ng

from Machine Learning by Andrew Ng