Disclaimer: Deep Learning is giving the best results in pretty much all areas of machine learning right now, and it’s much easier to train a deep network than to train a shallow network. But it has also been shown that in several cases, a shallow network can match a deep one in precision and number of parameters, while requiring less computational power. In this post I leave aside the problem of training/learning.
This started by me buying several books and trying to read all of them at the same time. Strangely, they all treated the same topics in their opening chapters (the unconscious, perception, natural selection, human behaviour, neurobiology…), and all disagreed with each other. Also none of them was talking about layers, but somehow my conclusion about these books is that we might want to give up layered design in AI.
So today I’m going to talk about layers, and how and why we might want to give them up. The “layers” I’m talking about here are like the ones used in Deep Learning (DL). I’m not going to explain the whole DL thing here, but in short it’s an approach to machine learning where you have several layers of “neurons” transmitting data to each other. Each layer takes data from the previous one and does operation on it to reach a “higher level of abstraction”. For example, imagine that Layer 1 is an image made of colored pixels, and Layer 2 is doing operations on these pixels to detect edges of objects in the image. So each neuron is holding a value (or a group of values).
Typically, neurons in the same layer don’t talk to each other (no lateral connections); they only perform operations on data coming from the previous layer and send it to the next layer (bottom-up connections). What would happen if there were lateral connections, it that your layer would stop representing “something calculated from below”. Instead they would hold data made up of lower abstraction and current abstraction data mixed up together. As if instead of calculating edges from pixels, you calculated edges from pixels and from other edges. Layers also usually don’t have top-down connections (equivalent to deciding the color of a pixel based on the result of your edge calculation). These properties are shared by many processing architectures, not only DL. I’m not focusing on DL particularly, but rather trying to find what we might be missing by using layers – and what might be used by real brains.
Example of layering – Feedforward neural network. “Artificial neural network” by en:User:Cburnett – Wikimedia Commons
Layers are good for human designers: you know what level of data is calculated where, or at least you can try to guess it. Also we talk about the human brain cortex in terms of layers – but these are very different from the DL layers, even from a high level point of view. Neurons in the human brain have lateral and top-down connections.
DL-like layers are a convenient architecture. It keeps the levels of abstraction separated from each other – your original pixel data is not modified by your edge detection method.Your edge detection is not modified by object detection. But… Why would you want to keep your original data unmodified in the first place? Because you might want to use it for something else? Say that you’re playing a “find the differences” game on two pictures. You don’t want to modify the model picture while looking for the 1st difference; you want to keep the model intact, find difference 1, then use the model again to find difference 2 etc.
But… For example if you could look for all errors in parallel, you wouldn’t care about modifying the images. And if what is being modified is a “layer” of neurons inside your head, you really shouldn’t care about it being modified; after all, the model image is still there on the table, unmodified.
The assumptions behind layers
Let’s analyse that sentence: “you might want to use it for something else.”
It: it is the original unmodified data. Or rather, it is the data that will be transmitted to the next layer. That’s not trivial. How to decide what data should be transmitted? Should you try to find edges and then send that to another layer? Or is it OK to find edges and objects at the same place and then send that to a different layer? All depends on the “something else”.
Something else: If you can calculate everything in a messy bundle of neurons and go directly from perception to action in a single breath, you probably should. Especially if there is no learning needed. But when you have a whole range of behaviors depending on data from the same sensor (eyes for example), you might want to “stop” the processing somewhere to preserve the data from modification and send these results to several different places. You might send the edge detection results to both a sentence-reading module and a face detection module. In that case you want to keep your edge detection clean and unmodified in order to send the results to the different modules.
Might: But actually, you don’t always want to do that. If there are different behaviors using data from the same sensors but relying on different cues, you don’t need to preserve the original data. Just send what your sensor senses to the different modules; each one modifying its own data should not cause any problem. Even if your modules use the same cues but in different ways, sending to each one a copy of the data and letting them modify it can be OK. Especially, if your modules need to function fast and in parallel. Let’s say that module 1 needs to do some contrast detection in the middle of your field of vision (for face detection maybe). Module 2 needs to do contrast detection everywhere in your field of vision (obstacle detection?). If we make the (sometimes true in computers) assumption that contrast detection takes more time for a big field than a small one, it will be faster for module 1 to do its own contrast calculation on partial data instead of waiting for the results calculated in module 2. (but more costly).
Did you know that if the main vision center of your brain is destroyed, you will still be able to unconsciously detect the emotions in human faces… while being blind? You will also be able to avoid obstacles when walking. The parts of your brain for conscious vision, face recognition and obstacle detection are located at different places, and function semi-independently. My hypothesis is that these 3 functions rely on different use of the same cues and need to be running fast, therefore in parallel.
If not layers then what?
I would go for modules – so called “shallow networks”. A network of shallow network. And I suspect that it is also what happens in the brain, although that discussion will require a completely different blog post.
First, I think that the division in layers or in modules need to be less arbitrary. Yes, it is easy to use for human designers. But it can also be a cost for performance. I can see some advantages in using messy shallow networks. First, neurons (data) of the same level of abstraction can directly influence each other. I think it’s great to perform simplifications. If you need to do edge detection, you can just try to inhibit (erase) anything that’s not an edge, right there in the “pixel” layer. You don’t need to send all that non-edge data to the next module – after all, very likely, most of the data is actually not edges. If you actually send all the data to be analyzed (combined, added, subtracted…) in an upper layer, you also need more connections.
Furthermore, it seems justified to calculate edges also from other edges and not just from pixels. Edges are typically continuous both in space and time: using this knowledge might help to calculate edges faster from results that are already available about both pixel and edges than if you just update your “edge layer” after having completely updated your “pixel layer”.
Ideally we should only separate modules when there is a need to do so – not because the human designer has a headache, but because the behavior of the system requires so. If the output of the module is required as is for functionally different behaviors, then division is justified.
I would also allow top-down connections between modules. Yes, it means that your module’s output is modified by the next module, representing a higher level of abstraction. It means that you are “constructing” your low level input from a higher level output. Like deciding the color of pixels based on the result of edge detection… I think it can be justified: sometimes it is faster and more economical to construct a perception than to just calculate it (vast subject…); sometimes accurate calculation is just not possible and construction is necessary. Furthermore if a constructed perception guide your behavior as to make it more successful, then it will stick around thanks to natural selection. I also think that it happens in your brain (just think about that color illusion where two squares look like different colors just because of the surrounding context like shadows).
Lots of unsubstantiated claims in this blog post! As usual. If I could “substantiate” I’d write papers! But I really think it’s worth considering: are layers the best thing, and if not then why? This thought actually came from considerations about whether or not we are constructing our perceptions – my conclusion was yes, constructed perceptions have many advantages (more on that later…maybe?). But what kind of architecture allows to construct perceptions? The answer: not layers.