Exploring New Neural Architectures for Adaptation to Complex Worlds

Below you’ll find the link to a draft of my thesis. I’d be happy to have comments, specifically on the 2nd part about the Epsilon Network. You don’t really need a sciency background to have a look, and comments can be about anything from errors in the figures to general ideas or grammar errors.

Open your favorite PDF reader and use the comment function, or just comment below or on twitter with the page number.

If you give me your name I’ll put you in the acknowledgements. Thanks for your help!




Intraspecific Group Arm Race


[Image By Ciar – Own work, Public Domain, https://commons.wikimedia.org/w/index.php?curid=4251990%5D

We all know the basics behind Darwinian evolution. Individuals from a species are born with different traits and the fittest members have more offspring, which share some of these “winning traits”. Antelopes become faster, giraffes become taller, etc.

Nevertheless, selection pressures do not always affect a species as a uniform group. In fact, the most interesting examples of competition arguably occur between different groups in the same species. This tells a more complicated story than the monolithic interspecies competition (e.g. prey vs predator), or even than individual fitness against the rest of the species (tall giraffe vs all other giraffes, male vs other males). That is why I call it Intraspecific Group Arm Race.

Today I would like to talk about how these strange pressures appear and what they entail. As usual, this post is not a summary of a current consensus, but a personal reflexion.

The most famous example of intraspecific group competition must be the evolutionary arm race between males and females mallards. Sexual arm races exist in dolphins, flies, beetles and many other species, but ducks are the most popular example for online news (and I just know the following paragraph should increase traffic to my blog tenfold…).

Mallards have complicated sex lives. You see, single males like raping females, and females do not like being raped by males. Therefore they both evolved complicated genitals: males have long, corkscrew shaped penises; females have even more gigantic, corkscrew shaped vaginas. (Maybe I should write this once more to be sure to attract the internet crowds?)
The race for complex genitals occurs because single males have an interest in fathering at least some ducklings rather than zero, while females have an interest in choosing the best father for their offspring (and not just the most rapey one). Thanks to their convoluted vagina, females can control who fertilizes the eggs: their active cooperation is required for the male to succeed. The sexes both continuously evolve to trump each other’s adaptations. Most birds don’t even have penises, so the evolutionary pressure must definitely be strong.

Arm races occur whenever you have competition. But for intraspecific group arm race to start, you need to satisfy these four requirements:

  1. You need groups that coexist through a generation
  2. These groups must belong to the same species
  3. They need to compete for a limited physical or abstract resource
  4. The offspring must be able to end up in any of the competing groups at birth OR be able change groups during lifetime

A group is defined as an ensemble of individuals who are on the same side of the competition for a resource at some point in time: male/female/asexued, cheater/cheated, adults/kids. In the ducks case, 1. the 2 groups are single males vs females. 2. They are all mallards (rule 2 is a corollary of rule 4, but let us still write it explicitly). 3. They compete for a choice: who gets to choose the genetic makeup of the ducklings? The cost of losing for the unpaired male is not passing his genes on. If he can father even 1 duckling, that is the difference between the life or death of his entire lineage! The female on the other hand, always passes her genome on: for her the cost is spending energy to raise ducklings that have unwanted genes and might not all survive and reproduce. 4. The ducklings can be born male or female: they can end up on any side of the competition. This last condition prevents one side from “winning”: an ideal genome would give you strong male offspring AND strong female offspring, otherwise you’re sentencing half of your own descendants to losing the game.

Groups do not have to be as classical as male vs female.
Take human babies vs human parents. Have you noticed that babies cry a lot (Yes, you have…)? Let’s be honest, a lot of the time it seems that they cry for no specific reason and nothing but time will calm them down.
One theory, not uncontroversial but backed up by interesting data, is that babies cry day and night both to exhaust their parents and to extend the post-birth infertility of their mother (linked to breastfeeding). The end goal being that no other sibling should be born too early so the baby can have the parents’ full attention and grow with all the physical and emotional resources available (like milk, cuddles and peekaboo parties). Human babies and human adults compete for the adult’s energy: the conflict arises from the fact that the baby fares better if the parents don’t make another baby too soon, but the parents fitness would be increased by 1. Not being exhausted and 2. Having as many babies as possible. Babies should evolve to be as noisy and stressful as possible, while parents evolve to discriminate unmotivated crying and ignore it. But the arm race is contained within the species: the baby is supposed to in turn join to the “parents” group. An ultra effective crying baby would turn into a very miserable parent. Again, the tug-o-war is balanced.

This brings us to the critical idea of unbalanced costs. Two competing groups have to mitigate the costs for their opponents cum future offspring, but what about the costs at the scale of the species? There is no baked-in rule preventing an arm race from being costly to the species. Some scientists say that a successful mutation must increase the fitness of the whole species, otherwise the mutation would be weeded out. I think that something much more interesting happens.

Can a mutation be beneficial for an individual and damaging for its species? Yes. Some very aggressive male ducks end up drowning unlucky females. From their point of view, there is no loss: if the female resists so much that she dies, the male probably wouldn’t have managed to fertilize the egg anyway. Remember, for those single males, just one duckling is the difference between life and death of their entire lineage. It is highly beneficial for them to be overly aggressive, and to spread their aggressive genes. At the scale of the species though, the cost is high. These female ducks could have raised many generations of ducklings. But no individual acts with an entire species in mind: even on the brink of extinction, the aggressive males would likely not be able to “just stop drowning uncooperative females”, as females would not be able to “just stop being uncooperative and accept unwanted genes”. There is no magical stop switch, “think about the species before yourself” button: arms races can go wrong.

And yet. Restraints can evolve and enhance stability. I will always remember Justin Werfel’s talk about evolved death. A major dominant theory is that death has no evolutionary advantage, and therefore no species will ever evolve to have its members die younger rather than older. After all, if you die, you loose on occasions of passing your genes on.
But during the last 15 years, evidence for the evolution of death has accumulated. More generally, in a space where resources are limited, a population that has self restraint mechanisms will survive better than a population of reckless individuals. Imagine a group of cows devouring all of the grass at once, not allowing it to grow back. They face immediate reward as they fatten but long term starvation as the grass just dies from exhaustion. Now imagine if one of the cows has a mutation for restraint. Maybe it’s a dwarf cow, or maybe it dies after only a few years. Anyway, the grass were this cow and its descendants live will be healthier than elsewhere, and on the long term they will, as a group, survive where the others starve and die.

Restraints are counter intuitive in a classical evolutionist view, because evolution is not supposed to look into the future. If I can have more kids than you right now, I am fitter than you and will supplant you. But this view fails to take into consideration environmental feedback, which can be quite immediate. The fasting cows only survive because they share the same physical space, allow their own offspring to inherit it and protect their turf from outsiders. If the voracious cows could just come up and eat what the fasting cows have saved up, this would not work.  The environment must be inherited and protected just as the genes are.

So there is no need for evolution to “look into the future” to favor less costly arm races, and there is no guarantee that the arm race offers any advantage to the species either.

This becomes particularly interesting when you start considering other, more abstract groups satisfying the conditions for intraspecific group arm race. Take cheaters and vigilantes. The transmission does not even have to be genetic. When you find out a new way to cheat, you have an immediate advantage. But you also become able to spot people who cheat in the same way as you. So if you teach your offspring or other in-groups how to cheat, they also have an advantage but they cannot use it against you or their siblings. The more successful your cheating strategy is, the more offspring you transmit it to, and… the less successful it becomes. The greater honeyguide is a good example of cheating dynamics. These birds lay an egg in other birds’ nests. But they have also evolved to notice if another honeyguide has already left an egg there, and destroy that egg. So they have to evolve eggs that look more and more like the parasited bird species, to avoid being crushed by their fellow cheaters; but they also have to become better and better at spotting suspicious eggs… Here the two groups are simply the 1st honeyguide to lay an egg vs the ones that come after. Whether a bird will belong to one or the other group is pretty much random. All conditions are respected for a perfect intraspecific group arm race. Yet there is a big loser in the story: the parasited bird. None of its own offspring survives, only the honeyguide chick. And if the competition between honeyguides is so harsh already, it must mean that almost all nests get parasited, several times in a row! The bird could well get parasited to extinction.
Unless, that is, a mutation for restraint appears first and the conditions for its protection are satisfied…

We have seen that these interesting competitive dynamics can happen for concrete, classical groups defined by their genes, only redistributed once per generation; to groups defined by their age, with a total permutation halfway through the generation; to the more abstract who-came-first groups that change every time a honeyguide visits a nest. This framework seems highly robust and its preconditions are so loose that it seems likely to have a large, underestimated influence on the evolution of species.

What about leaders vs followers? 1st child vs following siblings?

How many other groups can you find out?

OEE: Videos

I just remembered that I have videos that I took of my OEE simulations. They go along with the OEE posts that you can find here: https://itakoyak.wordpress.com/?s=OEE

They’re not excellent quality and somehow I never thought about including the videos in the data post.




How to detect life or did the omelet come before the egg

Wikimedia Commons

Close your eyes. I give you 2 planets to hold, one in each hand. One harbors life as we know it, the other is without life. Which one would you say feels hotter in your palm? Why?

Detecting life on faraway planets is a gigantic an fascinating issue. We can’t just go there and look; we can’t even have a proper look from Earth. We have to rely on noisy measures that come from instruments very different from the sensors we’re used to in our own bodies.

An even more thorny issue is that we are not really sure how to define life.
I once heard someone say that a characteristic of life it that it goes against the 2nd law of thermodynamics (exploration of the relationship between life and entropy actually goes back to Erwin Schrodinger). My first reaction was disbelief, which is ironic considering that my first reaction to hearing the 2nd law in high school was also disbelief.
You might have heard it presented like this: if you break an egg and mix it up, you will never be able to get it back to its original state (separated white and yolk) even if you mix it forever. Why? Because of the 2nd law, which says that the universe must always go towards more disorder (actual formulation: The total entropy of an isolated system must always increase over time or remain constant.)
I strongly disliked the example of the egg, with its  fuzzy notion of “disorder”. I felt like the initial state was only special because my physics professor had decided so. What if I define another state as being special? I could record the position of each molecule after having mixed the egg for 20 seconds, and say that this is a very special state, and that any amount of mixing would not bring me back to that exact state. Therefore this sate must represent “order”. Then bringing the egg from its separated-white-and-yolk state to my special ordered state would be a decrease in entropy. The 2nd law did not make sense.

The notion of a relationship between an “order” and time made more sense in chemistry lessons, where everything “wants” to go to a state of lower energy. Electrons go to the lower orbits of the atom if they can find a free space. Spontaneous chemical reactions release energy, and non spontaneous ones require energy. And in mechanics, where everything also goes to the states of lower energy if given a chance. Balls go down hills, etc. But equating low energy with order in this way was just as wrong as my understanding of the egg example.

Entropy is a measure of statistical disorder. It is not applied to one state; it is the number of different microscopic states that a system could theoretically be in given a set of parameters. If you take cup of water, there is a given (enormous) number of positions at which each molecule can be: each one can be literally anywhere in the volume of the cup. If you now freeze the cup, each molecule has a reduced number of positions it could be in, because a crystal of ice has a specific structure and the molecules have to arrange themselves following that structure.
And here comes the relationship between entropy and beating an egg. The cup of ice has lower entropy than the cup of water. The non-beaten egg (each yolk particle must be in contact only with other yolk particles, except a fine layer; same for the white particles) has lower entropy than the beaten egg (each particle can be anywhere).

So what does it have to do with life? Consider the example of the egg. If it is such an organised structure, and the universe goes towards disorder, how could the atoms ever come together from a disorganised state and make such a highly organised, low entropy system as an egg? Order arising from disorder seems to defy the 2nd law. Entropy is sometimes defined as a measure of energy dispersal; does it mean that a planet with organised life everywhere would be colder than a planet without life?

It is mostly accepted that phenomenons seemingly going against the 2nd law do respect it when considered as part of a bigger system (there are several such cases besides life itself). You can make ordered ice from disordered water by channelling this disorder into the environment: it is the heat absorbed by your freezer. On average, the ice-freezer system still has the same entropy. So the egg must also come into existence at the expense of creating disorder somewhere else, and the 2nd law is respected. Maybe the 2 planets in our introduction would have the same average temperature.

These observations about the 2nd law and life do give us an interesting starting point to think about life definition and detection. You could say, like Lovelock when asked how to detect life on Mars, that entropy reduction is a characteristic of life.
But I would like to talk in terms of temporal patterns in energy. I haven’t really seen this discussed elsewhere, but I confess not having looked a lot either.

Life requires some chemical reactions to take place. Chemical reactions tend to have preferred directions: those that release energy and therefore lead to lower states of energy. If you want to obtain other reactions, you have to deliver energy to the system. In addition, if you want some reactions to happen at a predefined timing or in a specific order, you have to control when the energy is delivered to the system.
So, if you want to broaden the set of chemical reactions available to you, you need a way to store energy (and some other things to get proper metabolism: a way to get energy for storage, and a way to schedule the desired reactions).
If you store energy, it means that you are taking energy from your environment; it also means that you prevent this energy from being released.
Finally, because no energy transfer can be perfect, by causing chemical reactions to happen you must also be releasing heat in the environment.
So one way to detect life could be to look for pockets of stored energy and heat that are isolated from the environment.

Back to our introduction, which planet would be hotter and why?
Consider what makes the basis of life on Earth: plants. Plants feed on sunlight, animals feed on plants, other animals feed on animals that feed on plants.
Plants use solar energy for immediate chemical reactions; they also use it to store energy in starch form. Without plants a lot of this energy would just disperse back into the atmosphere and back in space. Animals eat the plants, and in turn store energy. Of course, they also disperse some of the energy. But for an organism to survive, the total of the dispersed energy must always be less than the stored energy; even if energy is necessary to hunt and digest preys (the sources of energy). Natural selection must favor efficient storage.

Clearly, a planet where life depends heavily on sunlight must harbor more energy than a planet without life. The problem is that some (a lot? How much? Why?) of this energy is stored, and passes from one form of storage to another. The planet would only be hotter if life consistently releases more energy than is currently being absorbed from outside into stored form, that is, releasing energy that had been stored in the past and not used (for example, animals eating a stock of very old trees, or humans burning fossile fuels). Obviously, that kind of situation can only go on as long as the stock of “old” energy lasts, so it is only a temporary state. Therefore we should try to measure stored energy, not the energy being currently dispersed in heat form, which is what temperature measures.

Unfortunately, the only way to measure how much energy is stored somewhere without having access to the history of the object is to burn it down and see how much energy is released in heat form. Burning down entire planets is not a very convenient way to proceed. We are better off looking for indirect signs that energy might be stored somewhere, by detecting small pockets of variable heat isolated from sources of constant heat.

Evolutionary Stability of Altruism


Wolf pack surrounding a bison, via Wikimedia

Wikipedia cites altruism as “an evolutionary enigma”, because under current paradigms it is “evolutionary unstable”.

It means that when an altruistic individual appears in a group for the first time (by genetic mutation), its has a lower probability to pass its genes to the next generation, so altruism should always disappear shortly after appearing: altruism may benefit other members of the group but is detrimental for the altruistic individual itself. Even if altruism genes do spread in the whole group, if a single member evolves a selfishness gene, it will be advantaged by cheating on the other members and the gene for selfishness should take over the whole group.

Diverse models have been built to explain how altruism can have spread through a population, without disappearing from the start or from competition with selfishness. All are evolutionary unstable, so the puzzle is not solved.

Here is my model, and I do believe that it is evolutionary stable. Hopefully I will have time to code a simulation.

Hypothesis I: Vindictive behaviour is a precondition to the formation of societies.

Hypothesis II: A necessary condition for the apparition and continuation of altruistic behaviour is vindictive behaviour.

Hypothesis III: The individual cost of altruistic behaviour must always be balanced by the cost of retribution in case of non-altruistic behaviour.

These are three strong hypotheses… Let me explain what I mean by giving an example: food sharing in wolves. How could this real life behaviour have appeared?

Say you’re a lonely carnivore, ancestor of today’s wolves, but not living in groups. You hunt a prey and start eating, but then some creature comes and steals your food from you. Clearly, if your descendants evolve some genes that make them attack people who try or does steal their food, they will be better off than their naive conspecifics. It is even possible that the same genes that make you attack preys also make you attack other people, or other people’s preys… Maybe are you even one of the thieves that to steal other wolves’ preys in the first place? There is not much difference between a sick rabbit and a freshly killed rabbit, or between your dead rabbit and their dead rabbit… It is difficult to sort out the order in which these related behaviours (hunting, stealing, defending one’s food) appeared, and it is plausible that they all appeared conjointly.

Now say that for some reason, you find yourself stuck with several other pre-wolves on a small area. Maybe the population had a sudden increase in density. Maybe you’re all following the same herbivore migration. Anyway, now several of you have to eat their own prey at relatively short distance from each other. You’re not yet a society, but you do live together (think about today’s bears, who usually live alone or with their cubs but form big groups when it’s salmon season).

The first thing to happen might be that cubs stay closer to their mother, even as young adults, simply because there is not much space. Obviously mums share food with their cubs, but they also protect their cubs when they are eating. If, simply because they live close to each other, this behaviour persists once they are adult, the family will have an obvious evolutionary advantage by protecting each other’s food. They might even team up to steal solitary wolves’ food, or hunting bigger preys. On the other hand, those who don’t even bother to protect their own food don’t stand a chance in this new setting.

At this point, what prevents one member of the pack from cheating? You could eat more than your share, and stay away from battles to avoid danger. That would confer you a big advantage. This is what makes theories of altruism evolutionary unstable. Altruism should not be able to survive cheaters.

… Except if there is retribution. If you tend to take the biggest part of the prey and go away to eat it in peace, it might trigger the thief detector of your colleagues and they will attack you. If you don’t take part to the hunt, you may be considered as an outsider and attacked when you try to eat with the others. The apparition of such vindictive behaviour may not require much genetic change, but it has obvious advantages: it protects the group from cheaters, and it also represents a disadvantage for the cheater, who can be harmed, killed, or just starved as a result of its behaviour. In this group, cheating is the evolutionary unstable behaviour, while cooperation is stable.

But what about altruism? Imagine that instead of hunting all together, some wolves go hunting and then share with the whole pack (maybe because some members have to stay home to protect the cubs). In that case, they must obviously share with those who didn’t go hunting. Maintaining cheaters at bay means insuring that you don’t end up hunting alone while a whole group of lazy adult wolves wait for you to bring food, an easy way to game the system. Being vindictive or resentful is a defence mechanism that should bring the group to punish free riders before reaching that extreme situation.

Meanwhile, altruism should be partly motivated by the fear of social retribution, which is learned, and partly by genetic predispositions. I say that altruism should be learned, because cheating remains beneficial for a given individual, provided that the cheating is not big enough to be caught and punished and behaviours that are beneficial have not reason to disappear from the gene pool; but the punishment threshold depends on the current food resources and the character of other group members so it cannot be genetically encoded. Same goes for vindictive behaviour, which should be proportional to the offence to make evolutionary sense.

A consequence of this theory is that genes for the fear of social retribution should also be evolved, since it prevents the individual to get into too much trouble. At the same time, a race between better cheaters that don’t get caught and those who catch and punish them could also appear. Good cheaters will pass more genes on (and possibly also their tricks as knowledge), but they might also be better at catching members who use the same tricks as them, maintaining balance.

It is possible to game the system by not exhibiting vindictive behaviour. It is costly to monitor and punish cheaters, so you can try to count on others to do it for you and save your energy for more important things. Except of course if this kind of slacking is also punished (just think about all the people who get angry both at what they see as immoral behaviour and at those who refuse to be indignant at such behaviour). Who would have believed it! Vigilantism, self-righteousness, jealousy and charity, sharing, benevolence, all linked together… (I do not endorse vindictive behaviour, by the way.)

This walkthrough can, I think, be applied to most altruistic behaviours. Some howling monkeys give alarm calls when a predator approach the group, which make them more likely to be spotted and killed by said predator. This is a behaviour that is clearly very costly in terms of survival chances. The group can only resist to cheaters if there is a form of punishment that is even more costly (I don’t know if cheaters are punished in these groups of monkeys, but I expect so). The loss caused by altruistic behaviour must always be lower than the cost of retribution to maintain evolutionary stability.

Once it has appeared and found stability, altruistic behaviour can be enforced by other means than retribution, for example by ensuring that the individuals that have the possibility to cheat do not reproduce (like in social bees or mole rats). After all, it is also costly to the group to monitor and punish cheaters…

Open Ended Evolution – At last, some data

Hi tweeps (I know most of you arrive here via Twitter (except for that one US bot who watches the same 10 pages every night (hi you!)))

So I’m sitting in the Tokyo-Sidney plane, which has no power plugs and broken headphones, and I thought, what can I do while waiting to land on the continent with the weirdest fauna in the world?

The answer is, talk about my own computer generated weird species of course. This post is the follow up to this one and this one, and to a lesser extent, this and this. Actually the results have been sitting in my computer since last summer; I posted a bit on twitter too.  In short, OEE is about building a living world that is “forever interesting”. Since you’ve got all the theory summed up in the previous posts, let’s go directly to the implementation and results.

To be honest, I don’t remember all the details exactly. But here we go!

Here is what the 1st batch looked like, at the very beginning of the simulation:


So you have an artificial world with individuals that we hope will evolve into something interesting through interactions with each other. The yellow individuals are how we input free energy in the system. It means that they appear every few steps, “out of nothing”. They cannot move, or eat, or do anything interesting really. They just contain energy, and when the energy is used up, they die. I call it “light”.

Then you have the mutants, which appear on the field like light, but with a random twist. Maybe they can move, or hypothetically store more energy. Mutants can produce one or more kids if they reach a given energy threshold (they do not need a mate to reproduce.) The kid can in turn be copy of their parents, or mutants of mutants.

Then you have the interesting mutants. These have sensors: if something has properties that their sensors can detect, they eat it (or try to). Eating is the only way to store more energy, which can then be used to move, or have kids, or just not die, so it’s pretty important.

Now remember that this sim is about the Interface Theory of Perception. In this case it means that each sensor can only detect a precise value of a precise property. For example, maybe I have a sensor that can only detect individuals who move at exactly 1 pixel/second. Or another sensor that detects individuals that can store a maximum of 4 units of energy. Or have 2 kids. Or have kids when they reach 3 units of energy. Or give birth to kids with a storage of 1 unit of energy.

A second important point is, you can only eat people who have less energy than you do, otherwise you lose energy or even die. BUT, to make things interesting, there is no sensor allowing you to measure how much energy this other guy has right now.

It sounds a bit like the real world, no? You can say that buffaloes that move slowly are maybe not very energetic, so as a lion, you should try to eat them. But there is no guarantee that you will actually be able to overpower them.

Before we get there, there is a gigantic hurdle. Going from “light” to lions or buffaloes is not easy. You need sensors, but sensors require energy. And mutations appear randomly, so it takes a lot of generations to get the first viable mutant: something that can detect and eat light, but doesn’t eat their own kids by mistake. Here is what the origin of life looks like in that world:


The Y axis is the ID of the oldest ancestor, and X is the time. Everything on the diagonal is just regular light. All the horizontal branches are mutant lineages; as you can see, there are lots of false starts before we get an actual species to get off! Then at a longer timescale this happens:


This is the same sim as before, but zoomed out. Lots of interesting stuff here. First, or previously successful descendants of #14000 go extinct just as a new successful lineage comes in. This is not a coincidence. I think either both species were trying to survive on the same energy source (light) and one starved out the other, or one just ate the other to extinction.

Seeing as the slope of the “light” suddenly decreases as species 2 starts striving, I think the 1st hypothesis is the right one.

Now the fact that our successful individuals all have the same ancestor doesn’t actually mean that they belong to the same species. Actually, this is what the tree of life looks like in my simulated worlds:

tree1 tree1b four_species

These images represent the distribution of individuals’ properties through time. I encoded 3 properties (don’t remember which) in RGB, so differences in colors also approximately represent different species, or variations in species.  In images 1 and 2, you can see 2 or 3 different branches with different ancestors. On these I was only looking at the max amount of energy that each individual can store, and how much of this energy is passed from the parent to its kids. If I had looked instead at speed, or at the number of sensors, we may have seen the branches divide in even smaller branches.

In the 3rd image, we see much more interesting patterns; the lower branch divides clearly in 2 species that have the same original ancestor; then one branch dies out, and the other divides again, and again! To obtain these results, I just doubled the area of the simulation, and maybe cranked up the amount of free energy too (the original one was extremely extremely tiny compared to what is used in Artificial Life usually. Even when doubling the area, I don’t think I ever heard about such a small scale simulation.).

Still, the area was bigger but one important motor of speciation was missing. In real life, species tend to branch out because they become separated by physical obstacles like oceans, mountains, or just distance. To simulate that, I made “mobile areas” of light. Instead of having a fixed square, I had several small areas producing light, and these areas slowly move around. It’s like tiny islands that get separated and sometimes meet each other again, and it looks like this:


Now the species have to follow the light sources, but they can also meet each other and “infect” each other’s islands, competing for resources (or just to eat each other). The trees you get with this are like this:

snake_tree wild

Even more interesting!  So many branches! And just looking at the simulation is also loads of fun. Each one is a different story. Sometimes there is drama…


I was rooting for the “smart guys” (fast and with many sensors) above, but they eventually lost the war and went extinct.

What do we take out of that? First, some of the predictions I made in previous posts got realised. The Interface Theory of Perception does allow for a variety of different worlds with their own histories. Additionally, refusing to encode species in the simulation does lead to interesting interactions, and speciation becomes an emergent property of the world. Individuals do not have a property names “species”, or a “species ID” written somewhere. Despite that we don’t end up with a “blob of life” with individuals spread everywhere, we don’t have a “tree of life” clean and straight like in textbooks. It’s more of a beautiful mutant broccoli of life, with blobs and branches. And this sim doesn’t even have sexual reproduction in it! That would make the broccoli even cooler.

The next step in the sim was to implement energy arrays, as I mentioned in an earlier post. I already started and then I kinda forgot. Hopefully I’ll find time to do it!

Conclusion: Did I build an OEE world? Ok, probably not. But I like it and it lived to my expectations.

An opinion on defining life, and a theory of emergence, embodied cognition, OEE and solving causal relationships in AI

Wikimedia Commons

Here comes a really long blog post, so I have included a very brief summary at the end of each section. You can read that and go for the long version after if you think you disagree with the statements!


  • Defining Life: A Depressing Opinion

Deciding what is alive and what is not is an old human pastime. A topic of my lab’s research is Artificial Life, so we also tend to be interested in defining life, especially at the level of categories. What is the difference between this category of alive things and this category of not-alive things? There are lots of definitions trying to pin down the difference, none of them satisfactory enough to convince everyone so far.

One thing we talked about recently is the theory of top down causation. When I heard it, it sounded like this: big alive things can influence small not-alive things (like you moving the molecules of a table by pushing said table away) and this is top down causation, as opposed to bottom up causation where small things influence big things (like the molecules of the table preventing your fingers to go through it via small scale interactions).
I’m not going to lie, it sounded stupid. When you move a table, it’s the atoms of your hands against the atoms of the table. When you decide to push the table, it’s still atoms in your brain doing their thing. Everything is just small particles meeting other small particles.

Or is it? No suspense, I still do not buy top-down causation as a definition of life. But I do think it makes sense and can be useful for Artificial Intelligence, in a framework that I am going to explain here. It will take us from my personal theory of emergence, to social rules, to manifold learning and neural networks. On the way, I will bend concepts to what fits me most, but also give you links to the original definitions.

But first, what is life?
Well here is my depressing opinion: trying to define life using science is a waste of time, because life is a subjective concept rather than a scientific one. We call “alive” things that we do not understand, and when we gain enough insight about how it works, we stop believing it’s alive. We used to think volcanoes were alive because we could not explain their behaviour. We personified as gods things we could not comprehend, like seasons and stars. Then science came along and most of these gods are dead. Nowadays science doesn’t believe that there is a clear-cut frontier between alive and not alive. Are viruses alive? Are computer viruses alive? I think that the day we produce a convincing artificial life, the last gods will die. Personally, I don’t bother to much about classifying things as alive or not; I’m more interested in questions like “can it learn? How?”, “Does it do interesting stuff?” and “What treatment is an ethical one for this particular creature?”. I’m very interested in virtual organisms — not so much in viruses. Now to the interesting stuff.

Summary: Paradoxically, Artificial Life will destroy the need to define what is “alive” and what is not.


  • A Theory of Emergence

Top-down causation is about scale, as its name indicates. It talks about low level (hardware in your computer, neurons in your brain, atoms in a table) and high level (computer software, ideas in your brain, objects) concepts. Here I would like to make it about dimensions, which is quite different.
Let’s call “low level” spaces with a comparatively larger number of dimensions (e.g. a 3D space like your room) and “high level” spaces with fewer dimensions (like a 2D drawing of your room). Let’s take low level spaces as projections of the high level spaces. By drawing it on a piece of paper, you’re projecting your room on a 2D space. You can draw it from different angles, which means that several projections are possible.
But mathematical spaces can be much more abstract than that. A flock of bird can be seen as a high dimensional space where the position of each bird is represented by 3 values on 3 bird-dependent axes: bird A at 3D position a1 a2 a3, bird B at position 3D b1 b2 b3 etc. That’s 3 axes per bird, already 30 dimensions even if you have only 10 birds! But the current state of the flock can be represented by a single point in that 30D space.
You can project that space onto a smaller one, for example a 2D space where the flock’s state is represented by one point of which position is just the mean of all bird’s vertical and horizontal position. You can see that the trajectory of the flock will look very different whether you are in the low or high dimensional space.

What does this have to do with emergence?
Wikipedia will tell you a lot about the different definitions and opinion about emergence. One thing most people agree about is that an emergent phenomena must be surprising, and arise from interactions between parts of a system. For example, ants finding the shortest path to food is an emergent phenomena. Each ant follows local interactions rules (explore, deposit pheromones, follow pheromones). That they can find the shortest path to a source of food, and not a random long winding way to it, is surprising. You couldn’t tell that it was going to happen just by looking at the rules. And if I told you to build an algorithm inside the ants heads to make them find the shortest path, that’s probably not the set of rules you would have gone for.

I think that all emergent phenomena are things that happen because of interactions in a high dimensional space, but can be described and predicted in a projection of that space. When there is no emergence, no projection is ever going to give a good description and predictability to your phenomenon. Food, pheromones, but also each ant is a part of the high dimensional original system. Each has states that can be represented on axes: ants move, pheromones decay. Ants interact with each other, which mean that their states are not independent from each other. The space can be projected onto a smaller one, where a description with strong predictive power can be found: ants find the shortest path to food. All ants and even the food have been smashed into a single axis. Their path is optimal or not. They might start far from 0, the optimal path, but you can predict that they will end up at 0. If you have two food sources, you can have two axes; the ants closeness tho the optimal solution depends to their position on these axes. Another example is insect architecture: termite mounds, bee hives. The original system includes interactions between particles of building material and individual termites, but can be described in much simpler terms: bees build hexagonal cells. Or take the flock of birds. Let’s say that the flock forms a big swarming ball that follows the flow of hot winds. The 2D projection is a more efficient way to predict the flock’s trajectory than the 30D space, or than following a single bird’s motion (combination of its trajectory inside the swarm-ball and on the hot winds). Of course, depending on what you want to describe, the projection will have to be different.

Here we meet one important rule: if there is emergence, the projection must not be trivial. The axes must not be a simple subset of the original space, but a combination (linear or not) of the axes of the high dimensional space.
This is where the element of “surprise” comes in. This is a rather nice improvement on all the definitions of emergence I’ve found: all talk about “surprise” but most do not define objectively what is considered as surprising. The rule above is a more practical definition than “an emergent property is not a property of any part of the system, but still a feature of the system as a whole” (wikipedia).

Follows a second implicit rule: trajectories (emergent properties) built in low dimensional spaces cannot be ported to high dimensional spaces without modification.
You could try to build “stay in swarm but also follow hot winds” into the head of each bird. You could try to build “find shortest path” into the head of each ant. It makes sense: that is the simplest description of what you observe. The problems starts when you try to build that with what you have in the real, high dimensional world. Each ant has few sensors. They cannot see far away. Implementing a high level algorithm rather than local interactions may sometimes work, but is not the easiest, more robust or more energy efficient solution. If you are building rules that depend explicitly on building high level concepts from low level high dimensional input, you are probably not on the right track. You don’t actually need to implement a concept of “shortest path” or “swarm” to achieve these tasks; research shows that you obtain much better results by giving these up. This is a well known problem in AI: complex high level algorithms do very poorly in the real world. They are slow, noise sensitive and costly.

However, I do not agree that emergent phenomena “must be impossible to infer from the individual parts and interaction”, as anti-reductionists say. Those that fit in the framework I have described so far can theoretically be inferred, if you know how the high dimensional space was projected on the low dimensional one. Therefore you can generate emergent phenomena by drawing a trajectory in the low dimensional space, and trying to “un-project” it to the high dimensional space. By definition you will have several possible choices, but it should not be a too big problem. I intuitively think this generative approach works and I tried it on a few abstract examples; but I need a solid mathematical example to prove it. Nevertheless, if emergent phenomena don’t always have to be a surprise and can be engineered using tools other than insight, it’s excellent news!

Summary: an emergent phenomenon is a simplification (projection to low dimensional space) of underlying complex (high dimensional) interactions that allows to predict something about the system faster than by going through all the underlying interactions, whatever the input.

Random thought that doesn’t fit in this post: If there were emergent properties of the system “every possible algorithm”, the halting problem would be solvable. There is not, so we actually have to run all algorithms with all possible inputs (see also Rice’s theorem).


  • Top-Down Causation and Embodied Cognition

So far we’ve discussed about interactions happening between elements of the low level high dimensional system, or between the elements of the high level low dimensional system. It is obvious that what happens down there influences what happens up here. Can what happens upstairs influence what goes on downstairs?
Despite my skepticism at what you can read about top-down causation here  and there, I think the answer is yes, in a very pragmatic way: if elements of the high dimensional space take inputs directly from the low dimensional space, top down causation happens. Until now the low-dim spaces were rather abstract, but they can exist in a very concrete way.

Take sensors, for example. Sensors reduce the complexity of the real world in two ways:
– by being imprecise (eyes don’t see individual atoms but their macro properties)
– by mixing different inputs into the same kind of output (although the mechanisms are different, you feel “cold” both when rubbing mint oil or an ice cube on your skin)
This reduction of dimensions is performed before giving your brain the resulting output. Your sensors give you input from a world that is a already a non-trivial projection of a richer world. Although your actions impact the entire, rich, high dimensional world, you only perceive the consequences of it through the projection of that world. Your entire behaviour is based on sensory inputs, so yes, there is some top-down causation going on. You should not take it as a definition for living organisms though: machines have sensors too. “sensor” is actually a pretty subjective word anyways.
You might not be convinced that this is top-down causation. Maybe it sounds to pragmatic and real to be top-down causation and not just “down-down” causation.

So what about this example: social rules. Take a group of humans. Someone decides that when meeting a person you know, bowing is the polite thing to do. Soon everyone is doing it: it has become a social rule. Even when the first person to launch this rule has disappeared from that society, the rule might continue to be applied. It exists in a world that is different from the high dimensional world in formed by all the people in that society, a world that is created by them but has some independence from each of them. I can take a person out and replace them by somebody from a different culture — soon, they too will be bowing. But if I take everyone out and replace them by different people, there is no chance that suddenly everyone will start bowing. The rule exists in a low dimension world that is inside the head of each member of the society. In that projection, each person that you have seen bowing in your life is mashed up in an abstract concept of “person” and used in a rule saying “persons bow to each other”. This low dimensional rule directs your behaviour in a high dimensional world where each person exists as a unique element. It’s bottom up causation (you see lots of people bowing and deduce a rule) that took its independence (it exists even if the original person who decided the rule dies, and whether or not some rude person decides not to bow) and now acts a top down causation (you obey the rule and bow to other people). When you bow, you do not do it because you remember observing Mike bow to John and Ann bow to Peter and Jake bow to Mary. You do it because you remember that “people bow to other people”. It is definitely a low dimensional, general rule dictating your interactions as a unique individual.

We have seen two types of top-down causation. There is a third one, quite close to number one. It’s called embodied cognition.
Embodied cognition is the idea that some of the processing necessary to achieve a task are delegated to the body of an agent instead of putting all the processing load on its brain. It is the idea that the through interactions with the environment, the body influences cognition, most often by simplifying it.
My favourite example is the swiss robot. I can’t find a link to the seminal experiment, but it’s a small wheeled robot that collects objects in clusters. This robot obeys fixed rules, but the results of its interaction with the environment depends on the placement of its “eyes” on the body of the robot. With eyes on the side of its body, the robots “cleans up” its environment. For other placements, this emergent behaviour does not appear and the robot randomly moves objects around.
Although the high level description of the system does not change (a robot in an arena with obstacles), changes in the interactions of the parts of the system (position of the eyes on the body) change the explanatory power of the projection. In one case, the robot cleans up. In the others, the behaviour defies prediction in a low dimensional space (no emergence). Here top-down causation works from the body of the robot to its interactions in a higher dimensional world. The low dimension description is “a robot and some obstacles”. The robot is made of physically constrained parts: its body. This body is what defines whether the robot can clean up or not — not the nature of each part, but how they are assembled. For the same local rules, interactions between the robot and the obstacles depend on the high level concept of “body”, not only on each separate part. The swiss robot is embodied cognition by engineered emergence.

In all the systems with top-down causation I have described so far, only one class falls into the anti-reductionist framework. Those where the state of the high level space are historically dependent on the low level space. These are systems where states in the low dimensional world depends not only on the current state of the high dimensional one, but also on its past states. If on top on that the high dimensional world takes input from the low dimensional one (for example because directly taking high dimensional inputs is too costly), then the system’s behaviour cannot be described only by looking at the interactions in the high dimensional world.
Simple example: some social rules depend not only on what people are doing now, but on what they were doing in the past. You wouldn’t be able to explain why people still obey archaic social rules just by looking at their present state, and these rules did not survive by recording all instances of people obeying them (high dimensional input in the past), but by being compressed into lower dimensional spaces and passed on from person to person in this more digestible form.
This top-down causation with time delay cannot be understood without acknowledging the high level, low dimensional world. It is real, even if it only exists in people’s head. That low dimensional world is where the influence of the past high dimensional world persists even if it has stopped in the present high dimensional world. Maybe people’s behaviour cannot be reduced to only physical laws after all… But there is still no magic in that, and we are not getting “something from nothing” (a pitfall with top-down causation).

A counter argument to this could be that everything is still governed by physical laws, both people and people’s brain, and lateral causation at the lowest level between elementary particles can totally be enough to explain the persistence of archaic social rules and therefore top-down causation does not need to exist.
I agree. But as soon as you are not looking at the lowest level possible, highest dimensional world (which currently cannot even be defined), top-down causation does happen. Since I am not trying to define life, this is fine with me!

Summary: Top-down causation exists when the “down” takes low dimensional input from the “top”. The key here is the difference in dimensions of the two spaces, not a perceived difference of “scale” as in the original concept of top-down causation. Maybe I should call it low-high causation?


  • Open Ended Systems

In this section I go back to my pet subject, Open Ended Evolution and the Interface Theory of Perception. You probably saw it coming when I talked of imprecise sensors. I define the relationship between OEE and top-down causation as: An open ended system is one where components take inputs from changing projected spaces. It’s top-down causation in a billion flavors.
These changes in projections are driven by interactions between the evolution of sensors and the evolution of high dimensional outputs from individuals.

Two types of projections can make these worlds interesting:
1. sensory projection (see previous section)
2. internal projections (in other words, brains).

The theoretical diversity of projections n.1 depends of the richness of the real world. How many types of energy exist, can be sensed, mixed and used as heuristics for causation?
N.2 depends on n.1 of course (with many types of sensors you can try many types of projections), but also on memory span and capacity (you can use past sensor values as components of your projections). Here, neurons are essentially the same as sensors: they build projections, as we will see in the next section. The main difference is that neurons can be plastic: the projections they build can change during your lifetime to improve your survival (typically, changes in sensors decrease your chances of survival…).
As usual, I think that the secret ingredient to successful OEE is not an infinitely rich physical world, even if it helps… Rather, the richness of choice of projected spaces (interfaces) is important.


  • Neural Networks

I will not go into great details in this section because it is kind of technical and it would take forever to explain everything. Let’s just set the atmosphere.
I was quite shocked the other day to discover that layered neural networks are actually the same thing as space projection. It’s so obvious that I’m not sure why I wasn’t aware of it. You can represent a neural network as a matrix of weights, and if the model is a linear one, calculate the output to any input by multiplying the input by the weight matrix (matrix multiplication is equivalent to space projection).
The weight matrix is therefore quite important: it determines what kind of projection you will be doing. But research has shown that when you are trying to apply learning algorithms to high dimensional inputs, even a random weight matrix improves the learning results, as long as it reduces the dimension of the input (you then apply learning to the low dimensional input).
Of course, you get even better results by optimizing the weight matrix. But then you have to learn the weight matrix first, and only then apply learning to the low dimensional inputs. That is why manifold learning has been invented, it seems. It finds you a good projection instead of using random stuff. Then you can try to use that projection to perform tasks like clustering and classification.

What would be interesting is to apply that to behavioural tasks (not just disembodied tasks) and find an equivalent for spiking networks. One possible way towards that is prediction tasks.

Say you start with a random weight matrix. You goal is to learn to predict what projected input will come after the current one. For that, you can change your projection: two inputs that you thought were equivalent because they got projected at the same point, but ended up having different trajectories, were probably not equivalent to begin with. So you change your projection as to have these two inputs projected at different places.

From this example we can see several things:
– Horror! layers will become important. Some inputs might require a type of projection, and some others a different type of projection. This will be easier implemented if we have layers (see here).
– A map of the learned predictions ( = input trajectories) will be necessary at each layer to manage weight updates. This map can take the form of a Self Organising Map, or more complex, a (possibly probabilistic) vector field where each vector points to the next predicted position. There are as many axes as neurons in the upper layer, and axes have as many values as neurons have possible values. (This vector field can actually be itself represented by a layer of neurons, with each lateral connection representing a probabilistic prediction). Errors in the prediction drive changes in the input weights to the corresponding layer (would this be is closer to Sanger’s rule than to BackProp?). Hypothesis: the time dependance of this makes it possible to implement using Hebbian learning.
– Memory neurons with defined memory span can improve prediction for lots of tasks, by adding a dimension to the input space. It can be implemented simply with LSTM in non-spiking models of NN, or with axonal delays for spiking models.
– Layers can be dynamically managed by adding connections when necessary, or more reasonably deleting redundant ones (neurons that have the exact same weights as a colleague can replace said weights by a connection to that colleague)
– Dynamic layer management will make a feedforward network into a recurrent one, and the topology of the network will transcend layers (some neurons will develop connections bypassing the layer above to go directly to the next). The only remain of the initial concept of layers will be the prediction vector map.
– Memory neurons in upper layers with connections towards lower layers will be an efficient tool to reduce the total number of connections and computational cost (See previous sections. Yes, top-down causation just appeared all of a sudden).
– Dynamic layer management will optimise resource consumption but make learning new things more difficult with time.
– To make the difference between a predicted input and that input actually happening, a priming mechanism will be necessary. I believe only spiking neurons can do that, by raising the baseline level of the membrane voltage of the neurons coding for the predicted input.
– Behaviour can be generated (1), as the vector field map tells us where data is missing or ambiguous and need to be collected.
– Behaviour can be generated (2), because if we can predict some things we can have anticipatory behaviour (run to avoid incoming electric shock)

Clustering and classification are useless in the real world if they are not used to guide behaviour. Actually, the concept of class is subjective to the behaviour it supports; here we take “prediction” as a behaviour, but the properties you are trying to describe or predict depends on which behaviour you are trying to optimise. The accuracy or form of the prediction depends on your own experience history, and the input you get to build predictions from depends on your sensors… Here comes embodied cognition again.

Summary: predictability and understanding are the same thing. Predictability gives meaning to things and thus allows to form classes. The difference between deep learning and fully recurrent networks is top-down causation.


  • Going further

It might be tempting to generalise top-down causation. Maybe projecting to lower dimensions is not that important? Maybe projecting to different spaces with the same number of dimensions, or projecting to higher dimensional spaces, enhances predictability. After all, top-down projection in layered networks is equivalent to projecting low dimensional input to higher dimensional space (see also auto-encoders and predictive recurrent neural networks). But if our goal is to make predictions in the less costly way possible (possibly at the cost of accuracy), then of course projection to lower dimensional spaces is a necessity. When predictions must be converted back to behaviour, projection to high dimensional spaces becomes necessary; but in terms of memory storage and learning optimisation, the lowest dimensional layer is the one that allows reduction of storage space and computational cost (an interesting question here is, what happens to memory storage when the concept of layer is destroyed?).

One possible exception would be time (memory). If knowing the previous values of an input improves predictability and you add neuron(s) to keep past input(s) as memory, you are effectively increasing the number of dimensions of your original space. But you use this memory for prediction (i.e. low dimensional projection) in the end. So yes, the key concept is dimension reduction.

A nice feature of this theory is that models would be able to capture causal relationships (something that deep learning cannot do, I heard). This whole post was about a concept called “top-down causation” after all. If an input improves predictability of trajectories in the upper layer, certainly this suggests a causal relationship. So what the model is really doing all the way is learning causal relationships.

Summary: dimension reduction is the key to learning causal relationships.
Wow, that was a long post! If you have come this far, thank you for your attention and don’t hesitate to leave a thought in the comments.