Open Ended Evolution – At last, some data

Hi tweeps (I know most of you arrive here via Twitter (except for that one US bot who watches the same 10 pages every night (hi you!)))

So I’m sitting in the Tokyo-Sidney plane, which has no power plugs and broken headphones, and I thought, what can I do while waiting to land on the continent with the weirdest fauna in the world?

The answer is, talk about my own computer generated weird species of course. This post is the follow up to this one and this one, and to a lesser extent, this and this. Actually the results have been sitting in my computer since last summer; I posted a bit on twitter too.  In short, OEE is about building a living world that is “forever interesting”. Since you’ve got all the theory summed up in the previous posts, let’s go directly to the implementation and results.

To be honest, I don’t remember all the details exactly. But here we go!

Here is what the 1st batch looked like, at the very beginning of the simulation:

initial

So you have an artificial world with individuals that we hope will evolve into something interesting through interactions with each other. The yellow individuals are how we input free energy in the system. It means that they appear every few steps, “out of nothing”. They cannot move, or eat, or do anything interesting really. They just contain energy, and when the energy is used up, they die. I call it “light”.

Then you have the mutants, which appear on the field like light, but with a random twist. Maybe they can move, or hypothetically store more energy. Mutants can produce one or more kids if they reach a given energy threshold (they do not need a mate to reproduce.) The kid can in turn be copy of their parents, or mutants of mutants.

Then you have the interesting mutants. These have sensors: if something has properties that their sensors can detect, they eat it (or try to). Eating is the only way to store more energy, which can then be used to move, or have kids, or just not die, so it’s pretty important.

Now remember that this sim is about the Interface Theory of Perception. In this case it means that each sensor can only detect a precise value of a precise property. For example, maybe I have a sensor that can only detect individuals who move at exactly 1 pixel/second. Or another sensor that detects individuals that can store a maximum of 4 units of energy. Or have 2 kids. Or have kids when they reach 3 units of energy. Or give birth to kids with a storage of 1 unit of energy.

A second important point is, you can only eat people who have less energy than you do, otherwise you lose energy or even die. BUT, to make things interesting, there is no sensor allowing you to measure how much energy this other guy has right now.

It sounds a bit like the real world, no? You can say that buffaloes that move slowly are maybe not very energetic, so as a lion, you should try to eat them. But there is no guarantee that you will actually be able to overpower them.

Before we get there, there is a gigantic hurdle. Going from “light” to lions or buffaloes is not easy. You need sensors, but sensors require energy. And mutations appear randomly, so it takes a lot of generations to get the first viable mutant: something that can detect and eat light, but doesn’t eat their own kids by mistake. Here is what the origin of life looks like in that world:

ancestor2Leg

The Y axis is the ID of the oldest ancestor, and X is the time. Everything on the diagonal is just regular light. All the horizontal branches are mutant lineages; as you can see, there are lots of false starts before we get an actual species to get off! Then at a longer timescale this happens:

bifurcation

This is the same sim as before, but zoomed out. Lots of interesting stuff here. First, or previously successful descendants of #14000 go extinct just as a new successful lineage comes in. This is not a coincidence. I think either both species were trying to survive on the same energy source (light) and one starved out the other, or one just ate the other to extinction.

Seeing as the slope of the “light” suddenly decreases as species 2 starts striving, I think the 1st hypothesis is the right one.

Now the fact that our successful individuals all have the same ancestor doesn’t actually mean that they belong to the same species. Actually, this is what the tree of life looks like in my simulated worlds:

tree1 tree1b four_species

These images represent the distribution of individuals’ properties through time. I encoded 3 properties (don’t remember which) in RGB, so differences in colors also approximately represent different species, or variations in species.  In images 1 and 2, you can see 2 or 3 different branches with different ancestors. On these I was only looking at the max amount of energy that each individual can store, and how much of this energy is passed from the parent to its kids. If I had looked instead at speed, or at the number of sensors, we may have seen the branches divide in even smaller branches.

In the 3rd image, we see much more interesting patterns; the lower branch divides clearly in 2 species that have the same original ancestor; then one branch dies out, and the other divides again, and again! To obtain these results, I just doubled the area of the simulation, and maybe cranked up the amount of free energy too (the original one was extremely extremely tiny compared to what is used in Artificial Life usually. Even when doubling the area, I don’t think I ever heard about such a small scale simulation.).

Still, the area was bigger but one important motor of speciation was missing. In real life, species tend to branch out because they become separated by physical obstacles like oceans, mountains, or just distance. To simulate that, I made “mobile areas” of light. Instead of having a fixed square, I had several small areas producing light, and these areas slowly move around. It’s like tiny islands that get separated and sometimes meet each other again, and it looks like this:

niches

Now the species have to follow the light sources, but they can also meet each other and “infect” each other’s islands, competing for resources (or just to eat each other). The trees you get with this are like this:

snake_tree wild

Even more interesting!  So many branches! And just looking at the simulation is also loads of fun. Each one is a different story. Sometimes there is drama…

wars

I was rooting for the “smart guys” (fast and with many sensors) above, but they eventually lost the war and went extinct.

What do we take out of that? First, some of the predictions I made in previous posts got realised. The Interface Theory of Perception does allow for a variety of different worlds with their own histories. Additionally, refusing to encode species in the simulation does lead to interesting interactions, and speciation becomes an emergent property of the world. Individuals do not have a property names “species”, or a “species ID” written somewhere. Despite that we don’t end up with a “blob of life” with individuals spread everywhere, we don’t have a “tree of life” clean and straight like in textbooks. It’s more of a beautiful mutant broccoli of life, with blobs and branches. And this sim doesn’t even have sexual reproduction in it! That would make the broccoli even cooler.

The next step in the sim was to implement energy arrays, as I mentioned in an earlier post. I already started and then I kinda forgot. Hopefully I’ll find time to do it!

Conclusion: Did I build an OEE world? Ok, probably not. But I like it and it lived to my expectations.

An opinion on defining life, and a theory of emergence, embodied cognition, OEE and solving causal relationships in AI

Wikimedia Commons

Here comes a really long blog post, so I have included a very brief summary at the end of each section. You can read that and go for the long version after if you think you disagree with the statements!

 

  • Defining Life: A Depressing Opinion

Deciding what is alive and what is not is an old human pastime. A topic of my lab’s research is Artificial Life, so we also tend to be interested in defining life, especially at the level of categories. What is the difference between this category of alive things and this category of not-alive things? There are lots of definitions trying to pin down the difference, none of them satisfactory enough to convince everyone so far.

One thing we talked about recently is the theory of top down causation. When I heard it, it sounded like this: big alive things can influence small not-alive things (like you moving the molecules of a table by pushing said table away) and this is top down causation, as opposed to bottom up causation where small things influence big things (like the molecules of the table preventing your fingers to go through it via small scale interactions).
I’m not going to lie, it sounded stupid. When you move a table, it’s the atoms of your hands against the atoms of the table. When you decide to push the table, it’s still atoms in your brain doing their thing. Everything is just small particles meeting other small particles.

Or is it? No suspense, I still do not buy top-down causation as a definition of life. But I do think it makes sense and can be useful for Artificial Intelligence, in a framework that I am going to explain here. It will take us from my personal theory of emergence, to social rules, to manifold learning and neural networks. On the way, I will bend concepts to what fits me most, but also give you links to the original definitions.

But first, what is life?
Well here is my depressing opinion: trying to define life using science is a waste of time, because life is a subjective concept rather than a scientific one. We call “alive” things that we do not understand, and when we gain enough insight about how it works, we stop believing it’s alive. We used to think volcanoes were alive because we could not explain their behaviour. We personified as gods things we could not comprehend, like seasons and stars. Then science came along and most of these gods are dead. Nowadays science doesn’t believe that there is a clear-cut frontier between alive and not alive. Are viruses alive? Are computer viruses alive? I think that the day we produce a convincing artificial life, the last gods will die. Personally, I don’t bother to much about classifying things as alive or not; I’m more interested in questions like “can it learn? How?”, “Does it do interesting stuff?” and “What treatment is an ethical one for this particular creature?”. I’m very interested in virtual organisms — not so much in viruses. Now to the interesting stuff.

Summary: Paradoxically, Artificial Life will destroy the need to define what is “alive” and what is not.

 

  • A Theory of Emergence

Top-down causation is about scale, as its name indicates. It talks about low level (hardware in your computer, neurons in your brain, atoms in a table) and high level (computer software, ideas in your brain, objects) concepts. Here I would like to make it about dimensions, which is quite different.
Let’s call “low level” spaces with a comparatively larger number of dimensions (e.g. a 3D space like your room) and “high level” spaces with fewer dimensions (like a 2D drawing of your room). Let’s take low level spaces as projections of the high level spaces. By drawing it on a piece of paper, you’re projecting your room on a 2D space. You can draw it from different angles, which means that several projections are possible.
But mathematical spaces can be much more abstract than that. A flock of bird can be seen as a high dimensional space where the position of each bird is represented by 3 values on 3 bird-dependent axes: bird A at 3D position a1 a2 a3, bird B at position 3D b1 b2 b3 etc. That’s 3 axes per bird, already 30 dimensions even if you have only 10 birds! But the current state of the flock can be represented by a single point in that 30D space.
You can project that space onto a smaller one, for example a 2D space where the flock’s state is represented by one point of which position is just the mean of all bird’s vertical and horizontal position. You can see that the trajectory of the flock will look very different whether you are in the low or high dimensional space.

What does this have to do with emergence?
Wikipedia will tell you a lot about the different definitions and opinion about emergence. One thing most people agree about is that an emergent phenomena must be surprising, and arise from interactions between parts of a system. For example, ants finding the shortest path to food is an emergent phenomena. Each ant follows local interactions rules (explore, deposit pheromones, follow pheromones). That they can find the shortest path to a source of food, and not a random long winding way to it, is surprising. You couldn’t tell that it was going to happen just by looking at the rules. And if I told you to build an algorithm inside the ants heads to make them find the shortest path, that’s probably not the set of rules you would have gone for.

I think that all emergent phenomena are things that happen because of interactions in a high dimensional space, but can be described and predicted in a projection of that space. When there is no emergence, no projection is ever going to give a good description and predictability to your phenomenon. Food, pheromones, but also each ant is a part of the high dimensional original system. Each has states that can be represented on axes: ants move, pheromones decay. Ants interact with each other, which mean that their states are not independent from each other. The space can be projected onto a smaller one, where a description with strong predictive power can be found: ants find the shortest path to food. All ants and even the food have been smashed into a single axis. Their path is optimal or not. They might start far from 0, the optimal path, but you can predict that they will end up at 0. If you have two food sources, you can have two axes; the ants closeness tho the optimal solution depends to their position on these axes. Another example is insect architecture: termite mounds, bee hives. The original system includes interactions between particles of building material and individual termites, but can be described in much simpler terms: bees build hexagonal cells. Or take the flock of birds. Let’s say that the flock forms a big swarming ball that follows the flow of hot winds. The 2D projection is a more efficient way to predict the flock’s trajectory than the 30D space, or than following a single bird’s motion (combination of its trajectory inside the swarm-ball and on the hot winds). Of course, depending on what you want to describe, the projection will have to be different.

Here we meet one important rule: if there is emergence, the projection must not be trivial. The axes must not be a simple subset of the original space, but a combination (linear or not) of the axes of the high dimensional space.
This is where the element of “surprise” comes in. This is a rather nice improvement on all the definitions of emergence I’ve found: all talk about “surprise” but most do not define objectively what is considered as surprising. The rule above is a more practical definition than “an emergent property is not a property of any part of the system, but still a feature of the system as a whole” (wikipedia).

Follows a second implicit rule: trajectories (emergent properties) built in low dimensional spaces cannot be ported to high dimensional spaces without modification.
You could try to build “stay in swarm but also follow hot winds” into the head of each bird. You could try to build “find shortest path” into the head of each ant. It makes sense: that is the simplest description of what you observe. The problems starts when you try to build that with what you have in the real, high dimensional world. Each ant has few sensors. They cannot see far away. Implementing a high level algorithm rather than local interactions may sometimes work, but is not the easiest, more robust or more energy efficient solution. If you are building rules that depend explicitly on building high level concepts from low level high dimensional input, you are probably not on the right track. You don’t actually need to implement a concept of “shortest path” or “swarm” to achieve these tasks; research shows that you obtain much better results by giving these up. This is a well known problem in AI: complex high level algorithms do very poorly in the real world. They are slow, noise sensitive and costly.

However, I do not agree that emergent phenomena “must be impossible to infer from the individual parts and interaction”, as anti-reductionists say. Those that fit in the framework I have described so far can theoretically be inferred, if you know how the high dimensional space was projected on the low dimensional one. Therefore you can generate emergent phenomena by drawing a trajectory in the low dimensional space, and trying to “un-project” it to the high dimensional space. By definition you will have several possible choices, but it should not be a too big problem. I intuitively think this generative approach works and I tried it on a few abstract examples; but I need a solid mathematical example to prove it. Nevertheless, if emergent phenomena don’t always have to be a surprise and can be engineered using tools other than insight, it’s excellent news!

Summary: an emergent phenomenon is a simplification (projection to low dimensional space) of underlying complex (high dimensional) interactions that allows to predict something about the system faster than by going through all the underlying interactions, whatever the input.

Random thought that doesn’t fit in this post: If there were emergent properties of the system “every possible algorithm”, the halting problem would be solvable. There is not, so we actually have to run all algorithms with all possible inputs (see also Rice’s theorem).

 

  • Top-Down Causation and Embodied Cognition

So far we’ve discussed about interactions happening between elements of the low level high dimensional system, or between the elements of the high level low dimensional system. It is obvious that what happens down there influences what happens up here. Can what happens upstairs influence what goes on downstairs?
Despite my skepticism at what you can read about top-down causation here  and there, I think the answer is yes, in a very pragmatic way: if elements of the high dimensional space take inputs directly from the low dimensional space, top down causation happens. Until now the low-dim spaces were rather abstract, but they can exist in a very concrete way.

Take sensors, for example. Sensors reduce the complexity of the real world in two ways:
– by being imprecise (eyes don’t see individual atoms but their macro properties)
– by mixing different inputs into the same kind of output (although the mechanisms are different, you feel “cold” both when rubbing mint oil or an ice cube on your skin)
This reduction of dimensions is performed before giving your brain the resulting output. Your sensors give you input from a world that is a already a non-trivial projection of a richer world. Although your actions impact the entire, rich, high dimensional world, you only perceive the consequences of it through the projection of that world. Your entire behaviour is based on sensory inputs, so yes, there is some top-down causation going on. You should not take it as a definition for living organisms though: machines have sensors too. “sensor” is actually a pretty subjective word anyways.
You might not be convinced that this is top-down causation. Maybe it sounds to pragmatic and real to be top-down causation and not just “down-down” causation.

So what about this example: social rules. Take a group of humans. Someone decides that when meeting a person you know, bowing is the polite thing to do. Soon everyone is doing it: it has become a social rule. Even when the first person to launch this rule has disappeared from that society, the rule might continue to be applied. It exists in a world that is different from the high dimensional world in formed by all the people in that society, a world that is created by them but has some independence from each of them. I can take a person out and replace them by somebody from a different culture — soon, they too will be bowing. But if I take everyone out and replace them by different people, there is no chance that suddenly everyone will start bowing. The rule exists in a low dimension world that is inside the head of each member of the society. In that projection, each person that you have seen bowing in your life is mashed up in an abstract concept of “person” and used in a rule saying “persons bow to each other”. This low dimensional rule directs your behaviour in a high dimensional world where each person exists as a unique element. It’s bottom up causation (you see lots of people bowing and deduce a rule) that took its independence (it exists even if the original person who decided the rule dies, and whether or not some rude person decides not to bow) and now acts a top down causation (you obey the rule and bow to other people). When you bow, you do not do it because you remember observing Mike bow to John and Ann bow to Peter and Jake bow to Mary. You do it because you remember that “people bow to other people”. It is definitely a low dimensional, general rule dictating your interactions as a unique individual.

We have seen two types of top-down causation. There is a third one, quite close to number one. It’s called embodied cognition.
Embodied cognition is the idea that some of the processing necessary to achieve a task are delegated to the body of an agent instead of putting all the processing load on its brain. It is the idea that the through interactions with the environment, the body influences cognition, most often by simplifying it.
My favourite example is the swiss robot. I can’t find a link to the seminal experiment, but it’s a small wheeled robot that collects objects in clusters. This robot obeys fixed rules, but the results of its interaction with the environment depends on the placement of its “eyes” on the body of the robot. With eyes on the side of its body, the robots “cleans up” its environment. For other placements, this emergent behaviour does not appear and the robot randomly moves objects around.
Although the high level description of the system does not change (a robot in an arena with obstacles), changes in the interactions of the parts of the system (position of the eyes on the body) change the explanatory power of the projection. In one case, the robot cleans up. In the others, the behaviour defies prediction in a low dimensional space (no emergence). Here top-down causation works from the body of the robot to its interactions in a higher dimensional world. The low dimension description is “a robot and some obstacles”. The robot is made of physically constrained parts: its body. This body is what defines whether the robot can clean up or not — not the nature of each part, but how they are assembled. For the same local rules, interactions between the robot and the obstacles depend on the high level concept of “body”, not only on each separate part. The swiss robot is embodied cognition by engineered emergence.

In all the systems with top-down causation I have described so far, only one class falls into the anti-reductionist framework. Those where the state of the high level space are historically dependent on the low level space. These are systems where states in the low dimensional world depends not only on the current state of the high dimensional one, but also on its past states. If on top on that the high dimensional world takes input from the low dimensional one (for example because directly taking high dimensional inputs is too costly), then the system’s behaviour cannot be described only by looking at the interactions in the high dimensional world.
Simple example: some social rules depend not only on what people are doing now, but on what they were doing in the past. You wouldn’t be able to explain why people still obey archaic social rules just by looking at their present state, and these rules did not survive by recording all instances of people obeying them (high dimensional input in the past), but by being compressed into lower dimensional spaces and passed on from person to person in this more digestible form.
This top-down causation with time delay cannot be understood without acknowledging the high level, low dimensional world. It is real, even if it only exists in people’s head. That low dimensional world is where the influence of the past high dimensional world persists even if it has stopped in the present high dimensional world. Maybe people’s behaviour cannot be reduced to only physical laws after all… But there is still no magic in that, and we are not getting “something from nothing” (a pitfall with top-down causation).

A counter argument to this could be that everything is still governed by physical laws, both people and people’s brain, and lateral causation at the lowest level between elementary particles can totally be enough to explain the persistence of archaic social rules and therefore top-down causation does not need to exist.
I agree. But as soon as you are not looking at the lowest level possible, highest dimensional world (which currently cannot even be defined), top-down causation does happen. Since I am not trying to define life, this is fine with me!

Summary: Top-down causation exists when the “down” takes low dimensional input from the “top”. The key here is the difference in dimensions of the two spaces, not a perceived difference of “scale” as in the original concept of top-down causation. Maybe I should call it low-high causation?

 

  • Open Ended Systems

In this section I go back to my pet subject, Open Ended Evolution and the Interface Theory of Perception. You probably saw it coming when I talked of imprecise sensors. I define the relationship between OEE and top-down causation as: An open ended system is one where components take inputs from changing projected spaces. It’s top-down causation in a billion flavors.
These changes in projections are driven by interactions between the evolution of sensors and the evolution of high dimensional outputs from individuals.

Two types of projections can make these worlds interesting:
1. sensory projection (see previous section)
2. internal projections (in other words, brains).

The theoretical diversity of projections n.1 depends of the richness of the real world. How many types of energy exist, can be sensed, mixed and used as heuristics for causation?
N.2 depends on n.1 of course (with many types of sensors you can try many types of projections), but also on memory span and capacity (you can use past sensor values as components of your projections). Here, neurons are essentially the same as sensors: they build projections, as we will see in the next section. The main difference is that neurons can be plastic: the projections they build can change during your lifetime to improve your survival (typically, changes in sensors decrease your chances of survival…).
As usual, I think that the secret ingredient to successful OEE is not an infinitely rich physical world, even if it helps… Rather, the richness of choice of projected spaces (interfaces) is important.

 

  • Neural Networks

I will not go into great details in this section because it is kind of technical and it would take forever to explain everything. Let’s just set the atmosphere.
I was quite shocked the other day to discover that layered neural networks are actually the same thing as space projection. It’s so obvious that I’m not sure why I wasn’t aware of it. You can represent a neural network as a matrix of weights, and if the model is a linear one, calculate the output to any input by multiplying the input by the weight matrix (matrix multiplication is equivalent to space projection).
The weight matrix is therefore quite important: it determines what kind of projection you will be doing. But research has shown that when you are trying to apply learning algorithms to high dimensional inputs, even a random weight matrix improves the learning results, as long as it reduces the dimension of the input (you then apply learning to the low dimensional input).
Of course, you get even better results by optimizing the weight matrix. But then you have to learn the weight matrix first, and only then apply learning to the low dimensional inputs. That is why manifold learning has been invented, it seems. It finds you a good projection instead of using random stuff. Then you can try to use that projection to perform tasks like clustering and classification.

What would be interesting is to apply that to behavioural tasks (not just disembodied tasks) and find an equivalent for spiking networks. One possible way towards that is prediction tasks.

Say you start with a random weight matrix. You goal is to learn to predict what projected input will come after the current one. For that, you can change your projection: two inputs that you thought were equivalent because they got projected at the same point, but ended up having different trajectories, were probably not equivalent to begin with. So you change your projection as to have these two inputs projected at different places.

From this example we can see several things:
– Horror! layers will become important. Some inputs might require a type of projection, and some others a different type of projection. This will be easier implemented if we have layers (see here).
– A map of the learned predictions ( = input trajectories) will be necessary at each layer to manage weight updates. This map can take the form of a Self Organising Map, or more complex, a (possibly probabilistic) vector field where each vector points to the next predicted position. There are as many axes as neurons in the upper layer, and axes have as many values as neurons have possible values. (This vector field can actually be itself represented by a layer of neurons, with each lateral connection representing a probabilistic prediction). Errors in the prediction drive changes in the input weights to the corresponding layer (would this be is closer to Sanger’s rule than to BackProp?). Hypothesis: the time dependance of this makes it possible to implement using Hebbian learning.
– Memory neurons with defined memory span can improve prediction for lots of tasks, by adding a dimension to the input space. It can be implemented simply with LSTM in non-spiking models of NN, or with axonal delays for spiking models.
– Layers can be dynamically managed by adding connections when necessary, or more reasonably deleting redundant ones (neurons that have the exact same weights as a colleague can replace said weights by a connection to that colleague)
– Dynamic layer management will make a feedforward network into a recurrent one, and the topology of the network will transcend layers (some neurons will develop connections bypassing the layer above to go directly to the next). The only remain of the initial concept of layers will be the prediction vector map.
– Memory neurons in upper layers with connections towards lower layers will be an efficient tool to reduce the total number of connections and computational cost (See previous sections. Yes, top-down causation just appeared all of a sudden).
– Dynamic layer management will optimise resource consumption but make learning new things more difficult with time.
– To make the difference between a predicted input and that input actually happening, a priming mechanism will be necessary. I believe only spiking neurons can do that, by raising the baseline level of the membrane voltage of the neurons coding for the predicted input.
– Behaviour can be generated (1), as the vector field map tells us where data is missing or ambiguous and need to be collected.
– Behaviour can be generated (2), because if we can predict some things we can have anticipatory behaviour (run to avoid incoming electric shock)

Clustering and classification are useless in the real world if they are not used to guide behaviour. Actually, the concept of class is subjective to the behaviour it supports; here we take “prediction” as a behaviour, but the properties you are trying to describe or predict depends on which behaviour you are trying to optimise. The accuracy or form of the prediction depends on your own experience history, and the input you get to build predictions from depends on your sensors… Here comes embodied cognition again.

Summary: predictability and understanding are the same thing. Predictability gives meaning to things and thus allows to form classes. The difference between deep learning and fully recurrent networks is top-down causation.

 

  • Going further

It might be tempting to generalise top-down causation. Maybe projecting to lower dimensions is not that important? Maybe projecting to different spaces with the same number of dimensions, or projecting to higher dimensional spaces, enhances predictability. After all, top-down projection in layered networks is equivalent to projecting low dimensional input to higher dimensional space (see also auto-encoders and predictive recurrent neural networks). But if our goal is to make predictions in the less costly way possible (possibly at the cost of accuracy), then of course projection to lower dimensional spaces is a necessity. When predictions must be converted back to behaviour, projection to high dimensional spaces becomes necessary; but in terms of memory storage and learning optimisation, the lowest dimensional layer is the one that allows reduction of storage space and computational cost (an interesting question here is, what happens to memory storage when the concept of layer is destroyed?).

One possible exception would be time (memory). If knowing the previous values of an input improves predictability and you add neuron(s) to keep past input(s) as memory, you are effectively increasing the number of dimensions of your original space. But you use this memory for prediction (i.e. low dimensional projection) in the end. So yes, the key concept is dimension reduction.

A nice feature of this theory is that models would be able to capture causal relationships (something that deep learning cannot do, I heard). This whole post was about a concept called “top-down causation” after all. If an input improves predictability of trajectories in the upper layer, certainly this suggests a causal relationship. So what the model is really doing all the way is learning causal relationships.

Summary: dimension reduction is the key to learning causal relationships.
Wow, that was a long post! If you have come this far, thank you for your attention and don’t hesitate to leave a thought in the comments.

To: Google Scholar’s Dad — Data-driven science hypotheses

A week ago I sent an email to Anurag Acharia, the man behind “Google Scholar”. Scholar is a search engine that allows you to browse through scientific papers, a specialized version of Google. You can use it for free (although accessing the papers is often not free).

scholar

Scholar is an extraordinary tool. It does something that nobody else can right now. Which is why I think they can solve a weird issue of science: the way researchers come up with hypotheses is everything but scientific. It relies on the same methods that your grandma’s grandma used to cure a cold: tradition and gut feeling.

I believe that the generation of scientific hypotheses must be data-driven, just as science itself is. Here is what I wrote in my proposal (original PDF here). There was no answer, unsurprisingly: I can’t imagine what the timetable of someone like Anurag Acharya looks like. But I put this here in the hope that someone finds it worth debating.


MAKING SCIENCE SCIENCIER

Introduction

By definition, science follows the scientific process. Hypotheses are adopted or discarded based on objective analysis of data. But surprisingly, the process of generating hypotheses itself is hardly scientific: it relies on hunches and intuition.

It often goes like this: a researcher gets an idea from reading a colleague’s paper or listening to a talk. Literature from the field is reviewed, which allows for refinement of the original idea. Then it is time for designing experiments, analysing data and writing a paper. If the researcher is actually a student, things can be more complicated. But in all cases, the original hypothesis relies tremendously on the researcher’s own subjective collection and appreciation of information, that must be selected from the gigantic amount of existing scientific papers.

Clearly, the fact that we now have access to all this scientific information is a giant leap from the situation of a few decades ago; and it has been made possible single-handedly by Google Scholar. But it is also a fact that researchers everywhere have more and more data to look at, and that “to look at” too often becomes “to subjectively pick from”.

Hypothesis generation is the basis of science – arguably the most crucial and exciting part of actually doing science. Yet it is not based on anything scientific. This document summarise 3 proposals to make of hypothesis generation a datadriven process. I believe this is not restricting the creativity of scientists, but enhancing it; that it can make science more efficient and limit the waste of time and resources caused by irrelevant, biased, or outdated hypotheses – especially for graduate students. Not only does this respect the philosophy of Google and more specifically Google Scholar, but Google Scholar is currently the only organism that has the resources to make it happen. Here are my 3 proposals, from the easiest to implement to the more hypothetical.

1. Paper Networks

Going through several dozen of references at the end of a paper is far from optimal: the reason why a paper is cited and the paper itself are not physically close; the authors tend to unconsciously cite papers that support their view; the place of the papers in the field and their relationship to each other are virtually inaccessible.

Numerous services suggest papers supposed to be close to the one you have just read, but this is not enough. We need, at a glance, to know which papers support each other’s views and which support conflicting opinions, and we need to know how many there are. A visual map, a graph of networks of papers or of clusters of papers could be the ideal tool to reach this goal. The benefits would go beyond simple graphical structuring of the information:

• Reducting confirmation bias. When we look for papers simply by inputting keywords in Google Scholar, the keyword choice itself tend to be biased. A Paper Network would make supporting and opposing papers equally accessible.

• Promoting interdisciplinarity. It’s easy to say that interdisciplinary approaches are good. It’s better to actually have the tools to make it happen. A Paper Network would make it clear which approaches are related in different fields.

• Sparking inspiration. Standard search methods tell us what is there. But science is about bringing forth what is not yet here. A Paper Network would show existing papers in different fields, helping us to avoid re-doing what has already been done. More importantly, it would make it visually clear where the gaps are, where some zones are still blank, and what may be needed to fill them.

2. Burst Detection

Artificial Intelligence, my field, has known several “winters” and “summers”: periods when it seemed like all had already been done and the field fell in hibernation, and periods when suddenly everyone seemed to do AI (now is such a period). I suspect that other field know these brisk oscillations as well: several teams announcing the same big discovery in parallel, or a rapid succession of findings that leads to revival of the field, or even spawn new specialised fields.

These bursts are most likely not completely random. If we could predict, even very roughly, when which field will boom, we could prepare for it, invest in it and even maybe make it happen faster. What are the factors influencing winters and summers? How many steps in advance can we predict? How many more Moore Laws are waiting to be discovered? Being able to predict winters would also be an asset, because we could look for the profound causes that force science to slow down and try to prevent it. Is it the lack of funds? Relying too much on major paradigms? Only analysing data from the past can transform hunches into successful policies for the advance of science.

3. Half Life of Facts

The destiny of scientific facts if to be overturned – it is the proof that science works. Better tools, better theories: these are obvious first level parameters influencing the shelf life of scientific papers. But we need to go deeper and look for meta-parameters: properties that allow us to predict this shelf life, and identify which papers, which parts of a theory are statistically more likely to be busted.

As anyone who has assisted to a heated scientific debate can testify, right now, the leading cause for accepting a non-trivial theory or choosing to challenge it is the researcher’s own “common sense”; yet all science is about is rejecting common sense as an explanation to anything and looking for facts in hard data. In these conditions, how can we continue to rely on gut feeling to justify our opinions? We need more sound foundations to our beliefs, even if in the absence of experimental verification they are just that: beliefs.

If a specific part of a theory looks perfectly sound but is statistically close to death, we must start looking at its opponents, or even better, think about what a good opponent theory would look like and choose research topics accordingly.

4. Conclusion

These proposals could change the way we, researchers, do science. They also come with a flurry of ethical issues: new tools would change the way resources (financial and human) are distributed, with desirable and undesirable outcomes. Just like prenatal genetic screening leads to difficult ethical questions, building tools allowing the hierarchisation of research projects should be a very careful enterprise.

But here is the catch: unlike genetic screening, new research tools have an objective component to them. These 3 proposals are about bringing more science to science: allowing the generation of science seeds to be data-driven. Science changes the world, every day. Any tiny improvement to the scientific process is worth striving for – and these 3 changes would, I believe, bring major improvements.

Stories #4: The Sacred Bonds of Marriage

Here comes the 4th episode of my series of translated old travel posts.


September 9th, 2011

The hotel’s clerk is 24 years old.

A bit like my first Japanese friend did, she started by talking to me for hours every evening, while totally aware that I could only understand a fraction of what she told me. When I finally picked up enough vocabulary and got used to her accent (and her to mine), we started having simple exchanges. At last, sitting with her behind the reception desk 2 hours at the same time every a day, I became part of the background just like the plants and the cat.

I feel somewhat like a pet. Clients make me speak a little (“Oh wow, she talks? Where did you find her? What’s her name? Come on, say something!”). I keep an eye on the lobby when nobody is here to do it, and like a good pet, I do tricks for treats. “Look, I brought you a watermelon!* Come on, say something in French for us!”

Like the majority of this city’s inhabitants, the clerk is not from here. Yes, people in Shenzhen come from everywhere in China, looking for a better life and social status. They are factory workers, salespersons, engineers; their parents are moto-taxi drivers, mom and pop restaurant owners.
Here more than elsewhere people ask about each other’s province of origin. “Do you eat noodles?” one inquires to make sure that you come from the North. If you have a Southern accent, people will rather ask: “Do you eat rice?” — Southerners eat rice-based products, and Northerners wheat-based products, as everyone knows.
“Your Mandarin is beautiful. You come from the North, don’t you?”
“You add chili peppers to everything you eat. A Southerner, eh?”

To these questions, one answers simply: “Yes, we people from Hunan like our food spicy” or “Yes, I come from Hubei, we speak Mandarin almost without an accent.”

The clerk arrived here 3 months ago. She ran away from home, fleeing an arranged marriage.
Arranged marriages are not necessarily forced marriages: her parents know where she is, they call her on the phone sometimes, but they did not tell her ex-future husband. They wanted to find her a husband because at 24, she should already have started her own family. I think she ran away because she was heartbroken.
Here is what I understood of her story, with my unreliable Chinese skills:

She used to have a boyfriend.
A rather well-off, older man, not really handsome. But he was nice and sweet to her, always there when she needed help. And so she fell in love. It was not for his money that she loved him: she is young and pretty, and because of the current gender imbalance in China she had her share of admirers. Here, girls don’t really have to torture themselves into ideal beauties. It is the law of supply and demand: there will not be enough wives for everyone and everyone knows it.
It is guys who have to look their best, spend hours at the hairdresser, wear colorful accessories, be unique and remarkable. Just like in Japan, it is in the workers social class that you will find the most extravagant fashion fads.

One day, the young clerk’s parents told her about their plan to marry her to a man she did not know. On hearing the news, she probably met with her rich lover to convince him to marry her.
It was killing two birds with one stone: reassuring her parents about her future and avoiding the arranged marriage. But her lover was not the angel he appeared to be: as it turns out, he was married, even had children. Her romantic dreams evaporated and her heart broken, no one left to save her from her unknown future husband, she fled and arrived here.

“You know, often men will propose to offer me presents, to treat me to dinner. Lots of women get seduced like that. But not me. I’m a good girl, not someone you can buy, I don’t want to benefit from strangers’ money.

You know, my lover, he said he wanted to marry me. He said he would divorce his wife, abandon his children and save me from the arranged marriage. But I’m not that kind of person. I won’t steal the husband of a pregnant woman who doesn’t even have a job, and has little kids, can you imagine! Now he wants to marry me, but later he’ll find an even younger woman, and I too will end up alone with my kids. But I love him, what should I do?

– You are right, but I don’t know what to say.

– Of course you don’t know! You are but a child.”

Seen from here, you are not much less of a child than I am.

* Watermelon does not have racist connotations for most people in the world, except in the US. I did not even know about that US watermelon-thing at the time… I did not think anything special about the watermelon, except maybe that an entire watermelon was a big present for just a few words of French.


Looking back on this story, I am even more speechless. I wonder what happened to the young, heartbroken girl.
I praise her moral strength and sympathize with her distress and confusion.
“Time heals everything”, hopefully broken hearts too.

Anti-Layer Manifesto

Disclaimer: Deep Learning is giving the best results in pretty much all areas of machine learning right now, and it’s much easier to train a deep network than to train a shallow network. But it has also been shown that in several cases, a shallow network can match a deep one in precision and number of parameters, while requiring less computational power. In this post I leave aside the problem of training/learning.

This started by me buying several books and trying to read all of them at the same time. Strangely, they all treated the same topics in their opening chapters (the unconscious, perception, natural selection, human behaviour, neurobiology…), and all disagreed with each other. Also none of them was talking about layers, but somehow my conclusion about these books is that we might want to give up layered design in AI.

Layers?

So today I’m going to talk about layers, and how and why we might want to give them up. The “layers” I’m talking about here are like the ones used in Deep Learning (DL). I’m not going to explain the whole DL thing here, but in short it’s an approach to machine learning where you have several layers of “neurons” transmitting data to each other. Each layer takes data from the previous one and does operation on it to reach a “higher level of abstraction”. For example, imagine that Layer 1 is an image made of colored pixels, and Layer 2 is doing operations on these pixels to detect edges of objects in the image. So each neuron is holding a value (or a group of values).

Typically, neurons in the same layer don’t talk to each other (no lateral connections); they only perform operations on data coming from the previous layer and send it to the next layer (bottom-up connections). What would happen if there were lateral connections, it that your layer would stop representing “something calculated from below”. Instead they would hold data made up of lower abstraction and current abstraction data mixed up together. As if instead of calculating edges from pixels, you calculated edges from pixels and from other edges. Layers also usually don’t have top-down connections (equivalent to deciding the color of a pixel based on the result of your edge calculation). These properties are shared by many processing architectures, not only DL. I’m not focusing on DL particularly, but rather trying to find what we might be missing by using layers – and what might be used by real brains.

Example of layering – Feedforward neural network. “Artificial neural network” by en:User:Cburnett – Wikimedia Commons

Layers are good for human designers: you know what level of data is calculated where, or at least you can try to guess it. Also we talk about the human brain cortex in terms of layers – but these are very different from the DL layers, even from a high level point of view. Neurons in the human brain have lateral and top-down connections.

DL-like layers are a convenient architecture. It keeps the levels of abstraction separated from each other – your original pixel data is not modified by your edge detection method.Your edge detection is not modified by object detection. But… Why would you want to keep your original data unmodified in the first place? Because you might want to use it for something else? Say that you’re playing a “find the differences” game on two pictures. You don’t want to modify the model picture while looking for the 1st difference; you want to keep the model intact, find difference 1, then use the model again to find difference 2 etc.

But… For example if you could look for all errors in parallel, you wouldn’t care about modifying the images. And if what is being modified is a “layer” of neurons inside your head, you really shouldn’t care about it being modified; after all, the model image is still there on the table, unmodified.

The assumptions behind layers

Let’s analyse that sentence: “you might want to use it for something else.”

It: it is the original unmodified data. Or rather, it is the data that will be transmitted to the next layer. That’s not trivial. How to decide what data should be transmitted? Should you try to find edges and then send that to another layer? Or is it OK to find edges and objects at the same place and then send that to a different layer? All depends on the “something else”.

Something else: If you can calculate everything in a messy bundle of neurons and go directly from perception to action in a single breath, you probably should. Especially if there is no learning needed. But when you have a whole range of behaviors depending on data from the same sensor (eyes for example), you might want to “stop” the processing somewhere to preserve the data from modification and send these results to several different places. You might send the edge detection results to both a sentence-reading module and a face detection module. In that case you want to keep your edge detection clean and unmodified in order to send the results to the different modules.

Might: But actually, you don’t always want to do that. If there are different behaviors using data from the same sensors but relying on different cues, you don’t need to preserve the original data. Just send what your sensor senses to the different modules; each one modifying its own data should not cause any problem. Even if your modules use the same cues but in different ways, sending to each one a copy of the data and letting them modify it can be OK. Especially, if your modules need to function fast and in parallel. Let’s say that module 1 needs to do some contrast detection in the middle of your field of vision (for face detection maybe). Module 2 needs to do contrast detection everywhere in your field of vision (obstacle detection?). If we make the (sometimes true in computers) assumption that contrast detection takes more time for a big field than a small one, it will be faster for module 1 to do its own contrast calculation on partial data instead of waiting for the results calculated in module 2. (but more costly).
Did you know that if the main vision center of your brain is destroyed, you will still be able to unconsciously detect the emotions in human faces… while being blind? You will also be able to avoid obstacles when walking. The parts of your brain for conscious vision, face recognition and obstacle detection are located at different places, and function semi-independently. My hypothesis is that these 3 functions rely on different use of the same cues and need to be running fast, therefore in parallel.

If not layers then what?

I would go for modules – so called “shallow networks”. A network of shallow network. And I suspect that it is also what happens in the brain, although that discussion will require a completely different blog post.

First, I think that the division in layers or in modules need to be less arbitrary. Yes, it is easy to use for human designers. But it can also be a cost for performance. I can see some advantages in using messy shallow networks. First, neurons (data) of the same level of abstraction can directly influence each other. I think it’s great to perform simplifications. If you need to do edge detection, you can just try to inhibit (erase) anything that’s not an edge, right there in the “pixel” layer. You don’t need to send all that non-edge data to the next module – after all, very likely, most of the data is actually not edges. If you actually send all the data to be analyzed (combined, added, subtracted…) in an upper layer, you also need more connections.

Furthermore, it seems justified to calculate edges also from other edges and not just from pixels. Edges are typically continuous both in space and time: using this knowledge might help to calculate edges faster from results that are already available about both pixel and edges than if you just update your “edge layer” after having completely updated your “pixel layer”.

Ideally we should only separate modules when there is a need to do so – not because the human designer has a headache, but because the behavior of the system requires so. If the output of the module is required as is for functionally different behaviors, then division is justified.

I would also allow top-down connections between modules. Yes, it means that your module’s output is modified by the next module, representing a higher level of abstraction. It means that you are “constructing” your low level input from a higher level output. Like deciding the color of pixels based on the result of edge detection… I think it can be justified: sometimes it is faster and more economical to construct a perception than to just calculate it (vast subject…); sometimes accurate calculation is just not possible and construction is necessary. Furthermore if a constructed perception guide your behavior as to make it more successful, then it will stick around thanks to natural selection. I also think that it happens in your brain (just think about that color illusion where two squares look like different colors just because of the surrounding context like shadows).

Concluding words

Lots of unsubstantiated claims in this blog post! As usual. If I could “substantiate” I’d write papers! But I really think it’s worth considering: are layers the best thing, and if not then why? This thought actually came from considerations about whether or not we are constructing our perceptions – my conclusion was yes, constructed perceptions have many advantages (more on that later…maybe?). But what kind of architecture allows to construct perceptions? The answer: not layers.

Energy arrays, ITP and OEE

An antlion larva – Wikimedia Commons

Here is a follow-up of my 2 last posts about Open Ended Evolution. This time I would like to talk about energy arrays as a solution to 2 issues of the simulation:

  1. The environment’s map cannot be modified
  2. The individual agents move at random – they cannot decide where to go.

Introducing energy arrays is also a nice way to generalize the interface (as in Interface Theory of Perception) of the agents. It allows an agent to potentially detect numerous actions with only few sensors. It goes like this:

Imagine that there are different ways for an agent to emit energy in the simulation. By emitting light, sound, heat, smells, other kind of vibrations… it does not matter what we call them; what is important is that these forms of energy have different values for the following properties: speed of transmission (how fast does it go from here to there), inverse-square law (how fast does the intensity decreases with the distance) and dissipation (how long does it take to disappear from a specific point). In reality these values are linked, but in simulation we don’t need to follow the laws of physics so we just decide the values by ourselves.

Everything an agent does (moving, eating, mating, dying or just standing there) emits energy in different forms. For example, you have 3 forms of energy and represent it with an array [w1,w2,w3]. Each cell of the map (environment) has such an array. A specific individual doing a specific action in this cell will add a set of values to the cell’s energy array, that will propagate to the neighboring cells according to the properties of that form of energy. For example, a lion eating might emit a lot of sound, a bit of heat and a strong smell : [+5,+1,+3]. These values are decided by the genes of the individual, so each “species” will have different values for each action. And if you remember, the concept of “species” is just an emergent property of the simulation, so really each individual of the species might have slightly different values for the array of each action.

Now let’s solve the 2 issues mentioned earlier.

Making the environment modifiable

Each form of energy has 3 properties: speed of transmission, inverse-square law and dissipation. The values of these properties is different for each form of energy. But we can also make these values different for different regions of the environment: after all, the behavior of sound or light is different in water, air or ground.

Even better, we can allow the agents to change these values, which is equivalent to modifying the environment. In the real world, if  you’re a spider, you can build a web that will transmit vibrations to you in a fast and reliable way. Or you can make a hole in the ground, to make yourself invisible to others. This is what the modifiable energy properties allow us to do in the simulation.

Now if an agent’s speed per iteration depends on its genes but also on modifiable environmental properties, it becomes possible for a prey to slow down its predator by modifying the environment, or for a predator to trap its prey. The equivalent of  a squid inking a predator, or an antlion trapping ants. Which leads us to the next point:

Giving agency to the agents

We don’t want our agents to move simply at random, and we want them to be able to chose to modify the environment or not. Energy arrays offer a solution. Back to the example: if you have 3 forms of energy, your agents can have at most 3 types of sensors (eye, ear, nose for example). Say that each sensor takes values from 4 neighboring cells (front, back, left, right) and transforms it into a 2D vector (coordinates: [x = right – left, y = front – back]).

The sensor map/perceptual interface that we defined 2 posts ago can be rebuilt adding these new sensor types and mapping them to motion actions: if the vector for sound points to that direction, go in the opposite direction for example. This map is also encoded in genes, so the motion is not an individual choice; but now our agents are not moving at random. We can also add “modification actions”: if the sensors have these values, apply that modification to the environment.

Note that sensors cost energy, and if you can sense a large range of values for a given sensor, it will cost you a lot of energy. Not only you must earn that energy by attacking and eating other agents, but the energy you spend “sensing” around is dissipated in the environment, making you more detectable to potential predators. In short, having lots of precise sensors is not a viable solution. Instead you must go for a heuristic that will be “good enough”, but never perfect (local fitness).

Concluding words

The implementation of energy arrays and properties require little effort: in terms of programming, only one new class “energy” with 3 variables and 3 instances, and some modifications of existing classes. But the benefits are huge, as we now have a lot more potential behaviors for our agents: hunting, hiding, building traps, running away, defense mechanisms, even indirect communication between agents are now possible (which may lead to group behavior), all of that in a rather simple simulation, still based on perceptual interfaces. We also have much more potential for creating environmental niches, as the environment itself can be modified by agents. A big regret is that, visually speaking, it still just looks like little squares moving on a white screen – you have to observe really well to understand what is going on, and what the agents are doing may not be obvious. Is it doing random stuff? Is it building a trap?

One serious concern could be that too much is possible in this simulation. With so many possibilities, how can you be sure that meaningful interactions like eating and mating will ever appear or be maintained between agents? A first element is that we start simple at the beginning of the simulation: only one type of agent, with no sensors and no actions at all. Every increase in complexity comes from random mutations, therefore complex agents will only be able to survive if they can actually interact with the rest of the environment. A second element is that a “species” of agents cannot drift too far away from what already exists. If you suddenly change the way you spend energy in the environment or your actions, you might confuse predators. But you will also confuse your potential mates and lose precious advantages coming from them (like genome diversity and reduced cost for producing offspring). Furthermore, as explained 2 posts ago, a species that is “too efficient” is not viable on the long term and will disappear.

Next time I could talk about how generalized perceptual interfaces might lead to sexual dimorphism, or much better, give the first results of the actual simulation.

Perceptual Interface Theory and Open Ended Evolution

For some obscure reason I stumbled upon this paper the other day: “The user-interface theory of perception: Natural selection drives true perception to swift extinction” Download PDF here.
It’s 26 pages, published in 2009, but it’s worth the read; both the contents and the writing are great.
To summarize what interests me today, the author claims that the commonly accepted statements are false:

– A goal of perception is to estimate true properties of the world.
– Evolution has shaped our senses to reach this goal.

These statements feel intuitively true, but the author convincingly argue that:

– A goal of perception is to simplify the world.
– Evolution favorizes fitness, which can be (and most probably is) different from “exactitude”.

I feel a strong link between these claims and my previous post about OEE. If you remember, in my imaginary world where light can become grass, there is no hardwired definition of species, and therefore two individuals meeting each other can never be sure of each other’s exact identity, including “species”, strengths and weaknesses. They can have access to specific properties trhough their sensors, but must rely on heuristics to guide their behaviour. One heuristic could be “slow animals usually have less energy than me, therefore I should attack and eat them”. But this is not an optimal rule; you can well meet a new individual wich is slow as to save energy for reproduction, and has more energy than you. You will attack them and die. But your heuristic just has to be true “most of the time” for you to survive.

The paper, which is not about OEE at all but about the real world, says this at p2:
“(1)[these solutions] are, in general, only local maxima of fitness. (2) […] the fitness function depends not just on one factor, but on numerous factors, including the costs of classification errors, the time and energy required to compute a category, and the specific properties of predators, prey and mates in a particular niche. Furthermore, (3) the solutions depend critically on what adaptive structures the organism already has: It can be less costly to co-opt an existing structure for a new purpose than to evolve de novo a structure that might better solve the problem.”

You might recognize this as exactly the argumentation in my previous post. To achieve OEE, we want local fitness inside niches (1 and 2); we want evolution to be directed (3). For that, I introduced this simulated world where individuals do not have access to the exact, direct, “identity” of others (2): what we may call according to this paper a “perceptual interface”, which simplifies the world while not representing it with fidelity, which can lead to terrible errors.

Why would perceptual interfaces be a key to OEE?
In most simulation that I have seen, an individual from species A can recognize any individual from species B or from its own species A with absolute certainty.
I suspect that often, this is hardcoded inside the program: “if x.species = A then …”. Even if B undergoes a series of mutations increasing its fitness, A might be able to keep up by developing corresponding counter-mutations – *because there is no choice*. A eats B. If B becomes “stronger”(more energy storage), only the strongest members of A will survive and reproduce, making the entire group of A stronger. If some members of B become weaker trhough muation, they will die.
Play the same scenario with a perceptual interface: A only detects and eats individuals that have a maximum energy storage of X. Usually these individuals are from species B. If some B mutate to get stronger, as far as A is concerned, they stop being food. They are not recognized as “B”. To survive, A might mutate to store more than X energy AND detect the new value of energy corresponding to B, but any other mutation is equally likely to help the survival of A: maybe detecting only lower levels of energy would work, if there are weak species around. Maybe exchanging the energy sensor for a speed sensor would help detecting Bs again, or any other species.
What if B become weaker? As far as A is concerned, B also stops being food because A’s sensors can only detect a certain level of energy. Not only B has several ways to “win” over A, but A also has several ways to survive despite B’s adaptations: by adapting to find B again, or by changing its food source.

You might object that the real world does not work this way. A cat will chase mice even if they get slower.
Or will they? Quite a lot of animals actually evolved as not to be detected by their predators using tactics involving slow motion, even if it means moving slower in general (like sloths) or in specific situations (playing dead).

In simulated worlds, going faster / becoming stronger is usually the best way to “win” at evolution.
By introducing perceptual interfaces, we allow the interplay between individuals or “species” to be much richer and original. What is the limit? If you have ever heard of an OEE simulation with perceptual interfaces, I would be very happy to hear about it. All the papers I found about simulated perceptual interfaces were purely about game theory.

In 1 or 2 posts, I will talk about how to make my model more fun and general, by overcoming some current shortcomings in an programatically elegant way. I’m not only theory-talking, I’m implementing too, but slowly.