To: Google Scholar’s Dad — Data-driven science hypotheses

A week ago I sent an email to Anurag Acharia, the man behind « Google Scholar ». Scholar is a search engine that allows you to browse through scientific papers, a specialized version of Google. You can use it for free (although accessing the papers is often not free).


Scholar is an extraordinary tool. It does something that nobody else can right now. Which is why I think they can solve a weird issue of science: the way researchers come up with hypotheses is everything but scientific. It relies on the same methods that your grandma’s grandma used to cure a cold: tradition and gut feeling.

I believe that the generation of scientific hypotheses must be data-driven, just as science itself is. Here is what I wrote in my proposal (original PDF here). There was no answer, unsurprisingly: I can’t imagine what the timetable of someone like Anurag Acharya looks like. But I put this here in the hope that someone finds it worth debating.



By definition, science follows the scientific process. Hypotheses are adopted or discarded based on objective analysis of data. But surprisingly, the process of generating hypotheses itself is hardly scientific: it relies on hunches and intuition.

It often goes like this: a researcher gets an idea from reading a colleague’s paper or listening to a talk. Literature from the field is reviewed, which allows for refinement of the original idea. Then it is time for designing experiments, analysing data and writing a paper. If the researcher is actually a student, things can be more complicated. But in all cases, the original hypothesis relies tremendously on the researcher’s own subjective collection and appreciation of information, that must be selected from the gigantic amount of existing scientific papers.

Clearly, the fact that we now have access to all this scientific information is a giant leap from the situation of a few decades ago; and it has been made possible single-handedly by Google Scholar. But it is also a fact that researchers everywhere have more and more data to look at, and that “to look at” too often becomes “to subjectively pick from”.

Hypothesis generation is the basis of science – arguably the most crucial and exciting part of actually doing science. Yet it is not based on anything scientific. This document summarise 3 proposals to make of hypothesis generation a datadriven process. I believe this is not restricting the creativity of scientists, but enhancing it; that it can make science more efficient and limit the waste of time and resources caused by irrelevant, biased, or outdated hypotheses – especially for graduate students. Not only does this respect the philosophy of Google and more specifically Google Scholar, but Google Scholar is currently the only organism that has the resources to make it happen. Here are my 3 proposals, from the easiest to implement to the more hypothetical.

1. Paper Networks

Going through several dozen of references at the end of a paper is far from optimal: the reason why a paper is cited and the paper itself are not physically close; the authors tend to unconsciously cite papers that support their view; the place of the papers in the field and their relationship to each other are virtually inaccessible.

Numerous services suggest papers supposed to be close to the one you have just read, but this is not enough. We need, at a glance, to know which papers support each other’s views and which support conflicting opinions, and we need to know how many there are. A visual map, a graph of networks of papers or of clusters of papers could be the ideal tool to reach this goal. The benefits would go beyond simple graphical structuring of the information:

• Reducting confirmation bias. When we look for papers simply by inputting keywords in Google Scholar, the keyword choice itself tend to be biased. A Paper Network would make supporting and opposing papers equally accessible.

• Promoting interdisciplinarity. It’s easy to say that interdisciplinary approaches are good. It’s better to actually have the tools to make it happen. A Paper Network would make it clear which approaches are related in different fields.

• Sparking inspiration. Standard search methods tell us what is there. But science is about bringing forth what is not yet here. A Paper Network would show existing papers in different fields, helping us to avoid re-doing what has already been done. More importantly, it would make it visually clear where the gaps are, where some zones are still blank, and what may be needed to fill them.

2. Burst Detection

Artificial Intelligence, my field, has known several “winters” and “summers”: periods when it seemed like all had already been done and the field fell in hibernation, and periods when suddenly everyone seemed to do AI (now is such a period). I suspect that other field know these brisk oscillations as well: several teams announcing the same big discovery in parallel, or a rapid succession of findings that leads to revival of the field, or even spawn new specialised fields.

These bursts are most likely not completely random. If we could predict, even very roughly, when which field will boom, we could prepare for it, invest in it and even maybe make it happen faster. What are the factors influencing winters and summers? How many steps in advance can we predict? How many more Moore Laws are waiting to be discovered? Being able to predict winters would also be an asset, because we could look for the profound causes that force science to slow down and try to prevent it. Is it the lack of funds? Relying too much on major paradigms? Only analysing data from the past can transform hunches into successful policies for the advance of science.

3. Half Life of Facts

The destiny of scientific facts if to be overturned – it is the proof that science works. Better tools, better theories: these are obvious first level parameters influencing the shelf life of scientific papers. But we need to go deeper and look for meta-parameters: properties that allow us to predict this shelf life, and identify which papers, which parts of a theory are statistically more likely to be busted.

As anyone who has assisted to a heated scientific debate can testify, right now, the leading cause for accepting a non-trivial theory or choosing to challenge it is the researcher’s own “common sense”; yet all science is about is rejecting common sense as an explanation to anything and looking for facts in hard data. In these conditions, how can we continue to rely on gut feeling to justify our opinions? We need more sound foundations to our beliefs, even if in the absence of experimental verification they are just that: beliefs.

If a specific part of a theory looks perfectly sound but is statistically close to death, we must start looking at its opponents, or even better, think about what a good opponent theory would look like and choose research topics accordingly.

4. Conclusion

These proposals could change the way we, researchers, do science. They also come with a flurry of ethical issues: new tools would change the way resources (financial and human) are distributed, with desirable and undesirable outcomes. Just like prenatal genetic screening leads to difficult ethical questions, building tools allowing the hierarchisation of research projects should be a very careful enterprise.

But here is the catch: unlike genetic screening, new research tools have an objective component to them. These 3 proposals are about bringing more science to science: allowing the generation of science seeds to be data-driven. Science changes the world, every day. Any tiny improvement to the scientific process is worth striving for – and these 3 changes would, I believe, bring major improvements.

Stories #4: The Sacred Bonds of Marriage

Here comes the 4th episode of my series of translated old travel posts.

September 9th, 2011

The hotel’s clerk is 24 years old.

A bit like my first Japanese friend did, she started by talking to me for hours every evening, while totally aware that I could only understand a fraction of what she told me. When I finally picked up enough vocabulary and got used to her accent (and her to mine), we started having simple exchanges. At last, sitting with her behind the reception desk 2 hours at the same time every a day, I became part of the background just like the plants and the cat.

I feel somewhat like a pet. Clients make me speak a little (« Oh wow, she talks? Where did you find her? What’s her name? Come on, say something! »). I keep an eye on the lobby when nobody is here to do it, and like a good pet, I do tricks for treats. « Look, I brought you a watermelon!* Come on, say something in French for us! »

Like the majority of this city’s inhabitants, the clerk is not from here. Yes, people in Shenzhen come from everywhere in China, looking for a better life and social status. They are factory workers, salespersons, engineers; their parents are moto-taxi drivers, mom and pop restaurant owners.
Here more than elsewhere people ask about each other’s province of origin. « Do you eat noodles? » one inquires to make sure that you come from the North. If you have a Southern accent, people will rather ask: « Do you eat rice? » — Southerners eat rice-based products, and Northerners wheat-based products, as everyone knows.
« Your Mandarin is beautiful. You come from the North, don’t you? »
« You add chili peppers to everything you eat. A Southerner, eh? »

To these questions, one answers simply: « Yes, we people from Hunan like our food spicy » or « Yes, I come from Hubei, we speak Mandarin almost without an accent. »

The clerk arrived here 3 months ago. She ran away from home, fleeing an arranged marriage.
Arranged marriages are not necessarily forced marriages: her parents know where she is, they call her on the phone sometimes, but they did not tell her ex-future husband. They wanted to find her a husband because at 24, she should already have started her own family. I think she ran away because she was heartbroken.
Here is what I understood of her story, with my unreliable Chinese skills:

She used to have a boyfriend.
A rather well-off, older man, not really handsome. But he was nice and sweet to her, always there when she needed help. And so she fell in love. It was not for his money that she loved him: she is young and pretty, and because of the current gender imbalance in China she had her share of admirers. Here, girls don’t really have to torture themselves into ideal beauties. It is the law of supply and demand: there will not be enough wives for everyone and everyone knows it.
It is guys who have to look their best, spend hours at the hairdresser, wear colorful accessories, be unique and remarkable. Just like in Japan, it is in the workers social class that you will find the most extravagant fashion fads.

One day, the young clerk’s parents told her about their plan to marry her to a man she did not know. On hearing the news, she probably met with her rich lover to convince him to marry her.
It was killing two birds with one stone: reassuring her parents about her future and avoiding the arranged marriage. But her lover was not the angel he appeared to be: as it turns out, he was married, even had children. Her romantic dreams evaporated and her heart broken, no one left to save her from her unknown future husband, she fled and arrived here.

« You know, often men will propose to offer me presents, to treat me to dinner. Lots of women get seduced like that. But not me. I’m a good girl, not someone you can buy, I don’t want to benefit from strangers’ money.

You know, my lover, he said he wanted to marry me. He said he would divorce his wife, abandon his children and save me from the arranged marriage. But I’m not that kind of person. I won’t steal the husband of a pregnant woman who doesn’t even have a job, and has little kids, can you imagine! Now he wants to marry me, but later he’ll find an even younger woman, and I too will end up alone with my kids. But I love him, what should I do?

– You are right, but I don’t know what to say.

– Of course you don’t know! You are but a child. »

Seen from here, you are not much less of a child than I am.

* Watermelon does not have racist connotations for most people in the world, except in the US. I did not even know about that US watermelon-thing at the time… I did not think anything special about the watermelon, except maybe that an entire watermelon was a big present for just a few words of French.

Looking back on this story, I am even more speechless. I wonder what happened to the young, heartbroken girl.
I praise her moral strength and sympathize with her distress and confusion.
« Time heals everything », hopefully broken hearts too.

Anti-Layer Manifesto

Disclaimer: Deep Learning is giving the best results in pretty much all areas of machine learning right now, and it’s much easier to train a deep network than to train a shallow network. But it has also been shown that in several cases, a shallow network can match a deep one in precision and number of parameters, while requiring less computational power. In this post I leave aside the problem of training/learning.

This started by me buying several books and trying to read all of them at the same time. Strangely, they all treated the same topics in their opening chapters (the unconscious, perception, natural selection, human behaviour, neurobiology…), and all disagreed with each other. Also none of them was talking about layers, but somehow my conclusion about these books is that we might want to give up layered design in AI.


So today I’m going to talk about layers, and how and why we might want to give them up. The « layers » I’m talking about here are like the ones used in Deep Learning (DL). I’m not going to explain the whole DL thing here, but in short it’s an approach to machine learning where you have several layers of « neurons » transmitting data to each other. Each layer takes data from the previous one and does operation on it to reach a « higher level of abstraction ». For example, imagine that Layer 1 is an image made of colored pixels, and Layer 2 is doing operations on these pixels to detect edges of objects in the image. So each neuron is holding a value (or a group of values).

Typically, neurons in the same layer don’t talk to each other (no lateral connections); they only perform operations on data coming from the previous layer and send it to the next layer (bottom-up connections). What would happen if there were lateral connections, it that your layer would stop representing « something calculated from below ». Instead they would hold data made up of lower abstraction and current abstraction data mixed up together. As if instead of calculating edges from pixels, you calculated edges from pixels and from other edges. Layers also usually don’t have top-down connections (equivalent to deciding the color of a pixel based on the result of your edge calculation). These properties are shared by many processing architectures, not only DL. I’m not focusing on DL particularly, but rather trying to find what we might be missing by using layers – and what might be used by real brains.

Example of layering – Feedforward neural network. « Artificial neural network » by en:User:Cburnett – Wikimedia Commons

Layers are good for human designers: you know what level of data is calculated where, or at least you can try to guess it. Also we talk about the human brain cortex in terms of layers – but these are very different from the DL layers, even from a high level point of view. Neurons in the human brain have lateral and top-down connections.

DL-like layers are a convenient architecture. It keeps the levels of abstraction separated from each other – your original pixel data is not modified by your edge detection method.Your edge detection is not modified by object detection. But… Why would you want to keep your original data unmodified in the first place? Because you might want to use it for something else? Say that you’re playing a « find the differences » game on two pictures. You don’t want to modify the model picture while looking for the 1st difference; you want to keep the model intact, find difference 1, then use the model again to find difference 2 etc.

But… For example if you could look for all errors in parallel, you wouldn’t care about modifying the images. And if what is being modified is a « layer » of neurons inside your head, you really shouldn’t care about it being modified; after all, the model image is still there on the table, unmodified.

The assumptions behind layers

Let’s analyse that sentence: « you might want to use it for something else. »

It: it is the original unmodified data. Or rather, it is the data that will be transmitted to the next layer. That’s not trivial. How to decide what data should be transmitted? Should you try to find edges and then send that to another layer? Or is it OK to find edges and objects at the same place and then send that to a different layer? All depends on the « something else ».

Something else: If you can calculate everything in a messy bundle of neurons and go directly from perception to action in a single breath, you probably should. Especially if there is no learning needed. But when you have a whole range of behaviors depending on data from the same sensor (eyes for example), you might want to « stop » the processing somewhere to preserve the data from modification and send these results to several different places. You might send the edge detection results to both a sentence-reading module and a face detection module. In that case you want to keep your edge detection clean and unmodified in order to send the results to the different modules.

Might: But actually, you don’t always want to do that. If there are different behaviors using data from the same sensors but relying on different cues, you don’t need to preserve the original data. Just send what your sensor senses to the different modules; each one modifying its own data should not cause any problem. Even if your modules use the same cues but in different ways, sending to each one a copy of the data and letting them modify it can be OK. Especially, if your modules need to function fast and in parallel. Let’s say that module 1 needs to do some contrast detection in the middle of your field of vision (for face detection maybe). Module 2 needs to do contrast detection everywhere in your field of vision (obstacle detection?). If we make the (sometimes true in computers) assumption that contrast detection takes more time for a big field than a small one, it will be faster for module 1 to do its own contrast calculation on partial data instead of waiting for the results calculated in module 2. (but more costly).
Did you know that if the main vision center of your brain is destroyed, you will still be able to unconsciously detect the emotions in human faces… while being blind? You will also be able to avoid obstacles when walking. The parts of your brain for conscious vision, face recognition and obstacle detection are located at different places, and function semi-independently. My hypothesis is that these 3 functions rely on different use of the same cues and need to be running fast, therefore in parallel.

If not layers then what?

I would go for modules – so called « shallow networks ». A network of shallow network. And I suspect that it is also what happens in the brain, although that discussion will require a completely different blog post.

First, I think that the division in layers or in modules need to be less arbitrary. Yes, it is easy to use for human designers. But it can also be a cost for performance. I can see some advantages in using messy shallow networks. First, neurons (data) of the same level of abstraction can directly influence each other. I think it’s great to perform simplifications. If you need to do edge detection, you can just try to inhibit (erase) anything that’s not an edge, right there in the « pixel » layer. You don’t need to send all that non-edge data to the next module – after all, very likely, most of the data is actually not edges. If you actually send all the data to be analyzed (combined, added, subtracted…) in an upper layer, you also need more connections.

Furthermore, it seems justified to calculate edges also from other edges and not just from pixels. Edges are typically continuous both in space and time: using this knowledge might help to calculate edges faster from results that are already available about both pixel and edges than if you just update your « edge layer » after having completely updated your « pixel layer ».

Ideally we should only separate modules when there is a need to do so – not because the human designer has a headache, but because the behavior of the system requires so. If the output of the module is required as is for functionally different behaviors, then division is justified.

I would also allow top-down connections between modules. Yes, it means that your module’s output is modified by the next module, representing a higher level of abstraction. It means that you are « constructing » your low level input from a higher level output. Like deciding the color of pixels based on the result of edge detection… I think it can be justified: sometimes it is faster and more economical to construct a perception than to just calculate it (vast subject…); sometimes accurate calculation is just not possible and construction is necessary. Furthermore if a constructed perception guide your behavior as to make it more successful, then it will stick around thanks to natural selection. I also think that it happens in your brain (just think about that color illusion where two squares look like different colors just because of the surrounding context like shadows).

Concluding words

Lots of unsubstantiated claims in this blog post! As usual. If I could « substantiate » I’d write papers! But I really think it’s worth considering: are layers the best thing, and if not then why? This thought actually came from considerations about whether or not we are constructing our perceptions – my conclusion was yes, constructed perceptions have many advantages (more on that later…maybe?). But what kind of architecture allows to construct perceptions? The answer: not layers.

Energy arrays, ITP and OEE

An antlion larva – Wikimedia Commons

Here is a follow-up of my 2 last posts about Open Ended Evolution. This time I would like to talk about energy arrays as a solution to 2 issues of the simulation:

  1. The environment’s map cannot be modified
  2. The individual agents move at random – they cannot decide where to go.

Introducing energy arrays is also a nice way to generalize the interface (as in Interface Theory of Perception) of the agents. It allows an agent to potentially detect numerous actions with only few sensors. It goes like this:

Imagine that there are different ways for an agent to emit energy in the simulation. By emitting light, sound, heat, smells, other kind of vibrations… it does not matter what we call them; what is important is that these forms of energy have different values for the following properties: speed of transmission (how fast does it go from here to there), inverse-square law (how fast does the intensity decreases with the distance) and dissipation (how long does it take to disappear from a specific point). In reality these values are linked, but in simulation we don’t need to follow the laws of physics so we just decide the values by ourselves.

Everything an agent does (moving, eating, mating, dying or just standing there) emits energy in different forms. For example, you have 3 forms of energy and represent it with an array [w1,w2,w3]. Each cell of the map (environment) has such an array. A specific individual doing a specific action in this cell will add a set of values to the cell’s energy array, that will propagate to the neighboring cells according to the properties of that form of energy. For example, a lion eating might emit a lot of sound, a bit of heat and a strong smell : [+5,+1,+3]. These values are decided by the genes of the individual, so each « species » will have different values for each action. And if you remember, the concept of « species » is just an emergent property of the simulation, so really each individual of the species might have slightly different values for the array of each action.

Now let’s solve the 2 issues mentioned earlier.

Making the environment modifiable

Each form of energy has 3 properties: speed of transmission, inverse-square law and dissipation. The values of these properties is different for each form of energy. But we can also make these values different for different regions of the environment: after all, the behavior of sound or light is different in water, air or ground.

Even better, we can allow the agents to change these values, which is equivalent to modifying the environment. In the real world, if  you’re a spider, you can build a web that will transmit vibrations to you in a fast and reliable way. Or you can make a hole in the ground, to make yourself invisible to others. This is what the modifiable energy properties allow us to do in the simulation.

Now if an agent’s speed per iteration depends on its genes but also on modifiable environmental properties, it becomes possible for a prey to slow down its predator by modifying the environment, or for a predator to trap its prey. The equivalent of  a squid inking a predator, or an antlion trapping ants. Which leads us to the next point:

Giving agency to the agents

We don’t want our agents to move simply at random, and we want them to be able to chose to modify the environment or not. Energy arrays offer a solution. Back to the example: if you have 3 forms of energy, your agents can have at most 3 types of sensors (eye, ear, nose for example). Say that each sensor takes values from 4 neighboring cells (front, back, left, right) and transforms it into a 2D vector (coordinates: [x = right – left, y = front – back]).

The sensor map/perceptual interface that we defined 2 posts ago can be rebuilt adding these new sensor types and mapping them to motion actions: if the vector for sound points to that direction, go in the opposite direction for example. This map is also encoded in genes, so the motion is not an individual choice; but now our agents are not moving at random. We can also add « modification actions »: if the sensors have these values, apply that modification to the environment.

Note that sensors cost energy, and if you can sense a large range of values for a given sensor, it will cost you a lot of energy. Not only you must earn that energy by attacking and eating other agents, but the energy you spend « sensing » around is dissipated in the environment, making you more detectable to potential predators. In short, having lots of precise sensors is not a viable solution. Instead you must go for a heuristic that will be « good enough », but never perfect (local fitness).

Concluding words

The implementation of energy arrays and properties require little effort: in terms of programming, only one new class « energy » with 3 variables and 3 instances, and some modifications of existing classes. But the benefits are huge, as we now have a lot more potential behaviors for our agents: hunting, hiding, building traps, running away, defense mechanisms, even indirect communication between agents are now possible (which may lead to group behavior), all of that in a rather simple simulation, still based on perceptual interfaces. We also have much more potential for creating environmental niches, as the environment itself can be modified by agents. A big regret is that, visually speaking, it still just looks like little squares moving on a white screen – you have to observe really well to understand what is going on, and what the agents are doing may not be obvious. Is it doing random stuff? Is it building a trap?

One serious concern could be that too much is possible in this simulation. With so many possibilities, how can you be sure that meaningful interactions like eating and mating will ever appear or be maintained between agents? A first element is that we start simple at the beginning of the simulation: only one type of agent, with no sensors and no actions at all. Every increase in complexity comes from random mutations, therefore complex agents will only be able to survive if they can actually interact with the rest of the environment. A second element is that a « species » of agents cannot drift too far away from what already exists. If you suddenly change the way you spend energy in the environment or your actions, you might confuse predators. But you will also confuse your potential mates and lose precious advantages coming from them (like genome diversity and reduced cost for producing offspring). Furthermore, as explained 2 posts ago, a species that is « too efficient » is not viable on the long term and will disappear.

Next time I could talk about how generalized perceptual interfaces might lead to sexual dimorphism, or much better, give the first results of the actual simulation.

Perceptual Interface Theory and Open Ended Evolution

For some obscure reason I stumbled upon this paper the other day: « The user-interface theory of perception: Natural selection drives true perception to swift extinction » Download PDF here.
It’s 26 pages, published in 2009, but it’s worth the read; both the contents and the writing are great.
To summarize what interests me today, the author claims that the commonly accepted statements are false:

– A goal of perception is to estimate true properties of the world.
– Evolution has shaped our senses to reach this goal.

These statements feel intuitively true, but the author convincingly argue that:

– A goal of perception is to simplify the world.
– Evolution favorizes fitness, which can be (and most probably is) different from « exactitude ».

I feel a strong link between these claims and my previous post about OEE. If you remember, in my imaginary world where light can become grass, there is no hardwired definition of species, and therefore two individuals meeting each other can never be sure of each other’s exact identity, including « species », strengths and weaknesses. They can have access to specific properties trhough their sensors, but must rely on heuristics to guide their behaviour. One heuristic could be « slow animals usually have less energy than me, therefore I should attack and eat them ». But this is not an optimal rule; you can well meet a new individual wich is slow as to save energy for reproduction, and has more energy than you. You will attack them and die. But your heuristic just has to be true « most of the time » for you to survive.

The paper, which is not about OEE at all but about the real world, says this at p2:
« (1)[these solutions] are, in general, only local maxima of fitness. (2) […] the fitness function depends not just on one factor, but on numerous factors, including the costs of classification errors, the time and energy required to compute a category, and the specific properties of predators, prey and mates in a particular niche. Furthermore, (3) the solutions depend critically on what adaptive structures the organism already has: It can be less costly to co-opt an existing structure for a new purpose than to evolve de novo a structure that might better solve the problem. »

You might recognize this as exactly the argumentation in my previous post. To achieve OEE, we want local fitness inside niches (1 and 2); we want evolution to be directed (3). For that, I introduced this simulated world where individuals do not have access to the exact, direct, « identity » of others (2): what we may call according to this paper a « perceptual interface », which simplifies the world while not representing it with fidelity, which can lead to terrible errors.

Why would perceptual interfaces be a key to OEE?
In most simulation that I have seen, an individual from species A can recognize any individual from species B or from its own species A with absolute certainty.
I suspect that often, this is hardcoded inside the program: « if x.species = A then … ». Even if B undergoes a series of mutations increasing its fitness, A might be able to keep up by developing corresponding counter-mutations – *because there is no choice*. A eats B. If B becomes « stronger »(more energy storage), only the strongest members of A will survive and reproduce, making the entire group of A stronger. If some members of B become weaker trhough muation, they will die.
Play the same scenario with a perceptual interface: A only detects and eats individuals that have a maximum energy storage of X. Usually these individuals are from species B. If some B mutate to get stronger, as far as A is concerned, they stop being food. They are not recognized as « B ». To survive, A might mutate to store more than X energy AND detect the new value of energy corresponding to B, but any other mutation is equally likely to help the survival of A: maybe detecting only lower levels of energy would work, if there are weak species around. Maybe exchanging the energy sensor for a speed sensor would help detecting Bs again, or any other species.
What if B become weaker? As far as A is concerned, B also stops being food because A’s sensors can only detect a certain level of energy. Not only B has several ways to « win » over A, but A also has several ways to survive despite B’s adaptations: by adapting to find B again, or by changing its food source.

You might object that the real world does not work this way. A cat will chase mice even if they get slower.
Or will they? Quite a lot of animals actually evolved as not to be detected by their predators using tactics involving slow motion, even if it means moving slower in general (like sloths) or in specific situations (playing dead).

In simulated worlds, going faster / becoming stronger is usually the best way to « win » at evolution.
By introducing perceptual interfaces, we allow the interplay between individuals or « species » to be much richer and original. What is the limit? If you have ever heard of an OEE simulation with perceptual interfaces, I would be very happy to hear about it. All the papers I found about simulated perceptual interfaces were purely about game theory.

In 1 or 2 posts, I will talk about how to make my model more fun and general, by overcoming some current shortcomings in an programatically elegant way. I’m not only theory-talking, I’m implementing too, but slowly.

Open Ended Summer Project

The first session I attended at ECAL2015 was the Open Ended Evolution (OEE) session.
It was one of the most interesting sessions, on par with the « Measuring Embodiment » session. Yet I didn’t start thinking seriously about OEE until 3 days later. It happened because of an accumulation of little nudges from totally unrelated talks. One game-theory speaker mentionned that their simulation used ongoing evolution, as opposed to some other genetic algorithms that choose the best individual and kill all others at the same time. A researcher in artificial biology said that parasites created ecological niches. I had a small discussion with my professor about the definition of « environment » in real and artificial worlds. And in the middle of Justin Werfel’s keynote about evolved death, I was suddenly scribbling a plan for a simple OEE project.

Open Ended Evolution

It’s easy to have tons of ideas when you’re illiterate in a field, like I am towards OEE: I don’t know what other people did before me, what worked and what didn’t, so each of my ideas sounds OK to myself. It’s easier to start something when you’re ignorant, than when you know how many people with smart ideas have already been working on the same problem (and failed). So today I will organize my ramblings here, before I become unable to read my own dirty notebook. I haven’t surveyed the field, which is not good; but I always prefer to build plans first and survey after – it gives me an idea of what I should be googling for. And it prevents me from being de-motivated before I have even started anything!
But first, what does OEE mean? If there’s one thing I learned from ECAL, it’s that nobody agrees on definitions except when they’re translated into clean mathematical equations.
There is no equation for OEE. A lot of people agree that Earth is probably an Open Ended world. It means that from the time life appeared, it has always grown more and more complex with time, constantly producing new innovations. Today we have very complex forms of life, which appeared from simpler forms of life as an effect of evolution and natural selection.
The dream of OEE researchers is to build such a world in simulation. But it is much more difficult than it sounds: judging by the critical talks at ECAL, no one so far has been able to build a simulation that produces more and more complex beings with novel adaptations. Or at least no one seems to have convinced others that their own model leads to OEE. What usually happens is that the simulated world evolves fast at first, then reaches an equilibrium where you no longer get innovation or increased complexity. There usually exists such a state, where each species reaches a fitness value that allows for equilibrium, and stays there. Imagine a world with only wolves and sheep. The wolves might evolve over generations to run faster and faster to catch as many sheep as possible. But if wolves kill all the sheep because they are so fast, with no more preys to eat they will starve to death. And your world will have no more sheep and no more wolves. A different species of wolves, living in a different world, might be eating enough sheep to survive but not so much as to make the sheep go extinct. Maybe the wolves are getting faster, but so are the sheep, allowing for an equilibrium. In both cases, you do not achieve OEE. The first world is dead; the second one has reached an equilibrium.

There are some hypotheses about why it is so difficult to achieve OEE. The oldest and probably most popular one says that simulated worlds are not complex enough. In the real world, you have millions of species interacting with each other, competing, collaborating, forcing life out of any long term equilibrium.
Therefore if we could build an extremely complex simulation with lots of computers, maybe we would get OEE. Some say that computational power is not the issue – maybe we just haven’t discovered yet how to design simulations allowing for OEE, and in that case using more computers will not solve the problem. Some say that the issue is not unbounded complexity, but unbounded novelty. But we don’t have unanimous definitions of « complexity » and « novelty ».

My own problem is the concept of « environment ». In many simulations, you have species of interest (like wolves and sheep), and around that you have « the environment ». Often, the environment is not submitted to evolutionary pressure, and sometimes it cannot be modified by the species. I am not quite sure why it is like that. Of course in real life, « the environment » is constituted by other organisms who also undergo evolution, and of non-biological features that can be affected and can affect the organisms around it.
My second problem is the concept of « fitness ». The fitness is a measure of how well adapted an individual is to its environment. In virtual worlds, evolution (almost) means that the fittest individuals will have more chance to reproduce than other less fit individuals. More often than not, it means that the species will reach an optimal, stable value of fitness (at which its stops evolving); or the fitness will oscillate around a constant value. I think it could be interesting to run a simulation where the environment would be subjected to evolutionary pressure, and where long term optimal fitness would not be reachable.
The second point is of course, the most difficult. We could start by making fitness an extremely local property – both in time and space, creating niche environments. Firstly , we could make inter-individual interactions such as to prevent any « good global solution » to the problem of survival. Secondly, evolution should be biased as to make mutants from newer species « fitter » than mutants from « old » species, without any reference to « complexity ». Finally, to ensure that the world would never be stuck, evolution should focus on affecting a net of inter-species interactions and the properties they rely on.

The Model

So here is the model.

First, I do not define species. Speciation will emerge from the model, as you will see. But for the sake of clarity, just in this explanation, I will give names to some groups of individuals. The static part of my « environment » is just a map (Hey! You’re doing exactly what you said isn’t good! (Not really, see next paragraph. But yes, I will change that in the next draft)). A 2D space with a grid and cells, like cellular automatons, very classical. The rules of the model are: there are individuals on the map. Each individual can store energy. An individual with 0 energy dies. Moving to a different cell requires speed and costs energy. Producing offspring costs energy. Sensing costs energy. Energy can be gained from parents when being born or by eating other individuals (but eating also costs a small amount of energy). Each individual can have simple « sensors » to detect other individuals who are on the same cell: a map of conditions on « others » to actions (if « property »= »value » then « act on individual »; example : if max energy<7 then eat individual). The complete set of actions is eating, mating, running away. If two individuals try to eat each other, the one with highest current energy level will win. New individuals are produced by cloning (with a probability of mutation) or by mating (randomly mixing the properties of parent individuals) when the parent(s) energy has reached its max value.

(Beware, in the following I absolutely do not care about the realism of the simulated world – light is born and dies, reproduces and turn into animals.)
At the beginning of the world, I place on some patches of cells (not all cells) individuals (one per cell) with these properties: the individual cannot move, has a random lifespan and stores a maximum of 1 unit of energy. If for some reason one of these individuals disappears from a cell and the cell is left empty, it is immediately replaced. Because each of these replacing individuals is essentially born from nothing, they represent a constant source of free energy in the model – I call this group of individuals « light ». Light is defined as: [speed=0, max energy=1, current energy=1, number of offspring=1, reproduction cost=1, sensors=[if max energy=1, eat], lifespan=random]. The reproduction cost defines how much energy will be transmitted from a parent to its offspring. You can see that when a « light » is born at a timestep, it has already reached the reproduction cost. It will therefore produce an offspring at the next timestep, but then (current energy – reproduction cost) will be 0 and the individual will die. In short, each light individual will clone itself and die at each timestep. Furthermore, light can eat other light that appears on the same cell but it has no effect except preventing 2 light individuals to exist on the same cell. (which will « eat » which is decided at random – either way the winner’s current energy cannot change and become greater than the max energy.)

Mutations randomly happen each time a light individual is cloned. Mutations can increase or decrease the max energy, the number of offspring, the reproduction cost, the sensors map. It is easy to see that in these conditions, the only set of mutations that can at first lead to a viable branch of individuals is an improved energy storage (max energy = 2). These mutants will then often be able to eat the « normal » light, as they can store more energy. This branch gets energy from light, stores it, reproduce but does not move – I call it grass. Of course, it is not a proper species – it is not very different from normal light and a mix of the two, if they mated, would still be viable.

The next successful mutation could be storing more and more energy (leading to slower reproduction but competing with grass for light resources… shall I call it trees?), having more offspring, or more interestingly, moving. Moving costs energy, but it is also a good way to store lots of energy without having to wait standing at the same place. Storing energy fast would allow the individuals to afford more sensors, more offspring who are born with higher energy levels. On the other hand, eating only light should not provide enough energy to move at reasonable speeds. Eating grass or tress would be better, but then you need a mutation in your sensors to be able to detect grass.
This group could be our first herbivores – since it costs less to have 1 sensors than 2, completely herbivore species would likely be more successful than species having a sensor to eat light and another to eat grass. If we let this world evolve for long enough, we might see interesting things.

So What?

First, you can see that evolution in this world is directed: grass can appear once, then trees may appear from grass, then herbivores appear from trees, but not the other way around. Secondly, the interactions do not rely on species identification. They rely on the properties of the individuals. You cannot detect a « species », you must rely on detecting how much energy an individual can store, or how fast they can move. Therefore a strategy that was successful at some point in time and space might be(come) completely irrelevant in another niche or at another time: you can detect « grass » by its speed of 0 and try to eat it, but then come across a giant tree with speed 0, but much more energy than yourself (and you’ll die trying to eat it).

What could « open ended » mean in that world? I guess it could simply be the fact that what defines the state of the world at some place and time is only the interactions between individuals, and not the particular species existing there. Therefore there is a tiny hope that the world will continue to be « interesting » for some time, as each change in interactions (due to a mutation or to niche merging/niche division) would force the complete sate of the world to change.
Infinite possibilities for species or individuals is not necessary, as even a limited amount of both can lead to virtually limitless type of interactions between all different kinds of individuals.

A world about speciation in this simulated world. If we except the number of sensors, all individual have the same structure for their « genome » (the set of properties defining an individual). So any pair of individuals can, provided they can detect each other, reproduce and give birth to offspring that would inherit a mix of its parents’ genomes. But if the parents are very different, the offspring might end up with properties that does not allow it to survive. For example, a mix of « parentA’s low energy storage » and « parentB’s high moving speed » would die almost instantly by using all its limited energy at once.
Therefore, we can use the most basic definition of « species » in this world: if 2 individuals cannot mate, or if their offspring cannot reproduce, then the two « parents » individuals are from different species.

  • So, could this world be open ended? Let’s be realistic: it is still a very simple simulation, and you shouldn’t count on 1 month of programming by an excited student to suddenly solve one of the oldest problem of ALife. So no OEE for you just now!
  • Then what is interesting in this simulation? I think I will have fun seeing big innovations emerging (motion, reproduction…) and their consequences; it will be interesting to see conflict emerge between the big groups of similar individuals, and to compare the diverging evolutionary paths in two niches separated from each other. I predict that there will be mass extinctions (wolves invading a niche of sheep? leading to an invasion of grass? and the survival of only grass-eating wolves?), and even several false starts before I even see the emergence of herbivores. I am also curious to see how the different sensors will be combined to find a heuristic to the big problem: how to identify who I should eat/avoid/mate with if there is no proper concept of species in this world?

In conclusion… I have a fun project for this summer vacation.

Consciousness and the Unconscious as a Memory Management Tool

A few months ago, I wrote something like « I am not interested in consciousness. I don’t even get what is supposed to be interesting about « consciousness ». There is no definition of consciousness that does not sound new-agey. « The issue of consciousness » does not exist ».
It was a radical reaction to a deluge of pseudo-spiritual talking about consciousness. « Consciousness is… is what I am feeling now » « Consciousness is subjectivity » « Consciousness is qualia ». I don’t like this kind of discussions, because the conclusion is always that we cannot scientifically study consciousness, as it is such a « spiritual » whatever it is.

Then I went to a talk given by a scientist who is seriously, and scientifically, trying to study consciousness using applied mathematical methods (along the vein of Tononi, if you’re motivated enough to look it up). I  disagreed with almost all his hypotheses and even more with his interpretation of experimental results, but it was a very exciting talk. No new age. No nonsense. Just theory and experimental results. I salute and admire the courage that you must have to bring science where everyone is too afraid to look. For the first time, I though that studying consciousness might be interesting, after all.

And suddenly, here I am. From next week I’m most likely getting involved in a project that heavily features consciousness and robots. Diving into that field feels like looking for diamonds in sewage water. Someone who feels so strongly against this kind of stuff should be the last one to start such a project, right?
Wrong. I want to do it precisely because I don’t want the new-agey people to corrupt the subject of « consciousness » further. If there is any scientific truth in there, we can’t let it to rot. And if there’s nothing, at least we will not have dismissed the hypothesis without investigating, which would be somewhat unfair. That’s why you need me in your consciousness project: I won’t accept it to fall into the abyss of nonsense gaping at us; if it’s crap I’ll tell you and go do something else. If you have a radical skeptic in the team, it’ll give you an idea of how real people will react to your work when it becomes public. It doesn’t mean that everything I say will be right, just that you might hear the same criticisms from your opponents. That’s the advantage of having an enemy on your side.

So yeah, anyway, what do I mean by consciousness if I’m not talking about religion or « alternative medicine » or parapsychology? To talk about consciousness, it is often easier to first talk about the unconscious. I am not talking about the Freudian vision of the unconscious: if you have read anything by or about Freud, you know that the guy was just really obsessed by making everything about sex and had nothing scientific to say about anything. Unfortunately, just like consciousness, the unconscious too has no definition that everyone agrees on.
So here is what I mean by unconscious: all the processes that you run through mostly automatically. I am not including really trivial reflexes like sneezing, breathing or the beating of your heart. I am talking about when you are so used to doing something, that you don’t have to think about it anymore: it has become (almost) automatic. When you’re walking, you don’t think about moving all the muscles of your legs with the right strength at the right timing. You consciously decide to go for a walk, but the basic processes involved in walking are unconscious. If you use computers a lot, the motion of your fingers on the keyboard may also become unconscious. When you’re eating, the detail of the motion of your hands is unconscious.

Of course, it doesn’t mean that you don’t know what you are doing. On the contrary, you know it so well that you are able to do it without paying attention. Only if something seems to go wrong will you start paying attention again.

Being able to run some processes unconsciously has some obvious advantages. First, if you don’t have to think about something to do it, then it exerts less of a strain on your brain: you are able to direct your attention elsewhere. If your brain uses less resources, it also means that it is using less energy, which sounds evolutionary advantageous. Automatic processes are also much faster than processes that require a lot of thinking or a lot of monitoring, so they are more efficient in terms of reaction time.

If unconscious processes are so great, then why have conscious processes at all? They are slow, cumbersome and require more energy. My hypothesis is that unconscious processes have to be learned from conscious processes first because we live in a complex environment. For situations where there is only one « right » way to behave, this behaviour is likely to be mostly encoded in your genes for example, so you are able to do things « right » without going through complex learning procedures. Most people know how to chew food even before they have grown their baby teeth.

But when the definition of « right » behaviour itself is complex, you will have to learn it and update this definition if necessary. Maybe something is only right to do in some very precise conditions. Maybe you have to learn a behaviour that your ancestors didn’t have to use. Maybe « right » depends on the people you are with, on your culture, on the environment you live in. In all these situations, having an automatic behaviour from the start is either going to get you in trouble, or is just not possible because you cannot encode every possible reaction to every possible situation in a single human brain.

To introduce my opinion about the role and meaning of consciousness, I need to talk about models. When you learn a new behaviour, you try to build a model for it. Say you’re learning to ride a bicycle. At first, you have to think a lot about what to do: if the bike is going to much to the right, push on your right hand to make the handle turn to the left, to cause the bike to go to the left. But not too fast or the tire will turn in a way that will make you fall down.

This process is not instinctive at all. At first you might get the causal links wrong and turn the handle to the right while you really wanted to go to the left. But you end up with a model of how to control the bike, a group of rules going like « if I do this, that happens. If I want to do that, I should do this. ». As long as you are still building the model, things aren’t going so great. Then your reactions get more instinctive. You don’t have to think that much anymore, but you might still refer to the model from time to time. Then all your reactions become instinctive and you don’t need the model anymore, to the point that you don’t use it, you completely bypass it and it eventually disappears. Your body reacts to a selected set of inputs with a learned set of outputs. The process has gone from conscious to unconscious, and very often it even bypasses your memory. Type « computer » on your computer’s keyboard. Which fingers did you use and in which order? Can you answer this question without having to « replay » the sequence in your mind, just from memory? I sure can’t. Unconscious processes free your brain from models that became useless, and they don’t clutter your short and long term memory. Of course, if you think about it, it makes sense to turn a bike’s handle to the right to make it go to the right. But this is just knowledge derived from common sense and your experience of bikes, not a model that you have to use each time you ride a bike (thanks goodness).

And now about the role of consciousness. Automatic processes are great, but they are also very dangerous. First, because you are not paying attention when executing them and just anything could happen without you noticing (ask factory robots; but I am sure you had similar experiences as well). Secondly, because they are not able to react to unexpected situations. As long as the inputs are in the normal range, your process will happily produce outputs. But I see two cases that might require your consciousness to suddenly snap back in place: when an unexpected value appears and disturbs normal functioning of a process, and when two competing unconscious processes try to run at the same time. I see processes as separated entities that cannot directly communicate with each other. Several processes can run simultaneously, but if two sets of inputs trigger two distinct processes, conflict might appear in the resulting outputs and disrupt one or both processes. When a process faces unexpected values, or when two processes are competing, you must consciously decide what to do next. I think that consciousness is that thing which conveys information between processes. If an unexpected input is spotted during process 1, this unexpected information must be made accessible to all other processes. One or several processes tuned to this specific input will be kicked off. The role of that information road is to allow information sharing; therefore it must be unique to avoid conflicts and redundancy. My current hypothesis is that this flow of information is consciousness, and this is why you never feel like « having two consciousnesses » in your brain. Consciousness is by definition, unique.

Accordingly, the difference between dreaming and being unconscious (in the medical sense of unconscious) would be that information is flowing during dreams, but not when you are unconscious. So no memories are formed during total unconsciousness (like deep sleep or being knocked out), but you can still have memory of your dreams.

But hey, where is the shared information coming from when you’re dreaming? Why would your unconscious processes run if there is no input? Well, the input does not need to come directly from the outside world. As consciousness is supposed to deliver information between processes, it means that consciousness can take the output of a process and present it as input to other processes. That could be what happens during dreams. (This raises many questions: why is deep sleep (the dreamless phase of sleeping) necessary? Why does it require your consciousness to be out of the way? And what happens when you go from deep sleep to REM (when dreams occur)? How does consciousness come back? Why doesn’t it work for people in a coma?)

I also think that processes cannot access memories, but as a trasmitter of information, consciousness can. It is probably needed to build the models that will finally transform into unconscious processes; these models should most likely be built from existing knowledge and associations from existing processes.

These ideas might sound to you like the famous theory of consciousness as the result of information integration. If I remember well, that theory says that information from different types of sources (eg, color and shape; position and nature of an object…) can only be integrated consciously. This is not at all what I am saying. In my view, if color and shape are necessary inputs for a unconscious process, they will be « integrated » (used simultaneously) by that unconscious process. It does not require your consciousness.

This post is already a bit long and there are still blank parts in my ideas. Is a computer conscious just because it has distributed processors and a bus to share information? What exactly is the relationship between consciousness and memory? Why must sending info from one process to others automatically give access to memory (why can’t you have information sharing that does not leave a trace in your short term memory)? What about split-brain persons? There is information flowing independently in the two halves of their brain; why don’t they feel two consciousnesses? Why is only one half of the brain doing things consciously ? (I suspect that the fact that only one half of their brain seems to have access to memory is a hint). Why do some people not remember their dreams, and some do?

Finally, to be falsifiable, a theory needs to make new claims and predictions. Here is a tentative:

  1. All animals that exhibit learning have consciousness and therefore are conscious beings (of course we need a way to measure consciousness to check this).
  2. Consciousness has nothing to do with your ability to move, to react to external stimuli or to integrate information. This is in complete disagreement with what Google will tell you (Consciousness: the state of being aware of and responsive to one’s surroundings.). To prove this we could try to rely on people who got out of, or are still in a coma, for example.
  3. When an unconscious process is perturbed by unexpected input, information flows to reach all other potential relevant processes; only some will be started (this would be difficult but not impossible to check using brain imaging; it would require to define precisely what exactly we are looking for…)
  4. The transformation of a conscious process involving a model to an unconscious model-less process must be measurable objectively and subjectively: a process that links directly inputs to outputs without the use of a model must be unconscious. Forcibly making it conscious by paying attention to it is only possible by involving the use of model(s).
  5. Unconscious processes can run independently of your consciousness; therefore a person who is in an unconscious state should still be able to react unconsciously if you find the right set of input stimuli (if they are not paralyzed and can still receive inputs from outside). Maybe blind sight is linked to this point? And sleep-walking.

In conclusion, the unconscious could be tool for space and energy management in the brain; consciousness would be used « on top of it » to access memory and share information. The pair conscious-unconscious would be the result of the need for memory management.


Recevez les nouvelles publications par mail.