A few days ago I had to give a presentation about my master thesis work, so I decided to write a blog post about it, hoping it can get somebody interested in the topic and to spread the word about the interesting work being done there.
Evolutionary robotics is a branch of Robotics that uses evolutionary computation to develop controllers for autonomous robots. A candidate controller is a set of parameters influencing the behavior of the autonomous robot. Genetic algorithms operate on this candidates through operators such as crossover, mutation and selection according to a fitness function. The parameters often describe a neural network (i.e. the weights or network topology) and avoid any kind of imperative top-down programming.
Evolutionary robotics is often used for learning and adaptation in embodied and situated agents. Embodied and situated means that the behavior of the robot is a result of a continuous interaction between the controller, the robot's body and the environment in which it's located. The approach differs from traditional robotics in the way that the control system is not designed a priori and then uploaded into a robot which is later introduced into the environment. On the contrary, the candidates always learn and are evaluated inside the robot in the final environment, with the real sensory information. The continuous interaction between the robot and the environment, with its noisy incomplete sensorial experience, pushes an emergent and robust adaptation of the control system to both the features of the robot (embodiment) and of the environment the robot is put in (situatedness).
In fact, the fitness function does not describe the behavior that the designer desires but it only measures the goodness of the behavior being observed. This allows the system to develop autonomously an emergent behavior through a (global) trial-and-error by exploiting the characteristics of the world.
My contributions were:
(1) the development of an individual learning algorithm based on the Simulated Annealing.
Individual learning can be implemented by SA through a sequence of mutations from an initial state (the candidate random initial parameters). After each mutation, the new parameters are accepted if they bring an improvement to the fitness function. They can also be accepted by a probability p (also known as temperature) even if they degrade the fitness. The probability p decreases with the increasing of the number of learning cycles. The role of p is to avoid premature convergence to a local maximum.
In ER, each behavior (candidate) has to be evaluated multiple times with different initial conditions (i.e. by putting the robot in a different place inside of the arena). The more times the robot is tested, the more robust the fitness evaluation will be (it will be reduced the odds of a lucky test). This requires a computational effort. By decreasing the number of evaluations, the probability that an under-fitting candidate looks fitting is increased, rising the probability to accept a degrading mutation. As this role, in SA, is already played by the temperature p, we can save computation by avoiding useless evaluations that bring us to a correct fitness function that we might ignore with probability p. In fact, we can obtain the same temperature behavior by evaluating the candidate fewer times at the beginning of the learning phase (so increasing the probability p of having an under-fitting candidate look fitting) and more times in the end, when we want to accept mutations only based on robust evaluations (the number of evaluations you would have used for the whole process with the baseline algorithm). This brings to the same result as baseline SA but saves about 50% evaluations and therefore computations.
(2) the development of an algorithm for social learning and cultural evolution.
Social learning refers to a process in which agents learn new skills by interacting with other agents. Social learning should allow the learning individuals to acquire a strategy or behavior in a shorter time than would be necessary by individual learning (avoiding reinventing the wheel every time). Social learning can also be defined by its uncertainty-reduction function: it allows agents to gather information about the environment without taking the risks and costs of trial & error, rising its chances of survival. Also, social learning would be an important actor in what is called "cumulative cultural evolution", a process in which strategies are transmitted generation after generation, accumulating variations and improvements, leading to strategies that would be too difficult to learn individually.
With the rising complexity of robot systems, it is more difficult to look for a suitable solution, through trial & error, inside a solution space that is constantly increasing in size. Therefore trying to acquire a behavior starting from an adaptive one, might allow to drastically reduce the search area inside the solution space, as social learning would guide the search towards its promising areas.
Consider a control system defined by a set of free parameters. In computational terms, acquiring an adaptive behavior, exhibited by another control system (the model), through social learning means searching a set of free parameters that allow the learning system to exhibit a strategy that is similar to the experts'. Learning the strategy starting from an existing one, instead of searching through trial & error, would expedite the learning process, considering that the information provided by the expert control system is richer than the information provided by the environment and the fitness function. It would lead the search of free parameters into promising areas of the search space.
Different forms of social learning exist in nature and many of them aren't true imitation: (1) social facilitation, something like "eat only if there's a conspecifics around you, otherwise watch out from predators" (2) contagious behavior, like "flee away of your conspecifics flee away" (3) stimulus enhancement, i.e. "follow an adult and learn as much as possible during that time by interacting with the environment around you". We wanted to model social learning through a simple model of imitation or social influence, so avoiding top-down approaches that require the student robot to understand the inner goal of the expert agent, match a particular behavior and translate it into its own perspective. So we decided to model imitation as the individual learning of the behavior of the expert agent, in a way similar to stimulus enhancement.
Instead of measuring the goodness of the strategy, the new fitness function would measure the average difference between the actuator commands of the student compared to those of the expert, given the same sensory input. It would be like putting the student on the expert's shoulders and measuring how different their decisions would be in each condition. Therefore the social learning algorithm would put pressure on the student to act like the expert solution, but be ware that no pressure was put on having similar parameters!
What we realized is that just learning through imitation would produce an under-fitting agent. The student would basically produce a prototypical behavior, so something that could "quite resolve" the problem, but it would get lost in the details. For this reason we introduced a hybrid form of social learning: the new fitness would be a weighted sum of the ability of the student to imitate the expert agent and of the ability of the student to solve the given problem. The weights of this new hybrid fitness would change during the learning process and would give more weight to the first component at the beginning of the process and more weight to the second one towards the end. So the student would be able to acquire a prototypical strategy from the expert agent at the beginning, and optimize it through "its own experience" towards the end. This gave us quite good results: the student could acquire an adaptive behavior in 66% learning cycles compared to the individual learner, matching our initial hypothesis.
As a possible interpretation of how social learning works in our models, consider that two similar behaviors can be implemented by two sets of very different free parameters. For this reason, acquiring a similar behavior to the experts', through social learning, could push the student to find a solution that might be distant, in the parameters space, from the experts'. As the two solutions could be far from each other in the parameters space, also the morphology of the student solution neighborhood would be different (i.e. without areas of local minima that could characterize the area around the expert solution). Therefore the social learning process could be considered as a technique for the selection of the initial conditions in a parameter optimization problem or as a function that would allow to jump out of local minima.
Where do you go from now?