EvoMan


Publications


  1.  Introduction

One of the common artificial intelligence applications in electronic games consists of making an artificial agent learn how to execute some determined task successfully in a game environment. In this research, we introduce a framework for testing optimization algorithms with artificial agent controllers in electronic games, called EvoMan. The environment can be configured to run in different experiment modes, as single evolution, coevolution, multi-evolution and others. To demonstrate some challenges regarding the proposed platform, as initial experiments we applied Neuroevolution using Genetic Algorithms, the NEAT algorithm and the LinkedOpt algorithm.


2. EvoMan: Framework

Code at GitHub (python)

The EvoMan framework is an environment for evolving game playing agents for action-platformer games inspired by the classic Mega Man II.

This framework contains eight different enemies against which the player agent must learn how to fight and defeat, by performing a sequence of the following simple actions: move left, move right, jump, release jump and shoot.

The game screen is composed of a rectangular area which may contain some obstacles depending on the stage. When the game starts, each character (the player and the enemy) is positioned in opposing sides of the screen.

At every time step, the player and the enemy can perform one or more combined actions to interact with the environment and their opponent, having the goal of defeating them.

Both characters start the game with 100 points of energy, which decrease whenever they get hit by a projectile or hit each other. The character who reaches 0 points first loses the battle, making the other one the winner.

The eight pre-programmed enemies perform their own distinct attacks, which are stronger than the attack of the default player. They present a standard rule-based behavior mimicking the original game.

The game is composed of discrete time steps and at every time step the player can read from 20 different available sensors:

  • Vertical and Horizontal distance: the absolute distance between both agents in x and y-axis (total of two values)
  • Direction: the direction each character is facing (total of two values)
  • Projectile distances: the absolute distance on x, y- axis for each of the adversary projectiles towards the player. The game allows a maximum of eight projectiles at once (total of sixteen values).

An illustration to the available sensors is given in Figure 1. Besides these values, the API also provides the current energy of both players, to be used by the objective-function (fitness) of any optimization algorithm.

Screen Shot 2017-04-05 at 00.30.22
Figure 1

The framework allows running experiments using the combination of different simulation modes (Figure 2):

  • Human Player: in which the player-character is controlled by some human input device (i.e., joystick).
  • AI Player: in which the player-character is controlled by a machine learning algorithm.
  • Static Enemy: in which the enemy-character adopts a rule-based fixed attack/defence strategy, based on the original Mega Man II.
  • AI Enemy: in which the enemy-character is controlled by a machine learning algorithm.
Screen Shot 2017-04-05 at 01.35.21.png
Figure 2

At every time step during a game run, after receiving the value for each sensor (game state) the agent may perform up to 5 actions: left, right, shoot, jump e release. The release action is used to mimic the releasing of a joystick jumping button, interrupting the jumping process. The enemy agents perform additional actions whenever they present different attacks. These actions are labeled shootN if N ranging from 1 to 6.

The regular flow of an experiment using the framework is depicted by Figure 3. After each game run, the value of the objective-function is returned by the framework to the optimization algorithm being tested.

Screen Shot 2017-04-05 at 00.32.13
Figure 3

 3. Experiments 1: Generalized Strategy

The first part of the experiment assesses the capability of the neuroevolution algorithms to find an specific winning strategy for every single enemy of this framework. The second part verifies whether the algorithms can evolve a generalized strategy for different enemies. The simulation modes used for the framework in these experiments were AI player and Static Enemy, in single- and multi- evolution.

Since the Genetic Algorithm (GA) and LinkedOpt (LO) algorithms will evolve just the weights of the Neural Network, we have tested the neuroevolution with these two algorithms with the Perceptron and one-hidden layer with 10 and 50 neurons topologies. The NEAT algorithm evolves the topology together with the weights, and as such just a single version was used for the experiments.

For the first part of the experiments, the fitness function was defined as:

Screen Shot 2017-04-05 at 17.32.11

where p, e are the current energy level of the player and the enemy, respectively, and t is the total number of time-steps used until end the fight.

In Table I the final energy of each agent after the end of the fight is reported. A value of 0 means that the agent has lost the fight. In this Table we can see that every enemy in this framework is beatable by any algorithm given the correct network topology. NEAT algorithm had a much better performance when compared to the other two algorithms, being surpassed only in two occasions by the LinkedOpt algorithm.

Screen Shot 2017-04-05 at 17.40.04.png
Table I

For the second part of the experiments we performed the training stage of the Neural Network through Neuroevolution by using sets of two and three enemies (denoted training enemies). In order to do so, we will calculate a new fitness function that is equal to the average of the obtained fitness for each one of the training enemies:

Screen Shot 2017-04-05 at 17.43.30.png

After the evolution process, the evolved agents were tested on the remaining enemies (hereby validation enemies) with the objective of maximizing victories.

In Table II, the average gain (final energy of the agent) achieved by each strategy is reported. From this Table we can see that NEAT performed better on three of the seven tests, GAP obtained the best result in one of the tests, GA10 was best in two different sets and LOP achieved the best result in one of the tests. The best gain was obtained by GA10 on the training set (7,8).

Screen Shot 2017-04-05 at 17.51.44.png
Table II

Tables III and IV show the gain by enemy for the training and test sets. The best achieved result for this set was obtained by GAP with the training set (7, 8) with a total of five victories.

Screen Shot 2017-04-05 at 17.58.04
Table III
Screen Shot 2017-04-05 at 17.58.12.png
Table IV

The experiments showed that every enemy in the framework is beatable by simple Neuroevolution agents. On the other hand, a general strategy proves to be a challenge and not yet achieved by the tested algorithms. The best general strategies were obtained by Genetic Algorithm and LinkedOpt algorithms, with both beating a total of five enemies.

The following are videos with the execution of the best result obtained by NEAT, Genetic Algorithm and LinkedOpt against each enemy was recorded:


4. Experiments 2: Coevolutionary Algorithms

Before assessing the behavior of a coevolutionary competition, we have performed evolutions using the mode AI Player Vs Static Enemy in order to verify the capability of each algorithm beating every enemy. Afterwards, we experimented with the mode AI Player Vs AI Enemy, in which every agent (player or enemy) had a limited number of iterations to evolve a better strategy than their opponent.

The experiments were performed using an ANN with a single layer having its weights evolved by a standard Genetic Algorithm, and also an ANN evolved applying the NEAT algorithm.

At first, in order to generate a baseline, we evolved the player agent against each enemy, separately, fixing the enemy behavior with an heuristic approach. Next, we performed experimentations of Co-Evolution between the main player against the enemies. In this set of experiments, we alternated the evolutionary process of each population by three generations per turn.

In Table V we can observe that there does not seem to be a correlation between winning against the heuristic enemy and how easy it is to co-evolve against it.

Screen Shot 2017-04-05 at 18.26.24.png
Table V

 This Table shows the only beatable enemies by co-evolution are AirMan, MetalMan, BubbleMan and QuickMan. Though WoodMan was beatable during the co-evolution, it was by a small margin. It is worth considering that the enemies within this game have the advantage of possessing more complex forms of attack which may hit the main player more easily.

Nevertheless, the final analysis of the co-evolutionary systematization may not reflect the correct performance towards implicate agents, since there is no guarantee that the point in time when the performance was measured corresponds to the best performance achieved by the learning algorithm.

In order to assess the overall behavior of the co-evolutionary approach, we have plotted in Figure 4 the evolution regarding the energy of the main player (suffix p) against the enemy (suffix e) throughout the generations, for both Heuristic and Co-Evolutionary experimenting approaches.

Screen Shot 2017-04-05 at 18.32.55.png
Figure 4

From this Figure we learn that the easiest enemy to beat is AirMan, whereas in the Heuristic experiment the agent could learn how to win at an early generation. In contrast, this is the one that shows the best arms race during the Co-Evolutionary experiment, in which the winner agent alternates together with the alternation of who is evolving. On the other hand, we can notice that against FlashMan, the agent reached a good result only at the later generations, meaning that it required a longer number of generations to learn how to beat it. This reflects into the Co-Evolutionary experiments, where we can see that the agent can learn how to attack it, but never actually manages to beat it. This behavior is similar to the one observed regarding WoodMan and HeatMan. When playing against MetalMan, we can see that even though the agent can evolve at early generations on the Heuristic experiment, the NEAT algorithm cannot maintain the perfect solution as observed about AirMan. The CrashMan and BubbleMan enemies follow this same trend. Finally, QuickMan presents a similar behavior to AirMan, for which the agent can learn how to beat it at early generations and maintain this solution while in Co-Evolutionary experiments it maintains an arms race against the enemy.

Regarding the co-evolutionary experiments, we learned that the main player may require additional generations of advantage in order to generate an arms race against some of the enemies.

 

The following is a video with the execution of the coevolution between the enemy AirMan and the player EvoMan.

5. Conclusion

Overall, this framework can help the Computational Intelligence in Games community to test different tasks within a controlled environment, such as: finding a general strategy able to defeat every opponent, a constrained co-evolutionary approach in which the enemy behavior evolves in such a way that its difficulty is always challenging against a human player.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s