Automating Swords & Souls training — part 2

Gergo Bogacsovics
10 min readAug 26, 2020

In today’s article, we will continue automating the minigames in the Swords & Souls flash game. Last time, we managed to create an agent that is unbeatable in the block training since it does not make any mistakes no matter the number and speed of the apples. It was a really fun experience not to mention that we basically got to create an unbeatable character due to the huge amount of health and defensive power acquired from the level-ups and stat-ups. But one thing was still lacking. Power.

Our character desperately needs some juice. Photo by Jonathan Borba on Unsplash

Luckily for us, the sword training mini-game can help us gain some decent attack power. The training process, however, is still as boring in the long run as ever. So let’s figure out how to automate it!

Before starting

Before starting I would like to point out that this mini-game proved to be more challenging than I initially thought which resulted in me trying out many things to — hopefully — achieve better results. Therefore, after the problem description, we will briefly go through each of these methods and talk about their strong points and/or faults.

For mobile readers

Another important thing I’d like to mention before you dive into the article is that I’ve included several GIFs that are relatively big. Therefore, I’d like to warn anyone with a limited amount of mobile internet to either use Wi-Fi or save the article for later and read it on your computer.

Problem Description

There are two different kinds of objects to look out for during sword training: the apples and the stars(just like in the previous mini-game). The only differences are that a) now we are not to block the apples but slash them and b) the stars can only come falling from behind the character and we need to kick them when they are within reach. The dummy keeps on throwing these red abominations towards us in one of three different possible trajectories: he can throw them upwards, towards us, or from a ground level. This means that we will need to be able to categorize all the incoming apples into one of these three groups meanwhile also keeping an eye on the falling stars.

The sword training environment. If you are perceptive enough, you may already notice a phenomenon that will make automating this kind of training harder than the previous one.

Method 1 — Closest red pixel (Déjà vu)

This method is roughly the same as the one used in the previous article. We slash in the direction of the closest red pixel. This means that the method can be broken down into the following steps:

  1. Take a screenshot.
  2. Filter out all the red and yellow pixels.
  3. Measure the distance between each pixel and the character.
  4. Choose the closest one.
  5. If that pixel is backward, then kick back and go to step 1.
  6. If that pixel is above our character, then slash upwards and go to step 1.
  7. If that pixel is below our character’s center point, then slash downwards and go to step 1.
  8. Otherwise, slash forward and go to step 1.

Simple, right? Frankly speaking, I had high hopes for this method despite its simplicity but the result was … underwhelming to say the least.

Method 1. The agent performs poorly. Also, notice how there are way too many unnecessary slashes.

I’ve run this script multiple times and the results were the same. I personally hate this mini-game because I’m awful at it but even I can do better than this. After running the script a few more times and recording the results, I started examining them by replaying the videos in slow motion. I was looking for some kind of pattern that occurred in most of the experiments just before the agent’s mistakes. After a bit of looking around, I found a common problem: animation.

Just look at the GIF above. What happens when an apple is cut? Does it disappear immediately? The answer is no. It starts shrinking yes, but only slowly fades away. What’s even more troublesome is the fact that there are some small red particles scattered all over the place after cutting an apple. This generates a lot of noise for our agent. Why? Because these particles start drifting away and some of them end up being even closer to us than the apple that we previously cut. This results in the agent still slashing in the same direction. And again. And again. Until these particles completely disappear. This can lead to the noise overshadowing the real apples that are heading towards us, ultimately stimulating our character to make wrong decisions and therefore getting hit.

Method 2 — Rule of Majority

Since I wanted to keep things as simple as possible (as always), I tried out a slight modification. Here is the basic idea. As I wrote earlier, the particles can overshadow the “real” incoming apples. Okay. But the size of these particles tends to be smaller than the apples themselves. So instead of taking only the closest pixel, we will group each (red or yellow) pixel into its separate group: up, down, forward, or backward, depending on their positions related to our character. Then we will slash in the direction that has the most “votes”. One more important thing to keep in mind is that we don’t want the pixels that are too far away to affect this voting, so we will eliminate all the pixels that are not in the character’s reach by defining a constant look-ahead distance and cutting off all pixels from that point.

This version delivers a way better performance than the previous one. But can we do better?

This method worked a lot better than the previous one, so I started fine-tuning the different parameters, such as the look-ahead distance, sleep between actions, vote thresholds (the minimum amount of votes needed to perform any action), etc. Plus I decided to ignore all the yellow pixels (stars) and instead simply focus on the apples and how we can increase the normal combo. This got our character to reach a whopping combo of more than 100. This meant that we basically succeeded in automating the mini-game because even though the character makes mistakes sometimes, it is still way better than me at least, not to mention that we can keep the script running for a bit and the next time we check on our character, he will have gained lots of experience and stats. But I wanted the agent to do even better, so I moved on to another approach.

Method 3 — Object Detection (with contour)

We’ve only worked with pixel-level information so far by filtering the red and yellow pixels, then either taking the closest one or running a vote regarding them. But could things improve at least a little bit by going one step further: detecting the apples/stars as a whole, as objects? This section aims to answer this question.

Object detection is usually done with neural networks but then we would need to gather the data for training the model. Honestly, gathering the data, labeling it precisely, and then looking for any faults in it (inbalances, repetitions, etc.) is a tedious task, not to mention the task of training an unbiased and highly accurate network. Another solution however is using the contour of the objects. The idea came from one of Adrien’s blog posts at pyimagesearch where he manages to track a ball with this method. If you are not familiar with the blog, I highly recommend checking it out as there are numerous relevant posts about solving different fun tasks with computer vision. As for the code, we will extend it a little bit so that it can deal with multiple objects as well as tune some parameters and make a few little changes but the main idea remains: first we filter the red/yellow pixels, then grab the contours of the objects so that we have access to the center of these objects. Armed with this knowledge, we can simply move in the direction of the closest object.

Method 3 in action.

As you can see, the results aren’t bad at all, but the agent still makes mistakes every now and then. So how can we further improve our agent? After carefully looking at the agent’s reactions, we can see that he makes too many unnecessary moves: he strikes the same apple roughly 3–4 times or even more in some cases. Why is that bad, you ask? Because if you observe the environment carefully by playing the mini-game, you will notice that during the animation phases the agent is practically helpless: it will not be able to move or do anything until the animation ends. So if you execute the same move 2 or more times, you will create a micro-lag and therefore have less time to counter the next incoming apple, which ultimately leads to worse performance due to the shortened reaction time. Which brings us to the last method described in this article.

Method 4 — Full retard

As described above, the root of our problem is the relatively high number of unnecessary slashes. To bypass this, we will restrict the movements of the agent so that it cannot repeat the same movement for the next x milliseconds. During this interval the apple that we cut will be much smaller or even disappear, meaning that our agent will simply ignore it. Truthfully, I have tried eliminating this method by fine-tuning the different thresholds and parameters of Method 3 so that only the uncut apples are detected but that did not work. I’ve also tried introducing subtle amounts of sleep to the script but that led to worse performance as during these periods the agent was left defenseless. That means that limiting the agent to move only at set intervals will not quite cut it. On the other hand, we still need to tell it to stop poking in the same direction over and over again. So what can we do? Well folks, fasten your seat belts because we are going to introduce threading to our script!

There is no turning back now. With this method, we will go full retard. GIF from tenor.com.

Threading is a way of introducing parallelism to the code. Of course, many of us hate dealing with threads due to their nearly-unpredictable and hard-to-grasp nature (unless we are programming in Go), but for this mini-game, this will improve our results both visually (fewer movements hence a more professional-looking agent) and performance-wise (higher combos). The main idea is that we forbid the agent to slash in the same direction for a given interval. We will achieve this by declaring a global list, where each element is a permitted action at any time step t. After the agent decides its action for this moment (t), we will pop this action from the list and set up a timer that will only put it back after a certain amount of time (x milliseconds) have passed.

The method can be summarised in these points:

  1. Take a screenshot.
  2. Filter out all the red and yellow pixels.
  3. Find the different objects by their contours.
  4. Find the closest object.
  5. If that is too far away, go to step 1.
  6. If the direction of the closest object is not in the list of allowed directions, go to step 1.
  7. Slash in the direction of the closest object.
  8. Pop the chosen direction from the list of allowed directions.
  9. Start a thread that will enable the chosen direction after x milliseconds have passed.
  10. Go to step 1.

Now let’s have a look at the results!

While the agent still likes poking into the air by slashing upwards every-now-and-then even, the overall performance and stability of the agent have been greatly increased. Actually, the agent does some awesome combos which I genuinely could not pull of, like catching the stars while still slashing the apples coming from two different angles.

With that said, sadly our agent is not perfect as sometimes it still makes mistakes. These mistakes usually take place over 100–120 if we want to catch the stars, too, or over 150–170 if we are only focusing on the apples.

Conclusion

After a lot of experimenting I’ve found that there are several situations when the combo will break nevertheless the accuracy of our agent, so there is a certain point from where improving the agent any further is simply meaningless. This is because there will be cases when there are apples coming from all of the three different angles and our agent will be helpless since the animation time will create a micro-lag, rendering the agent practically powerless and resulting in our agent getting hit. In simpler terms, even if we had a perfect agent that knows what to do every time and does not make any mistakes, it still won’t be able to keep on increasing its combo indefinitely because the agent will be stuck in the previous animation.

Another thing I would like to point out is that this time I used a script available at pythonprogramming.net instead of pyautogui to perform the key presses. The reason behind this is that the latter is simply too slow compared to the first one, therefore generates even more micro-lag, of which, honestly, we’ve had our fair share.

And so, that brings us to the end of this article. In the next one, we will tackle the remaining mini-games and then finally move onto automating even more exciting tasks, so stay tuned! Until then, have fun automating!

The Code

As always, the code is available on the GitHub profile of the AiF series, at https://github.com/automatingisfun/SnSSword. The reason why there were no code snippets involved in the article itself is that it would simply take away too much space on the reader’s device (especially on phones), hence making the reading far less enjoyable and traceable.

--

--

Gergo Bogacsovics

PhD student & AI enthusiast. Owner of the ‘Automating is Fun’ Youtube channel.