Monday, January 23, 2017

Video Games in the Eyes of Deep Neural Networks

Like many researchers, I'm impressed and very excited about the advances in the filed of deep learning that led to state-of-the-art solutions in many tasks. Recent models trained on very large datasets are shown to achieve a competitive (and in some tasks superior) performance to that reported by humans.

Yet, despite these encouraging advancement, some of the best models were easily miss-led when presented with slightly modified, or noisy inputs1,2. I found this quite interesting as it reminds us that we are still quite far from human performance when it comes to generalisation, seeing things in context and transferring the knowledge we have about one domain to another.

I have been doing research in the field of video games for a while and with the recent advancement in DNNs I thought it would be interesting to test some of these methods on video games. There is already some interesting work on teaching agents to play games using deep reinforcement learning methods by just looking at the pixels and learning the actions3,4.  My interest however lies is training DNNs to pickup similarities between games by looking at gameplay videos. The problem is very interesting and challenging and I guess it will keep me busy for a while. I published initial results from classifying 200 games in a previous post (here) and for now, I want to share something more fun.

When I started my experiments, I did the very obvious first step: feeding images of games to an accurate pretrained deep neural network model and check what the model thinks about them. In my case, I used the VGG-16, a very popular and accurate models trained on the ImageNet dataset and achieved near human performance5. I thought this would be a good starting point to fine tune the model later on images from games but I wanted first to check the performance of the original model without fine tuning. So here I want to share some of the results I got, which I think are fun to look at. The model outputs probabilities of what it sees in an image (selected from 1000 different categories covering a wide range of objects) and I'm showing the five highest probabilities in the figures below.

First, let's look at images where the model failed to capture what's in the image and produced (with high confidence) wrong classes. (The one I particularly like is the image of a treasure box where the network sees it as garbage! I guess these networks are of no use for treasure hunters yet!)

I find the results quite intriguing. On one end, they demonstrate that we still have a lot of work to do to achieve human-level context awareness and knowledge transfer capabilities, and on the other, at least in some cases, it is interesting to see why the network makes mistakes. It seems that humans would have fallen in the same trap if context, domain or experience information were not available (Such as in the last figure on the right where the network predicted a Balloon for a 2d character with a balloon-shaped head!).

Now to be fair, the network did a good job in recognising quite a lot of images, specially those that simulates real objects with high quality graphics. Here are some examples from games such as: Bridge Project, Star Conflict, Among the Sleep, Alien Rage Unlimited, Maia and, Oniken.

There is clearly a lot to be done before we achieve general intelligence. Neural networks are definitely becoming better in solving specific tasks and in building domain-specific knowledge. They are still far, however, from achieving high-level performance across domains.

So, my next task will be to fine-tune these models and check how easily they can learn to comprehend the information of the new domain of games.