Pages

Thursday, October 20, 2016

Stop Writing Dead Papers


The idea struck me while listening to Bert Victor's talk "Stop Drawing Dead Fish" (and hence the title for this post).

We have been writing and publishing papers for more than 500 years (according to Wikipedia, one of the earliest journals started in the 17th century!) and yet, somehow, we are still using the same format and writing our papers as if hardcopies are the main, if not the only, medium for distributing them. Now this is really disturbing as we are now living in the 21st century where we have a way more powerful medium available to us: the interactive digital interfaces.

Most of academic papers written nowadays are dead: they are static with no interactive content. I'm talking particularly about papers reporting empirical results and showing graphs and tables filled with numbers and statistics, and supported with long discussions to help the reader understand and visualise (in her head) what can not be fully articulated by static content.

Take for instance the figure below appeared in Cho et al. (2014) paper, which is meant to visualise the space of representations of phrases of four words learned by a recurrent neural network. The authors clearly put a lot of effort into visualising the space and presenting their results in a convincing and expressive way. But because of the lack of interactive medium, they had to present the full graph (clearly hard to understand) and some closeups (not fully representative of the space). 



2–D embedding of the phrase representation learned with RNN. Cho et al. (2014)
Some zoom-ins from the figure above. Cho et al. (2014)
This is not only inadequate, presenting a number of figures to support an argument also takes up a lot of the limited space available in academic papers, which can be put for better use. Moreover, despite all these static illustrations, one wish she could hover over some points to highlight what they represent or zoom-in to get a better understanding.

While this format was totally accepted in the 17th century, it is way outdated in the 21st and no longer enough!

To compensate for the limitations of this medium that poorly accommodates our goals, a number of researchers started a tradition of writing blog posts that serves as fancier versions of their papers, usually supported with interactive visualisations and easier-to-access and understand analytics. Take for instance this great blog with many interactive examples for some of the results in Dai, et al. (2014) paper that tries to cluster Wikipedia articles. You can still see the same figures presented in the paper (like the one below), while also being able to interact with them and play with the parameters.
 Visualisation of Wikipedia paragraph vectors using t-SNE. Dai, et al. (2014)
Now I understand that this does not apply equally to all fields (I don't expect researchers working in the field of literature to move directly, or accept such a new medium). But I believe that researchers with a computer science background to be capable of making, and arguably welcoming, such move. I believe such functionalities could be integrated in new editing tools or traditional ones (such as LaTex web-based editors), and ultimately, if papers could be submitted in a scripting language format, say in php (or an editor built on top of it) in which many interactive tools already available can be easily integrated, one could have the opportunity to take creativity and accessibility of academic papers to a whole new level.

As someone who read, write and review papers, I'm really looking forward to the day where academic papers become more interactive and I strongly believe that this will lead to research that is highly accessible, easier to understand and evaluate and more fun to work with.

Tuesday, October 4, 2016

Experience-Driven Content Generation: A Brief Overview


Exploring and implementing methods for measuring experience, understanding and quantifying emotions and personalising users' experience have been the focus of my research for quite sometime. In this post, I will try to summarise some of my knowledge in this area.

The Big Picture

My theory is that: if I can accurately predict user’s affective states at any point during her interaction with a digital system, I can ultimately implement affect-aware methods that can automatically personalise the content leading to an improved and deeper user experience. I'm particularly interested in studying these aspects within the computer games domain as I believe games are a rich medium for expressing emotions, an interesting platform for collecting, recognising and modelling experience and an easy to controllable environment for personalisation of content.

These ideas have gathered interest from numerous researchers working in interdisciplinary areas trying to solve parts of the puzzles. There is for instance a whole field of research trying to measure and quantify emotions; a relatively new, but very fast growing, field on automatic generation of content in games (there is a new book about this subject here); and a growing interest in linking ideas from these two areas so that we can build content generators that are centred around users' experience as a core component in the content creation process. 

   

None of the above is easy and implementing a complete working system where all modules work together effectively is a big challenge. In my own work, I'm interested in realising the affective loop in games (see the figure below). I have a working implementation of what I believe a simple, yet easily extendable prototype of how the whole framework works in the game Super Mario Bros. (speed forward to the end of the post if you are eager to play!). My work revolves around revising and improving the different parts of the system so that the framework becomes general enough to be applied to predicting users' affects and personalising experience in any game (or more broadly, any digital interface). 

The components of the experience-driven content generation framework.
Ultimately, we want the system to be active (accurately choosing what information is important to learn from), adaptive (continuously learning and improving), reactive (acting in real time), multimodal (utilising information about the user from different sources) and generic (working well in various applications). I have made a progress along a number of these lines and I will be sharing them in individual posts that will follow. For now, I want to share insights about some of the main considerations towards realising the framework.

Features for Measuring Player Experience

If you survey the literature you will find numerous methods for gathering information of users' experience or emotion. Here are the three main dominant types:

Subjective measures: The most obvious measure is to ask players' about their experience. This method is the easiest to implement, and hence the widely used, but it comes with a number of limitations (including subjectivity, cognitive load and interruption of the experiment) as well as other concerns related to the nature of the experimental design protocol. So, to compensate for the drawbacks, other complementary or alternative methods are usually employed to gather information from other modalities. 

Objective measures: These are usually harder to control by the subject and therefore more reliable. Most of them also universal making them highly scalable. Your heart rate, for instance, can reveal information about your excitement and your brain activity can tell whether you are surprised, under cognitive load, or relaxed. Your facial expressions can tell if you are happy or angry and your head pose can tell whether you are engaged, attentive or bored. Information gathered by such measures is more reliable than the subjective ones but obviously harder to collect, annotate and analyse. Moreover, some of the equipments used for collection are quite intrusive that they can’t be used in real-life interaction settings. Therefore, most of the widely used methods rely on accessible and unintrusive mediums such as web cameras and Kinect devices to analyse facial expression, extract gaze information and capture body gesture to infer emotion.

Interaction data: The interaction between the users and the digital interface also holds patterns that can help us understand users' experience. Gameplay data for instance, is a rich, easy to collect, and reliable source of information for profiling players. By relying only on this modality, methods have been developed for predicting retention, progression analysis, discovering design flaws and clustering players for target segment marketing and content customisation

When it comes to modelling players experience in games, I believe a multimodal approach that combines and align information from multiple sources is the most effective. Gameplay data is the main source of information about experience that is usually analysed by most studies in academia and the industry. I believe information coming from relatively cheap sources such as the camera (which is already available in most gaming platforms) will soon become another standard for analysing emotion and improving the prediction of the experience, especially with the recent advancement in accurate real-time prediction of emotion from videos of faces. I believe in the not-too-distant future, there will be no need to ask users about their experience as other reliable modalities will provide accurate, less intrusive sources. 

Facial reaction of players playing Super Mario Bros. when losing, winning and faced with a challenging encounter.

Methods for Feature Processing 

The above types of features come with different forms: some are discrete numbers while others are sequences of temporal or spacial relationship. This means that different methods should be employed to handle each type and special care should be taken when combining different sources of different nature. For instance, gameplay data can be collected as discrete statistics about different actions taken or as continuous sequences of actions taken at each timestamp. Different methods can be applied in each case leading to various insights. 

Discrete features are the most common and can be directly processed by most machine learning methods (they should of course be cleaned and normalised in most cases). 

Features of continuous nature, such as objective measures of experience can be processed with methods that are sensitive to time-series data such as recurrent neural networks and regression analysis. Sequences can also be processed to extract meta data such as frequently occurring patterns. This can be done using frequent pattern mining methods. 

Combining features from multiple modalities can be tricky especially if they are of different nature. The signals should be aligned and either transformed into the same space or handled on multiple resolutions. Take for instance a system receiving a continuous signal from a facial-emotion recogniser and discrete statistics about the keyboard buttons pressed. To combine such information, one option would be to process the continuous signal and transform it to discrete values of emotions calculated within specific intervals. Another option would be to handle each signal by an appropriate method and combine the results in a later stage. A third option would be to sync the features so that we can extract the facial reactions around each keyboard press action.  

Methods for Modelling User Experience

You can imaging user experience models as magic black-boxes where you feed them with information about your users and the interface they interact with and in-return, you get useful categorisations or profiling that you can use for decision making. The input information can be one or a combination of the features presented above. The output can be an estimation of how fit the content is for a particular user, what profile best match the user (is she a buyer in amazon, a fighter in a first-person shooter, a puzzle-solver in an online course) or a recommendation of the best adjustment of the interface that will increase user's engagement.

Now any machine learning methods that can accurately estimate the mapping between your input and output can be implemented. The most widely used methods for profiling for instance are supervised or unsupervised clustering and classification methods such as support vector machines, self-organizing maps or regression models. Non-linear regression models are more powerful when attempting to predict affective states based on behavioural information. One can use neural networks, multivariate regression spline models or Random Forest to reduce the size of the search space while optimising the mapping functions. When trying to come up with recommendations or personalised content, efficient search and filtering methods can be applied such as collaborative filtering, genetic algorithms or reinforcement learning. There are many interesting applications for each approach and the choice of the appropriate method depends on the type of the data you have, the type of the problem you are trying to solve and the characterisation of the insights you are interested in.  

Adaptive User Modelling

The experience models I talked about so far are average models, meaning: they apply equally to all players and they are not tied to a certain individual. They serve a very good purpose if we want to ship them with the system and if we are looking for methods that work well with the majority of users. But we can do better.

The accuracy of the models is very much confined by the data used for training; your data need to be divers enough to include representatives of the majority of your users. Event when your data distribution is wide enough, it is very likely that the method will not recognise every individual. People are different and each of us has her unique preferences and ways of interactions. To accommodate for different personalities, one could implement adaptive systems that keep learning, improving, and personalising as the user interacts with the system. The models learned offline forms a good starting point for an initial rich experience and for learning more powerful personalised versions. Model improvement can be achieved through a brach of methods called active learning. Active learners attempt to improve their performance by sampling the instances from the space that lead to the fastest improvement. This means that they try to learn as much as possible about the user in the quickest way possible so that they become more accurate predictors of experience. Doing so, they also become more personalised for a specific user.

The Future

We live in a time where more and more data about the users is becoming available and where people from academia and the industry are eager to understand the users and make better decisions. We are also equipped with powerful methods that facilitate realisation of such goals. There are indeed a number of interesting research directions that can improve our understanding of users, emotions, behaviour and how emotion is manifistated through behaviour. Moreover, data-driven content personalisation is also a hot and interesting topic where lots of improvement could be done. I'm confident however that a lot can be achieved already with what we currently have in terms of data and methods.  

Now just for fun, I will leave you with an example where many of the ideas presented are implemented to improve the experience of the interaction with the system.   

Example: Content Personalisation in Super Mario Bros.

You will play an initial level of Super Mario Bros. and the system will collect information about how well you did. This information, along with your choice of the type of experience you would like to explore, are used by the system, using machine learning methods, to explore the space of possible content you would prefer. The best piece of content is then chosen and presented to you. 

Let the fun begins! Try it here (PS. you will need a browser with a support for java, otherwise, you can download the demo jar file from here (option 4)). 


References

If you would like to read more about the subject, there is a nice paper by Georgios Yannakakis and Julian Togelius that you can find here. The demo above is described in the paper here.