As a Gold Sponsor of the NIPS Conference, Criteo was more than excited to be part of the 30th Annual Conference on Neural Information Processing Systems in Barcelona from the 5th to the 9th of December 2016. This conference is THE conference for Criteo researchers, data scientists and engineers working on machine learning on a daily basis. Among more than five thousand peers, we attended talks and demonstrations and presented two posters on our current research topics.

At Criteo, our research challenges focus at the moment on Reinforcement Learning, Recommender Systems, and Game Theory. All in large-scale environments. Hence, attending NIPS every year is like going to Disneyland ! Here is what we will keep in mind from the 2016 edition.

**Conference Highlights**

**Deep Reinforcement learning**

Reinforcement Learning is an important subject for Criteo. Taking into account the effect of the action of the decision maker on the state distribution is crucial for our tasks. For example, when participating in an auction, our decision (bidding amount, or not bidding at all) can affect the response of other bidders, hence change the distribution of their actions.

At NIPS, Deep RL was one of the main topics — a lot of talks, posters, dedicated tutorial and workshops.

One of the tutorials that we enjoyed was about deep reinforcement learning through policy optimization, held by Pieter Abbeel (OpenAI / UC Berkeley / Gradescope) and John Schulman (OpenAI / UC Berkeley). It was a great introduction to several recent policy-gradient and actor-critic methods and techniques. Another interesting part of this tutorial was about scaling. Pieter showed how a differential dynamic program could be used to generate suitable samples, and then use these samples for policy search to describe a regularized importance sampled policy optimization, see the paper from NIPS 2015.

Deep RL was in the spotlight during the main conference, notably, with the best paper award talk presenting Value Iteration Networks. This new approach uses RL to train a convolutional neural network that represents a differentiable approximation of the value-iteration algorithm. It was shown that by learning an explicit planning computation, VIN policies generalize better to new, unseen domains.

During the conference two platforms dedicated to RL were released: DeepMind Lab and OpenAI’s Universe. Now you can easily train your RL agents in computer-based environments. Here is a demo of what you can do.

**Deep learning**

Deep learning continues to be a very active research area. Not surprisingly, at Criteo we also apply deep learning, namely for product recommendation. Here is a selection of deep learning topics that captured our attention.

**End-to-end learning**

Deep neural networks are increasingly trained end-to-end, turning previously human-engineered pieces into differentiable modules of the larger neural network.

Mentioned above, the award-winning Value Iteration Networks paper features differentiable value iteration module, replacing an explicit formula for value function update with a ConvNet. In more details, the key insight is that the recurrent formula for value function update involves linear operation and max operation. On the grid this corresponds to the convolution and max pooling operation.

Several papers propose network architectures themselves to be learned, for example, Learning the Number of Neurons or Dynamic Filter Networks.

**Causal representations**

Causality is of growing importance to the community and at Criteo we pay a lot attention to discovering causal effects.

The paper Learning representations for Counterfactual inference applied deep networks representation learning power to the counterfactual inference from observational data. The general idea is to learn representations for treated and control distributions jointly. The learning objective consists of the factual prediction error with two regularization terms. First term aims at learning similar distributions of both factual and counterfactual sets, the second makes counterfactual predictions be close to the nearest observed outcome from the respective treated or control set. In experiments, learning this objective over the deep neural net performed the best on most metrics.

**Learning algorithms**

There are lots of theoretical and empirical evidence in favor of deep representation learning. The learning procedure is being researched actively.

R. Sutton presented an alternative to backpropagation algorithm, called CrossProp, that takes into account the influence of all the past values of input weights over the current error.

The authors of Professor Forcing paper apply GANs to generative RNNs and train a discriminator to distinguish between training and self-generated sequences. The generator is trained to fool the discriminator, forcing the observed distribution and the free-running distribution to become similar. This procedure acts as a regularizer, and results in better sample quality and generalization, particularly for long sequences.

The authors of Learning to Optimize take a meta-learning approach and propose to parametrize the gradient descent update with a LSTM learned across multiple tasks. Their approach outperformed hand-crafted optimizers, such as Adam and RMSProp.

**GANs**

Generative Adversarial Network is a recent method to generate new samples from a training distribution, for example, creating new images or music looking like the ones from the training set. Since their introduction only two years ago, they have received a lot of attention and were certainly one of the most trendy subject this year.

Ian Goodfellow gave a great tutorial with a full room attendance. The authors of Learning What and Where to Draw explained how to use side information with GANs to produce samples with desired features. The paper Adversarially Learned Inference described a generalization of GANs may enable to interpolate between images.

While GANs distinguish themselves from previous methods by their relative simplicity and results quality, they also prove difficult to train, leading to several talks and tricks on how to improve their convergence, such as GANs hacks talk.

Another challenge in applying GANs is that there is still no good way of measuring their performance: researchers have to access the quality of the generated samples by looking at them.

**Criteo counterfactual learning dataset**

Detecting causality is a key aspect to understand how a system will react under a new intervention. The “What if” workshop was dedicated to causality and methods that help to answer questions like: what happens if a robot applies a new behavior policy, can we predict the consequences of removing a certain gene in a biological cell on its phenotype, how a user will react if Criteo implements a new recommendation engine.

We took part in this workshop by releasing a new dataset aimed at the counterfactual learning community. Counterfactual learning algorithms can iterate in the policy space without the need to collect data on new policies. It is particularly useful since it can be used offline and it is safe (e.g., it avoids breaking the robot).

Our dataset contains 250 GB of raw logs with the propensity scores used by our recommendation engine. With the help of Adith Swaminathan, Xiaotao Gu, Thorsten Joachims and Maarten de Rijke, we did a first benchmark of some of the current state of the art counterfactual learning methods (the dataset is available here and is described in Large-scale Validation of Counterfactual Learning Methods: A Test-Bed). We hope this dataset will help the community and serve as a new large-scale baseline for papers on these types of methods.

**Optimization**** **

Acceleration was definitely a big trend in optimization this year. One should particularly look at Regularized Nonlinear Acceleration which proposes a plug-in way to accelerate a given converging sequence of iterates. Another big milestone was the analysis of Ohad Shamir of stochastic methods under sampling without replacement.

**Graphs**

At the Women in Machine Learning Workshop, Jennifer Chayes‘s talk, Modeling and Estimation of Sparse Massive Networks, was particularly interesting. She presented how graphons could be used to estimate parameters/metrics on very large networks, including real-world sparse ones, with power-law distributions. There is some very nice theory she’s working on. The Graphons, mergeons, and so on! paper presented on Tuesday morning builds a graph clustering algorithm on top of that theory. These methods could be of interest in the future at Criteo, as we are building graphs (one being the cross-device graph) that also show power-law patterns.

**Time Series**

Time Series Forecasting also received a strong focus at NIPS this year with a dedicated tutorial and workshop. Historically, learning from TS was treated as a side track with specific modeling techniques and tools: ARMA, ARIMA, etc.. But many real world learning applications can in fact be seen as times series and recently, several research works tried to bridge the gap between TS modeling and classical statistical learning theory (e.g., The Generalization Ability of Online Algorithms for Dependent Data, Prediction of Time Series by Statistical Learning).

A key notion that allows to consider TS in a statistical learning theory framework is the *discrepancy* which roughly measures the non-stationarity of the observed process. For a detailed view of the generalization bounds using this discrepancy measure, see the tutorial paper Theory and Algorithms for Forecasting Non-Stationary Time Series by Vitaly Kuznetsov (Google) and Mehryar Mohri (Courant Institute, Google Research)*.*

At Criteo, we are particularly interested in advances in TS forecasting as most of our data collection and modeling are time-dependent: user behavior, conversions, etc..

**Bandits**

Bandits are still quite trendy in the machine learning community and especially at NIPS, as they focus on optimizing a system with estimating its performances at the same time with streaming data.

Out of the several dozens of papers on bandits, the most relevant for Criteo are those concerned with the combinatorial structure induced by the construction of banners of different shapes, colors, with multiple products at different positions. We have a paper published on it this year. We assumed that the performances of a set of products are correlated and that we have some prior upon those correlations. Based on them, we devised an optimal algorithm (up to some logarithmic terms).

Another interesting paper tackled the bias position problem in online display advertising and proved some theoretical bounds on a bandit algorithm that tries to explore optimally the space of products based on the knowledge of the performance of each slot of the banner.

**Criteo Cocktail Party**

On the second day of the conference, Criteo hosted a cocktail party gathering prominent figures from both academia and the industry.

Now, what happens when you put Machine Learning practitioners in the same room? They talk shop, of course! This kind of informal event is a great way to get to know one another, our challenges and areas of explorations, in a less constrained format than a talk or poster presentation.

Our events team, had done things well: the cocktail venue had a great view overlooking the conference center, but more importantly, it had a swimming pool. A heated swimming pool! And several of our guests as well as some of our researchers indulged themselves in a little guilty pleasure.

Overall, it was an extremely interesting conference. It is an exciting time for us to be in this field. We are looking forward to NIPS 2017!

Post written by:

**Criteo AI Lab team.**