Product Recommendation at Criteo
Recommending to users the right product at the right time is a critical component of Criteo’s success. Doing this at scale, where we need to choose a dozen relevant products from over 5B candidates within a few milliseconds, makes this task an interesting research problem (for details, see our previous year presentation at RecSys). Add to it the fact that Criteo optimizes not just for click performance, but for the probability of generating the sale, and you have a task that is ripe for new research.
A typical solution to recommendation problems is that of neighbourhood methods, wherein one relies on similarity between users that viewed/bought similar products or products that were co-viewed/co-bought within pre-defined shopping sessions. Neighbourhood-based approaches work best in the case of dense connections where most of the relationships between products are explicitly expressed in the graph. However, in the case of advertisers with very large product catalogs, this is often not the case. It is therefore interesting to extend the reach of recommender systems beyond simple product similarities. One approach that is fast becoming popular is using product embeddings, that map the products to a set of latent factors and allow the computation of similarities over this new representation.
Product Embeddings for Recommendation
We, at Criteo AI Lab, have been recently experimenting with the Prod2Vec algorithm, which embeds products based on their co-occurrence in the user shopping sessions. Similar to Word2vec, Prod2vec produces product representations that are very similar in cases where the products are functionally interchangeable in the user shopping sequences. A limitation of the Prod2Vec algorithm is that it too relies only on product co-occurrence information. Hence, in order to be able to support recommendations for products with zero or low co-occurrences (i.e., in the cold-start phase) we created Meta-Prod2Vec.
What is Meta-Prod2vec?
Meta-Prod2vec is an extension of the Prod2Vec algorithm that uses not only the product sequence information, but also product meta-data as a source of side-information. Learning with side information can be seen as optimising jointly between the main task of explaining product co-occurrences and a secondary task that adds additional constraints on the model parameters. This is helpful since product content is generally available on all products, either new or pre-existing.
In the following illustrative example, we will explain the differences between the two approaches and the additional mechanism that allows Meta-Prod2Vec to learn meaningful embeddings for cold start products.
Let’s say we are trying to build a recommender system for songs. We want to build a very simple solution that based on the last song the user heard, recommends the next song. We compare the behaviour of the Prod2Vec and Meta-Prod2Vec on two different recommendation situations, one where the previous song is popular, and one where the previous song is relatively unknown (suffers from cold start).
For the simple case, lets say that the popular song is ‘Shake It Off’ by Taylor Swift. Then, in order to produce Prod2Vec-based song embeddings, we feed the algorithm with all of the users play histories that contain the song and find the songs it co-occurs the most with. For the sake of argument, we find out is ‘It’s all about the Bass‘ by Meghan Trainor. Since both songs are very popular and co-occur a lot, both standard Collaborative Filtering and Prod2Vec work.
Complete article can be found here.
We will be presenting this work at RecSys 2016 in Boston. If you stop by the Criteo booth, we will be happy to exchange notes and talk shop! And yes, we are hiring both in our Paris and Palo Alto offices!
Post written by:
Senior Staff Data Scientist