The most important developments in data science of 2018

Jairo Mejía

d&a blog

The year 2018 has been one of the most important years in terms of breakthroughs in Machine Learning technologies. It has also been important vis à vis the debate on how to move forward beyond pure optimization, into a more advanced discipline of Data Science and real applied Artificial Intelligence.

Ranging from a realization of the challenges of the industrialization of AI systems to the future beyond Deep Learning, 2018 has been full of news and meaningful debates on how to make the AI revolution sustainable and inclusive. Included in this article, published in recognition of the life and work of Stephen Hawking, we find a collection of quotes from the genius himself where he anticipates the need for a responsible development of Artificial Intelligence. “Success in creating effective AI could be the biggest event in the history of our civilization”, said Dr. Hawking, who we lost this year.

Inspired by the success of last year’s recap of what we saw and liked in 2017, this article aims to summarize the progress made in making AI applicable to real products. The following paragraphs also address the most relevant events of the year relating to Natural Language Processing (henceforth, NLP), Deep Learning on graphs, Causal Inference, and the approach to ethics in the implementation of machine learning technologies.

AI industrialization

by Rafael Hernández, Jose A Rodríguez, Roberto Maestre, César de Pablo.

Machine learning and artificial intelligence are no longer ‘just a research topic’, but rather are on their way to becoming commoditized tools, whose development comes with its own challenges. This year has seen efforts towards more professionalized tools to implement machine learning.

One example is the appearance of frameworks to professionalize and automate ML platforms, such as MLFlow or Tensorflow TFX. In the domain of Efficient Neural Architecture Search (ENAS) we have witnessed the integration of AutoML into frameworks such as Tensorflow, Keras and Pytorch. Also, Featuretools, an MIT-born framework to discover combinations of attributes, was applied for credit card fraud detection in collaboration with BBVA.

We have also read posts about companies sharing their lessons in deploying machine learning or artificial intelligence at corporate scale: Uber revisiting its Michelangelo platform, LinkedIn and its AI Academy to scale AI training among all employees, and Amazon even opening its ML training for employees to any developer; or this article by Apple describing their platform to provide deep learning at scale internally; this other article where Booking.com describe how they democratize online controlled experiments.

2018 has consolidated workshops on machine learning plus software: a new edition of the NIPS MLSystems Workshop was held, and a new conference (SysML) has been launched.

Finally, we also find remarkable examples of ML breaking the boundaries of ‘departments’ within a company; the maximum exponent of re-use: Amazon offers SageMaker customers a time series forecasting model that they started developing internally for (their own) demand prediction; Uber deployed ML for (their own) financial forecasting.

Natural Language Processing

by César de Pablo.

This year we have seen a new breed of embedding methods, universal embeddings or more precisely, language models, that have proven useful in several NLP tasks as a way of making use of huge amounts of unlabelled text data to help with semi-supervised problems such as text classification or translation. ELMO (Deep Contextualized Word representations), ULMFit and the several improvements on the Transformer architecture that finally have lead to BERT have shown large improvements in text classification, NER or machine reading. ELMO provides fixed feature vectors (like word2vec) that are, however, contextualized. In contrast, ULMFit is a procedure to fine tune a language model to a new task with a few supervised examples. BERT touches both sides as it is a pre-trained language model that takes into account context, but can also extract embeddings. Its downside for non-Google scale is that even fine-tuning is computationally expensive and requires a large amount of memory.

Finally, another great piece of news is that some of these advances are not only for English, but they can be applied for other languages with relatively few changes.

Deep Learning on Graphs

by César de Pablo.

Deep Learning has obtained astonishing results for language and images. In both cases, this is thanks to specialized architectures that deal with sequences (LSTM) or grids (CNNs). However, a great range of problems may benefit from a structured representation but do not exhibit regular structure, but rather a generic graph structure with applications in recommender systems, NLP or user modeling.  This paper (with almost 30 authors) have served to provide a framework that encompasses different modeling approaches from GCNs (Graph Convolutional Networks) to a generalized Transformer architecture applied to graphs. Thanks to DeepMind we even have a reference library based on Tensorflow. We have also seen commercial use of related algorithms such as GraphSAGE on Pinterest.

Causal Inference

by Juan Arévalo.

This year has witnessed the irruption of Causal Inference into the field of Data Science, with contributions such as Jude Pearl’s book The book of Why or the article by Miguel Hernán and colleagues on how to incorporate Causality in Data Science, among others. The advent of this Causal revolution is concomitant to the successful application of Counterfactual Analysis in the Recommender System Community (see SIGIR’16 tutorial for a gentle introduction). Indeed, there have been two “Best Paper” awards -in WSDM’18 and RecSys’18 conferences- for new developments of counterfactual estimators that may prevent the selection bias (see, for instance, this great article by Airbnb on the selection bias in online A/B Testing). This is on top of ongoing efforts in other well-known machine learning conferences -as these NeurIPS and ICML workshops- where causal inference is gaining momentum.

Thus, although the application of Causal Inference to Machine Learning is still limited, we foresee a tighter interaction between the two fields (as was, for instance, the case between Bayesian Inference and Deep Learning in the past,). Indeed, Yoshua Bengio (one the fathers of Deep Learning) explains in an MIT Technology Review interview that “we need to be able to extend it (Deep Learning) to do things like reasoning, learning causality”, etc., because “If you have a good causal model of the world you are dealing with, you can generalize even in unfamiliar situations. This is crucial. We humans, are able to project ourselves into situations that are very different from our day-to-day experience. Machines are not, because they don’t have these causal models.”

Ethics in AI

by Juan Murillo and Roberto Maestre.

Great efforts are being made to define, develop and integrate Ethics into AI. Regulations, especially in the EU (but also in the US), are beginning to define the rules designed to enable harmless implementation of AI solutions [reference]. Once we have realized that AI can amplify the bias that exists in data [reference], we must be watchful with AI Systems that take actions automatically. An interesting trend in the industry is to develop new metrics (both from classification and regression) in order to monitor and mitigate such biases. However, there is a huge push to develop new models which integrate all these concepts [reference].

When pursuing implementation of non-discriminatory measures, businesses must make a trade-off between two main elements: to avoid unfair discrimination based on sensitive data, while keeping the necessary discerning capacity of their models to protect their shareholders’ interests and make their activity sustainable and profitable. For example, in finance, models are used to accept new customers (admission models) and to fix credit interests (dynamic pricing models).

Leaving aside customer descriptive features in order to aim for an increment in fairness metrics (through a more equal treatment for all customers) may imply assuming a greater risk with an impact on expected revenue. However, these costs must be measured when making the aforementioned trade-off, as seen here. Besides, in a recent paper, researchers of MIT Media Lab have demonstrated that significant increase in fairness measures -even when working with non sensitive data- may be achieved with comparatively low impact on business goals. This is an example of how machine learning may help us to maximize both, corporate and social benefits.