The most important developments in data science of 2018

Jairo Mejía

d&a blog

The year 2018 has been one of the most important ones in terms of breakthroughs in Machine Learning technologies, as well as for the debate on how to move forward beyond pure optimization and into a more advanced discipline of Data Science and real and applied Artificial Intelligence.

From a realization of the challenges of the industrialization of AI systems to the future beyond Deep Learning, this year has been full of news and meaningful debates on how to make the AI revolution sustainable and inclusive. In this article, published in recognition of the life and work of Stephen Hawking, we collect some phrases of the genius who anticipate the need for responsible development of Artificial Intelligence. “Success in creating effective AI could be the biggest event in the history of our civilization”, said Dr. Hawking, whom we’ve lost this year.

We want to summarize in this article, inspired by the success of last year’s recap of what we saw and liked in 2017, the progress made in making AI applicable to real products. The following paragraphs also address the most relevant events of this year related to Natural Language Processing (henceforth, NLP), Deep Learning on graphs, Causal Inference and the approach to ethics in the implementation of machine learning technologies.

AI industrialization

by Rafael Hernández, Jose A Rodríguez, Roberto Maestre, César de Pablo.

Machine learning and artificial intelligence are not any more ‘just a research topic’, but are on its path to becoming commoditized tools, whose development has its own challenges. This year we have seen efforts towards more professionalized tools to practice machine learning.

One example is the appearance of frameworks to professionalize and automate ML platforms, such as MLFlow or Tensorflow TFX. In the domain of Efficient Neural Architecture Search (ENAS) we have witnessed the integration of AutoML into frameworks like Tensorflow, Keras and Pytorch. Also, Featuretools, an MIT-born framework to discover combinations of attributes, was applied for credit card fraud detection in collaboration with BBVA.

We have also read posts about companies sharing their lessons in deploying machine learning or artificial intelligence at corporate scale: Uber revisiting its Michelangelo platform, LinkedIn and its AI Academy to scale AI training among all employees, and Amazon has  even opened their ML training for employees to any developer; this article by Apple describing their platform to provide deep learning at scale internally, and this other one where Booking.com describe how they democratize online controlled experiments.

2018 has consolidated workshops on machine learning plus software: a new edition of the NIPS MLSystems Workshop was celebrated, and a new conference (SysML) has been launched.

Finally, we also find remarkable examples of ML breaking the boundaries of ‘departments’ of a company; the maximum exponent of re-use: Amazon offers SageMaker customers a time series forecasting model that they started developing internally for (their own) demand prediction; Uber deployed ML for (their own) financial forecasting.

Natural Language Processing

by César de Pablo.

This year we have seen a new breed of embedding methods, universal embeddings or more properly language models, that have proven useful in several NLP tasks as a way of making use of huge amounts of unlabelled text data to help in semi-supervised problems like text classification or translation. ELMO (Deep Contextualized Word representations), ULMFit and the several improvements on the Transformer architecture that finally have lead to BERT have shown large improvements in text classification, NER or machine reading. ELMO provides fixed feature vectors (like word2vec) but they are contextualized. In contrast, ULMFit is a procedure to fine tune a language model to a new task with a few supervised examples. BERT touches both sides as it is a pre-trained language model that takes into account context, but you also can extract embeddings. Its downside for non-Google scale is that even for fine-tuning is computationally expensive and requires a large amount of memory.

Finally, another great piece of news is that some of these advances are not only for English but they can be applied for other languages with relatively few changes.

Deep Learning on Graphs

by César de Pablo.

Deep Learning has obtained astonishing results on language and images, which in both cases are due to specialized architectures that deals with sequences (LSTM) or grids (CNNs). However a great range of problems may benefit from a structured representation but they do not exhibit regular structure but a generic graph structure with applications in recommender systems, NLP or user modeling.  This paper (with almost 30 authors) have served to provide a framework that encompasses different modeling approaches from GCNs (Graph Convolutional Networks) to a generalized Transformer architecture applied to graphs. Thanks to DeepMind we even have a reference library based on Tensorflow. We have also seen commercial use of related algorithms like GraphSAGE on Pinterest.

Causal Inference

by Juan Arévalo.

This year we have witnessed the disruption of Causal Inference into the Data Science field, with contributions like Jude Pearl’s book The book of Why or the article by Miguel Hernán and colleagues on how to incorporate Causality in Data Science, among other things. The advent of this Causal revolution is concomitant to the successful application of Counterfactual Analysis in the Recommender System Community (see SIGIR’16 tutorial for a gentle introduction). Indeed, there have been two “Best Paper” awards -in WSDM’18 and RecSys’18 conferences- for new developments of counterfactual estimators that may prevent the selection bias (see, for instance, this great article by Airbnb on the selection bias in online A/B Testing). This is on top of ongoing efforts in other well-known machine learning conferences -as these NeurIPS and ICML workshops- where causal inference is gaining momentum.

Thus, although the application of Causal Inference to Machine Learning is still limited, we foresee a tighter interaction between these two fields (just as it happened between Bayesian Inference and Deep Learning in the past, for instance). Indeed, Yoshua Bengio (one the fathers of Deep Learning) explains in an MIT Technology Review interview that “we need to be able to extend it (Deep Learning) to do things like reasoning, learning causality”, etc., because “If you have a good causal model of the world you are dealing with, you can generalize even in unfamiliar situations. That’s crucial. We, humans, are able to project ourselves into situations that are very different from our day-to-day experience. Machines are not, because they don’t have these causal models.”

Ethics in AI

by Juan Murillo and Roberto Maestre.

An important effort to define, develop and integrate Ethics into AI is being done. Regulations, especially in the EU (but also in the US), are beginning to define the rules addressed to enable harmless AI solutions implementation [reference]. Once we have realized that AI can amplify the bias that exists in data [reference], we must be watchful with AI Systems that take actions automatically. An interesting trend in the industry is to develop new metrics (both from classification and regression) in order to monitor and mitigate such biases, however, there is a huge push to develop new models integrating all these concepts [reference].

When pursuing non-discriminatory measures implementation, businesses must make a trade-off between two main elements: avoiding unfair discrimination based in sensitive data, while keeping the necessary discerning capacity of their models to protect their shareholders’ interests and make their activity sustainable and profitable. For example, in finance, models are used to accept new customers (admission models) and to fix credit interests (dynamic pricing models).

Leaving apart customer descriptive features and aiming an increment on fairness metrics (through a more equal treatment to its customers) may imply assuming a greater risk with impact in expected revenue, these costs must be measured when making the aforementioned trade-off, as seen here. Besides, in a recent paper, researchers of MIT Media Lab have demonstrated that significant increase in fairness measures -even when working with non sensitive data- may be achieved with comparatively low impact in business goals. This is an example of how machine learning may help us to maximize both, corporate and social benefits.