As an applied ML company that seeks to make some sense out of financial data, we at BBVA Data & Analytics enjoy sharing articles, blogs and news about Artificial Intelligence (AI) and Machine Learning (ML). Two years ago we created a G+ community to share the latest on Machine Learning technologies that we find interesting and stimulating. This experiment has continued to grow into a thriving site for knowledge sharing and debate around the discipline and its latest developments.
As we enter 2018, we’d like to share the most popular and discussed posts from our blog in 2017. It reflects what we believe are valuable insights for the company. The following contains a great mix of contents—with a slight bias towards Deep Learning (but who hasn’t one these days?!).
After intense discussions within the team, we have filtered out a list of what we think are some of the most disruptive scientific contributions published in 2017. We’ve grouped them it into three parts: the first, details some breakthroughs specific to Deep Learning models. The second, offers a broader perspective around ML applications, what risks should be avoided, and how to use these capabilities to build trust and fairness, a critical aspect of our work within the bank. Lastly, in the third part, we examine the growing but intricate relationship between human and machine, and review the inroads of complex networks—an aspect of utmost importance for relational data-—and probabilistic programming into Deep Learning, i.e. Bayesian DL methods.
We hope you enjoy it!
Part 1. Scientific contributions in DL
Revisiting the unreasonable effectiveness of data in the Deep Learning era
— by Jose Antonio Rodriguez Serrano and César de Pablo
Every little while the ML community witnesses a recurrent discussion: would you invest in more data or better models? Although this might seem a false dilemma, ML researchers typically strive for models. From time to time, a counter-example appears suggesting that an investment in massive data collection alone yields unprecedented results. Seminal papers in the previous decade along this lines include The Unreasonable Effectiveness of Data, and the works on the 80 million tiny images, as well as the ImageNet collection effort.
In the current era of Deep Learning, we can legitimately ask ourselves whether this debate still makes sense for this kind of models. The work of Sun et al. (Google) addresses this question by training a deep neural network with an unprecedented amount of data (300 million images, three times larger than previous dataset), and shows that in common learning tasks, performance keeps increasing with more data and does not saturate (as long as the model keeps increasing capacity). Other conclusions along these lines can be found in the mentioned paper or at this blog post.
The debate on more data vs. better models often revisits the classical approach of nearest-neighbor search algorithm, which is an example of method that delivers simple but effective solutions, especially with massive amounts of data. Remarkably, this year NIPS organized a workshop on “Nearest neighbors for Modern Applications with Massive Data”, while Facebook AI Research also released FAISS, a library with efficient implementations of nearest-neighbor search.
Reinforcement Learning beats itself
— by Roberto Maestre and Juan Duque
For many decades the ancient game of Go has challenged artificial intelligence researchers all over the world. One reason is the huge amount of possible board configurations that makes an exhaustive search of movements an infeasible task—there are more possible board configurations than atoms in the Universe! In addition the difficulty of defining an adequate position evaluation function prevents any method from truncating the search with easy-to-compute predictions.
When in January 2016 Google’s DeepMind AlphaGo defeated the European Go champion Fan Hui 5 games to 0, and latter triumphed over the 18-time world champion Lee Sedol, winning 4 out of 5 games in a tie-break match, it was hard not to feel that a corner had been turned: finally an AI algorithm could crack and master the game of Go! To break this barrier AlphaGo combined deep artificial neural networks, Monte Carlo tree search, and Reinforcement Learning (RL) among other techniques. But the best was yet to come.
As if the above achievements were not enough, on October 2017 DeepMind revealed that a new version of AlphaGo called AlphaGo Zero beat its predecessor 100% of the matches. The new version learned the game entirely by trial and error playing against itself, omitting the knowledge from human-expert games. At this time, AlphaGo is going beyond, mastering more and more board games. And there is certainly room for improvements, for instance in the way deep neural networks—in RL domain—are trained and extended. How far will artificial intelligence get by the hand of RL? We will see. In the meantime, what board game would you dare to play?
Experimental intriguing questions and the de-hyping of DL
— by Leonardo Baldassini
The debate around the scope, methodology, real potential and shortcomings of deep learning systems is quite lively, to say the least. Beyond providing the scientific community with entertainment and food for thought, this debate is spurring some long overdue self-criticism and de-hyping of ML research. This line of questioning requires going through a deeper understanding of our models, as we’ve seen in many works this year leveraging tools ranging from the statistical to the information-theoretic (and acknowledging that even “explanatory” research is not free from methodological pitfalls). Google’s Ali Rahimi’s talk after receiving the Test of Time Award at this year’s NIPS was a sobering reminder that even the most widespread optimisation tools are not always well-understood by DL practitioners, or even researchers. Just as revealing were the results of this year’s International Conference on Learning Representations (ICLR) best paper, which demonstrate that very large Neural Networks have enough capacity to memorize completely random inputs. Along the same lines we were interested on a research paper showing that changing a single pixel is enough to fool a DL-based computer vision system. On the other hand, AI systems that need to be deployed in the real world would arguably be required to prove their robustness through much more extensive and intensive testing. Likewise, the promising research in the field of adversarial networks is working towards models that learn to be robust by being trained on data specifically designed to reduce the model performance. These efforts are not mere academic exercise: in a world where self-driving cars seem around the corner and the concern about artificially intelligent weapons is not unwarranted, assessing the actual reach and limitations of a piece of research becomes paramount.
There is something wrong with Convolutional Networks, and capsules might have the answer
— by Alejandro Vidal and Juan Arévalo
Not so long after disrupting the field of image classification with Deep Convolutional Neural Networks (NIPS 2012), Geoffrey Hinton started questioning the very nature of convolutional networks, as shown in this talk at MIT on December 2014 (only two years after the famous ImageNet paper). This year, Sara Sabour, Nicholas Fross and Professor Hinton have released a novel paradigm, the capsule, that provides a new abstraction for learning representations of entities. Such capsules might be capable of overcoming the difficulties of convnets when it comes to acquiring the pose of objects (i.e., the relationship between an entity and the viewer), and appear to be more robust to adversarial attacks. A series of posts explaining these capsules and how they work is being published on Medium.
The revolution will be unsupervised, and apparently it has started in language translation
— by César de Pablo and Juan Arévalo
Most successful applications of machine learning use supervised learning; yet, labelled data is costly and relatively scarce in lots of domains. Hence, advances in learning paradigms that require less supervision have always captured the interest of the Machine Learning community, including advances in unsupervised learning of representations, semi-supervised or active learning, as well as transfer learning. In our case—where data is predominantly not labelled—the algorithmic revolution is likely to be unsupervised.
Automatic machine translation is one of the applications that usually require huge amounts of labelled data: a parallel sentence corpus of the language pair (eg. English-German) you would like to learn to translate. This year, two research papers by Artexte et al. and Lample et al., independently and almost simultaneously, presented promising results on unsupervised translation using neural architectures. Both works use a sequential encoder and decoder architecture with attention, that crucially share the same word embeddings between languages. Smart use of available data—like denoising and backtranslation, or translating from one language to the other and back while requiring the sentences to be similar enough—provides an additional boost.
In other words, they are able to produce a reasonably good translator without a dictionary. No need for a corpus with translated sentences!
Neural architectures are simplifying, since all you need is attention
— by Alberto Rubio
When performing sequential tasks like translating, we focus on the current and surrounding words, but not the whole sentence at the same time. This behaviour can be achieved by LSTMs using the attention mechanism. To do this, we use the same trick as in Neural Turing Machines, where each decoder output word token now depends on a weighted combination of all the input states, not just the last state. The scores are fed into a softmax to create the attention distribution.
This approach can be used with CNNs in image captioning in order to understand what part(s) of the image motivated a certain word in the caption.
Unfortunately, such Recurrent Neural Networks (RNNs) have issues when training, because the input must be processed sequentially, which prevents full parallelization. This lack of performance can be avoided with stacked CNNs, that are highly parallelizable. The downside to this approach is that capturing the relationships between far away tokens require large kernels and more computation.
All these issues are alleviated in the Transformer architecture—since apparently, attention is all you need. The authors achieve better score and good training performance using an encoder – decoder architecture, where each part implements multi-head attention. They address the CNNs and RNNs sequential issue by removing recurrence and convolutions, and replacing them with multi-head attention to handle the dependencies between the input and output. Thus, it remarkably manages to improve performance while eliminating both recurrence and convolution in favor of self-attention.
We will have to study more maths, for nature ain’t Euclidean
— by Leonardo Baldassini, Juan Duque and Juan Arévalo
Most scientific disciplines start with a mixture of experimentation and discoveries
that drives the research itself, but often lack of a complete theoretical framework to explain those findings. It is, thus, the natural evolution of scientific research, that as a field of study matures it starts seeking rigorous mathematical explanations of its findings. As a matter of fact, despite the seeming mathematical simplicity behind Neural Networks (NN), the manifold hypothesis—that data of interest for AI tasks lies on a low-dimensional manifold—suggests that the data we are learning from might live in rather intrincated non-euclidean spaces. In addition, several connections with properties observed in Physics—such as symmetry, locality, compositionality, polynomial log-probability or even the Renormalization Group—have been proposed. In this respect, worth a mention are also the efforts made by Stéphane Mallat and Joan Bruna to provide a mathematical understanding of Deep Convolutional Networks.
In this past year, we have witnessed an increasing interest to accommodate existing NN architectures in the setting of Riemannian geometry—see for instance the use of Poincaré embeddings to learn hierarchical representations. Moreover, an attempt to generalize (structured) deep neural models to non-Euclidean domains such as graphs and manifolds, known as Geometric Deep Learning (GDL), is gaining momentum, as shown by this year’s NIPS tutorial. Applications of GDL range from ConvNets on biological graphs to matrix completion for recommendations. Therefore, the field is relevant for any company facing relational data—such as ourselves.
All these contributions highlight how non-Euclidean geometries must be taken into account and incorporated in our models when dealing with highly complex data. Furthermore, an increased understanding of the geometry of our data goes hand in hand with a deeper geometric intuition of neural networks work. As such, we predict that Riemannian geometry will play an important role in the understanding and development of Neural Networks—so much for them being “a small set of high-school level ideas put together”! For those who want to start hone their teeth on some non-high-school maths, we recommend Tu’s introductory text on manifolds. It makes for a highly interesting read—but definitely not a bedtime one.