It may have become a tradition: December is the time when the ripest fruits of 12 months of global research in machine learning are showcased and the future trends defined and outlined in what seems to have become the most important – or at the very least most hyped – gathering on the subject.
Following the escalating success it has enjoyed over the last few years, the 31st NIPS conference that was held this month in Long Beach was by a large margin the biggest- and fastest-selling edition, selling out its more than 8000 tickets shortly after the registration opened and accepting 679 papers of the 3240 which were submitted this year.
While these numbers strongly reflect that the boundary between academic symposia, industrial conventions and recruitment events is more blurred than ever, they also bear a clear testimony to the ever more central role that ML, AI and the technologies and applications revolving around them are bound to occupy in academia as well as in industry. It is therefore central for a data-centric organisation like ours to be present at such events in order to follow the developments of the state of the art as close as possible. In this post we therefore aim to outline the most relevant trends that we perceived during NIPS 2017.
Less is more
That the huge number of parameters offered by complex deep neural network architectures helps the expressive power of these models is no matter of discussion. That this blessing may turn into a curse has also been clear since the interest in deep learning started peaking a few years ago. Throughout the NIPS week, we were pleasantly impressed by what seems to us to be an increasing effort to strip deep architectures of unnecessary complexity, exploring both through experiments and theoretical study how a network can be greatly simplified retaining all its predictive power and, what’s more, increasing generalisation. We particularly liked the following works:
- On the complexity of learning in neural networks, showing that not all neural networks may yield good results (without an incredibly long training)
- Net-trim, a method which, due to its requirement of having to train a network twice, may lack practical applicability in many cases, but effectively shows how the complexity of a network can be greatly reduced
- Runtime neural pruning, in which the authors propose a sparsification of the layers of a network, this time based on a Markov decision process trained via reinforcement learning rather than the solution of a convex program used by Net-trim
- SVCCA: Singular vector canonic component analysis for deep learning dynamics and interpretability, quite a mouthful indeed, but just as certainly a very rich work fusioning the SVD and CCA techniques to analyse the similarity of the layers of a deep neural network (here interpreted as vector spaces spanned by neurons, interpreted as vectors) and show how the convergence of networks, that is what we may consider the formation of the concepts of classes in a classification problem, for example, happens bottom-up. This study also provide grounding to develop stopping conditions for the training of deep networks.
In the same spirit of power-preserving simplification of models, other works that we would warmly recommend for a pleasant and instructive read are:
- K-medoids for K-means seeding, a highly impactful and strongly experimental paper where the simple CLARANS algorithm is used to improve the seeding of Lloyd’s K-means algorithm
- Generalization properties of learning with random features, a learning-theoretic study of how a randomly sampling the weights and bias vectors for a ridge regression problem can actually yield reduced sample complexity while producing a good learnt model
- The unreasonable effectiveness of structured random orthogonal embeddings, showing how to easily construct random matrix with orthogonal rows offering powerful dimensionality-reduction properties, among others.
From alchemy to electricity
One of the most thought-provoking talks of the whole conference was undoubtedly Ali Rahimi’s address as a recipient of the Test of Time Award, a prize recognising the long-lasting impact of the paper Random features for large scale kernel machines. After briefly introducing the work in their paper, Rahimi went on pleading for a return to rigour in the methodological scrutiny of machine learning research, arguing that rather than looking at AI as the new electricity, as suggested by Andrew Ng, we may actually be dealing with a new alchemy: A discipline carrying within itself the seed of a fundamental science, but also promising to turn metals into gold.
The debate on whether we are trading scientific rigour for research throughput is lively and central, and hardly a day goes by without new research being released showing that what we thought true about the functioning of some ML model might actually the product of a weak methodology. This is why Rahimi’s powerful intervention deserves to be seen and meditated.
Another trend we greatly welcomed during this edition of NIPS was an increased thrust on the research around fairness, usually intended as the unjustified sensitivity of algorithms to some features. A classic example of an unfair algorithm, and one that concerns us very closely, is that of an automated mortgage approval system that would offer you a better treatment if, all else unchanged, you had a different gender, race or postcode.
Of the several contributions in this area, we feel the following papers make for some highly recommendable reading:
- Fair clustering through fairlets, a study showing how to enforce fairness as a hard constraint in clustering problems via a preprocessing step that partitions the data into regions where fairness can’t be easily violated
- On fairness and calibration, showing an inherent incompatibility between error-rate fairness and calibration. This is to say, for example, that a predictive model that wants to achieve the same false-positive and false-negative error rates on any two subsets of points won’t in general also have its prediction probabilities reflect the actual probabilities of something happening
- A slightly brighter view is instead offered by From parity to preference-based notions of fairness in classification, an interesting work showing that the usual parity-based idea of fairness may be too stringent to achieve good classification accuracy. Specifically, the paper shows that much better accuracy can be achieved by requiring that no individual would prefer to be in a group other than their current one, rather than by requiring that all groups necessarily receive the same treatment.
These papers only show a fraction of the issues arising when some concepts of fairness need to be included in a machine learning algorithm, and a smaller still part of the techniques that can be used to tackle them. While relatively new, this line of research is promising and certainly much needed, not only for the improved models that can come out of it, but also because, as Kate Crawford remarked in her must-see keynote, through the lens of unfair models we can realise what is wrong in our society, and try to fix it.
I’ll buy that AI
While the spirit of the NIPS conference still remains mainly academic, the increasingly stronger presence of industry, both through the many sponsorships and the research contributions, is undeniable. We spent some time talking to some of the sponsors of the event and noticed that companies and VC funds alike are willing to bet on a commoditization of AI. The common belief is that there is only a limited amount of AI-related talent, and most companies would rather be consumers of ready-made AI solutions than struggle with the gory details of algorithms and data processing. It is then plausible that in the next few years we’ll witness a multiplication of providers of AI products or, more likely, services.
On the other hand, for those companies willing to build AI solutions, either for themselves or for others, NIPS is also registering a growing presence of hardware vendors that don’t stop with the usual suspects Nvidia and Intel but also include offers like Lambda Lab’s out-of-the-box preconfigured machines to hit the ground running as well as interesting new players like Graphcore, a British startup promising an IPU – intelligence processing unit – that is a processor specifically designed for massively parallel AI applications (waiting for TPUs, anyone?).
Whether among universities, internet juggernauts, visionary startups, algorithmic trading platforms, automobile companies or consulting firms, it is clearer than ever that the race to shape the future of what promises to be a huge technological (r)evolution is fully on. And yet, as the research we discussed here seems to suggest, we may be only getting started.