What we saw at Open Data Science Conference Europe 2017

Rafael Hernández, Israel Herraiz and Amanda Garci

d&a blog

ODSC London 2017 showed us an amazing variety of tools, libraries, notebooks and data science apps. There were 75 speakers, plus 1.500 attendees and literally no room for more.
It doesn’t mind if you prefer Python, R or even Julia. Whether you’re into data visualization, academic research or open source ML libraries. Over there we found the usual hot topics, such as deep learning, and one of the most difficult to find in an open source conference: Quantitative Finance.

There were several tracks, depending on the kind of ticket you purchase: workshops, the so-called Accelerate AI conference, and the conference/talks. We attended the workshops and the conference, and skipped the Accelerate AI conference.

The workshops were long hand-ons session, mainly at introductory levels, about different data science topics. We managed to attend the Python Quants workshop about algorithmic trading with Python, using Pandas, and another workshop about running Tensorflow on Google Cloud.

The conference was focused on six topics: Open Data Science, Machine Learning, Quant Finance, Visualization, Data Science Research and a Kickstarter. The border among some them was a little fuzzy as you can image. In the ML track, the deep learning hype caused that several speakers all started their presentations introducing its basic building block: the perceptron. It was nice for entry level attendees, but we think the organization should foster some coordination to avoid repetitions.

The rest of the conference was about more advanced topics and current issues in the world of data science.

Keynotes

Neil Lawrence’s keynote pointed out the great necessity of what he called “data professionalism”. His outstanding reflexion comforted us showing that these issues have not gone unnoticed by many companies and institutions. But it’s still an unsolved problem.

What happens if one of the data scientists in your organization leaves or, in his own words, is run over by a bus in the street? The answer is that taking over her work requires a huge effort in terms of talent acquisition, training, etc.

Even if you hire a new great data scientist, the adaptation process and learning curves suppose a drain a resources, due to lack of standardization and common practices. Our company is well aware of this cost, we have launched specific projects that aim to smooth this path as soon as possible.

Probabilistic Programming with PyMC3

We were happy to discover that we are using this specific library in one of the projects we are developing at BBVA Data & Analytics. We have used it in some analysis where we want to infer the full posterior probability distribution through sampling methods, such as MCMC.

Thomas Wiecki is one of the main contributors of Pymc3. In this Python library you only need to specify the bayesian model formulation and, under the hood, the library builds up a computation graph in Theano before applying MonteCarlo simulations. Here are some links about this talk:

Where Algorithmic Trading meets Open Source

Before knowing Quantopian, Python Quants GmbH or some speakers at ODSC, algorithmic trading seemed looked like black magic to us. Talking about open source resources in this field was something unexpected and very welcomed.

For instance, Quantopian is a platform based on an algorithmic trading community where people write code for investment strategies. The platform checks the performance of these strategies with backtesting evaluation. In addition to that, the best algorithms in the community are eligible to receive money from hedge funds. As a reward, the author gets a commission for the profits generated by her strategies.

This conference was a good starting point for people who want to get hands-on how to apply Data Science to Quant Finance.

Find out more about this talk and algorithmic trading at:

How to win Kaggle competitions: Stacking Made Easy

The topic of this talk was a neat work from one of the top grandmasters in Kaggle’s community. Basically, Stacking is a meta-learning step on top of several machine learning models to build an ensemble in a smart way. First, multiple base classifiers are used to predict o classify in a supervised problem. Second, a new learner combines their predictions using the output of base learners as the inputs to train a higher level learner creating an stacked ensemble.

StackNet is a meta-modeling framework that can easily be used to increase your algorithm accuracy finding the optimal weights for each base learner. It supports some of the most popular machine learning frameworks: scikit-learn, xgboost, h20, Keras, etc.

Marios Michailidis explained in detail how this generalized stacking network performs:

“Most algorithms rely on certain parameters or assumptions to perform best, hence each one has advantages and disadvantages. Stacking is a mechanism that tries to leverage the benefits of each algorithm while disregarding (to some extent) or correcting for their disadvantages. In its most abstract form, stacking can be seen as a mechanism that corrects the errors of your algorithms.”

Stacking Made Easy: An Introduction to StackNet by Competitions Grandmaster Marios Michailidis (KazAnova)

In another talk, Piotr Migdał gave us some tricks and tips about reproducibility and a number of best practices on team cooperation and model deployment. It’s important to be aware that even a single untracked parameter tweak can lead to frustration and inefficiencies in the whole team. I can’t resist to mention his good humor with the parallelism between Borja’s Ecce Homo painting and model reproducibility issues.

BEHIND THE SCENES OF TRAINING, MANAGING AND DEPLOYING MODELS

Stay tuned for more on ODSC’s blog:

We returned to Madrid with the backpack full of ideas, new tools to test, and looking forward to the next edition of this conference in Europe. In the meanwhile, we satiate our appetite for more with the ODSC blog, always full of interesting posts:

https://opendatascience.com/blog/