DISPATCHES FROM JUPYTER CON
Jupyter Notebook is the tool of choice for many data workers. Data visualisation experts use it to build dashboards , data scientists use it to test algorithms, and computational scientists use it to study the stars in the sky and the genes in the human body. We use Jupyter a lot in our day to day as well. Usually, we use it on a laptop as a quick and easy way to build prototypes and explore data. Once we have a good idea of what we’re working with, we can then write scripts to to automate our analysis or train our model on a more powerful virtual machine. But we are only one type of engineer using Jupyter in only one of many ways. Jupyter Con this August gave us the chance to widen our knowledge by meeting other users and gathering more ideas from them.
As a former animal biologist, I observed roughly three breeds of people at Jupyter Con. There were the data workers use Notebooks to visualize and analyze data, there were the engineers who set up Jupyter for data workers, sometimes building on powerful big data frameworks and serving hundreds of people, and there were the educators who use Jupyter helps cultivate data literacy not only in scientists, but literature majors and high schoolers as well.
The data workers came from many fields, which really showed how different industries are being enhanced with a data-centric approach. There was Mark Hansen from the Columbia Journalism School, who talked about using Notebooks to investigate fake profiles on Twitter and the behaviour of the bots behind these accounts. He and his students eventually published the analyses as a longform article The Follower Factory in The New York Times, and the piece is a pioneering work on how statistics and data can be mixed with traditional investigative journalism to inform and to tell a great story. While Mark talked about journalism and data, Michelle Ufford from Netflix looked at how data fuels well, almost everything at her company, from business decisions to their legendary movie recommendation system. My favourite part of her talk was when she showed the company’s organisational chart – there were engineers for algorithms, visualisations, business analytics and compute infrastructure, just to name a few. All of them use data as their raw material, and Jupyter is their data tool.
The infrastructure engineers talked a lot about how Jupyter was not only a stand-alone web-browser application, but also a powerful extension to their existing compute infrastructure. There was CERN, who used Jupyter as a user-friendly interface that extended the functionality of their existing big-data processing system SWAN. Netflix improved their job scheduling system with parameterized Jupyter Notebooks. These teams took what was good about Notebooks – the interactivity and ease of use – to improve, not replace, what they had already built.
The educators were inspiring. One highlight was a talk from the UC Berkeley Data Sciences division, where a small team worked with student volunteers to craft Notebooks that applied machine learning to all sorts of undergraduate fields of study. There were Notebooks to analyze text data for English Literature classes; there were Notebooks that used data to teach fundamental theories in Economics. Despite only being in action for a short time, the team has managed to reach many students and many faculty, giving members of the university a taste of what data means for their field.
One last thing
Throughout the event, what was remarkable was how these three breeds didn’t move in silos. An educator was just as comfortable talking about hosting Jupyter Notebooks on the cloud as she was talking about working with Economics professors to use Jupyter to teach a class. An infrastructure engineer was clear about the multiple ways data analysts use Jupyter to build dashboards and visualize data. The atmosphere was interdisciplinary and open and the fast exchange of ideas was invigorating. If software and data are equal parts technical tooling and community, that feeling of community is one more thing to learn and emulate from the conference.