“Developers, developers, we need more developers,” comes the chorus. But what does it take to train an engineer, especially an engineer familiar with Machine Learning?

This is Part 3 of a series of posts sharing some pain points we’ve seen as our newbies start moving into the ML space.See here for Part 1 and Part 2

  • https://www.facebook.com/groups/aisingapore/
  • https://twitter.com/aisingapore
  • https://www.linkedin.com/groups/10329452

Context

We onboard a lot. Some of our people are apprentices just out of school, some are masters graduates. Our team is made of computer scientists and mathematicians, but also plant biologists and robotics specialists. We work to welcome people to Artificial Intelligence – the field that, broadly, together algorithms, machines and people. It’s part of our mission to fix the AI talent pipeline problem.

But we are not just a training program, yet another MOOC, or a university. We focus on practice over theory, doing over learning, projects over certificates. Our engineers and program participants work on real-world industry projects. Through doing so, they learn how to design data architectures, build pipelines and features, and optimise algorithms. They learn to make trade-offs and, perhaps most importantly, learn to own and take pride in their work.

This approach has strengths and weaknesses. Each project is different, which means each learning experience is different. Our varied backgrounds means everyone has a different learning curve. So, we may not have best practices or answers. What we do have, however, are patterns and pain points we’ve noticed. After giving workshops everywhere from London, Singapore and Indonesia, these learning curves seem quite generalisable to most beginners. We hope this list gives a good heads-up for people with teams they need to grow.

Pain point 3: Using Python as an object-oriented language

Unless one starts off as a Python developer, the Object-Oriented Programming basis of Python is often not noticed by ML newbies. Usually, they call predict on a model without understanding that predict is a method (if this sounds foreign to you, I can relate. It took many re-readings for me to understand what Classes, Objects and Methods in Python were). I call this style a user-centric approach. As users, we use a Python library because we have an end-goal in mind, for example changing datetimes or table indexes. We are less concerned with the deeper architecture of how the Python library is structured. This method of working is alright for beginners or users with defined tasks. For example, if your job is to run a hypothesis test, then you would use Python’s statistical libraries to get a p-value at the end. The situation is different for an ML practitioner. Sometimes, an ML practitioner will sometimes need to write a collection of custom functions to pre-process data. They may also build a selection of models that are to be used in different situations. In this case, their code, not the code’s output, will be shared. In other words, code needs to be made reusable. Usually this means organising code into a Python module. At this point, the ML practitioner faces a learning curve because they need to start moving from being a user, to being a user-developer, who takes a developer-centric approach. This means thinking about concepts like inheritance, encapsulation and error handling. Thankfully, there are good resources to help with this shift. This is my favourite.

 

One caveat here is that, by going down the user-developer route is a potential never-ending rabbit hole. There are practices like Test-Driven Development, building from UML diagrams and Designing Evolutionary Architectures that are entire fields in themselves. Personally, this example is my personal benchmark for how much logging and testing an ML practitioner should know. Notice also how Allen Downey has structured his Python library to serve the Bayesian Statistical approach that he’s trying to teach. This probabilistic programming library is another one I like because of how the code mirrors a method used to reason about data. In these two examples, the domain drives the library design.

Conclusion:

We looked at some learning curves beginner ML practitioners face as they start their programming journey. Another observation to add here is that I’ve seen quite a few beginners diverge just before they have to climb these curves. After learning foundational skills like reasoning about data and building visualisations, they choose to move into project management, product management or prefer to be business developers working with data products. These are extremely useful roles as well, and I think it’s worth pointing out here that being a developer is not the only end-goal after building your data literacy. There are many other fulfilling careers to explore!