During personal discussions or networking events, I am often asked by business owners, department/function heads the question “I believed in the power of data science, but how shall I start?”. So this blog post is my attempt to answer the question.

What to Keep in Mind

Its very important to always remember, while building data science capabilities and working through the roadmap, that “Value have to always move ahead of the costs”. I have seen efforts that was not able to create a sustainable momentum in building data science capabilities because the costs (mostly infrastructure) ran way ahead of the value. With costs running way ahead, many will be pressured to show results and without proper planning on the projects to work on, the effort was not sustained.

Hire an Experienced Data Scientist

Yes to start, please hire an experienced(!!!) data scientist. Why an experienced data scientist you might say and not those who has title of a “data scientist”? The experienced data scientist should have the expertise to understand the data quickly and determine if there are any “low hanging fruits” that can be plucked with tools that the organization has accessed to easily, like Excel or open source. These “low hanging fruits”, together with the tools is to provide immediate (well about 1–2 months wait, depending on the data quality) value to the organization. These projects are to be used to get buy-in from other parts of the organization.

Sometimes hiring a data scientist maybe a high risk maneuver given that it is a permanent position (data scientists are in high demand so please do not even consider trying to hire on a contractual basis). An alternative will be to hire a consultant who has done data science projects. The consultant can sieve through the available data and determine if there are sufficient “low hanging fruits”.

Side note

I have seen organizations that hire people who has completed a Masters or bootcamp and expect them to know how to work on their existing data. Most of these “fresh” trainees requires mentors to guide them further so that they know how to sieve through the data for insights. Experience really counts a lot in data science!

Plucking More Fruits

Given that sufficient value has been proven from existing data, the next step is to work on TWO paths: (1) data governance & management (2)infrastructure

(1) Data Governance & Management

Having proved that data is of value, it is time to set up processes to manage it, ensure that the data is of higher quality, so as to reduce the time period between extracting data to having data at the right quality to be used. This will allow data to be turned into insights for decisions quickly, pushing the value envelope further.

Based on the first few projects (aka “low hanging fruits), the organization can also now look at what further data can be captured (at a reasonable costs) so as to improve their insights.

(2) Infrastructure

Having created more buy-in from management, the organization can now work on the infrastructure. Building the infrastructure generally requires a much larger budget because of the need to integrate with existing systems and also storing of data. But since we have the “low hanging fruits” to show for, it will now be easier to ask for a budget to build and management will have more confidence that the budget will be used to create more value for the organization.

Side note

I’ve seen in a lot of situations, organization went ahead to purchase “Big Data” technology without proper plans on how to use them or even worst, whether there is a need to use them. In the end, the momentum to build data science capabilities was not sustained, because of various reasons value created (if they are created in the first place) was not enough to cover the infrastructure costs and these organizations are stuck with the ‘white elephant’. And the conclusion from such failed attempt was management do not believe in data science anymore (who can blame them) which to me is very sad, because the organization has lost the chance to be competitive.

So remember what I said, “Value have to always move ahead of the costs.”

With Better Infrastructure & Data, Comes Greater Value

Setting up the infrastructure and data governance processes might take some time, like 6 to 12 months. During this time, the organization should continue to find more data science projects to create value for the organization. With better infrastructure and data quality, the value/time spent ratio will be increased. This increase will then lead to another chance to put in more resources to build better infrastructure, larger teams and collect more data.

With the value running way ahead of the costs, and ensuring that it stays that way, it will create a virtuous cycle and in due time, the data science capabilities will be built up and stay with the organization.


This is of course just a very simple description on how to build data science capabilities. There will be other considerations as well given the different domain and such. But at the end of the day, the most important message that I want to bring across is that “Value have to always move ahead of the costs”. otherwise the effort is not sustainable and organizations may just lose the competitive edge that is necessary to survive in this dynamic and harsh environment.