5 Steps to Build an Effective Data Science Team

I’ve built and run data teams at Paypal, Facebook, and now at Retina AI. In each of
those roles, I’ve also been asked how to build a data science team. Below is my step-by-step approach to creating a team that facilitates effective data usage from the start.

Step 1: Think Data

The first step to creating an effective data science team is to build a data-driven
culture — not only within the team, but also across the organization. Valuing objective data insights over gut feelings early in your company decision-making processes is the first and most important step to becoming data driven.

Step 2: Plan Data Maturity Phases

As you build a successful data science team, certain phases will need to occur in order to establish a solid foundation that prevents later problems.

Phase 1: Transparency and Single Source of Truth

Executives at early-stage companies oftentimes make the mistake of hiring a data
scientist when the company is too young. They typically think that if the company has data, then they need a data scientist — but that just leads to trouble and disgruntled employees. Before hiring a data scientist, you need a data engineer or a dataops engineer to create the right data infrastructure.

Data engineers have a very different skill set than data scientists. If your first data
scientist chooses the wrong data infrastructure, it can result in more time spent fixing data problems — furthermore, any later changes to the system will be disruptive and expensive. (To learn more about how to build data infrastructure the right way, read: https://retina.ai/blog/dataops-principles/)

Many times, at this stage, there is no clear goal for data scientists; or even worse,
there is no way to measure the data team’s impact. The data scientist is then relegated to running SQLs and providing minimal value to the business. In addition, he or she won’t have enough data to work with, especially if the company’s product is still a beta version. Consequently, data scientists could lose credibility, and ultimately buy-in and resources.

Phase 2: Experimentation and Forecasting

Once your company has enough data, it’s time to hire a data scientist to develop a
stronger analytical muscle within your company. Data scientists collaborate with the product team to run experiments and build a solid infrastructure for experimentation. Results from data science projects will then advance from historical to predictive analytics.

Data scientists gain a lot of credibility at this phase by providing actionable results
to various teams, such as finance and marketing. Experiments and back-testing can be used to prove the value of the data science team’s predictions and insights.

Phase 3: Customization

In this phase, data scientists have more freedom to create customized experiences and models for users, stakeholders, and other teams. It is important to ensure that the data science team demonstrates progress and impact from their data-driven experiments.

Phase 4: Automation

The last stage of data science team maturity is automation. Automation in data science workflows can make impactful inroads to dramatically reduce output time. Insights are now streamlined so people are getting the right insights and at the right time, without the need for tedious tasks. While most automation can
significantly reduce the work data scientists need to do, no amount of automation can replace an entire data science workflow.

Step 3: Identify Potential Pitfalls and Remedies

When hiring a team of data scientists, it is important to anticipate some of the
possible problems that may arise and formulate a plan to remedy them.

Potential Pitfalls:

  1. Data Fatigue – Data scientists often encounter data fatigue, which is common when working with only one data set. The opposite is also a problem when there is too much data.
  2. Low Contribution to Revenue – Data scientists help out various teams, such as the product team, marketing team, or sales team, which are considered the true drivers of revenue. The closer you are to creating revenue, the higher
    chance there is that you will succeed inside the company.
  3. Failure to Tie Data Science to Business – The biggest obstacle data science teams face is providing valuable business impact. Data science teams find it difficult to quantify their models as business innovation, and there is often a
    disconnect between the data science and business teams.

Remedies:

  • Focus on Impact – Data scientists can undoubtedly drive and impact business decisions for many teams. Focus on data and insights that are non-obvious and directly contribute to revenue. Projects should have clear goals with business impact, and team members should collaborate closely.
  • Build Credibility – Start with analytics, which may not be hard from a data science perspective, but provide useful insights for
    various teams and executives.
  • Align Career Growth – Plan a career path for your data scientists that aligns with the growth of the company. Increasing resources and leadership responsibility can be tied to meaningful results from a data science
    team.

For more, see: The Five Dysfunctions of a Data Science Team

Step 4: Develop Organizational Strategy

via GIPHY

There are different ways of structuring a data science team within an organization. One size does not fit all. Let’s dive deeper into how different types of organizations can best use a data science team.

Type 1: Large Organization, Centralized Data Science Team

A centralized data science team in a large organization works across the entire company as various needs and projects arise. The biggest challenge with this model is a potential priority mismatch. Since many different business or executive teams are requesting help, the centralized team gets overwhelmed and can’t meet the expectations for the different teams all at once. In this model, almost every task can become “urgent” and slow the team’s workflow. It is important to do careful project management in such a team.

Type 2: Large Organization, Distributed Data Science Teams

Another approach for large organizations are several distributed teams, where the data infrastructure team becomes centralized and the data science/analytics team becomes distributed. In this model, data scientists are now distributed into different business units (e.g., marketing acquisition). The issue with this model, however, is that data scientists are isolated, which breeds a lack of collaboration with other data scientists. A recurring theme in this model is also the lack of visibility in the work of other data scientists, oftentimes creating duplicate efforts.

Type 3: Large Organization Hybrid

Large organizations — such as Google or Facebook — deploy a hybrid approach to building data science teams. In this model, data scientists circulate between a core data science/analytics team and other department teams. This allows for data scientists to have specific functions, while still being able to collaborate with others.

Type 4: Medium/Small Organization

In smaller organizations, the data team typically falls under a C-suite executive (i.e., CFO, CTO or CMO) with only a few people working on the team. This organizational type, when implemented with a proper mix of dataops engineers and data scientists, allows for high-impact work. Building a centralized team in early-stage companies is typically more effective because it allows room for flexibility to reallocate resources later.

Step 5: Hiring and Managing your Data Scientists

via GIPHY

Data science is an interdisciplinary subject, including machine learning, computer
science, statistics, mathematics, data visualization, communication, and deep learning. Keep in mind, it is oftentimes unrealistic to find a unicorn who will pull data, synthesize an analysis, and communicate business value. Hire data scientists that have deep expertise and knowledge in a particular field, whether it be marketing or computer science. Most importantly, keep high — but realistic — expectations when it comes to hiring candidates.

To create an effective team, it is also vital to build career paths for your data
scientists. Most importantly, communicate with individuals on the team to ensure that their personal goals match with the organizational goals. Here are some example roadmaps for data scientists:

Levels: Analyst/Scientist > Sr. Analyst/Scientist > Manager/Lead > Director > VP > CDO

Roles: Infrastructure & Operations, Business Intelligence, Research Scientist, Developer, DBA, Measurement

Functions: Product Strategy & Operations, Finance, Marketing/Growth, HR, Supply Chain

For more information on salaries, read this Data Salary Guide.

Relevant statistics:

  • Still a large gender bias – only 23% of professionals are female
  • Job seeker’s market – 72% of professionals would leave current role for the next good opportunity
  • Highest flexibility in work – 64% have flexible work hours or can work from home
  • 2 years remains the typical length of time in a given role

Summary

These five steps are what we’ve found to be the most effective way to build a data
science team. We’ve incorporated many of the learnings from our own experiences and those of others in the industry. Hopefully this helps you as you build your own data and data science teams.

If this sounds like the sort of team you’d like to join, we are always hiring!