Guiding Principles for Data Science Project Managers

July 29, 2020

The number of companies around the globe that are adopting artificial intelligence (AI) multiplies every year. Given this trend, data scientists are in high demand and specializations like natural language processing (NLP) and deep learning are becoming increasingly important. Despite the shortage of AI talent, companies cannot ignore the role of soft skills, such as storytelling and effective communication, which is critical to effective execution of data science projects. Data science project management must be an effort to bridge the gap between the data science team and other business units to bring out these essential soft skills and foster greater collaboration enterprise-wide.

While the data science manager will play an increasingly larger role in bringing data science and business units together, currently, job postings for data science managers are a very small fraction of postings for data scientists. Since it is an emerging career, the minimum qualifications vary greatly – from PhD degrees in data science with 6-plus years of experience (targeting a very limited pool of applicants) to traditional managers with strong communication skills and knowledge of project management principles and concepts (which is almost equal to “no experience necessary”). Wovenware has taken the approach of training traditional project managers, scrum masters and technical leaders to be able to lead data science projects. As part of our company strategy, we are making sure all our leads are AI-literate, have a general understanding of AI business use cases and the experimental process. This is the first step, but we are extending the program to provide opportunities for up-skill in basic understanding of statistics, data analytics, and storytelling. We will continue to develop the program as a new generation of specialized and formally trained data science project managers emerges.

The problem is that when it comes to data science project managers, the role continues to change and evolve. Needless to say, there are no nationally accepted degrees or professional certifications (yet) to give leaders formal training and a competitive edge in the market. The following guiding principles are a good starting point for leads that are transitioning to data science management.

A New Set of Guiding Principles for Data Science Project Managers

Traditional project managers are almost hard-wired to do whatever it takes to deliver results on time and on budget. To drive innovation in an organization and lead data science teams, managers need to follow a very different set of guiding principles. Famous author of The Mythical Man Month, Fred Brooks, accurately describes it this way: “A scientist builds in order to learn. An engineer learns in order to build.” The stark contrast is self-evident when comparing the two management approaches side by side.

Data Science Project Management

Figure 1: Traditional Project Management vs AI-Driven Innovation

Building vs. learning– Projects will no longer be driven by deterministic goals of building an automated machine but will be driven by a broader vision of opportunistic goals attained by acquiring unique knowledge about a business, an industry or perhaps the world. While in some cases the output of a data science project is deployed in a production application, in many cases the result of a data science project is a paper accompanied by a document or PowerPoint slides to discuss insights that have business impact.

Planning vs. experimenting– There will no longer be a “project plan,” though there may be cycles that resemble agile framework sprints. Data science projects need to be managed like R&D projects following a scientific process that may introduce some perceived chaos and a high degree of uncertainty which is often difficult to manage. In reality, experiments are carefully designed and documented, with effort and timelines estimated. They are executed with a high degree of discipline and commitment.

Mitigating vs. embracing risk– Instead of identifying and mitigating risk factors, data science leaders should accept and manage uncertainty. Risk needs to be quantified and embraced before tackling a problem. Data science project leads should be very proactive in the beginning of a project in defining and quantifying the acceptable margin of error for a data science model to be successful.

Managing time and budget– Limiting time and budget may have a direct impact on insight quality. Insights can’t be scheduled on a calendar and the innovation process is neither linear nor progressive. The first experiment may yield the most accurate model but there is no way of knowing that without completing the rest of the experiments. However, most organizations have a limit of budget, time, or both. Time and budget will be maximized by clearly setting milestones and designing each experimental phase with the goal of achieving milestones. Data scientists are motivated to run experiments and find answers as quickly as possible. After exploring the data and running a few experiments, a good data science team will have a sense of the feasibility of reaching the milestones.

Providing ongoing maintenance – To realize the full potential of artificial intelligence, data science models need to be continuously fed new data, re-trained, so that they can get smarter and provide more accurate predictions and better insights.

What Has Not Changed

Though the data science project management approach is quite different from that of traditional software development, there are many basic elements that have not changed:

It starts with choosing the right problem to solve– This remains the hardest barrier to beat. Ask the questions and then see if AI provides the right solution. Organizations getting caught up in the hype create a storm when trying to figure out how to inject AI into the organization before asking the right business questions first.

Projects are driven by milestones– Whether through an experimental process or scrum sprints, all project tasks are structured around achieving a clear set of milestones.

Communication and executive support are critical for success– As a leader, communicating vision, expectations, learnings, barriers and opportunities will be critical to managing successful data science projects. Support from executive leadership is key to getting buy-in from the rest of the organization.

Implementing change requires managing change– A lot of people resist change, even with something as exciting and promising as AI. People may worry about losing their jobs; they may not trust insights provided by an abstract mathematical model only a handful of people understand; or they may need to be trained to use new digital products. Change management strategies must be put in place to adopt AI across business units in an organization.

Managing data science teams is an art and a science. As shared in a Harvard Business Review article, humans are at the heart of every technology project. Managing technical and analytical resources is very challenging, especially if you do not have the technical acumen to pose the right questions. Managing people, motivating your team, and communicating clearly and often are traditional skills that are not growing old.

Why the Data Science Process Is Misconstrued

Data science is driven by an experimental process and this implies that the exact results cannot be guaranteed. This is often misconstrued as budget being potentially thrown down the drain if the experiments don’t go as planned. The level of uncertainty and experimentation will vary depending on the problem that is being solved. Object detection models and basic chatbots, where the technology is more advanced and widely used, bear a minimal amount of risk compared to building a self-driving car which requires much more research. When tackling problems that require a higher level of experimentation and research, while the exact outcome is not guaranteed, valuable insights are always derived after each phase in the data science process. The outcome may be a 90% accurate model, or it may be insights on additional data that may need to be collected to be able to generate answers to questions through an AI model. If there is value in determining an organization’s capacity to extract insights and make progress toward achieving those milestones, then the investment in AI will be worth every penny.

The Data Science Process

The first widely used data science process (or back then, data mining process) was Cross-Industry Standard Process for Data Mining (CRISP-DM) which was introduced in 1996. It included six major phases:

  • Business Understanding
  • Data Understanding
  • Data Preparation
  • Modeling
  • Evaluation
  • Deployment

CRISP-DM evolved and a new data science framework called OSEMN (Obtain, Scrub, Explore, Model, Interpret) emerged in 2010. Most data science teams implement some variation of it.

Data science managers leading innovation projects that need to be deployed in a production system should create their own flow based on lessons from industry experts that align with their processes and governance structure. A suggested workflow to evaluate is Github’s machine learning team described in the O’Reilly publication: Development Workflows of Data Scientists. A lot of data science projects result in presenting insights in papers or slide decks, but the most advanced in the industry take those insights and apply them to a machine learning model and ultimately into a live software application or business process.

Our team at Wovenware extends the OSEMN process by incorporating an AI strategy design. The high-level steps are the following:

  1. Define Problem- What problem will be addressed? What is the business impact?
  2. Define Success Metrics– How will success and failure be tracked and measured?
  3. Gather Data – Identify data sources, define inputs and collect data.
  4. Cleanup & Process Data– Perform data cleansing, scaling, normalizing.
  5. Explore Data– Analyze the data and identify subgroups, outliers, tendencies.
  6. Identify Features – Form a hypothesis and identify features to be used.
  7. Prototype– Build exploratory models and revise the problem and features, iterating as needed.
  8. Build Infrastructure– Build and test the infrastructure for the model.
  9. Operationalize Model– Gather new data, develop integration pipelines, retrain and optimize models iterating as needed.

Operationalizing Data Science

The most challenging and exciting part of data science project management is taking an abstract mathematical model and integrating it into an existing software product for the world to use. Creating this link not only between the technologies, but between the teams and business units involved is a journey and a process. As the innovation process matures, productionizing a model will require implementing a more traditional and operational process that includes a timeline, budget, and project schedule. Employing traditional project management skills will be critical

The AI revolution is exciting. Executing data science projects with the right balance of innovation, experimentation, planning, research and discipline is key for managers shifting from traditional software development to data science project management.

Leave a Reply

  • (will not be published)