Data scientists are often tasked with uncovering the unknown, which makes it nearly impossible to create a planned schedule with hard deadlines around specific milestones. Data science projects are packed with uncertainty, and almost always involve failed experiments. Business leaders grapple with this reality because they are used to making budget decisions based on a return on investment (ROI) analysis. To get the best results out of AI projects, teams should follow an agile process that blends techniques from the scientific method, design thinking methodologies and agile software development framework. In practice, data science sprints may feel like this:
But in reality, the employing specific habits, disciplines and techniques make the exploratory and creative process very efficient. In this blog we explain how each of these methodologies are integrated in an agile data science process.
The Scientific Method
You cannot remove the science from data science. Like research projects in the scientific community, a data science project should have a clearly stated question or problem, a hypothesis, experiments, and observations.
- Defining the problem – A clearly defined problem and success metrics are critical to an AI project. Without direction, budgets may be consumed while searching for answers to the wrong problems.
- Researching the data– Exploratory data analysis is a foundational step in building machine learning models. Data engineers will answer the following questions: Does the data have the right level of quality? Is there enough data to address the problem? Are there any biases? Are there correlations between features and the specific outcome being evaluated? After researching and understanding the dataset, the team will be ready to choose the machine learning algorithms best suited to solve the problem.
- Stating a hypothesis— A data science team will state a hypothesis on how to structure a Machine Learning (ML) solution around the data provided. Documenting the hypothesis and obtaining feedback from business experts reinforces the team’s focus and direction when designing experiments.
- Conducting experiments and recording observations — Data scientists are expected to conduct multiple experiments throughout a sprint and analyze their results. Some experiments may be successful, and others may fail. Results will not necessarily improve incrementally with every new experiment. The first experiment may yield the best results and the last, the worst.
- Extracting Insights– Acquiring knowledge and insights in every sprint is probably the most important step in an experimental process. If an experiment fails, what can we learn from this? Is more data needed? Different data? Better quality data? Are there biases in the dataset?
Identifying actionable insights in each experiment will drive better results in data science projects. An agile data science sprint will include all of the same key phases performed by scientists in experimental research and development projects.
The design life-cycle is very similar to the data science lifecycle in that it involves unstructured discovery and experimentation that will be improved through non-linear iterations. However, the design discovery process is driven by multidisciplinary teams with diverse skills and perspectives and is focused on the humans it is designing for. At Wovenware, we focus on augmenting human capabilities with transformational AI innovations. The only way we can achieve this is by deeply understanding and empathizing with the people we are serving and thinking through the lens of customer experience. The design-thinking approach provides a framework to understand if the solutions we are looking to build are desirable, technically feasible and economically viable.
- Empathize- Before diving deep into exploratory data analysis, our data science projects begin with a series of workshops and interviews to understand business objectives and experience and pain points of all the people that interact with the organization. We conduct interviews and talk to real people who may be impacted by the solution and place them front and center in every stage of the data science process.
- Define- Defining user personas, challenges, and pain points helps set the stage, so that we can define a very specific problem that AI can solve and how we can measure success. The user personas will become the central focus of every iteration of the project. In AI projects we take this a bit further and define “bot personas.” If an AI product needs to have human-like qualities to interact with real people, what should they be? For example, If it has natural language capabilities, what tone and language should it use?
- Ideate- To spark innovation, people with diverse skills, backgrounds and roles participate in ideation sessions. New ideas that challenge assumptions and what people envisioned for the project are generated in an open and creative environment. Teams often emerge from ideation workshops with ideas they never imagined possible.
- Prototype- Building quick prototypes before spending large sums of money is a pivotal step for designers. In data science projects, we build prototypes and proof of concepts (POCs) to validate that an AI solution is feasible and has practical potential.
- Test- In the validation stage the design team will have real people interact with the prototype and record their behavior and reactions to analyze what works and what needs refining in the next design iteration.
Integrating design thinking principles and diverse perspectives in an agile data science process will help teams create a desirable, feasible and viable design where humans and machines work seamlessly together to solve real-life problems.
Agile Software Development
The Scrum framework was designed to help solve complex problems, and data science almost always addresses complex problems. It helps teams learn and adapt in short iterative cycles. Agile data science sprints resemble Scrum because they incorporate a lot of its artifacts and events.
- Sprints Length– Sprints are generally 2-4 weeks. Short experiments help avoid going down rabbit holes and keep the team focused on objectives. Short sprints also set the stage to have frequent discussions of assumptions, results and future experiments with domain experts.
- Team Roles- Like Scrum, the data science team will receive priorities and direction from a business owner but will otherwise self-organize and be responsible for defining, planning, and executing the experiments, tasks and activities required to complete the desired results.
- Effort Estimation- Estimating the time it will take to complete an AI project is daunting if not impossible for a data science team. There is just too much uncertainty surrounding experimentation and research. Using story points and relative order of magnitude helps establish a baseline for planning and forecasting without stressing the team with hard deadlines.
- Daily Scrums- Data science requires thoughtful collaboration between team members and Daily Scrums are a great way to get discussions going, not just within the data science team but also with business experts, designers, and other members of the team. It is especially useful when teams are working remotely and have limited or no physical contact.
- Retrospectives- Inspection and self-reflection is an important habit and discipline to promote in data science teams. Inspection goes beyond the results of the sprint and experiment and focuses on the agile data science process itself. What is working for the team? What is not? What can we do better?
Innovation in data science does not come with a step-by-step handbook. The iterative and retrospective process should focus not only on the solution and the end-user but on the project team and how it can improve the ways to work together.
The Agile Data Science Manifesto
The following principles will help data science teams uncover better ways of working together and is inspired by the original Agile Manifesto.
- Outcomes vs. Metrics– Building AI models often turns into a purely technical challenge of obtaining the best accuracy. Data science teams should focus on business outcomes and use model performance metrics as supporting evidence of the potential impact of the solution.
- Multi-disciplinary Teams vs Technical Teams– While data scientists can wrangle data and create sophisticated algorithms and models, few have the business expertise to have a 360-degree understanding of the data and the problem they are solving. A holistic data science project team should include domain experts, service designers and business analysts who can engage in interdisciplinary thinking to validate assumptions on data, possible biases and interpretation of results.
- Simplicity vs. Sophistication- A lot of new and advanced research in artificial intelligence is being released every day, continuously increasing the potential and sophistication of AI models. To keep up with the pace of research, data scientists must learn and implement new tools, techniques, and algorithms. However, when creating industry solutions, the team should aim for simplicity and not sophistication. Do not use a neural network if a simpler linear regression solves the problem, and use the minimum amount of data needed to achieve business outcomes.
- Knowledge vs Product Features- The greatest value in building artificial intelligence solutions is uncovering knowledge about individuals, organizations or the world. While software solutions focus on feature development, data science solutions will focus on insights creation.
- Balance Uncertainty vs Avoiding Risk- The process of research and experimentation is filled with uncertainty. Innovation requires creativity and boldness and should not be thwarted by risk aversion. Uncertainty should be embraced as part of the process yet contained within carefully planned sprints and introspective sessions.
Managing AI projects can be complicated, but the principles in this manifesto will help teams work effectively and obtain excellent results.
Agile Data Science at Wovenware
As a design-driven AI and software development consultancy, Wovenware has created a proprietary agile data science process that we have coined “Innovation Sprint.”
By combining the best practices, tools and techniques from design thinking, agile software development and data science project management we drive positive business outcomes and better customer experiences.