Most software development projects go through the same four phases: discovery, research, prototype, and production. Usually, the research and prototype stages are fairly light because experienced engineers can design a solution and when necessary, test their ideas with a quick proof-of-concept (PoC). AI systems development cycle, on the other hand, depends heavily on research to find whether we can actually build a machine learning model that performs well. In addition, putting an AI system in production operationally involves much more than building the models. Therefore, a working prototype is typically required for AI systems in order to have the confidence that the system will work end-to-end.
Let’s look at each of the four stages of AI systems development cycle in more detail.
Discovery
The discovery phase is responsible for defining the project: what are its goals, what is the business problem it is solving, why solving it is important, what is the value of solving it, what are the constraints, and how will we know that we’ve succeeded. Frequently, such information is captured in a Product Requirements Document (PRD) or a similar document, defining the “what” of the project. Some aspects of discovery are described in another article on reducing the risk of machine learning projects.
For AI systems, feasibility and quality of a solution to the problem at hand are usually not obvious from the start. Carefully defining the constraints can dramatically narrow down the choice of technology and algorithms. However, creating new machine learning models still largely remains to be a task for an expert. As a result, a research stage is needed in order to find whether or not an AI solution is possible, as well as to estimate its value and cost.
Research
The research phase answers in detail how we are going to solve the business problem. Relevant documentation of a typical software project may include a system design, various options considered during design and their trade-offs, specifications, etc., with enough information for an engineering team to build the software.
The research phase of AI systems development cycle is highly iterative, often manual, and heavy on visualizations and analytics. First, we need to check whether we can solve the problem with machine learning given the available data and constraints established in the discovery phase. We collect the data, extract it and transform it into inputs to a machine learning algorithm. We usually build many variants of a model, experiment with input data and algorithms, test and evaluate the models. Then we frequently go back to collecting and transforming the data. This cycle stops when, after a few (and sometimes many) iterations at every step, there’s a model that makes predictions with an acceptable accuracy. Information gathered during this process is passed back into the discovery phase.
Prototype
A prototype for an AI system is proof that a system reflecting the production design, without all the bells and whistles, can run end-to-end as code and produce predictions within the predefined constraints. Sometimes, the output of the research phase is close to a prototype, after a little clean-up and converting some manual steps into scripts. As we are getting closer to production, it is better to keep the prototype code at production quality and involve engineers who will be working on the production AI system.
Note that the goal of the prototype stage is not for a data scientist to create something that will then be rewritten by an engineer in a different language. Often referred to as “over the wall” development, such a pattern is extremely inefficient and should be avoided.
Production
The production stage of the AI systems development cycle is responsible for the final system that is able to reliably build, deploy and operate machine learning models. The reliability requirements lead to a plethora of components that can easily take much of the time and effort of the whole project. Such components include testing, validation, model tracking and versioning, deployment, automation, logging, monitoring, alerting, and error handling, to name a few.
Summary
The AI systems development cycle has the same stages as most other software. It is different in the much higher proportion of the effort allocated to the research and prototype stages. The operational components of AI systems at the production stage may also require much effort, especially in the first iteration of the whole cycle. Once the first AI system is in production, the frameworks used for operationalizing machine learning can be reused and improved on in the subsequent cycles.