Life Cycle of an AI or Machine Learning Project
As a beginner in the field of AI or maybe a young intern or someone who happens to have been hired lately to execute some Data Science or Machine Learning related tasks. You are just a small part of a big puzzle. Right now you might just be doing what you are told to without having a full picture in mind. Your responsibilities may include Data Acquisition, Data Preparation or may be working on models which are amazing tasks to work on.
When we know the full life cycle of the project that we work on it helps us understand how the effort that we put into the project connects with the effort of our co-workers. Having knowledge of this connection makes the whole working process more fun as a team. It helps you see how the small efforts that you have put into it pile up with the efforts of co-workers and turn out to be a success. Something that caters as a solution for a real-life problem.
I have divided the whole project life cycle of an AI/ML project into nine steps:
1. Understanding Business Requirements
Normally enterprises and businesses are looking for data science solutions to help their organization grow. Depending upon the scale of the enterprise the data available with them might be of different sizes. They want to use Artificial Intelligence and Data Science to get solutions and growth opportunities with the huge amount of data that their firm holds.
So the first step is to understand the business requirement of the project. Like what sort of solution are they looking for? Do they want to have an increase in sales or they want good decisions to be taken in and outside the organization after getting the insight from the data?
This step can take from days to weeks to fully define the scope of the project and to fully understand the requirements. Involves a lot of back and forth meeting between the client and the development team.
2. Data Collection
The next step is to extract the data from the client. Data is the most important part of the whole project. This data can be stored in various form varying from enterprise to enterprise. It can be stored in the form of SQL DataBase, log files, Spark, Hadoop etc. An expert in data mining like a Data Engineer will deal with the extraction of data. Data Engineers via Project Manage and Business Analysts will reach out to the clients to discuss the data extraction.
The size of this data can vary from Business to Business. It can be in 100’s of Gigabytes to 100’s of Terabytes.
3. Data Preparation
This step has to do with taking care of the data that has been collected. The data that we have received can be unordered and unorganized. There can be values that can be missing in the data. There can be data that is not required or maybe it can be unimportant. So to fix all this issue Data Scientist should clean and pre-process the data.
4. Exploratory Data Analysis
The clean and pre-processed data that has been received should now be used to get insight from it. This analysis includes visualization of the data which will include plotting graphs, allowing the Data Scientists to learn about the pattern present in the data, also to find a correlation that exists between different data points.
5. Modeling & Evaluation
After getting details from the Data Scientist now a machine learning engineer will use this data and apply algorithms on it to train a model that will help achieve the business requirements of the project.
6. Communicate Result
Now the results so far have to be conveyed back to the clients to showcase them all the analysis and results that the team has with them. The developers have to ensure that they should convey the full report in not more than 6 pages.
Now when all the above steps are done and the client is satisfied with the results produced in step 6 then the whole model is deployed on a server which can be AWS, Azure etc. This task is done by the software developers as they hold more expertise in deployment than any other person on the team.
8. Real-World Testing
After the deployment, unknown data to the model is brought in and the model is implemented in the main business. The impact of the model on the business is measured. For example, the impact can be how has it helped increase the sale? How has it helped the enterprise generate more revenue?
The final step deals with making the model function better by making tweaks in the code and comparing it with the previous performance. It also takes cares of the small room of improvement that might have been left in the previous steps but would have been evident in the Real World testing.
Having an understanding of the whole life cycle of any project does help you connect all the dots. It allows you to appreciate the efforts that are put in by you and your co-workers and how that individual contribution sums up to a complete success. This article also helps those who have heard about job profiles related to data science like Data Engineer, Data Scientist and Machine Learning Engineer but never actually understood how all of them are connected. This post helps you bridge that gap of understanding that existed before.