In a previous post, we discussed what data science is all about. While doing so, we mainly focused on talking about the CRISP-DM model including the various steps of a typical data science process within a project. These steps usually start with understanding the business and data, which is then followed by data processing, preparation and modeling, before one evaluates the models and finally also deploys them. Traditionally, data science often focuses on the data processing and modeling parts which is also imminent when looking at the curricula of popular online courses and university classes. Nonetheless, when conducting projects in companies or via freelance work, it is often times a similarly important task to first collaborate with business departments and stakeholders in order to bring the project on track, as well as finally to deploy and put models into production. Sometimes the business understanding part is the responsibility of a separate business analyst, while deployment and production is the task of a data engineer or someone responsible for DevOps. However, in many companies no specific employees exist to support data scientists with these tasks. Traditional business analysts often have no expertise in the field of data science, machine learning, or artificial intelligence. Similarly, existing production systems and IT employees are not trained for putting machine learning models into production. At the same time, the work of data scientists is often very tightly knit with business requirements and consequent understanding, as well as with final production and deployment.
This is why I have seen the necessity of data scientists having knowledge and skills in this wide range of processes of a typical data science project. Consequently, I define a person who can support and execute a data science project from start to finish following all these necessary steps and processes as a full stack data scientist. In a nutshell, one could also say that a full stack data scientist is a combination of a business analyst, a modern data analyst, and a data engineer. This blog is dedicated to help interested people in widening their data science skills and making them great full stack data scientists.
Explanation of a full stack data scientist based on an example
To better envision a full stack data scientist, let us go through a simple example. Let’s say you start working at a medium sized web hosting company. They are just starting to get involved into data science and have no fully developed processes for data projects throughout the company yet. Consequently, there are either no dedicated business analysts, data engineers, or similar employees who can help you out in various tasks necessary for executing data science projects, or they need constant support in order to strengthen their skills in respective area. As your first task, you should develop an artificial intelligence system, that automatically transfers incoming support request emails to the appropriate service employee, for example splitting between domain support and hosting support.
To get started, you need to better understand the business around this task. How is the process of handling support requests currently working? Is someone manually assigning support emails to responsible experts or are there other ways of enforcing this categorization? Among many other questions, you also should need to already figure out at the start of the project, how a developed system should then be incorporated into existing IT systems and workflows. Similarly, it is necessary for you to understand the underlying data and get a feeling of available historical data needed for training and model building. After you got your hands on some data, you will proceed with data preparation, pre-processing, modeling and evaluation. Most likely, you will encounter irregularities and further questions you need to consult with responsible business and stakeholders. After you have successfully modeled the data and would be ready to put your model into production, you encounter the issue that the company has no experience with deploying machine learning models. As a result, you develop a simple API routine around your model, that other IT systems can connect to. You also should think about continuous re-training, re-deployment and other DevOps topics.
This example should just visualize the long process of going from project demand to a final system in production. There are many more sub-tasks, questions, and steps necessary in a real project. Sometime you might also not need to be responsible for all the steps and might need to collaborate with other employees being more specialized in responsible areas. Nonetheless, a data science project should be both integral to the businesses and IT systems involved. Consequently, having knowledge and skills in elaborated and other areas is heavily beneficial for successfully executing such a data science project, even if you might not need to do everything yourself in the end.
The term “full stack data scientist”
An example of previous content that has used the term is a workshop by Anand Chitipothu who describes that a full stack data scientist is responsible to coding, testing, shipping, and maintaining. The workshop contains several important aspects and facets of the data science process outlined on this website. The question about what a full stack data scientist is also came up on Quora, where the top rated response postulates that a full stack data scientist is one who is responsible for a diverse set of tasks in the whole data science process. As a final example, I want to mention an excellent blog post that turns the description around and tries to elaborate on how companies can become full stack data science companies.
By and large, we have seen that while the term has been used in the past in the data science community, it is by far not as popular as in other developing areas such as web development. However, I strongly believe that a modern data scientist needs to even further strengthen his or her skills in areas beyond the traditional statistical modeling and ML areas in order to be fully functional in companies and other areas. I hope that with this website, I can help striving data scientists to widen their skills and build a community where full stack data scientists can connect.
References and readings
Also published on Medium.