The Flavors of Data Science and Engineering

Data Science means something different to everyone and is more of a marketing terms than a job description nowadays. That said, certain definition for it and the related disciplines are starting to emerge from what I've seen so I wanted to write down my perceptions.

  • Data Analytics: The practice of deriving insights, trends and patterns from data without advanced machine learning or statistical techniques. You need to have good domain expertise for the business in question and the ability to present derived insights in a way that non-technical consumers can understand. They may apply some pre-canned machine learning and statistical tools however they are not expected to understand the technical aspects of them. Technical skills include SQL, Excel and BI tools although knowing a scripting language (Python, R, etc.) helps.
  • Data Science: The practice of deriving insights, trends and patterns from data with advanced machine learning or statistical techniques. You still need to have good domain expertise and the ability to present insights in a consumable way. In some cases you’ll be building the non-production machine learning models that will eventually get deployed by someone else. In others you will just be building models to analyze the past without the end goal of them ever becoming production components. Mathematical knowledge of the models in question may or may not be required depending on the exact role and company. As a rule of thumb, the bigger the Data Science function of a company then the more mathematics matters as the low hanging fruit that don't require it have already been picked. Technical skills include SQL, Python/R, and various ML Libraries.
  • Machine Learning Engineering: These are the people building machine learning systems that run in production, improve over time, and users interact with. Intimidate knowledge of the mathematics behind the models is not generally required, however at larger scales enough understanding the numerical optimization and numerical computation issues is a must. Machine Learning Engineers may be doing the end-to-end work of model building or simply implementing a model designed by a Data Scientist. Technical skills include system and software engineering with a focus on data access and, depending on the company, numerical computation optimization. That means knowledge of SQL, NoSQL, various ML libraries, and non-scripting programming languages (Python, Scala, Java, etc.).
  • Data Engineering: The data must flow somehow and Data Engineers are the ones managing the process and building the infrastructure that everyone else uses. They don’t need to know machine learning beyond the requirements it puts on the data infrastructure but do need to understand the intricacies of data flows. In some cases they may work almost strictly within SQL writing batch ETL logic while in other cases they may be writing complex streaming logic in a language like Scala. Technical skills as a result can vary across SQL, NoSQL, Big Data technologies and ETL knowledge along with software engineering. On the Big Data side tools could include Hadoop, Spark, Kafka, RabbitMQ and their proprietary brethren.
  • Backend Engineering: Backend Engineering covers the systems responding to user requests including writing the API layer, authentication, micro-services, and so on. In a certain sense they too manage the flow of data, especially in a micro-service architecture, so the line between them and Data Engineers can be blurry at times. The Machine Learning Engineers delve into this area in the sense that they need to expose the models they deploy to other services however they are not masters of it. Technical skills include SQL, NoSQL, system engineering and software engineering.
  • DevOps: In charge of the base infrastructure such as servers, databases, load balancing, and CI/CD. In some companies these are more engineering focused while in others they are more sys admin focused. If more engineering focused they may be called Site Reliability Engineers instead and would spend more of their time implementing software solutions versus operational tasks. In smaller companies this role may fall onto the rest of the engineering team. Technical skills include system engineering and scripting (Python, Bash, Terraform, etc.). 
Blank Diagram - Page 1.png

Let us know what you think of these descriptions.

And if you liked this post be sure to follow usreach out on Twitter, or comment.