So you want to be a data scientist. Now what?

Jul 19, 2023

In 2012, Harvard Business Review published an article with the title, “Data Scientist: The Sexiest Job of the 21st Century,” which brought a surge of excitement to the field.1 More recently, with the advent of ChatGPT, generative artificial intelligence (GenAI) has become all the rage. For students interested in tech, the data science field can be particularly attractive, especially for those who enjoy using a blend of technical and soft skills.

At the same time, data science is a nascent industry compared to its older cousin, computer science. Colleges with degree programs in data science are extremely varied in terms of the skills taught. Similarly, there’s a lot of variability within the industry itself. It’s no surprise that early-career professionals in data science are confused about what skills they need to break into or grow within the industry. It is equally confusing to figure out how to apply those skills in the real-world when interviewing or working. Many companies have a different expectation for the same title.

In this article, I discuss:

How to reflect on your motivations, strengths, and interests when thinking about data roles in a rapidly changing industry
My take on the 8 key skills to learn for data roles
Suggested next steps on how to grow your skills and apply them to data roles

REFLECT - How to think about the data science field, now and in the future

Over the course of my career in data science, I have experienced a marked shift in the types of projects, expectations, and skills required of an average data scientist. It’s no surprise that it’s hard to keep up with the latest technique or method in the industry while meeting business demands and project goals.

I don’t expect this to change anytime soon. In fact, the evolution of the field allows data scientists to take on different challenges and grow. When architecting your career in the field, it’s important to consider what is happening now versus what is on the horizon.

Ask yourself these questions and reflect. What skills do you need to learn to have a solid foundation? How excited are you about the skills required for the particular position you’re interviewing for? What skills are durable to industry changes, e.g., the AI revolution or the continued progression of augmented and virtual reality? What are your unique strengths you can lean into?

LEARN - What skills to learn to break into data science

When I entered the field, I was fixated on securing a job at a globally-recognized company with a high-performance culture. Looking back, I should have approached things differently. Rather than obsessing about entering the field with a brand-name company on my resume or acquiring a fancy-sounding title, I should have focused my job search process on learning the fundamentals and honing key skills critical in the industry. From my trials, I’ve identified 8 key skills to explore when thinking about data jobs. Each of these topics are learnable with practice. Importantly, instead of just a theoretical understanding of each topic, experiential learning is key.

Skill 1. Product or business sense

In a nutshell: understanding the product or business

As a successful data professional, it is crucial to hone your ability to connect your work to the broader product or business direction. Domain-specific expertise, for example the advertising industry or the ride sharing economy, makes a data scientist even more successful. This is the number one priority topic to focus on. Without context, data science and analytics is not impactful to product teams or the business.

Skill 2. Data analysis

In a nutshell: data wrangling, cleaning, transforming, and pattern identification

Another core skill to focus on is analyzing data (on any topic). Practicing taking a raw set of data points, cleaning it, and making sense of it over and over hones your ability to identify patterns faster. Data analysis is also commonly referred to as exploratory data analysis (EDA) or data exploration at the earlier stages of a project.

Skill 3. Data visualization

In a nutshell: making sense of data visually

The ability to visualize data trends via charts, graphs, tables, and frameworks allows you to abstract away from the details of your analysis and focus in on the main takeaway. Importantly, it’s about conveying your message succinctly, especially helpful for visual thinkers and learners. At a larger scale, several data visualizations can be combined together into a dashboard that refreshes daily to allow technical and non-technical stakeholders alike to monitor trends.

Skill 4. Data engineering

In a nutshell: processing data from raw tables to production systematically

Data engineering comes into play when the product or business requires systematic creation and transformation of data via production-ready pipelines. Another name for this process is extract, transform, and load (ETL). At a high-level the process is as follows:

Process and clean raw data logs produced by products
Join and modify data tables together based on matching attributes
Feed resulting table into analyses, systems, and dashboards
Implement data quality checks to ensure a consistent, reliable stream of data (serving as on-call when the data breaks)

Skill 5. Causal inference

In a nutshell: experimentation, A/B testing, propensity matching, difference-in-difference

One of the core skills data professionals use in their toolkit is causal inference, or the ability to understand what is causing what and assessing if the company should do more or less of it. The gold standard causal inference technique is an A/B test (experiment) that uses the scientific method to understand the difference between a changed state vs. a business-as-usual state. When measuring the experiment, a data professional identifies the probability that the changed state is materially different from the current state.

A classic example is testing out whether a webpage button should be green or blue to drive more sales or to improve click-through rates. Measuring this experiment would include assessing if the difference in the volume of sales or click-through rates between the test group (green button) and control group (blue button) is statistically significant.

Skill 6. Machine learning and generative artificial intelligence

In a nutshell: building models to find patterns in current and similar data to predict future trends

Machine learning, under the artificial intelligence umbrella, is a popular, sought-after topic in the data field. It involves using modeling techniques to infer future trends from past trends or from similar comparable data. Some of it overlaps with causal inference (e.g., A/B testing or propensity matching applied to time series data). It can be applied to all types of data — numbers, text, and pictures.

Generative AI (GenAI) is another subset of artificial intelligence. It involves leveraging large language models to generate sentences, paragraphs, images, and videos based on the patterns identified from millions of inputted sources.2

Skill 7. Data storytelling

In a nutshell: crafting a narrative around the data

No matter what technique is being used to analyze the data (data analysis, basic statistics, causal inference, or machine learning), ultimately one needs to craft a story with the data to make it useful. Based on the data observed, a data professional determines what the data says about what happened in the past, what is happening in the present, or what should happen in the future.

Skill 8. Communicating to stakeholders

In a nutshell: sharing your work and conclusions to cross-functional partners focused on the product or business

Communication is king to be an effective data professional. This is partially why I believe studying fields that improve your writing and speaking skills are paramount to becoming an effective data professional.3

GROW - How to learn and apply data skills

I will now dive into what jobs require what skills, an example of how the skills flow together in a sample data project, and guidance on how to learn these skills.

What jobs require what skills

As previously mentioned, the data science industry can be confusing due to the different role titles in the field and the varied skills each role requires. Often, job descriptions are generic. To cut through the noise, observe the types of interviews in a company’s recruiting process, connecting each interview with the associated skills and assessing whether the skills emphasized in that role is interesting to you. Generally speaking, I’ve seen more specialized roles pop up in medium and large companies, while at startups it’s more common to flex skills in all areas as a full-stack data scientist (e.g., as the first data science hire of a company).

Below is a table of the various types of data roles (and synonyms) as well as the associated skills required. Green indicates a crucial skill or expertise, light green indicates a secondary skill or nice to have, and yellow indicates a potentially relevant skill depending on the company stage.

Example of how skills can flow through a project

To connect all of the skills, let’s look at how they come together through an example project:

Repeatable Framework

Start with the problem (product or business sense)
Explore the data (data analysis)
Leverage technical toolkit to understand what is happening, why it’s happening, and what to do about it (data analysis, data experimentation, data visualization, machine learning and generative AI)
Craft a story around it, often using powerful visuals or frameworks (data storytelling, data visualization, communicating to stakeholders)
Share your work via presentations or research notes (communicating to stakeholders)
Build repeatable ways to gather and understand data (data engineering, data visualization especially dashboards)

Illustrative example (made-up)

Problem: Users want to converse candidly with whoever they follow on Instagram, inclusive of friends, influencers, and businesses. Solution: Enter Threads, a new text-based application built on top of the Instagram social graph.
Identify the market need: There are several companies in this space demonstrating a real customer need (e.g., Mastadon, Bluesky).
Data infrastructure & engineering: Transform raw data logs from product to be used for analyses with a daily refresh rate and data quality checks.
A/B test: Build a small-scale experiment to compare assess whether a 50 character limit or a 150 character limit on each thread influences retention trends on the application. Example metric: average number of threads per user created in the last 7 days.
Conclusion and next steps: Through experimentation, we find the 150 characters works better in most countries, particularly in Australia. We suggest using 150 character limit around the world, with further testing and refinement in any countries experiencing a regression in retention.
Communication: Present work to stakeholders in meetings and a measurement note.

How to learn these skills

In the majority of cases, it’s important to have both a theoretical understanding of the skills related to the role(s) you’re targeting and experience applying them to real-world problems. This can be achieved in many ways:

Degree program or bootcamp focused on experiential learning (learn by doing)
Internships or entry-level jobs in related quantitative fields (e.g. consulting, finance, accounting)
Quantitative research on any topic
Data competitions (e.g., Kaggle)
Side projects
Self learning on YouTube, books, online tutorials, or articles
Data science or analytics coaching
Data science or analytics conferences

Happy learning!

—

To those trying to break in: Ask questions in the comments below.

To those already in the industry: I would love to hear from you if you’ve approached your data science career journey in the same or in a different way.

2012 HBR article: https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
2022 HBR article: https://hbr.org/2022/07/is-data-scientist-still-the-sexiest-job-of-the-21st-century

Learn more about the difference between artificial intelligence and machine learning here: https://cloud.google.com/learn/artificial-intelligence-vs-machine-learning

Why a liberal arts degree gives you an edge in data science

Nina Boyko

July 12, 2023

Why a liberal arts degree gives you an edge in data science

There’s a misconception that people aspiring to work in technical fields need to have a technical background. A data scientist should have a degree in analytics, statistics, or data science. An aspiring computer scientist should only focus on learning how to code.

Read full story

Vinayak Kudva

Jul 29, 2023

Great read. My current role involves data engineering, data viz, data cleaning, and essentially being the middle man between our stakeholders and the machine learning team.

I totally agree that looking at YouTube, doing kaggle challenges, learning through websites like leetcode is one of the best ways for self motivated people who want to hone their skills.

But in my experience , the best way I’ve learnt whatever skills I have is on the job. Sources like leetcode and kaggle don’t foresee different obstacles that might need a use case which is outside of what you might think the problem is. I feel that kind of exposure you can get only while you’re on the job. You become more flexible in understanding what to expect. Also the business sense that is the first skill is also something that you get only while working in a corporate setting.

Expand full comment

1 reply by Nina Boyko

1 more comment...

Nuggets by Nina

Why a liberal arts degree gives you an edge in data science

Discussion about this post