6 Phases of Data Analytics Projects: Google’s Methodology

Vinícius Ramos
9 min readMay 28, 2023

--

Created by me in Stablecog

When I took the courses for the Professional Data Analyst Certification, issued by Google, back in the day, what I feel had the biggest effect in my day-to-day job was learning about the 6 phases of a Data Analytics project.

This class was presented by Cassie Korzyrkov, and if you follow me for a while, you know I quote her a lot here, and it consists basically in dividing the stages of the project in specific analytical problems. I have to admit that I’m writing this article based on notes I wrote down almost 2 years ago, so some concepts are mixed with my own experience of using this methodology ever since. To go straight to the point, the stages are the following:

Ask — Problem Discovery

The first thing you should think about is which is the problem you are trying to solve with analytics. That means, asking the right questions to stakeholders, asking yourself if data is worth it in this case, and understanding which problem type you are dealing with.

What do you need to accomplish at this stage?

  1. Curiosity: which is the problem? Has it been addressed before? Which are the hypothesis and key factors to be addressed?
  2. Understanding the context: Which is the opinion that the current stakeholders have? Was this problem caused by an internal or external variable? What other processes or problems can be influenced or affected by this?
  3. Having a technical mindset: Use SMART questions (I’ll explain this below) to understand what you should ask. Is this problem solved with analytics? Is there need for an extra data pipeline? Is there need for extra hands in the project? Which teams need to be consulted, and which is their current bandwidth?
  4. Data design: Is the data needed internal or external? Do I need extra tools or resources for this? Which is the database/datasource that I need? If there’s need for an extra pipeline, how much time will it take to build it and which are the costs? Does this need to be scalable?
  5. Data strategy: Is the analysis going to assist on a decision, create an argument for a decision that is already in progress, or does it need to suggest something new? Which is the stakeholder for this strategy, and which is the best way of delivering the analysis for this specific person?

After you accomplished and documented all the answers to these questions, you can assume you finished the Ask stage. And as I mentioned, it is all about SMART Questions! The SMART methodology is used for building goals for strategic planning, and it stands for Specific, Measurable, Achievable, Relevant and Time-Bounded goals. However, in this case it has a little twist: we are building SMART questions and not goals.

Image from Google’s course that I had in my notes

I’m sure you all probably heard, or said, things such as: “My stakeholder wasn’t clear”, “My stakeholder didn’t give me a proper briefing”, “My stakeholder does not know what she/he wants”… The sad truth to this, in my opinion, is that this is 100% your fault for not knowing how to extract the information.

With SMART Questions, we want to ask something specific, easily measurable, action-oriented, relevant to the problem faced and with a specific time period in mind, and the idea is that this will make your stakeholders give you better information. Let’s pretend we have a problem of obesity in a given population, and we want to work on a hypothesis that says it's because they are not exercising. The common question would be:

Are kids getting enough exercise these days?

The answer to this can go multiple ways and only a few of them will help you out, and in the end you blame the stakeholder for not saying something helpful… But what if you ask it in this way?

What percentage of kids achieve the recommended 60 minutes of physical activity at least five days a week?

That way, you accomplish all the requirements for SMART Questions and you have meaningful information for the problem you want to solve, just by changing how you ask questions.

Prepare — Setting the grounds

The main thing you will be working on at this stage is replying to this question: “What do I need to do this?”. It is related to the third part of the Ask phase we mentioned before, so it’s all about understanding which is the technical tools and information you are going to need for the project and mapping it out.

  1. Which is the type of data needed? Qualitative or Quantitative? Is the data structured or unstructured? Internal or External?
  2. How is the data collected? Interviews, observations, forms, questionnaires, surveys, CRM systems, internal databases?
  3. What is the data quality, and which are the sources that are meaningful?
Image from Google’s course that I had in my notes

After you map all the sources and the types of data you will be using, you’ll need to focus on data modelling, meaning checking if the data is clean, if it needs extra transformations, if you need to connect new data sources, decide on which is the schema you’ll use for the tables, selecting which features are relevant and which are not and so on!

But apart from all the standard data engineering practices, as an Analyst, you need to look into Data Biases! A bias, in this case, refers to a type of error that systematically skews results in a certain direction, so you need to check that the data you have for a given population is not biased in a way that will make the results reaffirm a stated result just because the collection and treatment were biased. This is a complex topic we can talk about in another article, but in a really simple way of explaining it, you have to make sure your data sources are Reliable, Original, Comprehensive, Current and Cited by other sources.

Process — Creating the workspace

This phase is what Data Analysts are already used to do, or at least they should be, which is caring about the data processing, mining and transformations! In summary, everything you identified during preparation needs to be done here, such as:

  1. Connecting Data Sources: make sure all the needed sources are connected to the environment that the analysis will take place, being it a database, a local drive, cloud drives or whatever.
  2. Cleaning the Data: look into human error, misleading data, bad quality categories, bad formatted measures and related.
  3. Feature Selection: select only relevant attributes and measures, filtering the tables and sources to be ready for analytics.
  4. Data modelling: design the schemas and the connection between tables, making it smooth for analysis later on.
  5. Data Testing: check sample sizes according to population, calculate confidence level for basic statistical assumptions, calculate the margin of error and document everything for the analysis to be better.

Important note for this stage, make sure to document all the cleaning process so that in a future analysis with the same data sources you will not suffer with discovering them all again.

Analyse — Writing a story with insights

In this phase, the work is focused on performing an EDA (Exploratory Data Analysis) at first, and then progressing to checking a hypothesis. In summary, the first steps should be:

  1. Data Organization: Does the data have outliers? Does it need to be filtered before analysed? Does it need to be transformed for better category grouping?
  2. Data Aggregation: What does the data tell about itself when you combine given measures with given categories? (Example: Which product has the best margin for a given dataset?)
  3. Correlation: How are given measures correlated to each other? Do they have a causal relationship? Which categories are correlated to each other? Which is the trend for given combination?

After this initial stage of EDA, the idea is that the analysis can start, and what I mean with that is that you need to start answering questions. Most Analysts get lost in this stage because they don’t know which questions to answer, but if you followed the process here, you have them mapped out on the Ask stage.

When you have all the hypothesis answered with data, the next step is to understand and organize them in priorities to search for external feedback. In summary, you need to see which hypothesis had better insights through data analysis and why, and the next step will be checking with stakeholders if the insights indeed make sense.

Share — Debating and getting different views

When the analysis is ready, you need to deliver it to stakeholders and get feedback. The most important aspect is deciding how to present it in a way that is comprehensive and effective for the stakeholders being managed in such project. And here is where you will use Decision Intelligence the most, some questions that can guide you are:

  1. How can the analysis be presented? Does this stakeholder prefer reports, dashboards, documents, presentations or meetings?
  2. What is relevant for the stakeholders? Do they need to see what was done in Prepare and Process phases, or can it be just the analysis?
  3. Do they care about the relationships, or just about the forecast of the problem? Do they need to hear about the project, or just about the problem and the solutions discovered?

After presenting it in the best way, you need to collect feedback, so make sure to use the SMART Questions from the Ask phase to request feedback on what you could have done better, which trends the stakeholders identified that you didn’t, which context from the business you missed and ask for expectations for the next delivery. This will help you improve outcomes, improve usage of your analysis and make people create better relationships with you.

Act — It’s time to solve it!

This is the last stage of this methodology, and it consists in a concept I personally really like and identify with: the analyst needs to solve it, not just deliver the solution!

The idea is that here you can take the results from the analysis and the feedback given in the Share stage and build an action-plan for the solution to be implemented, making the actual decision along with the stakeholders. So here you need to clear out:

  1. Which is the hypothesis that was taken as a decision to solve this problem?
  2. Which teams need to be involved in the solution? Do the stakeholders need help on delivering the plan? Do you need to make an action-plan and manage the project?
  3. How can this problem be forecasted in the next time? What can you do from the analytics end to report the probability of a similar problem happening in the future?

Here you need to shine on delivering data oriented decisions (not data-driven, here’s Why Decisions, not data, should drive analytics and businesses). Make sure to bring the A game in terms of building actionable recommendations and proving they are worth it using the analysis you managed!

Methodologies can help Analysts

This is just an introduction, I could make an entire article for each phase, but I wanted to share this brief version of Google’s methodology because it helped me a lot in my day-to-day work. I use a mixed methodology of this one with Double Diamonds and PDCA, depends on the project, but before leaving I want you to clearly understand the following:

You should not limit yourself to a project methodology. This is just a guidance on how to think about problem-solving and adapting it to the problem needs and to your personal preference!

Thanks for reading! Follow me to continue with the seriesVinícius A. R. Z.

I’m a Senior Data Analyst and when people ask me what do I work with, I always say “I work with decision intelligence” because I try not to limit myself to data! It’s like they say, you have to be smart as a fox… 🦊

--

--

Vinícius Ramos
Vinícius Ramos

Written by Vinícius Ramos

Data Scientist helping you with Decision Intelligence. 🦊 Decision Intelligence, Analytics, Statistics & Project Management www.varzdecisions.com