Google Cloud partners with SADA to build COVID-19 public dataset pipeline

/ AT A GLANCE

SADA and Google Cloud created a series of public domain datasets to aid researchers, data scientists, and analysts to better understand the spread of COVID-19.

INDUSTRY

Software & Technology

DEVELOPED

New public datasets and modern pipeline architecture

ENABLED

Access to reusable code for legacy data sources

Home / Insights / Customer Stories

Data always plays a critical role in the ability to research, study, and combat public health emergencies, and nowhere is this more true than in the case of a global pandemic. Access to data sets–and tools that can analyze that data at cloud scale–are increasingly essential to the research process and have been particularly useful in the global response to the novel coronavirus.

The Google Cloud Public Datasets Program (PDP) facilitates access to nearly 150 high-demand datasets from different industry verticals, which are constantly being added to. These datasets are onboarded and maintained by Google Cloud, with input and guidance from a variety of data providers, such as the Census Bureau, the National Weather Service, and the U.S. Geological Survey.

Additional examples include the National Water Model, detailing information about flooding and water movement across the continental United States; Broad References, containing human genomics reference files used for sequencing analytics; and the Global Surface Summary of the Day, which provides meteorological observations of weather stations around the world every day going back over 100 years.

This powerful resource is a playground for analysts and data scientists to unlock new insights from their own data by contextualizing it with data provided from the PDP. It helps create a more complete picture of customers, patients, or products by linking data to vast public datasets with a single line of code–that’s the power of PDP.

“We pull data from a lot of public sources that you normally would have to research, source, clean, prep, correctly format, and download,” says Michael Hamamoto Tribble, Head of Google Cloud Datasets. “Traditionally, that data would need to be sorted and uploaded to a database before end-users could work with it.”

“Our Public Datasets Program takes care of all the prep work, wrangling, cleaning, and aligning the data for easy access in BigQuery tables. Also, it can easily connect different databases together. It’s remarkable.”
Michael Hamamoto Tribble | Head of Google Cloud Datasets

Business challenge

To help organizations adapt and meet their customers’ changing needs during the COVID-19 pandemic, SADA and Google Cloud set out to create a series of public domain COVID-19 datasets to aid researchers, data scientists, and analysts in developing data-driven models to better understand the spread of COVID-19.

Knowing that COVID-19 datasets would rapidly evolve at high volumes, SADA and Google Cloud wanted to update the PDP infrastructure to address the unique challenges of the COVID-19 datasets, such as increasing capabilities for data validation and alerts and implementing quality controls to ensure datasets remain up-to-date.

Solution

SADA, a Google Cloud Premier Partner and three-time Google Cloud Reseller Partner of the Year, and Google Cloud partnered to develop COVID-19 dataset pipelines for ten states, including data from a number of key states, and leveraging information from healthcare organizations such as the American Hospital Association and The COVID Tracking Project.

SADA developed a framework to obtain the required information and to autogenerate specific schemas and tables that could be easily applied to the remaining forty state government health-related websites.

Through this partnership, SADA’s team of technical and professional services experts worked with Google Cloud to design and implement new, backend data pipelines for the PDP to capture data from public data sources. SADA developed a reference implementation for the COVID-19 dataset with reusable code to refactor the existing PDP pipeline for other datasets in a fully automated way, requiring minimal configuration.

Results

This project required a quick turnaround due to the emerging nature of the pandemic, with customers’ needs changing virtually overnight. Meeting three times a week, SADA and Google Cloud technical teams worked closely together to deliver the COVID-19 comprehensive dataset in only 90 days.

SADA also delivered the new standardized pipeline framework as a reference implementation that would enable customers to publish work on a small virtual machine running Python code, DataFlow, or Kubernetes, depending on the size of the dataset to be loaded into BigQuery.

“Now anyone can refer to the ingestion framework and build pipelines for collecting data from disparate sources and push data into BigQuery by providing endpoints and a few configuration details, requiring only minimal coding efforts,”
Michael Hamamoto Tribble | Head of Google Cloud Datasets

SADA also achieved advanced automation by implementing the underlying infrastructure as code, in addition to the data pipeline, making it simple to replicate the complete environment in support of customers’ evolving needs.

Overall, SADA’s partnership with Google Cloud helped customers by:

Delivering the public domain COVID-19 Open Data datasets in only 90 days
Accelerating a shared understanding of how coronavirus spreads
Developing reusable code to easily support legacy data sources
Replacing the existing PDP pipeline with a modern architecture to support the unique challenges of COVID-19 datasets and beyond

By making COVID-19 data open and available in BigQuery, researchers and public health officials have been able to better understand, study, and analyze the impact of this disease.

The team at SADA played an invaluable role developing data pipelines in support of customers. Additionally, our work together serves as a framework that can facilitate processing of other datasets that we want to onboard for this project–as well as some potential legacy datasets.”

— Michael Hamamoto Tribble | Head of Google Cloud Datasets

What we're up to

customer storyKravet LLC gets more contextually relevant search results with Gemini Enterprise

Through Insight’s clear project plan and quick rollout, Kravet found Gemini Enterprise is the right foundation for enterprise AI agents, saving hours on search and producing more relevant results.

/ Learn more

customer storySAI360 Boosts Security, Efficiency, and Morale While Cutting Cloud Costs 35%

A governance, risk, and compliance platform’s migration to Google Cloud increased data security and simplified network management.

/ Learn more

customer storyFreestar Moves Faster and Innovates With an Insight Security Roadmap

Freestar improved its security posture and accelerated innovation with a comprehensive security assessment and strategic security roadmap from Insight.

/ Learn more

blogStop being the human API: A three-stage productivity flywheel

Stop letting context-switching consume your workday. Learn how a Command Line Interface (CLI) AI assistant orchestrates your terminal, Git, cloud environments, and personal knowledge base, transforming hours of manual work into minutes of focused execution. Discover the three-stage productivity flywheel that eliminates fragmentation and makes your expertise compound across projects.

/ Learn more

customer storyAlltold Scales Inclusive Media Measurement Model Months Ahead of Schedule With Vertex AI

Alltold worked with SADA to build a foundational, automated AI pipeline using Google Cloud’s Vertex AI platform, unlocking the company’s expansion into film, TV, and generative AI markets.

/ Learn more

cloud and clearThe doable future: making AI’s biggest promises a reality in 2026

What does the AI-driven business landscape look like in 2026? SADA's CTO Miles Ward and Associate CTO of AI/ML Simon Margolis share big predictions on the future of work and transformation. Discover why boards will demand quarterly ROI from AI investment, how the shift to "service as software" will drive new value, and how the arrow of innovation makes friction reduction required, not optional.

/ Learn more

See all of SADA's insights

Solve not just for today but for what's next.

We'll help you harness the immense power of Google Cloud to solve your business challenge and transform the way you work.

Let's get started

Google Cloud partners with SADA to build COVID-19 public dataset pipeline

/ AT A GLANCE

INDUSTRY

DEVELOPED

ENABLED

Business challenge

Solution

Results

More customer stories

What we're up to

Solve not just for today but for what's next.