The COVID-19 pandemic has underscored the impact of data on infectious disease research and public health emergency response. Today, warehousing raw data is a common practice; however, the real value lies in the ability to apply smart analytics and distill real-time, actionable intelligence from extremely large data sets. In addition to aiding researchers’ quests to develop treatments and vaccines, smart analytics empower public health officials and local governments to assess a population’s risk of contagion, track an outbreak’s spread, determine when, where, and how to allocate resources, and dispense timely and accurate information to concerned citizens.
Here are 6 ways that Google Cloud’s fully managed, serverless smart analytics solutions give government agencies and public-sector organizations the tools they need to make data-driven decisions during COVID-19 and other public health emergencies:
1. Eliminate Overhead & Run Analytics at Scale
During a public health emergency, time is of the essence, and agencies’ financial resources are stretched thin. Agencies that use traditional data warehouse solutions spend the overwhelming majority of their time (85%) on systems engineering work, leaving scant resources to devote to data analysis.
BigQuery, Google’s serverless warehousing solution, eliminates this massive operational overhead. Data analysts can concentrate on deriving critical insights without having to concern themselves with the underlying infrastructure or overcome scaling, performance, or cost constraints. Automatic resource provisioning on a multi-tenant distributed architecture enables analysts to execute even the largest, most complex queries quickly; even as datasets grow from gigabytes to petabytes, BigQuery continues to scale literally without any additional effort from the user. It just works.
In addition to saving time, BigQuery also saves cash-strapped government agencies money. No upfront commitment is required — pay for only what you use. And, according to Enterprise Strategy Group, BigQuery’s three-year TCO is 26% to 34% lower than cloud data warehouse alternatives.
2. Enable Everyone to Develop Machine Learning Models
The development of data-driven models is critical to combat the spread of infectious disease, improve care, and accelerate research efforts.
BigQuery ML abstracts away the complexity of traditional ML solutions and enables users to build and deploy ML models using only basic SQL. This allows users who understand the data but don’t have extensive knowledge of coding or advanced algorithms to automate common ML tasks and hyperparameter tuning. BigQuery ML operates inside BigQuery, working on the data right at the source.
Through September 15, 2020, organizations can use BigQuery ML to train advanced machine learning models using data from the COVID-19 Public Dataset Program right inside BigQuery at no additional cost.
3. Cut Down on Prep Work When Integrating Other Agencies’ Data
By integrating internal data sets with information provided by other government agencies and public health organizations, local public health and government agencies can identify infection, hospitalization, and mortality trends, which allows for educated resource planning. Unfortunately, before analysts can begin working with this data, it must be extracted, transformed, and loaded.
Cloud Data Fusion, Google’s fully managed, cloud-native data integration service, significantly simplifies and speeds up this process. Analysts can choose from over 150 preconfigured connectors and transformations that support a wide variety of data sources and formats, or they can create custom connections and transformations that can be validated, shared, and reused across teams. Cloud Data Fusion’s graphical, no-code interface enables analysts to deploy ETL/ELT data pipelines simply by pointing and clicking, then manage and explore data pipelines and data sets within a central control center.
4. Query Data Stored in Other Clouds Without Paying Egress Fees
Since about 93% of cloud-powered organizations use more than one public cloud, the data that analysts need to make decisions is scattered across multiple clouds. Historically, working with data across clouds has been extremely difficult, yet exporting data incurs data transfer fees, also known as egress fees. These fees are quite high, typically ranging from USD $0.05 to $0.20 per GB. Additionally, analysts must wait for the data export to complete before the information can be loaded and analyzed.
BigQuery Omni, currently available in private alpha, solves this problem by allowing GCP customers to query data stored in GCP, AWS, and (coming soon) Azure without having to move or copy data and pay egress fees. By decoupling compute and storage functions, BigQuery Omni’s query engine runs the necessary compute on clusters in the same region where the data resides. There is no need to move or copy raw data out of the other public cloud, perform data prep, manage clusters, or provision resources. All computation occurs within BigQuery’s multi-tenant service, within the region where the data is located.
5. Make Insights Easily Accessible to all Decision Makers
Most of the decision makers at government agencies and public health organizations are not data scientists, yet they are tasked with making high-stakes, data-driven decisions. Looker democratizes data analytics by bringing the power of BigQuery to the entire organization using an intuitive self-service analytics platform that makes it simple to describe data and define analytics, giving everyone in the organization a consistent and reliable picture of conditions on the ground. Using Looker, anyone in the organization can analyze, explore, and create visualizations, then share them with a simple link.
By combining BigQuery and Looker, Commonwealth Care Alliance (CCA) built an analytics data architecture to deliver valuable information and predictive insights to its clinicians. When the COVID-19 outbreak began, the team was able to build COVID-19 monitoring dashboards within a day, then integrate COVID-19 information into existing clinical dashboards. As the pandemic evolves, CCA rapidly iterates its dashboards so that clinicians can utilize the most current information to update and guide data and care strategies.
HCA Healthcare and SADA combined BigQuery with Looker, in addition to a variety of other GCP tools, to build the National Response Portal (NRP), which collects COVID-19 data and makes it easily accessible to healthcare providers and policymakers. Through this centralized platform, decision-makers can view data on critical metrics such as case and death counts, as well as advanced, ML-based predictions for cases, deaths, and hospital admissions. Data is collected directly from hospitals as well as public data sources, with a special emphasis on the impact of public policy and behavior on the spread of the disease.
6. Use Streaming Analytics to Respond in Real Time
As COVID-19 has proven, public health emergencies move fast, and organizations may have real-time data streaming in from multiple sources, including social media activity, websites, and mobile apps. Analyzing this data in real time can provide agencies with valuable insights into metrics such as hospital capacity, emergency call center volume, and compliance with social distancing and other health mandates. Agencies can also gauge public sentiment and design outreach programs.
GCP’s streaming analytics tools enable organizations to capture immediate insights from large-scale real-time data streams. Resource provisioning is automated and abstracted, so analytics are accessible to data analysts as well as data scientists.
Organizations can use Pub/Sub to ingest and analyze hundreds of millions of events per second, from applications or devices located virtually anywhere; leverage BigQuery’s streaming API to directly stream millions of events per second into their data warehouse for lightning-fast, SQL-based analysis; or deploy more responsive, efficient, and supportable streaming pipelines with Dataflow. Dataflow, based on the open source Apache Beam project, is especially powerful because it provides a harmonized interface for working with both batch and streaming data.
During an infectious disease outbreak, combining historical data with real-time and forecast reporting helps public health and government authorities to allocate resources and save lives. GCP’s smart analytics tools give authorities easy access to the most current and accurate information, enabling them to make critical data-driven decisions.
To learn more about generating instant insights from data at any scale with a serverless, fully managed analytics platform, be sure to check out SADA’s Next OnAir post-session recaps on Thursdays. Here’s what’s coming up on August 13th:
Join SADA’s Chris Lehman, Head of Engineering, and Gautam Pandya, Senior Data Engineer, as they discuss the Week 5 Next OnAir Keynote: What’s New and What’s Next in Smart Analytics.
Listen in as SADA’s Gautam Pandya, Senior Data Engineer, and Ankit Mukhija, Data Engineer, discuss Building a Petabyte Scale Reporting Pipeline on GCP from Week 5 of Next OnAir.
Listen in as Ankit Mukhija and Prakash Gunasekaran, Data Engineers at SADA, discuss What’s New in BigQuery, Google Cloud’s Modern Data Warehouse from Week 5 of Next OnAir.