Back in February, I wrote a blog touting the portability benefits of containerization and Google Anthos in an increasingly multi-cloud world and making predictions about a future where true workload portability is the norm, not the exception. Little did I know that one of the hypotheticals I posited was in the works even as I typed:
"When I consider that Google leverages containers and container orchestration to deliver applications like Gmail, YouTube, BigQuery, and Compute Engine, I start to dream about what the future could hold. Can we one day deploy BigQuery on Anthos on AWS so that it can query on the data stored in Redshift with no egress cost and no latency hit?"
In July, that future arrived in the form of BigQuery Omni, now available in private alpha. BigQuery Omni is Google Cloud’s multi-cloud analytics solution that enables BigQuery users to access and analyze data on AWS, with Azure coming soon, without having to move or copy datasets.
How BigQuery Omni works
Up until now, BigQuery could analyze data residing in any cloud a user wanted, so long as that cloud was Google Cloud Platform (GCP). If a user wanted to run a query on data stored in another cloud, they’d first have to move or copy it to GCP. With the overwhelming majority of enterprises running multi-cloud environments, and with egress fees eating up one of the biggest chunks of organizational cloud budgets, this hindered organizations’ ability to use BigQuery despite its many advantages.
Unlike some other data warehouses, including AWS Redshift, BigQuery decouples storage and compute. The idea of separating storage from processing was a novel concept 10 years ago when Google Cloud first introduced BigQuery. This design decision allowed users to take advantage of inexpensive storage costs, and it also unknowingly set the stage for BigQuery Omni.
By running on Anthos clusters inside AWS, BigQuery Omni can directly and securely access data in AWS databases, allowing BigQuery to treat this data as if it were stored on GCP, negating the need to move or copy data. BigQuery Omni’s query engine runs compute functions on clusters in the same region where the data resides, and users can also choose to store the results there.
Because it enables BigQuery to analyze data in situ, BigQuery Omni eliminates egress fees, reduces latency, and ensures compliance with data locality and data sovereignty mandates, enabling organizations to use BigQuery as a cross-platform data analysis tool and enjoy a number of advantages.
Increased Flexibility Without Increased Overhead
Like BigQuery, BigQuery Omni is a serverless solution with a fully managed infrastructure, powered by Anthos. Compute resources run in the same cloud region where data is stored, and users don’t have to concern themselves with underlying infrastructure.
Faster Time to Insights
BigQuery automatically provisions resources on a multi-tenant distributed architecture, enabling analysts to execute even very large and very complex queries quickly. It doesn’t matter whether your datasets can be measured in gigabytes or petabytes; BigQuery scales laterally to meet any size workload, without users having to do a thing.
Reduced Administrative Costs
BigQuery’s fully managed serverless architecture means out-of-the-box simplicity, with no nodes to plan, configure, or scale, and no infrastructure to maintain. The complexities of cluster management and database configuration are abstracted away. For these reasons, Enterprise Strategy Group estimates BigQuery’s three-year TCO to be 26% to 34% lower than cloud data warehouse alternatives.
Reduced Complexity & Consistent User Experience
Different clouds have different analysis tools. BigQuery Omni enables organizations to standardize their data analysis on BigQuery, reducing complexity and providing users with a consistent experience across cloud platforms. Users can write queries with standard SQL and the BigQuery UI, democratizing data analysis and opening it up to users who know the data but don’t know how to code.
Unlock Insights Hidden in Data Silos
Freed from high egress fees and latency issues, organizations can combine datasets stored in different public clouds and uncover hidden insights. For example, you can combine Google Analytics data stored in GCP with first-party market research data stored in AWS, then use Looker to build a dashboard to visualize the results and share them with the rest of your team.
Integration With GCP’s Smart Analytics Tools
Standardizing data analysis on BigQuery allows analysts to make use of GCP’s many smart analytics tools, including:
- BigQuery ML, which abstracts away the complexity of traditional machine learning solutions, enabling users to build and deploy ML models using only basic SQL.
- Looker, which democratizes data analytics through an intuitive, self-service platform that enables anyone in the organization to analyze, explore, create, and share visualizations.
- Analytics 360, which combines analytics and marketing insights to give marketers a deeper understanding of the customer journey.
In my previous blog, I predicted a seamless multi-cloud future where Anthos would act as an “operating system” on which services could be “installed,” allowing organizations to run the right workloads in the right clouds at the right time. BigQuery Omni is a big step towards that future.