Nobody knows artificial intelligence, machine learning, and predictive analytics as well as Google, and businesses have flocked to Google Cloud Platform to take advantage of cutting-edge yet easy-to-use solutions that allow them to distill actionable insights from massive amounts of data. Google reports that the volume of data businesses have analyzed using BigQuery, its serverless data warehouse solution, increased by over 300% in the last year alone.
Google recently unveiled a wide variety of updates and new tools for its data analytics solutions, including significant upgrades to BigQuery ML, an extension for BigQuery that enables data analysts who are proficient with Structured Query Language (SQL) to build and deploy machine learning models without requiring expertise in data science or knowledge of programming languages such as R or Python.
What is BigQuery ML?
Released in beta during the summer of 2018, BigQuery ML seeks to make it easier and less expensive for enterprises to take advantage of machine learning by bridging the gap between data analysts and data scientists and eliminating the need to export data from a data warehouse.
BigQuery ML empowers users to build and deploy ML models using only basic SQL statements, allowing them to automate common ML tasks and hyperparameter tuning. Because BigQuery ML operates inside BigQuery, it works on the data right at the source, decreasing complexity and allowing it to perform predictive analytics in a fraction of the time compared with traditional ML systems. It also allows companies to work on data that they are legally prohibited from exporting and reformatting, such as healthcare data covered by HIPAA.
Google adds support for new, non-linear machine learning models
For the first few months of its beta release, BigQuery ML supported only linear and logistic regression models, which limited the tool’s potential business uses. To better serve the needs of its customers, Google announced support for additional models, including:
- K-means clustering (beta), which can be used for customer segmentation and ensuring data quality. The model works on a mix of numerical and categorical features and supports all major SQL data types, including GIS. During Google Cloud’s Next ‘19 Conference in April, travel reservations site Booking.com demonstrated how they had used k-means clustering and BigQuery ML to ensure that their website returned accurate results to customers searching 176 offered dimensions for specific hotel amenities, such as an in-room microwave or free toiletries.
- Matrix factorization (private alpha), which businesses can use for product recommendations, offer matching, and group recommendations, such as the Netflix movie recommendation challenge.
- The ability to build and directly import deep neural networks using TensorFlow (private alpha), which Google recently demonstrated by building a tool for the NCAA March Madness tournament that predicted how many three-point shot attempts each team would make. Google discovered that a non-linear ML model was much more accurate in predicting three-point attempts by top teams.
Additionally, Google announced new model evaluation charts on the BigQuery UI, as well as new pre-processing functions. The latter feature, which is available in private alpha, allows users to define feature transformations during model creation and use SQL functions for common ML-related preprocessing.
BigQuery ML democratizes machine learning
Traditionally, using ML to analyze extremely large data sets required users with expertise in ML frameworks and data science programming languages such as R and Python. In other words, enterprises needed to have data scientists on staff, an insurmountable obstacle for many organizations. This also inhibited innovation in companies that employed data scientists because of silos between the data scientists and the users who were closest to the data and who truly understood what solutions their enterprises needed, the data analysts.
By putting ML tools in the hands of data analysts, BigQuery ML bridges this knowledge gap, democratizes machine learning, and empowers innovation. Data analysts can build models to further organizational objectives and solve the specific business problems they know are getting in the way.
BigQuery ML is currently still in beta, but Google noted that general availability is coming soon, undoubtedly with additional enhancements and features as beta users provide feedback.