When it comes to preparing your healthcare organization for ML workflows in Google Cloud, there are a few hurdles to overcome, namely: Do you have enough data available? Is it formatted correctly in a way you can use it? And, how are you allowed to access it? While these first steps may require data aggregation, parsing and anonymization, this article will assume you’ve ported your data to the cloud – maybe using Google Cloud Healthcare’s FHIR, HL7v2, or DICOM integrations – and are ready to glean some valuable information from it.
Where do we go from here?
There is no need to start from scratch. Your healthcare organization’s unique goals and existing knowledge of business problems will guide you towards a type of machine learning model to implement first. Your language of choice will likely be Python, and Google Cloud’s batch and online ML engine will allow you to snap in pre-built libraries so that the logistics of getting started is a breeze (again, no need to re-invent machine learning algorithms on your own). Your method of choice will boil down to choosing between starting with supervised learning model or an unsupervised learning model (or a combination of the two), and this decision hinges on your input data and business goal. Is your raw training data packaged with pre-labeled output?
Use Supervised Learning Models to Classify and Predict Outcomes
While classification models are becoming prevalent in our everyday consumer technology such as face recognition and predictive shopping carts, healthcare organizations are just starting to see where these models fit into clinical, administrative and financial workflows.
Supervised learning models rely on input-output pairs to generate algorithms that explain a function between the two. As a healthcare example, the inputs to a classification model might be a dataset containing patients’ diagnosis codes for a specific type of surgical encounter, and the output might classify whether or not a patient would experience a complication during surgery. If there is enough input and output data for a model to train on (usually thousands), a classification model could go as far as predicting which specific complication a patient might experience.
The beauty of machine learning is that classifications algorithms are easier now to use than ever before, packaged in popular open-source libraries and compatible with Google Cloud ML Engine. Like many great ML resources, such as Tensorflow started by Google, some of the most popular ML libraries began as private initiatives that were released open source.
Take the Facebook-originated FastText library, for example, which allows developers to build custom classifications and predictions based off of word groupings. In the previous example of predicting surgical outcomes, you can imagine how this text-based model model could scan through patients’ histories and physicals, or even a operative report and make a similar prediction regarding complications from surgery. The purpose of this library is to allow developers to build an enterprise-scale text model in just a few lines of code.
Every library, however, has its pros and cons. For datasets that contain numeric data (think charges, white blood cell counts, or body weight), you might pick a different library for regression so that the prediction can be a number instead of a category. Regression models are very common, and like FastText, there are several pre-built libraries compatible with Google Cloud to use. A regression model might provide predictions such as expected recovery room time needed after surgery, and it can always work in parallel with other types of models to provide more insight.
While classification and regression ML models have surged in popularity for clinical areas such as radiological image recognition and precision medicine, they are still starting to permeate the rest of the healthcare ecosystem. In both examples discussed above, the cost barrier is relatively low: machine learning model code can be relatively lightweight and told to run in batch, allowing your usage and costs to scale up and down with needs. The biggest investment is employee time.
Unsupervised Learning Models Generate Patterns from Messy Data
For many problems in the healthcare space, we now have mountains of collected data to use as input and no correlated output to attach to it. In situations such as this, we can look to unsupervised learning models (models that infer relationships from raw, unclassified input data) to draw out patterns for human use. One such common unsupervised learning approach is called clustering, and many clustering algorithms are built into many popular open source libraries that are compatible with Google Cloud ML Engine.
A clustering algorithm such might help provide insights to deal with understanding patient populations or case complexity. Perhaps we have data on all the surgeries performed at a hospital system for a given year. Clustering output would lead an administrator understand the comparative resource use of surgeries across the system, commonalities between patient demographics across surgical visits, or actions that can be taken to improve patient outcomes across seemingly different clinical scenarios. Clustering often serves up patterns that are not intuitive or discoverable by the human eye: imagine an unbiased, eager, superintelligent whiz employee that brings an outside perspective to vague, muddled issue.
Clustering results can be visualized and packaged up into easy to understand reports using Google Data Studio, and the link between ML models and actionable insight is as simple as refreshing a dashboard.
Your Organization May Be Ready for an Experiment with ML in Google Cloud (Even If You Didn’t Know It)
By understanding a healthcare organization’s various business cases for ML, and recognizing that pre-established, Google Cloud-compatible libraries exist for you to hit the ground running, you might find the time is already ripe to implement an ML workflow.