How a Digital Workforce Will Save Healthcare

How a Digital Workforce Will Save Healthcare

Enterprises have been digitizing data and processes furiously over the past few decades, and these efforts have unlocked a pantheon of capabilities to offer new products and better experiences, and healthcare is no exception. Partially driven by government mandates and subsidies, healthcare systematically bought large electronic medical record systems (EMRs and EHRs) and other systems to bring them into the digital era. Unfortunately, this tidal wave of adoption, although extraordinarily valuable, had negative side effects as well: 


The digitization of healthcare created silos. Database fortresses were built at every organization. They weren’t built to share. They weren’t built to interoperate – not between software systems and certainly not between organizations. No connection to insurers. No connection to other providers and until recently, almost no connection to patients. 

Instead, healthcare employees have taken on the job of the data router, shifting hours spent from being in front of patients, to being in front of computers, shepherding patient data into the right fields. This administrative burden is driving skyrocketing costs, rising attrition and a backlog of work in an industry already suffering from razor thin margins. Healthcare can’t continue to operate like this – there must be a solution to rescue nearly a trillion dollars of administrative costs and reallocate these precious resources to the delivery of care, the creation of new drugs and therapies, and the research to eradicate diseases.


The answer is an AI-powered digital workforce.

Today, most healthcare executives are familiar with robotic process automation, or RPA – it’s used to automate common workflows or business practices like patient scheduling, supply chain management, claims management, and more. That’s because many of the time-consuming, manual processes that make up healthcare administration are simple, rule-based and high volume – the perfect candidate processes for automation. But for many organizations looking to deploy artificial intelligence, RPA alone will not allow them to realize the full benefits of AI – a digital workforce is required.


A digital workforce goes beyond traditional RPA in 3 very important ways:
  1. A digital workforce has deep learning
  2. Gets smarter over time & adjusts work 
  3. Interacts with human management

Not all automations a digital employee does can be performed by a human – in many cases, a digital employee uses deep learning techniques to accomplish far more complex tasks

Although a digital employee depends on RPA as a building block of their capabilities, they leverage other advanced technologies to handle more complex tasks that RPA can’t accomplish alone. For instance, while RPA can quickly and accurately process large volumes of data, Olive, the first AI-powered digital employee built for healthcare, leverages some degree of artificial cognition on top of an automation, allowing her to make decisions or take action with cognitive “thinking” involved.

The processes involved in deep learning are similar to that of data mining and predictive modeling – this is how a digital employee gets smarter over time. Leveraging deep learning techniques provides better and faster information that improves efficiency, capacity, and reduces costs by providing insights into bottlenecks – and the reason behind these bottlenecks – identifying systemic, recurring issues and making adjustments or recommendations to solve them. 

A digital workforce can learn, adapt to change their work based on new intelligence.

Most of the value of a digital worker is created after a bot is deployed – that’s because, much like a human employee, if a bot is doing the same thing on day 100 of employment as it was on day 1, a huge opportunity is lost. Through predictive analytics, deep learning, and a continual stream of insights, a digital employee gets smarter over time, providing lasting value.

Olive turns insights into actionable intelligence, identifying potential problems from a mile away, so organizations are learning about solutions before they even learn about the problem. By consuming large amounts of historical data already in your system, Olive finds trends and data anomalies in your workflows and learns to respond the same way a human would – only smarter, faster, and more accurately – making continual improvements to provide better, more meaningful data and insight as she learns. And by pairing a digital employee with key hospital administrators, they can streamline and improve the management of data-heavy tasks like insurance eligibility checks or patient scheduling, using data to uncover and resolve recurring issues.


A digital employee interacts with managers to provide business intelligence and recommendations on improved ways to handle tasks, so they continue to generate value after deployment.
 

Olive works with human managers to determine the best way to communicate actionable insights, and that intelligence gives organizations a ‘Decision Advantage’ over where and how they apply their resources towards current workflow improvements or new candidate processes for automation. 

For instance, at one health system, Olive was hired to automate claim status checks. But unlike a traditional RPA bot, as soon as Olive was live she started collecting data that became actionable insights – like dollar amounts associated with denials – to communicate back to her manager for process improvement opportunities. Based on these learnings, her manager recommended that she focus on a specific subset of denials, which lead to another key discovery: millions of dollars of denials stemmed from a specific drug denial due to missing prior authorizations and medical necessity. This insight allowed the hospital to target a specific department in their organization where this recurring issue could be resolved. This “always on” analysis of information allows a digital employee to proactively offer new solutions for workflow improvements as she gets smarter over time.

A digital employee has Global Awareness and can connect disparate sets of information

Lastly, “global awareness” is another important concept that’s core to a digital employee – the understanding or awareness of information across multiple networks, systems, databases, or contexts. Interoperability is a consistent and growing challenge facing healthcare and the ability for our digital employees to transcend those silos opens up great opportunity. One example is quickly identifying a portal outage and alerting managers before a failure, as well as other organizations where Olives are employed. In the future, it could mean knowing a particular patient’s identity across multiple doctors’ offices or hospitals – even across different systems globally. This identification and matching of people is monumentally important to building the interoperability our industry so desperately needs.


That’s why we built Olive: to work side-by-side with healthcare employees with access to a limitless amount of data. 

As AI becomes more advanced – using applications humans have already developed to organize and interpret larger datasets than a human ever could – the opportunity to build and scale a digital workforce is greater than ever before. And at Olive, we think healthcare employees should handle the functions that are uniquely suited to humans, not the job of data entry clerk or data router. Olive can perform these tasks much more accurately and efficiently, working to resolve recurring issues over time and allowing human employees to focus on higher-value initiatives.

 

Working alongside healthcare employees, Olive is trained to think and make complex decisions that are driven by data. She never misses a day of work. She never makes unprogrammed mistakes. And every Olive learns collectively, like a network, so that healthcare organizations never have to solve the same problem twice. 

We’re making healthcare more efficient, more affordable, and more human with a growing digital workforce, so humans finally have the time, energy, and bandwidth to focus on what matters most: the patient experience. Just think of all the time digital employees will give back to our human employees – clinicians, providers, administrators, payers, and more. And with every organization that employs a digital employee, our ability to carve millions of dollars out of the cost of healthcare will become closer to reality.

If you want to learn more about Olive, contact us to schedule a demo.

 

Machine Learning Basics Part 3: Basic model training using Linear Regression and Gradient Descent

Machine Learning Basics Part 3: Basic model training using Linear Regression and Gradient Descent

If you missed part one in the series, you can start here (Machine Learning Basics Part 1: An Overview).

Linear Regression is a straightforward way to find the linear relationship between one or more variables and a predicted target using a supervised learning algorithm. In simple linear regression, the model predicts the relationship between two variables. In multiple linear regression, additional variables that influence the relationship can be included. Output for both types of linear regression is a value within a continuous range.

Simple Linear Regression: Linear Regression works by finding the best fit line to a set of data points.

For example, a plot of the linear relationship between study time and test scores allows the prediction of a test score given the amount of hours studied.


To calculate this linear relationship, use the following:


In this example, ŷ is the predicted value, x is a given data point, θ1 is the feature weight, and θ0 is the intercept point, also known as the bias term. The best fit line is determined by using gradient descent to minimize the cost function. This is a complex way of saying the best line is one that makes predictions closest to actual values. In linear regression, the cost function is calculated using mean squared error (MSE): #331b9

Mean Squared Error for Linear Regression1

In the equation above, the letter m represents the number of data points, ????T is the transpose of the model parameters theta, x is the feature value, and y is the prediction. Essentially, the line is evaluated by the distance between the predicted values and the actual values. Any difference between predicted value and actual value is an error. Minimizing mean squared error increases the accuracy of the model by selecting the line where the predictions and actual values are closest together.

Gradient descent is the method of iteratively adjusting the parameter theta (????) to find the lowest possible MSE. A random parameter is used initially and each iteration of the algorithm takes a small step—the size of which is determined by the learning rate—to gradually change the value of the parameter until the MSE has reached the minimum value. Once this minimum is reached, the algorithm is said to have converged.

 

Be aware that choosing a learning rate that is smaller than ideal will result in an algorithm that converges extremely slowly because the steps it takes with each iteration are too small. Choosing a learning rate that is too large can result in a model that never converges because step size is too large and it can overshoot the minimum.

Learning Rate set too small1

Learning Rate set too large1

 

Multiple Linear Regression: Multiple linear regression, or multivariate linear regression, works similarly to simple linear regression but adds additional features. If we revisit the previous example of hours studied to predict test scores, a multiple linear regression example could be using hours studied and hours of sleep the night before exam to predict test scores. This model allows us to use unrelated features on a single data point to make a prediction about that data point. This can be represented visually as finding the plane that best fits the data. In the example below, we can see the relationship between horsepower, weight, and miles per gallon.

Multiple Linear Regression3

Thanks for reading our machine learning series, and keep and eye out for our next blog!

 

Reference:

  1. Geron, Aurelien (2017). Hands-On Machine Learning with Scikit-Learn & TensorFlow. Sebastopol, CA: O’Reilly.
  2. https://www.mathworks.com/help/stats/regress.html
  3. https://xkcd.com/1725/
Machine Learning Basics Part 1: An Overview

Machine Learning Basics Part 1: An Overview

This is the first in a series of Machine Learning posts meant to act as a gentle introduction to Machine Learning techniques and approaches for those new to the subject. The material is strongly sourced from Hands-On Machine Learning with Scikit-Learn & TensorFlow by Aurélien Géron and from the Coursera Machine Learning class by Andrew Ng. Both are excellent resources and are highly recommended.

Machine Learning is often defined as “the field of study that gives computers the ability to learn without being explicitly programmed” (Arthur Samuel, 1959).

More practically, it is a program that employs a learning algorithm or neural net architecture that once trained on an initial data set, can make predictions on new data.

Common Learning Algorithms:¹

Linear and polynomial regression

Logistic regression

K-nearest neighbors

Support vector machines

Decision trees

Random forests

Ensemble methods

While the above learning algorithms can be extremely effective, more complex problems -, like image classification and natural language processing (NLP) – often require a deep neural net approach.

Common Neural Net (NN) Architectures:¹

Feed forward NN

Convolutional NN (CNN)

Recurrent NN (RNN)

Long short-term memory (LSTM)

Autoencoders

We will go into further detail on the above learning algorithms and neural nets in later blog posts.

Some Basic terminology:

Features – These are attributes of the data. For example, a common dataset used to introduce Machine Learning techniques is the Pima Indians Diabetes dataset, which is used to predict the onset of diabetes given additional health indicators. For this dataset, the features are pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, etc.

Labels – These are the desired model predictions. In supervised training, this value is provided to the model during training so that it can learn to associate specific features with a label and increase prediction accuracy. In the Pima Indians Diabetes example, this would be a 1 (indicating diabetes onset is likely) or a 0 (indicating low likelihood of diabetes).

Supervised Learning – This is a learning task in which the training set used to build the model includes labels. Regression and classification are both supervised tasks.

Unsupervised Learning -This is a learning task in which training data is not labeled. Clustering, visualization, dimensionality reduction and association rule learning are all unsupervised tasks.

Some Supervised Learning Algorithms:¹

K-nearest neighbors

Linear regression

Logistic regression

Support vector machines (SVMs)

Decision trees and random forests

Neural networks

Unsupervised Learning Algorithms:¹

Clustering

• K-means

• Hierarchical cluster analysis (HCA)

• Expectation maximization

Visualization and Dimensionality Reduction

• Principal component analysis (PCA)

• Kernel PCA

• Locally-linear embedding (LLE)

• t-distributed Stochastic Neighbor Embedding (t-SNE)

Association Rule Learning

• Apriori

• Eclat

Dimensionality Reduction: This is the act of simplifying data without losing important information. An example of this is feature extraction, where correlated features are merged into a single feature that conveys the importance of both. For example, if you are predicting housing prices, you may be able to combine square footage with number of bedrooms to create a single feature representing living space

Batch Learning: This is a system that is incapable of learning incrementally and must be trained using all available data at once1. To learn new data, it must be retrained from scratch.

Online Learning: This is a system that is trained incrementally by feeding it data instances sequentially. This system can learn new data as it arrives.

Underfitting:  This is what happens when you creating a model that generalizes too broadly. It does not perform well on the training or test set.

Overfitting:  This is what occurs when you creating a model that performs well on the training set, but has become too specialized and no longer performs well on new data.

Common Notations:

m: The total number of instances in the dataset

X: A matrix containing all of the feature values of every instance of the dataset

x(i): A vector containing all of the feature values of a single instance of the dataset, the ith instance.

y: A vector containing the labels of the dataset. This is the value the model should predict

References:

  1. Géron, Aurélien (2017). Hands-On Machine Learning with Scikit-Learn & TensorFlow. Sebastopol, CA: O’Reilly.

#331b9