Machine Learning Basics Part 3: Basic model training using Linear Regression and Gradient Descent

Machine Learning Basics Part 3: Basic model training using Linear Regression and Gradient Descent

If you missed part one in the series, you can start here (Machine Learning Basics Part 1: An Overview).

Linear Regression is a straightforward way to find the linear relationship between one or more variables and a predicted target using a supervised learning algorithm. In simple linear regression, the model predicts the relationship between two variables. In multiple linear regression, additional variables that influence the relationship can be included. Output for both types of linear regression is a value within a continuous range.

Simple Linear Regression: Linear Regression works by finding the best fit line to a set of data points.

For example, a plot of the linear relationship between study time and test scores allows the prediction of a test score given the amount of hours studied.

To calculate this linear relationship, use the following:

In this example, ŷ is the predicted value, x is a given data point, θ1 is the feature weight, and θ0 is the intercept point, also known as the bias term. The best fit line is determined by using gradient descent to minimize the cost function. This is a complex way of saying the best line is one that makes predictions closest to actual values. In linear regression, the cost function is calculated using mean squared error (MSE): #331b9

Mean Squared Error for Linear Regression1

In the equation above, the letter m represents the number of data points, 𝛉T is the transpose of the model parameters theta, x is the feature value, and y is the prediction. Essentially, the line is evaluated by the distance between the predicted values and the actual values. Any difference between predicted value and actual value is an error. Minimizing mean squared error increases the accuracy of the model by selecting the line where the predictions and actual values are closest together.

Gradient descent is the method of iteratively adjusting the parameter theta (𝛉) to find the lowest possible MSE. A random parameter is used initially and each iteration of the algorithm takes a small step—the size of which is determined by the learning rate—to gradually change the value of the parameter until the MSE has reached the minimum value. Once this minimum is reached, the algorithm is said to have converged.


Be aware that choosing a learning rate that is smaller than ideal will result in an algorithm that converges extremely slowly because the steps it takes with each iteration are too small. Choosing a learning rate that is too large can result in a model that never converges because step size is too large and it can overshoot the minimum.

Learning Rate set too small1

Learning Rate set too large1


Multiple Linear Regression: Multiple linear regression, or multivariate linear regression, works similarly to simple linear regression but adds additional features. If we revisit the previous example of hours studied to predict test scores, a multiple linear regression example could be using hours studied and hours of sleep the night before exam to predict test scores. This model allows us to use unrelated features on a single data point to make a prediction about that data point. This can be represented visually as finding the plane that best fits the data. In the example below, we can see the relationship between horsepower, weight, and miles per gallon.

Multiple Linear Regression3

Thanks for reading our machine learning series, and keep and eye out for our next blog!



  1. Geron, Aurelien (2017). Hands-On Machine Learning with Scikit-Learn & TensorFlow. Sebastopol, CA: O’Reilly.
George S. Barret Elected to Olive’s Board of Directors

George S. Barret Elected to Olive’s Board of Directors

COLUMBUS OH, January 17th, 2019 — Olive, the only healthcare-focused artificial intelligence and robotic process automation company, is pleased to announce the appointment of George S. Barrett, former chairman and chief executive officer of Cardinal Health, Inc as a new director, effective immediately.

“I am exceptionally pleased to welcome George as a new Board member,” said Sean Lane, CEO of Olive. “George brings a wealth of experience and expertise in healthcare transformation and will be a tremendous asset to Olive as we continue our mission of radically reducing the operational costs of healthcare through artificial intelligence. George complements our existing board of directors’ skills and experiences, and will provide valuable perspective as we continue to execute our growth strategies.”

“I’m delighted to be joining the Olive board in its mission to improve operational efficiency in the healthcare system.  By streamlining workflows and reducing administrative errors, Olive’s technologies will help enable more time and resources within hospitals to be directed to what’s most important — patient care,” said George Barrett.

The appointment of George Barrett adds to the momentum the organization has seen in 2018 including a $32.8 million Series D fund raising round and adoption in hospitals and healthcare organizations, large and small, across the United States.

Background on George S. Barrett

Mr. Barrett is the former chairman and chief executive officer of Cardinal Health, Inc., a role he held from August 2009 through end-December 2017, when he was named executive chairman of the board through early November 2018.  

Under Barrett’s leadership, Cardinal Health grew to rank 15th on the Fortune 500 with annual revenue of more than $130 billion and 50,000 employees in nearly 60 countries. During his tenure, the company was recognized as one of Fortune’s World’s Most Admired Companies, Forbes America’s Best Employers, The Wall Street Journal’s Drucker Institute’s Top Companies for Corporate Social Responsibility, National Association for Female Executives’ Top Companies for Executive Women and Chief Executive Magazine’s Top Companies for Talent Development, as well as a number of awards for its public health and philanthropic initiatives.

Barrett is on the boards of Target Corporation, Nationwide Children’s Hospital, Brown University and Children’s Hospitals’ Solutions for Patient Safety. He is vice chair of the board of trustees of The Conference Board and a trustee of the Committee for Economic Development. Additionally, he serves on the National Academy of Medicine’s President’s Advisory Council on Healthy Longevity. Barrett is also a member of the governing committee of The Columbus Foundation, one of the largest community foundations in the United States, as well as participates in leadership positions in other community initiatives in the Columbus area.   

Barrett earned his Bachelor of Arts degree from Brown University and a Master of Business Administration from New York University.  He holds an Honorary Doctor of Humane Letters degree from Long Island University’s Arnold & Marie Schwartz College of Pharmacy and Health Sciences and an Honorary Doctorate in Fine Arts from the Columbus College of Art and & Design.

Olive’s other directors include Chris Olsen, Partner, Drive Capital,  Billy Deitch, Principal, Healthcare, Oak HC/FT,  John Kuelper, Investment Director, Ascension Ventures and Dr. David Agus, Physician and Professor, University of Southern California.



Olive is a healthcare-specific artificial intelligence and process automation company that empowers healthcare organizations to improve efficiency and patient care while reducing costly administrative errors. Olive acts as the intelligent router between systems and data by automating repetitive, high-volume tasks and workflows, providing true interoperability. Olive is helping some of the nation’s top healthcare and hospital organizations reduce data and billing errors, eliminate denials for no coverage, improve cash collections and more. To learn more, visit


Machine Learning Basics Part 2: Regression and Classification

Machine Learning Basics Part 2: Regression and Classification

If you missed part one in the series, you can start here (Machine Learning Basics Part 1: An Overview).


Common real-world problems that are addressed with regression models are predicting housing values, financial forecasting, and predicting travel commute times. Regression models can have a single input feature, referred to as univariate, or multiple input features, referred to as multivariate. When evaluating a regression model, performance is determined by calculating the Mean squared error (MSE) cost function. MSE is the average of the squared errors of each data point from the hypothesis, or simply how far each prediction was from the desired outcome. A model that has a high MSE cost function fits the training data poorly and should be revised.

A visual representation of MSE:

In the image above,1 the actual data point values are represented by red dots. The hypothesis, which is used to make any predictions on future data, is represented by the blue line. The difference between the two is indicated by the green lines. These green lines are used to compute MSE and evaluate the strength of the model’s predictions.

Regression Problem Examples:

  • Given BMI, current weight, activity level, gender, and calorie intake, predict future weight.
  • Given calorie intake, fitness level, and family history, predict percent probability of heart disease.


Commonly Used Regression Models:

Linear Regression: This is a model that represents the relationship between one or more input variables and a linear scalar response. Scalar refers to a single real number.

Ridge Regression: This is a linear regression model that incorporates a regularization term to prevent overfitting. If the regularization term (𝝰) is set to 0, ridge regression acts as simple linear regression. Note that data must be scaled before performing ridge regression.

Lasso Regression: Lasso is an abbreviation for least absolute shrinkage and selection operator regression. Similar to ridge regression, lasso regression includes a regularization term. One benefit to using lasso regression is that it tends to set the weights of the least important features to zero, effectively performing feature selection.2 You can implement lasso regression in Sci-kit Learn using the built-in model library.

Elastic Net: This model uses a regularization term that is a mix of both ridge and lasso regularization terms. By setting r=0 the model behaves as a ridge regression, and setting r=1 makes it behave like a lasso regression. This additional flexibility in customizing regularization can provide the benefits of both models.2 Implement elastic net in Sci-kit Learn using the built in model library. Select an alpha value to control regularization and an l1_ratio to set the mix ratio r.

ClassificationClassification problems predict a class. They can also return a probability value, which is then used to determine the class most likely to be correct. For classification problems, model performance is determined by calculating accuracy.

model accuracy =  correct predictions / total predictions * 100

Classification Problem Examples: Classification has its benefits for predictions in the healthcare industry.For example, given a dataset with features including glucose levels, pregnancies, blood pressure, skin thickness, insulin, and BMI, predictions can be made on the likelihood of the onset of diabetes. Because this prediction should be a 0 or 1, it is considered a binary classification problem.

Commonly Used Classification Models:

Logistic Regression: This is a model that uses a regression algorithm, but is most often used for classification problems since its output can be used to determine the probability of belonging to a certain class.2 Logistic regression uses the sigmoid function to output a value between 0 and 1. If the probability is >= 0.5 that an instance is in the positive class (represented by a 1), the model predicts 1. Otherwise, it predicts 0.

Softmax Regression: This is a logistic regression model that can support multiple classes. Softmax predicts the class with the highest estimated probability. It can only be used when classes are mutually exclusive.2

Naive Bayes: This is a classification system that assumes that the value of a feature is independent from the value of any other feature and ignores any possible correlations between features in making predictions. The model then predicts the class with the highest probability.4

Support Vector Machines (SVM): This is a classification system that identifies a decision border, or hyperplane, as wide as possible between class types and predicts class based on the side of the border that any point falls on. This system does not use probability to assign a class label. SVM models can be fine-tuned by adjusting kernel, regularization, gamma, and margin. We will explore these hyperparameters further in an upcoming blog post focused solely on SVM. Note that SVM can also be used to perform regression tasks.

Decision Trees and Random Forests: A decision tree is a model that separates data into branches by asking a binary question at each fork. For example, in a fruit classification problem one tree fork could ask if a fruit is red. Each fruit instance would either go to one branch for yes or the other for no. At the end of each branch is a leaf with all of the training instances that followed the same decision path. The common problem of overfitting can often be avoided by combining multiple trees into a random forest and taking the prediction from the tree with the highest probability of accuracy.

Neural Networks (NN): This is a model composed of layers of connected nodes. The model takes information in via an input layer and passes it through one or more hidden layers composed of nodes. These nodes are activated by their input, make some determination, and generate output for the next layer of nodes. Connections between nodes have edges, which have a weight that can be adjusted to influence learning. A bias term can also be added to the edges to create a threshold theta (𝛉), which is customizable and determines if the node’s output will continue to the next layer of nodes. The final layer is the output layer, which generates class probabilities and makes a final prediction. When a NN has two or more hidden layers, it’s called a deep neural network. There are multiple types of neural networks and we will explore this in more detail in later blog posts.

K-nearest Neighbor: This model evaluates a new data point by its proximity to training data points and assigns a class based on the majority class of its closest neighbors as determined by  feature similarity. K is an integer set when the model is built and determines how far out the model should look for neighbors. The boundary circle is set when it includes k neighbors.


  2. Geron, Aurelien (2017). Hands-On Machine Learning with Scikit-Learn & TensorFlow. Sebastopol, CA: O’Reilly.
The Problem with Healthcare

The Problem with Healthcare

Over the past two decades, the use of software systems has lead to a paradigm shift in healthcare, particularly in the United States. Medical records went digital at a rapid rate, driven in large part by federal mandates (e.g. the American Reinvestment & Recovery Act pushing “meaningful use” of electronic health records). As the industry responded to the operational and legislative incentives to digitize medical records, a number of EMR (Electronic Medical Record) systems emerged to meet demands. Overall, this push towards a digital age in medical record keeping has been a success; over 95% of all hospitals in the United States have certified Health IT according to The Office of the National Coordinator for Health Information Technology.

While this shift to digital had significant benefits, it was not without its drawbacks. One of the side effects of the switch to digital was that, as a whole staff now often spends more time at a PC than with patients. This is due in large part to one of the main problems in healthcare: the lack of interoperability between systems. Lack of interoperability means that the various systems in a healthcare environment are often unable to communicate with one another in an efficient and scalable manner  This leads to significant friction and manual action in business processes.

EMR systems were designed to be secure and reliable means of storing and recording sensitive patient data. Interoperability wasn’t a primary requirement when these systems were designed, but with the benefit of hindsight, we can now see the shift to digital record keeping and lack of EMR interoperability has created a groundswell of administrative workloads across the organization. An organization full of siloed systems leads to an environment where data transfers between systems can become tedious, time-consuming, and costly. In this piece, we will review this problem in more detail and dive into one of the most promising solutions: Artificial Intelligence (AI).

Understanding the impact of poor interoperability

In modern healthcare facilities, it is a given that many employees with clinical skills like nurses and technicians will spend a non-trivial amount of time moving data from one format or system to another. Humans now effectively act as data routers and processes between discrete systems, moving from one interface to another to type and retype data. Not only does this keep them away from patients, it significantly contributes to increased administrative costs.

To help quantify the astounding administrative costs impacting healthcare in the United States, check out the statistics cited in this New York Times article. The article cites research that puts the administrative cost of healthcare in the U.S. higher than anywhere else in the world and data that indicates that in the U.S. administrative costs account for over 25% of healthcare spending while our neighbor to the North, Canada, spends about 12% of their healthcare dollars on administration.

This isn’t to say that the shift in how data is input into EMR systems will solve all of the nation’s healthcare spending woes, or even to suggest that the makers of EMR software are at fault (after all, they built solutions based on market demands and requirements). The point here is that today this is an area where healthcare organizations have significant bottlenecks and inefficiencies in business processes. Viewed differently, given the right solution, this is an opportunity for healthcare businesses to reduce cost and drive down overhead.

As we will see, automated intelligence is an ideal way to address many of these bottlenecks and organizations that adopt AI to help optimize their work processes can take advantage of this opportunity. In so doing, they will be able to save a significant amount of time and money, while also freeing up staff to do the more important and creative work humans excel at.  The takeaway here is this is one of the health care issues we can solve pragmatically without the need for legislators to take action (which can always be a roadblock when dealing with problems within healthcare).

Understanding how AI can solve interoperability issues with healthcare today

The problem is clear, but the solution is still up for debate. Some have suggested that an overhaul of systems is required. Creating an “Internet for Healthcare” that enables secure, reliable, and fast data exchanges is the ideal for many. However, there is some concern that such a “scorched Earth” approach goes too far and creates more of an administrative hassle than it is worth. Agreeing to standards and implementing entirely new systems at scale, while also meeting the stringent requirements healthcare organizations must adhere to (e.g. HIPAA) while taking a significant amount of time, effort, and coordination.

A solution that is able to work with current systems would enable healthcare organizations to continue to leverage many of the trusted and secure EMRs they are comfortable with today, while still resolving the interoperability problem. Given that, AI is uniquely capable of resolving these challenges and helping to address one of the biggest issues in healthcare. One of the ideal use cases for intelligent automation technologies like AI and RPA (Robotic Process Automation) is one where humans are tasked with high-volume work that is done in s similar way every time. Offloading these tasks to software enables human workers to focus more time on the complex and creative work they should be focused on (e.g. caring for patients), while also increasing speed and reducing exposure to human error.  

The counterargument some make to leveraging AI in healthcare is that APIs (Application Programming Interfaces) or HL7 (Health Level Seven) data streams aren’t always readily available or require complex development work to feed into an AI software. However, when AI is built with the idea in mind of being able to pass the Turing Test for AI (as mentioned in our Artificial Intelligence 101 article), these APIs and HL7 feeds aren’t required. AI is capable of using the same user interfaces (UIs) a human would use to complete the task. This leads to a new paradigm where AI is treated as an employee. For example, in the “onboarding” of our AI Olive, often Olive can be assigned user accounts and email addresses much in the same way a new employee would.

This creates a scenario where healthcare organizations are able to continue to leverage existing systems, while still freeing up human capital to focus on core healthcare functions like patient care. This helps to drive down costs, increase efficiency & speed in administrative processes, and improve patient satisfaction. In short, while there is a myriad of current problems in healthcare, you can resolve many of your interoperability problems with AI.

Conceptualizing the benefits of AI to healthcare administration

To help conceptualize the power of AI to healthcare administration, let’s walk through a real-world example, eligibility checks, and compare the manual process to Olive. Manual eligibility checks are often time-consuming and prone to human error, with technical errors causing 61% of initial medical billing denials for eligibility. This is an excellent microcosm for how many small errors can scale to create bigger issues affecting healthcare providers. There are multiple disjointed systems involved in completing a single eligibility check and if you are having a human go through these processes repeatedly, you can expect a typo or oversight fairly regularly.

Looking at the AI approach with Olive, she will:

  • Automatically pull patient information from the existing HER
  • Use the information to check the same eligibility portals a human would
  • Report the information back for review
  • Make recommendations

This means that the mundane, repetitive tasks associated with eligibility checks are now quickly completed by a software that is significantly less typo-prone than a human, and processing of eligibility is sped up not only due to less technical errors, but also because AI can work 24/7. Using this example, you can now see how AI can be leveraged to elegantly resolve many of the problems in healthcare today.

Conclusion: AI helps optimize healthcare administration

While asking anyone in the industry “what are some health care issues?” will often lead to a laundry list of healthcare problems, not all of those problems have readily apparent, pragmatic solutions. Fortunately, the EHR interoperability challenges faced by healthcare organizations do have a solution that can be implemented today in the form of AI that is built specifically for the healthcare industry. By leveraging AI, healthcare organizations can improve work processes, minimize human error, decrease turnaround times, lower expenses, and free up human resources to focus on more valuable work like patient care.

Here at Olive, we are dedicated to building world-class automated intelligence solutions specifically designed to solve the unique challenges facing the healthcare industry. If you have questions about how AI can help drive your healthcare business forward, please contact us today to work with our team of automation experts.  

3 Benefits of Intelligent Process Automation for Your Business

3 Benefits of Intelligent Process Automation for Your Business

As a healthcare business professional, what are the biggest benefits of intelligent process automation to your business? If you have never considered that question, then you may be missing out on a myriad of potential benefits. The use cases for it in healthcare businesses are seemingly endless. Given that these technologies are ideal for processes that are high-volume and similar every time, the healthcare industry, with its wide variety of administrative tasks taking up valuable human time, is a prime candidate to leverage the power of automation.

In this article, we will dive into the details of some of the more common intelligent process automation technologies and 3 specific benefits intelligent process automation can bring to the healthcare businesses.

Explaining Process Automation & RPA

Before we dive into the specific benefits intelligent process automation can bring to your healthcare business, let’s dive into some of the nuts and bolts of process automation and RPA (Robotic Process Automation).

What is Process Automation?

Process automation is as simple as the title suggests—a process done automatically, not by a human being but by computer software that does not get tired, frustrated, and doesn’t need to sleep, drink or eat. Processes that become automated require less human involvement and certainly less human time to execute to its utmost efficiency.

Robotic Process Automation

One of the most apt examples of process automation and its benefit to a healthcare business is Olive’s Robotic Process Automation. Olive utilizes RPA to automate cyclical and time-consuming tasks that are rule-based and trigger-driven, freeing your staff from enduring countless hours of productivity that could have been better applied elsewhere in your business’ developments.

Now we’re sure when you hear the word “robotic”, you might immediately think about Rosie from the Jetsons, who performed menial house tasks as a robot made for a futuristic cartoon family from the Hannah Barbera cartoons, or, if you would like a more modern example, the Transformers who can transform into vehicles or other machinery assist human beings in various ways. Those aren’t the types of robots we’re talking about here.

Robots exist in other forms in technology, as detailed by Olive’s blog that also touches upon Robotic Process Automation. These technologies can consist of soft-bots, a computer program that acts on behalf of another user or program, or sensor networks, a group of spatially separated and dedicated sensors that monitor and record an environment’s physical conditions and organize the data collected at a vital location.

Oftentimes, RPA is considered the simplest form of Artificial Intelligence and is therefore used in business practices that require little skill. RPA specifically reaps benefits by giving skilled and specialized workers the opportunity to focus all of their attention on jobs that demand full human cognition and subjective decision making.

RPA vs Cognitive Automation

To put it simply, RPA takes a given set of inputs and produces a predictable, repeatable set of outputs. Not unlike a grunt or, aptly, a robot designed solely to follow instructions without freedom to think independent of its design, while other more advanced forms of intelligent automation, like cognitive automation autonomously improve in performance over time using machine learning. Machine learning is similar to humans gaining experience and figuring out more efficient ways to do things, but it is computers doing the iterating and learning instead of people. Both cognitive automation and RPA are beneficial tools for a myriad of  work processes ranging from simple rule-based processes (RPA) to more complex judgement-based processes (cognitive automation).

Benefit 1: Minimize errors

In order for your organization to fire on all cylinders with maximum profitability and productivity, the main things you have to invest on are: saving time and decreasing or outright eliminating the risk of errors. Why? As they say, time is money, and errors are setbacks that can be avoided if you leverage the benefits of process automation. Software like Olive can assist healthcare organizations, hospitals, and their staff to remedy a human-made mistake or miscommunication.

To help conceptualize and quantify the benefits, let’s consider a common healthcare business process: eligibility checks. Often times, eligibility checks require a human to manually transfer data from one system to another system, and then make a decision (or have one provided to them) about eligibility. This mundane, but important process is prone to typos and human error given the same data being entered multiple times into different forms and User Interfaces (UIs). It is no surprise then that technical errors cause 61% of initial medical billing denials for eligibility. By offloading this business process to  Olive, healthcare organizations can benefit from a high level of automation and repeatability in executing these tasks that minimizes susceptibility to human error and typos while still enabling businesses to use existing EHRs.

Benefit 2: Enhance problem-solving capacity

Automating processes in within your organization business doesn’t simply stop at the ‘cyclical and time-consuming tasks’. Enter, intelligent automation. Intelligent Automation is what is says on the tin: software that actually thinks for you, thus is the wonder of artificial intelligence and its role in intelligent automation. It isn’t simply mind-numbing repeatable tasks with minimum human monitoring, but actual problem-solving software that can actually think independent of human guidance and assist problem-solving on every level imaginable.

As best described in Olive’s article, 3 Trends to Consider Before AI Deployment, by 2026, intelligent automation might save the US healthcare economy a total of $150 billion annually according to a recent analysis by Accenture. It’s no wonder healthcare organizations are investing in intelligent automation, powered by AI software like Olive.

It doesn’t stop at healthcare, however, we’re seeing intelligent automation overtake the workplace and our daily lives all around us, from automated tellers at banks replacing human tellers, to booking hotel rooms online without needing to speak to a live person, to letting Google Maps navigate your next drive, just to name a few.

That being said, the most astounding example of intelligent automation, may indeed lie in healthcare. AI-assisted robots, as the article further explains, are aiding surgeons with medical decision support, image analysis and diagnostics, reducing and eliminating the potential of human error by joining human and machine in order to achieve the best results possible.

Benefit 3: Free clinical staff to work on clinical tasks

Another aspect of intelligent automation is cognitive automation software, which brings intelligence to information-intensive procedures. Cognitive automation is effectively the combination of Artificial Intelligence and Cognitive Computing. What sets cognitive automation apart is its performance of jobs that only human beings used to be able to do.

Often times, healthcare employees are bogged down with tedious administrative tasks that, while important to business, are inherently time-consuming and repetitive (e.g. insurance verification and data recording). These responsibilities can easily be outsourced to an RPA to execute in order to free up said staff so they can concentrate on tasks that humans excel at which require uniquely human skills like empathy and creativity (e.g. corresponding with patients, resolving more complex issues, etc.).

Part of cognitive automation is machine learning in order to have computing technology imitate human operations to complete tasks. While RPA is required to operate on a rule-base that limits its decision making, Cognitive automation expresses its artificial intelligence as a resource that learns as any human would in order to adapt and execute a job to its utmost efficiency, while becoming fatigued as a human being would, mind you.


In conclusion, in the field of healthcare alone, studies have found the increase in automation processing and data recording has decreased the in-hospital mortality rate by 15% and administrations that have adopted RPA have noticed a 200% return of investment in the first year of use according to this Olive white paper. Given the power of the technology and the myriad of high-volume tasks ripe for outsourcing to an intelligent automation solution in healthcare but it’s no wonder that intelligent process automation is a problem solver and driver of profitability-growth in the industry.

Machine Learning Basics Part 1: An Overview

Machine Learning Basics Part 1: An Overview

This is the first in a series of Machine Learning posts meant to act as a gentle introduction to Machine Learning techniques and approaches for those new to the subject. The material is strongly sourced from Hands-On Machine Learning with Scikit-Learn & TensorFlow by Aurélien Géron and from the Coursera Machine Learning class by Andrew Ng. Both are excellent resources and are highly recommended.

Machine Learning is often defined as “the field of study that gives computers the ability to learn without being explicitly programmed” (Arthur Samuel, 1959).

More practically, it is a program that employs a learning algorithm or neural net architecture that once trained on an initial data set, can make predictions on new data.

Common Learning Algorithms:¹

Linear and polynomial regression

Logistic regression

K-nearest neighbors

Support vector machines

Decision trees

Random forests

Ensemble methods

While the above learning algorithms can be extremely effective, more complex problems -, like image classification and natural language processing (NLP) – often require a deep neural net approach.

Common Neural Net (NN) Architectures:¹

Feed forward NN

Convolutional NN (CNN)

Recurrent NN (RNN)

Long short-term memory (LSTM)


We will go into further detail on the above learning algorithms and neural nets in later blog posts.

Some Basic terminology:

Features – These are attributes of the data. For example, a common dataset used to introduce Machine Learning techniques is the Pima Indians Diabetes dataset, which is used to predict the onset of diabetes given additional health indicators. For this dataset, the features are pregnancies, glucose, blood pressure, skin thickness, insulin, BMI, etc.

Labels – These are the desired model predictions. In supervised training, this value is provided to the model during training so that it can learn to associate specific features with a label and increase prediction accuracy. In the Pima Indians Diabetes example, this would be a 1 (indicating diabetes onset is likely) or a 0 (indicating low likelihood of diabetes).

Supervised Learning – This is a learning task in which the training set used to build the model includes labels. Regression and classification are both supervised tasks.

Unsupervised Learning -This is a learning task in which training data is not labeled. Clustering, visualization, dimensionality reduction and association rule learning are all unsupervised tasks.

Some Supervised Learning Algorithms:¹

K-nearest neighbors

Linear regression

Logistic regression

Support vector machines (SVMs)

Decision trees and random forests

Neural networks

Unsupervised Learning Algorithms:¹


• K-means

• Hierarchical cluster analysis (HCA)

• Expectation maximization

Visualization and Dimensionality Reduction

• Principal component analysis (PCA)

• Kernel PCA

• Locally-linear embedding (LLE)

• t-distributed Stochastic Neighbor Embedding (t-SNE)

Association Rule Learning

• Apriori

• Eclat

Dimensionality Reduction: This is the act of simplifying data without losing important information. An example of this is feature extraction, where correlated features are merged into a single feature that conveys the importance of both. For example, if you are predicting housing prices, you may be able to combine square footage with number of bedrooms to create a single feature representing living space

Batch Learning: This is a system that is incapable of learning incrementally and must be trained using all available data at once1. To learn new data, it must be retrained from scratch.

Online Learning: This is a system that is trained incrementally by feeding it data instances sequentially. This system can learn new data as it arrives.

Underfitting:  This is what happens when you creating a model that generalizes too broadly. It does not perform well on the training or test set.

Overfitting:  This is what occurs when you creating a model that performs well on the training set, but has become too specialized and no longer performs well on new data.

Common Notations:

m: The total number of instances in the dataset

X: A matrix containing all of the feature values of every instance of the dataset

x(i): A vector containing all of the feature values of a single instance of the dataset, the ith instance.

y: A vector containing the labels of the dataset. This is the value the model should predict


  1. Géron, Aurélien (2017). Hands-On Machine Learning with Scikit-Learn & TensorFlow. Sebastopol, CA: O’Reilly.