Machine Learning (M.L.)

Machine Learning at its most basic is the practice of using algorithms to parse data, learn from it, and then decide or predict about something in the world. So rather than hand-coding software routines with a specific set of instructions to accomplish a particular task, the machine is "trained" using large amounts of data and algorithms that give the ability to learn how to perform the task. <br>

Careful! Not to be mistaken with Deep Learning

Deep Learning ( D. L) 

Deep learning is a subfield of M. L concerned with algorithms inspired by the structure and function of the brain called artificial neural networks. Deep Learning excels on problem domains where the inputs (and eve outputs) are analog.

Meaning, they are not a few quantities in a tabular format but instead are images of pixel data, documents of text data, or files of audio data.

Learn more about D.L. here.


Classification is concerned with building models that separate data into distinct classes. These models are built by inputting a set of training data for which the classes are pre-labeled for the algorithm to learn from. Well-known classification schemes include decision trees a support vector machines. As this type of algorithm requires explicit class labeling, classification is a form of supervised learning. 


Regression is very closely related to classification.

While classification is concerned with the prediction of discrete classes, regression is applied when the class to be predicted is made up of continuous numerical values.

Linear regression is an example of the regression technique.


Clustering is used for analyzing data that does not include pre-labeled classes or even a class attribute at all. Data instances are grouped using the concept of maximizing the interclass similarity.

k-means clustering is perhaps the most well-known example of a clustering algorithm. 


Association is most easily explained by introducing market basket analysis, a typical task for which it is well-known.

Market basket analysis attempts to identify associations between the various items that have been chosen by a particular shopper, and placed in their market basket, be it real or virtual, and assigns support and confidence measures for comparison.

The value of this lies in cross-marketing and customer behavior analysis.

Cross Validation

Cross-validation is a deterministic method for model building, achieved by leaving out one of the k-segments and using the remaining kth segment for testing.

This process is then repeated k times, with the individual prediction error results being combined and averaged in a single integrated model.  

Supervised Algorithms

Supervised learning relies on data where the true class of the data is revealed. For example, if we want to teach the computer to distinguish between pictures of cats and dogs; we will run the algorithm on lots of those pictures. To supervise the code to learn the right way to classify images, we will label the pictures as cats and dogs.

Once our algorithm learns how to classify images, we can use it on new data and predict labels (cat or dog in our case) on unseen images. 

Unsupervised Learning

Unsupervised earning means that the learning algorithm does not have to aby labels attached to supervise the learning.

We just provide an algorithm with a large amount of data and characteristics of each observation. Imagine there are no labels to the images of cats and fogs in the above example.

In such a case, the algorithm itself cannot decide what a face is, but it can divide the data into groups.

Reinforcement Learning

This class of problems focuses on the end outcome to learn. Let's illustrate by an example of learning to play chess. As input to this problem, the M.L. algorithm receives information about whether a game played was won or lost. So M.L. does not have every move in the game labeled as successful or not, but only has the result of the whole game.

The more games the algorithm plays, the more it learns about the winning moves. 

Source: Data Science App