Supervised Learning uses labeled inputs and maps it out to known output. There are two data, inputs and outputs (X & Y) and we train the data on the relationship between the two, then test and make predictions.
Example: annual promotion scores, top songs on Spotify
Unsupervised Learning- uses unlabeled data to train machine, there is only an input. Understands patterns, trends and structures in the data and discovers the output.
Examples: Grouping customers by purchasing behavior, spam filters in email, churn analysis, customer segementation
Reinforcement Learning- Follows trial and error method to arrive at the desired solution. Inputs are given, and then the model is “rewarded” or “punished” based on output.
Examples: Building games, robotics, self-driving cars, online stock trading
Naive Bayes Classsifier
Some common used cases are Spam Filtering, Sentiment analysis, and article classification based on words used. But before we dig deep, let’s review first the concept of Bayes Theorem where Naie Bayes Classifier is based of.
Bayes’ theorem gives the conditional probability of an event A given another event B has occured.
For example, you have to toss 2 coins and the sample spaces are = {HH,HT,TH,TT}:
- P(Getting two heads) = 1/4
- P(At least 1 tail) = 3/4
- P(2nd coin being head given that 1st coin being tail) = 1/2
- P(Getting 2 heads given first coin is head) = 1/2
Naive-Bayes Classifier is being used in the following:
- Face recognition
- Weather prediction
- Medical Diagnosis
- News Classification
Sample Problem Statement: To predict whether a person will purchase a product on a specific combination of Day, Discount and Free Delivery using Naive Bayes Classifier.
Given that we want the Day to be Holiday and Discount and Free Delivery as yes, the probability of purchase is 0.986 and the probability of non-purchase is 0.178. Then we need to normalize these probabilities to get the likelihood of the event.
Thus, the likelihood of purchase is 84.71% against non-purchase at 15.29%. We can conclude that an aerage customer will buy on a holiday with discount and free delivery.
Advantages of a Naive-bayes classifier:
- Very simple and easy to implement
- Needs less training data
- Handles both continous and discrete data
- Highly scalable with number of predictors and data points
- Fast and can be used for realtime
- Not sensitve for irrelevant features
Layman’s term; Bayes Theorem
Decision Tree Classifier
This is a tree-shaped diagram ued to determine a course of action. Each branch of the tree represents a possible decision, occurence or reaction.
Probems Decision Tree can solve
- Classification — a classification tree will determine a set of logical if-then conditions to classify problems. For example, discriminating between three-types of flowers based on certain features.
- Regression — Regression tree is used when the target variable is numerical or continous in nature. We fit the regression model to the target variable using each of the independent variables. Each split is made based on the sum of squared error.
Advantages of using decision tree
- Simple to understand, interpret, and visualize
- Little effort is required for data preparation
- Can handle both numerical and non-numerical data
- Non-linear parameters don’t affect it’s performance
Disadvantage of using decision tree
- Overfitting
- Low-biased: which makes it difficult for the model to work with new data
- High variance: The model can get unstable due to small variation in data
Important Terms
- Entrophy — is the measure of randomness or unpredictability in the dataset
- Information gain — it is the measure of decrease in entropy after the dataset split
- Leaf Node — carries the classification or the decision
Sample problem statement: To classify different type of animals based on their features using decision tree. Variables are color, height and label.
- Split the data in such a way that the information gain is at it’s highest based on entropy
- Calculate the entropy after every split to calculate the gain. Gains can be calculated by finding the difference of the subsequent entropy values after split
- Choose a condition that gives us the highest entropy.
- The first condition that gives us the highest gain will be used to make the 1st test split
Layman’s term: Just a bunch of if-else statement but finding optimal split