Decision Tree, One of the simplest and most useful approaches in Machine Learning. The decision tree is a supervised learning approach to problem-solving. It is used to solve both classification and regression problems. It is one of those predictive modeling approach used in data mining, statistics and machine learning. This method uses tree representation to solve a problem where leaf nodes represent class labels and internal nodes represent the attributes. Any boolean function with discrete attributes can be used to solve using this approach.

## Decision Tree:

An example of a decision tree can be expressed by solving the following table in terms of a decision tree. Here we have a table that shows the data of 10 days and the respective factor for if we want to play tennis that day.
Now you can use this table to decide if you want to play tennis or not. the decision tree can be applied to explain this classification problem. The above problem can be represented in terms of a decision tree in the following way:

### Data Format

Data is collected in the form:
`(x,Y)=(x1,x2,x3,....,xk,Y)`
We are trying to understand, classify or generalize the dependent variable Y. X is a vector that is composed of the features, x1, x2, x3, etc. which are used for the task.

### Example

```training_data = [                  ['Green', 3, 'Apple'],
['Yellow', 3, 'Apple'],
['Red', 1, 'Grape'],
['Red', 1, 'Grape'],
['Yellow', 3, 'Lemon'],
]
# Header = ["Color", "diameter", "Label"]
# The last column is the label.
# The first two columns are features.

my_tree = build_tree(training_data)

print_tree(my_tree)```

### Types of  decision tree

There are two main types of the decision tree as follow:

#### Regression trees

In the case of the regression tree, the output is continuous which is in the range like 123 or any other continuous form.

### Working of Decision Tree

Now that we have the general idea about the decision tree we will move forward to learn how they actually work. There are many algorithms that are used to develop a decision tree such as ID3, C4.5, Classification, and regression tree(CART), etc. Among them, ID3 is the best and most used one. ID3 Stands for Iterative Dichotomiser 3. We will go in detail about the ID3 algorithm after we define some important terms that are vital while constructing a decision tree.

#### Entropy

For a finite set S, an entropy H(S) also called Shannon entropy is defined as the measure of the amount of uncertainty in the provided data. Entropy tells us about the predictability of certain events. For example, While tossing the coin the predictability of it being the head or tail is 0.5. Let's consider both sides of the coin is head than the predictability of it being head is 1 and being tail is 0.

#### Information Gain

For a set S, Information Gain IG(S, A) also called Kullback-Leibler divergence is defined as the effective change in the entropy after deciding the specific attribute. It measures the relative change in entropy with respect to the independent variables.

#### Gini Impurity

Impurity here refers to the mixing of data of different classes. It is the metric to measure the likelihood of the identification of the new instance of a random variable. If our data is impure the likelihood of wrong identification is high which means pure datasets are preferred ie. data with low Gini index.

### ID3 Algorithm recursively performs the following tasks:

1. At first, create the root node of the tree.
1. Analyze the examples, if they are positive return leaf node 'positive' else return 'negative'.
1. After that calculate the entropy H(S) based on that state.
1. Now, for each attribute calculate the entropy with respect to attribute X, H(S, X).
1. Calculate IG(S, X) and select that attribute having maximum IG value.
1. Remove that attribute from the set.
1. Repeat until all attributes are removed from the set and added to the tree.

• Can handle both binomial and continuous data.
• It requires fewer data processing.
• Easy to understand and use.

• can create overfitting of an attribute.
• can create biased learning.
• should be highly careful with parameters.

### Implementing Decision tree by Coding:

```import pydotplus
from sklearn import tree
from IPython.display import Image, display
__author__ = "Poshan Pandey <poshan.xyz.com>"

"""

:return:        data set instance
"""
return iris

def train_model(iris):
"""
Train decision tree classifier

:param iris:    iris data set instance
:return:        classifier instance
"""
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)
return clf

def display_image(clf, iris):
"""
Displays the decision tree image

:param clf:     classifier instance
:param iris:    iris data set instance
"""
dot_data = tree.export_graphviz(clf, out_file=None,
feature_names=iris.feature_names,
class_names=iris.target_names,
filled=True, rounded=True)

graph = pydotplus.graph_from_dot_data(dot_data)
display(Image(data=graph.create_png()))

if __name__ == '__main__':