Building Decision Tree Classifier for cancer dataset

Learn how to make a decision tree classifier using scikit-learn API

Decision Trees are supervised machine algorithm that are capable of performing both classification and regression tasks (with multiple outputs). In this project, we will be focusing on Decision Tree Classifier. This algorithm is capable of predicting values of responses by learning decision rules derived from features that are used for training.

scikit-learn uses Classification and Regression Tree (CART) algorithm to train decision trees. For this project, we will use scikit-learn API and its modules to create a decision tree for the breast cancer dataset which will be used for binary classification (benign or malignant categories). This dataset is available in scikit-learn and can be directly loaded into our working environment (Google Colab).

I have created an interactive Google Colab document along with the description of each step for this project in a detailed manner. Feel free to experiment with the document below by changing different parameters and seeing how it affects the output. Have fun!

More information regarding scikit-learn API can be found here.

Archana Tikayat Ray
Archana Tikayat Ray

My research interests include applied data science, machine learning, and Natural Language Processing.