R Tree Package | How does the Tree Package work? (2024)

Updated March 10, 2023

Introduction to R Tree Package

The R tree package is a package specifically designed to work with the decision trees. This package allows us to develop, modify, and process the classification as well as the regression trees in R programming, which will help us make the precise decisions related to the business problems.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

This article will walk you through the tree package in R, how to install it, how it can be used to run the decision, classification, and regression trees with hands-on examples.

How to Install the tree package?

To install the package in the R workspace, follow the code below:

#Install the tree package in your workspace install.packages("tree")

See the output for the installation as shown below:

How does the Tree Package work?

For this article, we are going to use carseats data. This is a built-in dataset that comes with the built-in packages in R. We will now read this data and try to store it as a copy under the new object.

#Reading the cars data and storing it as a new object in R data <- Carseats head(data) #First few rows for each column of the data

The code above reads the Carseats data and stores it under the data object. The head() function returns the top six rows of this dataset. See the output for this code as below:

Here, this data represents the Carseats data for children seats for around 400 different stores with variables as below:

#creating Sales_bin based on the Sales variable data$Sales_bin <- as.factor(ifelse(data$Sales >= 8, "yes", "no")) #droping the original Sales variable data$Sales = NULL #Take a look at the data head(data)

Let us see the output of this code:

The important part of any statistical analysis is creating two portions of your data. One is training data; the other is testing data. We basically train the model on training data, and then before deploying it, we test it on testing data. The standard ratio to divide a model into training and testing data is 70: 30. Meaning, we use 70% of the data to train a model and use 30% of it to test the model. Let us split our data into training and testing models with the given proportion. Remember that this split needs to happen randomly. We will use a combination of a sample() and rm() functions to achieve randomness.

set.seed(200) #Developing the model train_m <- sample(1: nrow(data), nrow(data)*0.70) #Making the split Train_data <- data[train_m,]Test_data <- data[-train_m,]rm(data, train_m) head(Train_data) head(Test_data)

See the output for this code as below:

Now, the time is to run the decision tree model, which is a part of the tree package in R.

We will use the tree() function to generate a tree on the training dataset and use the same tree on the testing dataset to predict the values for the future. See the example below:

#Training the decision tree Des_tree_model <- tree(Sales_bin~., Train_data) plot(Des_tree_model) text(Des_tree_model, pretty = 0) #Using the model on testing dataset to check how good it is going Pred_tree <- predict(Des_tree_model, Test_data, type = "class" mean(Pred_tree != Test_data$Sales_bin)

Here, we have created the decision tree model on the Sales_bin variable from Train_data. We have also used the plot function to plot the decision tree. See the image below, which shows the decision tree generated.

After this, we tried to use this model on testing data to generate the prediction. And finally, we have used the mean() function to get the percentage error value the predicted tree generates on the testing dataset.

See the output below for a better realization:

This image shows that around 21% of observations from the predicted decision tree are not matching with the actual data. In other words, there is a 21% error in the model, or the model is 79% accurate. For the sake of this example, it is a huge achievement, and I will be using the predictions made by this model.

Note: One thing to remember, since the split of training and the testing dataset was made randomly, the final results obtained by you while actually practicing on the same data, will be different at different times. I don’t expect the same accuracy which I got (Slight here and there, you know).

This article ends here. Where we used the tree package to generate, analyze, and predict the decision tree.

Let us wrap things up with a few Conclusion

Conclusion

The tree package in R could be used to generate, analyze, and make predictions using the decision trees.
The tree() function under this package allows us to generate a decision tree based on the input data provided.
It is always recommended to divide the data into two parts, namely training and testing.
The general proportion for the training and testing dataset split is 70:30.

Recommended Articles

This is a guide to R Tree Package. Here we discuss the tree package in R, how to install it, how it can be used to run the decision, classification, and regression trees with hands-on examples. You may also have a look at the following articles to learn more –

B Tree in Data Structure
R Data Types
Types of Data Visualization
B+ Tree in Data Structure

ADVERTIsem*nT

All-in-One Excel VBA Bundle - 120+ Courses | 110+ Mock Tests | 500+ Hours | Lifetime | 120+ Online Courses 30+ Projects 500+ Hours Verifiable Certificates Lifetime Access

ADVERTIsem*nT

Financial Analyst Masters Training Program 2000+ Hours of HD Videos 43 Learning Paths 550+ Courses Verifiable Certificate of Completion Lifetime Access

ADVERTIsem*nT

All in One Data Science Bundle 2000+ Hour of HD Videos 80 Learning Paths 400+ Courses Verifiable Certificate of Completion Lifetime Access

ADVERTIsem*nT

All in One Software Development Bundle 5000+ Hours of HD Videos 149 Learning Paths 1050+ Courses Verifiable Certificate of Completion Lifetime Access

Primary Sidebar

");jQuery('.cal-tbl table').unwrap("

");jQuery("#mobilenav").parent("p").css("margin","0");jQuery("#mobilenav .fa-bars").click(function() {jQuery('.navbar-tog-open-close').toggleClass("leftshift",7000);jQuery("#fix-bar").addClass("showfix-bar");/*jQuery(".content-sidebar-wrap").toggleClass("content-sidebar-wrap-bg");jQuery(".inline-pp-banner").toggleClass("inline-pp-banner-bg");jQuery(".entry-content img").toggleClass("img-op");*/jQuery("#fix-bar").toggle();jQuery(this).toggleClass('fa fa-close fa fa-bars');});jQuery("#mobilenav .fa-close").click(function() {jQuery('.navbar-tog-open-close').toggleClass("leftshift",7000);jQuery("#fix-bar").removeClass("showfix-bar");jQuery("#fix-bar").toggle();jQuery(this).toggleClass('fa fa-bars fa fa-close');/*jQuery(".content-sidebar-wrap").toggleClass("content-sidebar-wrap-bg");jQuery(".inline-pp-banner").toggleClass("inline-pp-banner-bg");jQuery(".entry-content img").toggleClass("img-op");*/});});