In this blog we will discuss about more features of Orange Tool. To learn basic of Orange Tool then see my previous blog named Introduction to Orange Tool. In this blog I will show how to Split our data in training data and testing data in Orange, how to use cross validation in Orange.
Now let’s start creating the Work Flow.
First, We can use file widget and selecting titanic.tab as datasets. We can also import our datasets via File explorer. Also using URL of datasets.
Now we have to send this data as an input to the Data Sampler. Data Sampler selects a subset of data instances from an input data set.
It outputs a sampled and a complementary data set. The output is processed after the input data set is provided and Sample Data is pressed.
Here I sampled the data 70% output sampled data and 30% will be complementary data set. Now, we need to send this sample data from Data Sampler to Test and Score. The widget tests learning algorithms. Different sampling schemes are available, including using separate test data.
Test and Score First shows a table with different classifier performance measures, such as classification accuracy and area under the curve. and Second, it gives outputs evaluation results, which can be used by other widgets for analyzing the performance of classifiers, such as ROC Analysis or Confusion Matrix.
Sampling using Cross Validation in Orange
Cross-validation splits the data into a given number of folds (usually 5 or 10). Cross-validation is primarily used in applied machine learning to estimate the skill of a machine learning model on unseen data.
Split data in training data and testing data in Orange
To split the data into train and test datasets, we will send 70% of the sampled data to Data Sampler as the train data.
And the remaining 30% data as the test data by clicking on the link between Data Sampler and Test and Score. In there set the link from Data Sample box to Data box and Remaining Data box to Test Data as shown in the below figure.
So till now, we have seen how to how we can sample our data and compare different learning algorithms to find out which is the best algorithm for our data set using the Orange tool
Comments