Machine learning with R programming
Machine learning
-
Setting random seed -- > assign random number
to the data thus the result will be reproducible every time the data is run
-
Set.seed(123) -- > randomly sampling number
- If set.seed(1000000)
32 bit integer -- >give over 4
billion possible random sequences -- > 2^32 (- 2,147,483,648 to 2,147,483,648)
Ref: https://stat.ethz.ch/pipermail/r-help/2006-June/107399.html
-
Classification
o
Using caret library to classify the data -- >
it is the library which contains many machine learning models
-
Data splitting (I am not so sure how many ratios
I should use to build up the model?)
o
Training set
§
80%
o
Testing set
§
20%
The method (svmPoly) is not in
the library(caret) – requires kernlab, then – another package e1071
##svmPoly -- refer to support
vector machine -- supervised learning model which using algorithm to analyze
data for classification and regression analysis
Classification
Regression
10-fold cross validation
(k-value)
-
When there is no way to get a validation dataset
- > cross validation is used to check the efficiency of model
-
k-value is used as a splitting factor to divide
the training data
o
this is done to ensure that subsets of data
contain a similar distribution of the outcomes of interest
o
1 subset is selected as testing subset, n-1 is
used as the training set to build the model
o The process is repeated throughout the training data set
Ref: https://en.wikipedia.org/wiki/Cross-validation_(statistics)
o Thus, we feel ensure that the model is ready for accurately predict the independent data
Ref: R. Sullivan, Introduction to Data Mining for the Life Sciences, 2012
Code related to this note: https://github.com/tlerksuthirat/R-learning/blob/master/Machine%20learning.R
Comments
Post a Comment