Machine learning with R programming

Self-learning note

Machine learning

-        Setting random seed -- > assign random number to the data thus the result will be reproducible every time the data is run

-        Set.seed(123) -- > randomly sampling number


-        If set.seed(1000000)

32 bit integer -- >give over 4 billion possible random sequences -- > 2^32 (- 2,147,483,648 to 2,147,483,648)

Ref: https://stat.ethz.ch/pipermail/r-help/2006-June/107399.html

-        Classification

o   Using caret library to classify the data -- > it is the library which contains many machine learning models

-        Data splitting (I am not so sure how many ratios I should use to build up the model?)

o   Training set

§  80%

o   Testing set

§  20%

The method (svmPoly) is not in the library(caret) – requires kernlab, then – another package e1071

##svmPoly -- refer to support vector machine -- supervised learning model which using algorithm to analyze data for classification and regression analysis

Classification

Regression

10-fold cross validation (k-value)

-        When there is no way to get a validation dataset - > cross validation is used to check the efficiency of model

-        k-value is used as a splitting factor to divide the training data

o   this is done to ensure that subsets of data contain a similar distribution of the outcomes of interest

o   1 subset is selected as testing subset, n-1 is used as the training set to build the model

o   The process is repeated throughout the training data set

Ref: https://en.wikipedia.org/wiki/Cross-validation_(statistics)

 o   The model generated at each time with the output from the testing dataset - > find the average performance of all outputs - > it can be used to estimate the performance of the original model

o   Thus, we feel ensure that the model is ready for accurately predict the independent data

Ref: R. Sullivan, Introduction to Data Mining for the Life Sciences, 2012

Code related to this note: https://github.com/tlerksuthirat/R-learning/blob/master/Machine%20learning.R

Comments

Popular posts from this blog

Useful links (updated: 2024-10-23)

Odd ratio - อัตราส่วนของความต่าง

Note: A Road to Real World Impact (new MU-President and Team) - update 12 Sep 2024