Machine learning with R programming

- June 18, 2020

Self-learning note

Machine learning

- Setting random seed -- > assign random number to the data thus the result will be reproducible every time the data is run

- Set.seed(123) -- > randomly sampling number

- If set.seed(1000000)

32 bit integer -- >give over 4 billion possible random sequences -- > 2^32 (- 2,147,483,648 to 2,147,483,648)

Ref: https://stat.ethz.ch/pipermail/r-help/2006-June/107399.html

- Classification

o Using caret library to classify the data -- > it is the library which contains many machine learning models

- Data splitting (I am not so sure how many ratios I should use to build up the model?)

o Training set

§ 80%

o Testing set

§ 20%

The method (svmPoly) is not in the library(caret) – requires kernlab, then – another package e1071

##svmPoly -- refer to support vector machine -- supervised learning model which using algorithm to analyze data for classification and regression analysis

Classification

Regression

10-fold cross validation (k-value)

- When there is no way to get a validation dataset - > cross validation is used to check the efficiency of model

- k-value is used as a splitting factor to divide the training data

o this is done to ensure that subsets of data contain a similar distribution of the outcomes of interest

o 1 subset is selected as testing subset, n-1 is used as the training set to build the model

o The process is repeated throughout the training data set

Ref: https://en.wikipedia.org/wiki/Cross-validation_(statistics)

o The model generated at each time with the output from the testing dataset - > find the average performance of all outputs - > it can be used to estimate the performance of the original model

o Thus, we feel ensure that the model is ready for accurately predict the independent data

Ref: R. Sullivan, Introduction to Data Mining for the Life Sciences, 2012

Search This Blog

Random Records

Machine learning with R programming

Comments

Post a Comment

Most viewed blogs

Useful links (updated: 2025-07-20)

Genome editing technology short note

Umbrella vs Basket Trial