Machine Learning for Everyone (recommend blog - easy to read and understand)

Machine Learning for Everyone: In simple words. With real-world examples. Yes, again -- > good source to learn

Machine learning algorithms

Image from vas3k blog

Explanation of support leaning machine;

Linear

Polynomial

Radial basis function

https://towardsdatascience.com/svm-and-kernel-svm-fed02bef1200

Learning from datacamp

Machine learning ทำอะไรได้บ้าง

Predict สามารถนำเอาข้อมูลมาทำนายอนาคตได้ ผ่านการสร้างโมเดล
Infer สามารถนำมาใช้วิเคราะห์สาเหตุที่น่าจะเกิดเหตุการณ์นี้ หรือพฤติกรรมแบบนี้
Infer pattern สามารถนำมาแยกรูปแบบต่าง ๆ กันได้ (จับแพตเทิร์น และแบ่ง)

Machine learning มีกี่ประเภท

1.Reinforcement learning -- เป็นการเรียนรู้แบบเพื่อให้สามารถตัดสินใจได้อย่างเป็นลำดับ เช่น

การการตัดสินใจของโรบอตในการเลือกการเดินหมากในหมากรุก -- การเรียนรู้แบบนี้ค่อนข้างซับซ้อน

และต้องใช้การคำนวนทางคณิตศาสตร์ค่อนข้างเยอะเพื่อนำมาคำนวนเพื่อสร้างเป็นโมเดลในการประกอบ

การตัดสินใจ

2.Supervised learning (more info for the model building at scikit-learn -- determine type of data
first, then pick the algorithm)

Classification -- category as the output
Regression -- continuous values as the output

Example of exercise

Unsupervised learning -- no target column, เป็นเทคนิคการเรียนรู้ที่สามารถหารูปแบบ (pattern)

ของข้อมูลได้ โดยที่ไม่ต้องมี guidance

Clustering -- looking for similarities

K-means -- เจาะไปที่จำนวน cluster ที่ต้องการจะแบ่ง
DBCAN -- เจาะไปที่อะไรเป็นเองค์ประกอบที่จะทำให้แบ่งเป็นคลัสเตอร์ได้

Association -- เป็นการหาความสัมพันธ์ระหว่างกันภายในกลุ่มของข้อมูล
Anomaly detection -- เป็นวิธีสำหรับการดีเทค outlier

Example of exercise

ความแตกต่างระหว่าง supervised กับ unsupervised ต่างกันที่ trained data
Supervised -- outcome จะมี label ในขณะที่ unsupervised outcome จะไม่มี label คือ ไม่ได้รับการ

assign ว่า คืออะไร จะเป็นการจัดกลุ่มโดยการหา similarity ภายในข้อมูล

Feature เป็นสิ่งที่เราใช้ในการทำนายผลลัพธ์ที่จะเกิดขึ้น -- สิ่งที่เราต้องใส่เข้าไปในการเทรน machine เพื่อให้เกิดการเรียรู้ คือ ข้อมูลใน features ต่าง ๆ และ labels ที่ได้มาจากผลลัพธ์ของ features -- machine learning จะจับ pattern เพื่อสร้างออกมาเป้นโมเดลในการทำนายข้อมูลใหม่ ๆ ที่ใส่เข้าไปว่าผลลัพธ์ที่ทำนายออกมานั้นจะเป็นอย่างไรบ้าง

ตัวอย่างแบบฝึกหัดที่ทำให้ง่ายต่อการทำความเข้าใจ

Machine learning workflow

วิเคราะห์ feature เพื่อที่จะนำมาเป็นข้อมูลในการสร้างโมเดล
แยกข้อมูล เพื่อที่จะนำส่วนหนึ่งเอามาสร้างเป็นโมเดล แล้วเอาอีกส่วนมาทดสอบ เพื่อประเมินว่าโมเดลที่สร้างนั้นเป็นโมเดลที่ดีไหม
สาเหตุที่เราต้องสปลิต เพราะว่าเราต้องการนำข้อมูลที่เรารู้อยู่แล้วมา evaluate model นั่นก็คือ test dataset
วิธีการประเมินโมเดล มีหลายวิธี ขึ้นอยู่กับว่าเราต้องการเซตค่าไว้อย่างไร ถ้าโมเดลของเราไม่เป็นที่ถูกใจ เราสามารถที่จะปรับ (tune model) เพื่อให้ได้ outcome ที่น่าจะดีขึ้นกว่าเดิม แต่ถ้ายังไม่ดีขึ้นกว่าเดิม อาจจะต้อง train ใหม่ แต่บางที อาจจะเป็นไปได้ว่า ข้อมูลที่เอามาเทรนนั้นไม่เพียงพอต่อการสร้างโมเดลที่จะสามารถทำนายได้อย่างถูกต้อง (หา pattern ไม่ได้)

Model evaluation

Overfitting

Predict ได้ดีแค่ train dataset แต่ model ที่ได้มาเมื่อเอาไปใช้จริงกลับทำนายได้แย่

เส้นสีเขียนเป็น model ที่เรียกได้ว่า overfit เพราะเราเอาเส้นเขียวนี้ (model) ไปใช้แยก unknow data อาจจะแยกไม่ได้เลย เพราะ model เหมาะสมแค่กับข้อมูลที่เราเอามาใช้ train ขึ้นมา -- แต่เส้นสีดำจะดีกว่าเพราะว่ามัน genralize กว่าทำให้สามารถนำมาทำนาย unknown data ได้ดีกว่า

Confusion matrix -- ใช้กับข้อมูลที่มีลักษณะแบบ classification

Improving performance มีอยู่หลายทางเลือก เช่น

Dimensionality reduction -- คือ การลดจำนวน features ที่อยู่ใน data เพราะว่า การมีจำนวน feature เยอะ ๆ ไม่ได้หมายความว่าจะช่วยทำให้โมเดลทำนายได้อย่างถูกต้อง เพราะบางฟีเจอร์ ไม่ได้มีความเกี่ยวข้องอะไรเลย ต้องตัดออก

Hyperparameter tuning -- มีวิธีการ tune โดยดูจากลักษณะของข้อมูล วิธีนี้เขาให้เปรียบเสมือนเสียงดนตรีของเครื่องมือแต่ละตัวที่อยู่ในวงออเคาตรา ที่แต่ละตัวจะให้เสียงแตกต่างกันไป ดังนั้นต้องจูนเสียงแต่ละตัวให้เหมาะสม เพื่อให้ภาพรวมของเสียงบรรเลงนั้น smooth hyperparameter tuning ก็ทำในลักษณะคล้ายกัน ซึ่งมีอยู่หลายวิธีขึ้นอยู่กับ algorithm ที่ใช้ในการสร้างโมเดล

Ensemble methods -- เป็นการรวมหลาย ๆ โมเดล เพื่อที่จะหาโมเดลที่เหมาะสม

สำหรับข้อมูลที่เป็นแบบ classification ก็จะใช้วิธีการ voting เพื่อที่จะเลือก model ที่ดีที่สุด

สำหรับข้อมูลที่เป็นแบบ regression ก็จะใช้วิธีการหาค่าเฉลี่ยที่ได้จากแต่ละโมเดลออกมา

Example of improving performance

Deep learning

เป็นการใช้ algorithm ที่เรียกว่า neural networks เป็น algorithm หนึ่งใน machine learning
เหมาะสำหรับทำงานกับข้อมูลที่มีความซับซ้อน และมีจำนวนข้อมูลค่อนข้างเยอะ และไม่มี domain knowledge ไม่สามารถ identify features ได้

Flow for deep learning

Limit of machine learning

Garbage in and garbage out -- > the quality of data used as input is very important to build the model
Explainability -- for example, deep learning cannot inform how the machine analyzes the data which is important for explaining something

ขึ้นอยู่กับจุดประสงค์ว่า เราต้องการรู้กระบวนการทำงานของโมเดลหรือเปล่าในการตอบโจทย์ที่ได้มาจากการทำนาย

Machine learning มีหลาย algorithm แต่แต่ละตัวจะเอามาใช้กับ task แตกต่างกันไป ถ้าข้อมูลไม่ complex มากก็จะไม่จำเป็นต้องใช้กับ neural network

Lecture from Machine learning youtube (2013)

Let the machine find (extract) the feature or pattern of data after we collect the data for the machine
Just like we collect the face data and feed this information to the machine to train it to learn what kind of image should be categorized as a face.
Or feeding in pedestrian image which can then be used for automatic car

Gaussian processes;

Application

Tracking object like airplane

Naive bayes

Use a small amount of data to reduce the bias?

Genome editing technology short note

- Saturday, August 13, 2016

Search This Blog

Random Records

Machine Learning for Everyone (recommend blog - easy to read and understand)

Comments

Post a Comment

Most viewed blogs

Useful links (updated: 2026-06-28)

Genome editing technology short note

Umbrella vs Basket Trial