Note: Public discourse and sentiment during the COVID 19 pandemic: Using Latent Dirichlet Allocation for topic modeling on Twitter
Note: Public discourse and sentiment during the COVID 19 pandemic: Using Latent Dirichlet Allocation for topic modeling on Twitter
doi: 10.1371/journal.pone.0239441
Aim:
Twitter uses’ discourse + physiological reactions to COVID19 (output based on word)
Method
Using machine learning to analyse
1.9 million
Tweets (written in English) - related to Coronavirus
Collection 23 Jan 2020 - 7 Mar 2020
11 salient topics แยกออกมาเป็น 10 กลุ่ม
“updates about confirmed cases,”
“COVID-19 related death,”
“cases outside China (worldwide),”
“COVID-19 outbreak in South Korea,”
“early signs of the outbreak in New York,”
“Diamond Princess cruise,”
“economic impact,”
“Preventive measures,”
“Authorities,”
“supply chain
Results
Not reveal treatments and symptoms related messages
Sentiment analysis
fear for nature of coronavirus
implications and limitation -- discussed in maintext
Machine learning approach; (no code deposit)
Latent topics relates to COVID19 identified from tweet
Theme of these identified topics
Emotionally react to COVID19 pandemics
Sentimental changes over time
Methods
Observational study decide
Purposive sampling approach contain defined hashtags related to covid19 on twitter
Using natural language processing methods
Finds salient topics and terms related to covid19
Overall workflow
Data collections
Data cleaning
Data analysis/1 tweet
Unsupervised ML เพื่อที่จะจัดกลุ่มทวิตเตอร์
Quantitative methods
Sentiment analysis
Sample collections
19 trends hashtags
#Coronaoutbreak, #CoronavirusChina,#Wuhan, #Coronavirus, #ChinaCoronavirus,
#Wuhan #WuhanCoronavirus,#Wuhanoutbreak, #ChinaVirus, #2019nCoV,
#ChineseDon'tComeToJapan,#NoSoyUnVirus, #IamNotVirus,#JeNeSuisPasUnVirus, #Xenophobia,#PrayForChina, #DrLiWenLiang,#ItWillGetBetter, #BeStrongChina
Collect tweets published
23 Jan 2020 - 7 March 2020 ~ 1.5 months
Data cleaning
Remove non-english
Duplicates and retweets
Feature extraction -- do it per tweet
(1) each message-level tweets (full text);
(2) function features of
(a) hashtags;
(b) the number of favorites;
(c) the number of followers;
(d) the number of friends;
(e) number of retweets;
(f) user location; and
(g) user description.
Ethical concern from twitter
Grant for academic uses
Pre-processing
Using python
Remove hashtag symbol and its content
Remove all non-English characters bc focusing on analysis of msg in english
Remove repeat word ex. Sooo to so
Remove special characters, punctuations, and numbers from dataset
Data analysis
Using unsupervised learning
Topic modelling -- การหา topic จาก LDA
Using qualitative approach to analyse those unsupervised data
Labeling popular words
Labelling tweet topics
Assigning meanings and themes to topics
Interpreting themes and pattern
Sentiment analysis
Analysing sentiment, emotion and attitude
Pairwise emotion into 8 themes (Plutchik’s wheel)
Results
Descriptive results after pre-processsing
Total 19 hashtag but picking top 9 hashtag
Fig 2. The number of Tweets under the top 9 hashtags by dates.
LDA approach (เข้าใจว่า input คือ ประโยคของตัว twitter แล้วปล่อย ให้ LDA หา co-occured words)
Generate co-occurred words and organised into different topics
Calculate appropriate number of topics based on coherence model-gensim
Pick 11 topics according to hughes coherence score
Fig 3. Coherence score for the number of topics. -- เลือกมา 11 หัวข้อ ตามคะแนนที่ให้ไว้สูงสุด
Sentiment analysis
Contain information about ppl’s thoughts and emotion
Fear is prominent
Fear is found in all 11 topics
Running the stat to compare the emotion across various topics
Discussion and conclusion
Good point for twitter to see the social
Fast and real-time compare with the traditional one
Traditional one -- take time, afford (interviews, done only small-scale)
Usefulness
Using sentimental analysis to guide targeted intervention program
Could be divided into three phases
Early recognition of COVID19
Discussions of COVID19 symptom
Fear is the prominent emotion in all topics
Limitation of this study
Only sample a “trending 19 hashtags” -- trend may change overtime
Twitter users -- not represent the whole population (only digital literacy based population)
Non-English is removed
Supplementary
Total hashtags
Comments
Post a Comment