Note: Public discourse and sentiment during the COVID 19 pandemic: Using Latent Dirichlet Allocation for topic modeling on Twitter

Note: Public discourse and sentiment during the COVID 19 pandemic: Using Latent Dirichlet Allocation for topic modeling on Twitter

doi: 10.1371/journal.pone.0239441


Aim: 

  • Twitter uses’ discourse + physiological reactions to COVID19 (output based on word)

Method

  • Using machine learning to analyse

    • 1.9 million

    • Tweets (written in English) - related to Coronavirus

    • Collection 23 Jan 2020 - 7 Mar 2020

    • 11 salient topics แยกออกมาเป็น 10 กลุ่ม

  1. “updates about confirmed cases,” 

  2. “COVID-19 related death,” 

  3. “cases outside China (worldwide),” 

  4. “COVID-19 outbreak in South Korea,” 

  5. “early signs of the outbreak in New York,” 

  6. “Diamond Princess cruise,” 

  7. “economic impact,”

  8. “Preventive measures,”

  9. “Authorities,”

  10. “supply chain

Results

  • Not reveal treatments and symptoms related messages

  • Sentiment analysis

    • fear for nature of coronavirus

    • implications and limitation -- discussed in maintext


Machine learning approach; (no code deposit)

  • Latent topics relates to COVID19 identified from tweet

  • Theme of these identified topics

  • Emotionally react to COVID19 pandemics

  • Sentimental changes over time


Methods

  • Observational study decide

  • Purposive sampling approach contain defined hashtags related to covid19 on twitter

  • Using natural language processing methods

    • Finds salient topics and terms related to covid19


Overall workflow






















  • Data collections

  • Data cleaning

  • Data analysis/1 tweet

    • Unsupervised ML เพื่อที่จะจัดกลุ่มทวิตเตอร์

    • Quantitative methods

    • Sentiment analysis

Sample collections

19 trends hashtags

#Coronaoutbreak, #CoronavirusChina,#Wuhan, #Coronavirus, #ChinaCoronavirus,

#Wuhan #WuhanCoronavirus,#Wuhanoutbreak, #ChinaVirus, #2019nCoV,

#ChineseDon'tComeToJapan,#NoSoyUnVirus, #IamNotVirus,#JeNeSuisPasUnVirus, #Xenophobia,#PrayForChina, #DrLiWenLiang,#ItWillGetBetter, #BeStrongChina


  • Collect tweets published

    • 23 Jan 2020 - 7 March 2020 ~ 1.5 months

  • Data cleaning

    • Remove non-english

    • Duplicates and retweets

  • Feature extraction -- do it per tweet

    • (1) each message-level tweets (full text); 

    • (2) function features of

(a) hashtags; 

(b) the number of favorites; 

(c) the number of followers; 

(d) the number of friends; 

(e) number of retweets; 

(f) user location; and 

(g) user description.

  • Pre-processing

    • Using python

    • Remove hashtag symbol and its content

    • Remove all non-English characters bc focusing on analysis of msg in english

    • Remove repeat word ex. Sooo to so

    • Remove special characters, punctuations, and numbers from dataset

  • Data analysis

    • Using unsupervised learning

    • Using qualitative approach to analyse those unsupervised data

      • Labeling popular words

      • Labelling tweet topics

      • Assigning meanings and themes to topics

      • Interpreting themes and pattern

    • Sentiment analysis

      • Analysing sentiment, emotion and attitude

      • Pairwise emotion into 8 themes (Plutchik’s wheel)


Results

  • Descriptive results after pre-processsing

  • Total 19 hashtag but picking top 9 hashtag












Fig 2. The number of Tweets under the top 9 hashtags by dates.


  • LDA approach (เข้าใจว่า input คือ ประโยคของตัว twitter แล้วปล่อย ให้ LDA หา co-occured words)

    • Generate co-occurred words and organised into different topics

    • Calculate appropriate number of topics based on coherence model-gensim

    • Pick 11 topics according to hughes coherence score






































Fig 3. Coherence score for the number of topics. -- เลือกมา 11 หัวข้อ ตามคะแนนที่ให้ไว้สูงสุด





Sentiment analysis

  • Contain information about ppl’s thoughts and emotion

  • Fear is prominent

  • Fear is found in all 11 topics

  • Running the stat to compare the emotion across various topics






















Discussion and conclusion

  • Good point for twitter to see the social

    • Fast and real-time compare with the traditional one

    • Traditional one -- take time, afford (interviews, done only small-scale)

  • Usefulness

    • Using sentimental analysis to guide targeted intervention program

    • Could be divided into three phases

      • Early recognition of COVID19

      • Discussions of COVID19 symptom

      • Fear is the prominent emotion in all topics

  • Limitation of this study

    • Only sample a “trending 19 hashtags” -- trend may change overtime

    • Twitter users -- not represent the whole population (only digital literacy based population)

    • Non-English is removed


Supplementary









Total hashtags




Comments

Popular posts from this blog

Useful links (updated: 2024-10-23)

Odd ratio - อัตราส่วนของความต่าง

Note: A Road to Real World Impact (new MU-President and Team) - update 12 Sep 2024