• Home
  • 2020 Datasets

2020 Qualitative Datasets

DATASET 1: ENGLISH ONLINE DISCUSSIONS THAT MENTION “CORONAVIRUS” OR “CORONA VIRUS” OR “COVID” (BY WEBHOSE.IO)
Link: https://webhose.io/free-datasets/online-message-boards-that-mention-corona-virus/
Format: JSON | Size: 2.5GB | Crawled: Dec, 2019 – Mar, 2020
Access: Free, but you have to create a profile on webhose.io
Main variables: Social media shares and likes; Site name; Site section; Section title; Country; Entities; Participants count; Replies count; Spam score; Performance score; Text; External links

DATASET 2: ENGLISH BLOG POSTS THAT MENTION “CORONAVIRUS” OR “CORONA VIRUS” OR “COVID” (BY WEBHOSE.IO)
Link: https://webhose.io/free-datasets/blog-posts-that-mention-corona-virus/
Format: JSON | Size: 6GB | Crawled: Dec, 2019 – Mar, 2020
Access: Free, but you have to create a profile on webhose.io
Main variables: Social media shares and likes; Site name; Site section; Section title; Country; Entities; Participants count; Replies count; Spam score; Performance score; Text; External links

DATASET 3: ENGLISH NEWS ARTICLES THAT MENTION “CORONA VIRUS” OR “CORONAVIRUS” OR “COVID” (BY WEBHOSE.IO)
Link: https://webhose.io/free-datasets/news-articles-that-mention-corona-virus/
Format: JSON | Size: 13.7GB | Crawled: Dec, 2019 – Mar, 2020
Access: Free, but you have to create a profile on webhose.io
Main variables: Social media shares and likes; Site name; Site section; Section title; Country; Entities; Participants count; Replies count; Spam score; Performance score; Text; External links

DATASET 4: COVID-19 TWEETS DATASET (BY RABINDRA LAMSAL)
Link: https://ieee-dataport.org/open-access/corona-virus-covid-19-tweets-dataset
Format: CSV | Crawled: Mar, 2020 – present
Access: Free
Main variables: Tweet ids; sentiment score (positive, negative, neutral)
IMPORTANT NOTE: This .csv dataset contains tweet IDs. To obtain the text of the tweets and other identifying information, you will need to download and use a hydrator to retrieve the information. You can find one example of a hydrator here.

DATASET 5: AUTOMATED TWITTER ACCOUNTS (BY KUNGFU.AI)
Link: https://data.world/kungfuaiteam/coronavirus-twitter-analysis
Format: XLSX | Size: 1.02MB | Crawled: Feb 2-15 & 23-27; Mar 8-11
Access: Free, but you have to create a profile on data.world
Example variables: Rank; Influence score; Date of max influence; Summary tweet; Youtube videos referenced

DATASET 6: NEWS MEDIA AND GOVERNMENT/INTERNATIONAL ORGANIZATION TWEETS (BY JINGYUAN YU)
Link: https://github.com/narcisoyu/Institional-and-news-media-tweet-dataset-for-COVID-19-social-science-research
Format: txt | Crawled: Mar 21-present
Access: Free
Example variables: Created_at; Hashtags; In_reply_to; Tweet id; User_screen_name;
IMPORTANT NOTE: This .csv dataset contains tweet IDs. To obtain the text of the tweets and other identifying information, you will need to download and use a hydrator to retrieve the information. You can find one example of a hydrator here.

Metadata

INTERNATIONAL DATASETS INCLUDING TRACKING, CASE COUNTS, GOVERNMENT MEASURES, EDUCATION, AND MORE
Link: https://data.world/datasets/covid-19
Access: Free, but you have to create a profile on data.world

US SPECIFIC COVID TRACKING PROJECT
Link: https://covidtracking.com/data
Format: JSON, CSV | Size: 6GB | Crawled: Dec, 2019 – Mar, 2020
Access: Free
Example variables: Positive; Negative; totalTestResults; Hospitalized; Death; Pending; State;

INTERNATIONAL COVID-19 CONTAINMENT AND MITIGATION MEASURES
Link: http://epidemicforecasting.org/containment
Format: CSV | Crawled: Dec, 2019 – Mar 30, 2020
Access: Free
Example variables: ID column; Intensity of targeted symptomatic isolation measures; Intensity of non-targeted symptomatic isolation measures; Intensity of isolation of confirmed case contacts; Intensity of blanket isolation measures (curfews and lockdowns); Domestic travel restriction; International travel restriction; Number of tests; Contact tracing; Mask wearing; Hand washing; Gatherings banned; Intensity of measures in place to isolate confirmed cases in the healthcare system; Public education and incentives; Assisting people to stay home; Intensity of public hygiene measures; Miscellaneous hygiene measures; Public interaction reduction; Intensity of business shutdowns; School closure; Activity cancellation; Intensity of activity resumptions reported; Indicator for tightening of diagnostic criteria; Indicator for broadening of diagnostic criteria; Approximate coverage of testing criteria; Date on which measure came into force; Country; Confirmed Cases; Deaths

Additional Datasets and Resources

Papers on H1N1 Pandemic (For Coding/Analysis Inspiration)

  1. Chew, C., & Eysenbach, G. (2010). Pandemics in the age of Twitter: content analysis of Tweets during the 2009 H1N1 outbreak. PloS one, 5(11).

  2. Jung Oh, H., Hove, T., Paek, H. J., Lee, B., Lee, H., & Kyu Song, S. (2012). Attention cycles and the H1N1 pandemic: A cross-national study of US and Korean newspaper coverage. Asian Journal of Communication, 22(2), 214-232.

  3. Lee, S. T., & Basnyat, I. (2013). From press release to news: mapping the framing of the 2009 H1N1 A influenza pandemic. Health Communication, 28(2), 119-132.

  4. Signorini, A., Segre, A. M., & Polgreen, P. M. (2011). The use of Twitter to track levels of disease activity and public concern in the US during the influenza A H1N1 pandemic. PloS one, 6(5).

  5. Vasterman, P. L., & Ruigrok, N. (2013). Pandemic alarm in the Dutch media: Media coverage of the 2009 influenza A (H1N1) pandemic and the role of the expert sources. European Journal of Communication, 28(4), 436-453.