2022 Qualitative Datasets
Dataset #1: Supreme Court nominee confirmation hearing transcripts from 1971 to 2018Link: https://www.rstreet.org/2019/04/04/supreme-court-confirmation-hearing-transcripts-as-data/
Format: CSV
Access: Free, available from the R Street Institute website
Main variables: Year; Hearing (session); Statement (utterance); Speaker (name/title/party)
Dataset #2: CNN news articles from 2011 to 2022
Link: https://www.kaggle.com/datasets/hadasu92/cnn-articles-after-basic-cleaning
Format: CSV
Access: Free, but you will need to create a free Kaggle account to download
Main variables: Author; Date published; Category; Section; URL; Headline; Key words; Article text
Link: https://www.kaggle.com/datasets/hadasu92/cnn-articles-after-basic-cleaning
Format: CSV
Access: Free, but you will need to create a free Kaggle account to download
Main variables: Author; Date published; Category; Section; URL; Headline; Key words; Article text
Dataset #3: Harry Potter and the Philosopher’s Stone
Link: https://docs.google.com/spreadsheets/d/1aPQkohYK8h29M4_0YuEch8cZj1s0uGUf/edit?usp=sharing&ouid=101384278014467918956&rtpof=true&sd=true
Format: CSV
Main variables: Name (speaker); Scene; Text
Link: https://docs.google.com/spreadsheets/d/1aPQkohYK8h29M4_0YuEch8cZj1s0uGUf/edit?usp=sharing&ouid=101384278014467918956&rtpof=true&sd=true
Format: CSV
Main variables: Name (speaker); Scene; Text
Dataset #4: Star Wars Episodes 4, 5, 6
Link: https://docs.google.com/spreadsheets/d/1mJaBkcAi-NNTvrcw01q1FqCZwKZfFmpv/edit?usp=sharing&ouid=101384278014467918956&rtpof=true&sd=true
Format: CSV
Main variables: Episode; Speaker; Dialogue (text)
Link: https://docs.google.com/spreadsheets/d/1mJaBkcAi-NNTvrcw01q1FqCZwKZfFmpv/edit?usp=sharing&ouid=101384278014467918956&rtpof=true&sd=true
Format: CSV
Main variables: Episode; Speaker; Dialogue (text)
Dataset #5: Interviews of 7 Ukrainian refugees
Link: https://www.politico.com/news/magazine/2022/04/03/7-ukrainian-refugees-escaping-russias-war-00022175
Format: Text (Politico magazine article)
Access: Free, available on the Politico website
Main variables: Name; Age; Occupation; Hometown; Current location; Interview text (questions/responses)
Link: https://www.politico.com/news/magazine/2022/04/03/7-ukrainian-refugees-escaping-russias-war-00022175
Format: Text (Politico magazine article)
Access: Free, available on the Politico website
Main variables: Name; Age; Occupation; Hometown; Current location; Interview text (questions/responses)
Dataset #6: Multimodal data on young adults’ experiences of loneliness (Interviews + free association task data)
Link to article: https://www.frontiersin.org/articles/10.3389/fpsyg.2021.660791/full
Link to dataset: https://rdr.ucl.ac.uk/articles/dataset/Qualitative_and_output_data_on_loneliness_among_young_adults/17212991
Format: Interviews (DOCX); Free association task image (JPEG)
Access: Free, available from UCL research repository
Main variables: Participant ID; Gender; Age; Hometown; Speaker; Utterance
Link to article: https://www.frontiersin.org/articles/10.3389/fpsyg.2021.660791/full
Link to dataset: https://rdr.ucl.ac.uk/articles/dataset/Qualitative_and_output_data_on_loneliness_among_young_adults/17212991
Format: Interviews (DOCX); Free association task image (JPEG)
Access: Free, available from UCL research repository
Main variables: Participant ID; Gender; Age; Hometown; Speaker; Utterance
Dataset #7: Video interviews of juvenile inmates
Link: https://www.youtube.com/playlist?list=PLsJsTF6yJel3Au9pnYG01keTeVxDd_xZJ
Format: Videos (96 interviews)
Main variables: Free, available on the Calamari Productions YouTube channel
Link: https://www.youtube.com/playlist?list=PLsJsTF6yJel3Au9pnYG01keTeVxDd_xZJ
Format: Videos (96 interviews)
Main variables: Free, available on the Calamari Productions YouTube channel
Other Datasets Available on the ISQE Website
- COVID-related datasets from the Data Challenges in 2020 & 2021
- Sample coded datasets