• Home
  • 2023 Datasets

2023 Qualitative Datasets

Dataset #1: BTS English Translations

Link: Ask Jonathon

Format: CSV

Access: Free; please cite Jonathon Sun

Main variables: Album, Song, Lyrics


Dataset #2: Chat GPT Sentiment Tweets

Link: https://www.kaggle.com/datasets/charunisa/chatgpt-sentiment-analysis

Format: CSV

Access: Public Domain

Main variables: ID, Tweets, Labels


Dataset #3: Top 10 Songs from Billboard 2022

Link: Ask Jonathon

Format: CSV

Access: Free; please cite Jonathon Sun

Main variables: Line, Section, Song name, Artists Name


Dataset #4: Australian Legal Cases from the Federal Court of Australia (2006 - 2009)

Link: https://www.kaggle.com/datasets/shivamb/legal-citation-text-classification?select=legal_text_classification.csv

Format: CSV

Access: Public Domain

Main variables: Case ID, case outcomes, case title, case text


Dataset #5: Disney+ Movies and TV shows

Link: https://www.kaggle.com/datasets/shivamb/disney-movies-and-tv-shows

Format: CSV

Access: Public Domain

Main variables: showID, type, title, director, cast, country, date added, release year, rating, duration, listed in, description


Dataset #6: Netflix Movies and TV Shows

Link: https://www.kaggle.com/datasets/shivamb/netflix-shows?select=netflix_titles.csv

Format: CSV

Access: Public Domain

Main variables: showID, type, title, director, cast, country, date added, release year, rating, duration, listed in, description


Dataset #7: Avatar: The Last Airbender

Link: https://www.kaggle.com/datasets/ekrembayar/avatar-the-last-air-bender

Format: CSV

Access: Kaggle

Main variables: #, ID, Book, Book number, chapter, chapter number, character, full text, character words, writer, director, imdb rating


Dataset #8: Tweets with the hashtag #ChatGPT

Link: https://www.kaggle.com/datasets/konradb/chatgpt-the-tweets

Format: CSV

Access: Public Domain

Main variables: username, text, user location, user description, user created, user followers, user friends, user favorites, user verified, date, hashtags, source


Dataset #9: Turkey Earthquake Tweets

Link: https://www.kaggle.com/datasets/gpreda/turkey-earthquake-tweets

Format: CSV

Access: Public Domain

Main variables: ID, username, user location, user description, user created, user followers, user friends, user favorites, user verified, date, text, hashtags, source, retweets, favorites, is retweet


Dataset #10: BBC News

Link: https://www.kaggle.com/datasets/gpreda/bbc-news

Format: CSV

Access: Public Domain

Main variables: Title, pubDate, guid, link, description