In this homework, we scrape some textual news data and do a sentiment analysis of that data. Start by creating a new RMarkdown document named hw6-news.Rmd using the standard homework format.

The purpose of this analysis is to see if this sentiment of OpEd pieces on the conservative and the liberal sides matches the expected positions. E.g., for the January 2020 Supreme Court Vaccine ruling, we would expect conservative articles to be more positive than liberal articles.

library(tidyverse)
library(robotstxt)
library(rvest)
library(tidytext)

Data

We’ll take our data from The FlipSide, a news analysis site that collects opinion pieces from respectable sources on both sides of various issues and publishes sample texts on one issue per day. These postings are maintained in the FlipSide Archives.

Start by pulling the raw HTML data from the posting on the Supreme Court on Vaccine Rules.

Now, find and extract the raw text values sampled from the two sides. Your goal in this step is to create a single dataframe with a column of words and a column indicating which side the word came from (left column or right column). Notes:

Analysis

Now, compare the sentiment on both sides. To do this, use the basic sentiment lexicon from the tidytext library to mark each word with the given sentiment and compute the ratio of negative to positive words on each side.

Run this analysis on at least two daily issues, preferable on issues that you believe will yield polarized views in both directions.

Conclusion

Present your results and include your conclusions on what you found.