Homework 6 - News Sentiment Analysis

In this homework, we scrape some textual news data and do a sentiment analysis of that data. Start by creating a new RMarkdown document named hw6-news.Rmd using the standard homework format.

The purpose of this analysis is to see if this sentiment of OpEd pieces on the conservative and the liberal sides matches the expected positions. E.g., for the January 2020 Supreme Court Vaccine ruling, we would expect conservative articles to be more positive than liberal articles.

Data

We’ll take our data from The FlipSide, a news analysis site that collects opinion pieces from respectable sources on both sides of various issues and publishes sample texts on one issue per day. These postings are maintained in the FlipSide Archives.

Start by pulling the raw HTML data from the posting on the Supreme Court on Vaccine Rules.

Now, find and extract the raw text values sampled from the two sides. Your goal in this step is to create a single dataframe with a column of words and a column indicating which side the word came from (left column or right column). Notes:

The left and right columns on the webpage do not always represent the liberal and conservative sides respectively. This doesn’t matter so much to us; we simply want to observe the difference in sentiment on the two sides.
The left and right columns have slightly different HTML structures.
Not all daily issues provide two separate columns, e.g., both sides had kind things to say about Sidney Poitier, so you should skip postings like that. Further, not all postings that have two columns are structured in exactly the same way. We suggest getting your code to work on the Supreme Court Ruling entry and then finding other entries that match that structure. There are several in the January archives.
It may be easier to create two dataframes, one for the left and one for the right, add the “side” column with the appropriate, hard-coded value (i.e., “left” or “right”), and then combine the two tables into one.

Analysis

Now, compare the sentiment on both sides. To do this, use the basic sentiment lexicon from the tidytext library to mark each word with the given sentiment and compute the ratio of negative to positive words on each side.

Run this analysis on at least two daily issues, preferable on issues that you believe will yield polarized views in both directions.

Homework 6 - News Sentiment Analysis

Data

Analysis

Conclusion