Objectives

Students who complete this lab will demonstrate that they can:

In this lab, you’ll work with basic array techniques in Java in the context of simple text analysis. You will turn in a single version of your program, but be sure to complete the exercises in the order given.

Exercise 8.1
  1. Create an appropriate package for lab 8.
  2. Create a class named TextAnalysis.

The first step in text analysis is to find and format an appropriate text. Your analysis will focus on the popular children’s book Green Eggs and Ham by Theo Geisel, aka Dr. Seuss, but the techniques can be applied to any text.

Exercise 8.2

Start by copying this definition of an array of strings: GreenEggsAndHam.java. Place this definition into your program, inside the class definition, at the very bottom, below your main() method.

Verify that your definition is usable by adding code to your main() method that prints out the total number of words in the array. Mark this test in your file with a comment that says "Exercise 8.2".

Searching Arrays

Search is a common computational task.

Exercise 8.3

Adapt the linear search algorithm in the text to search through an array of words looking for a given target word. Your method should receive an array of strings and a target string from the calling program and return the index at which it finds the target word or -1 if it doesn't find the target word. Be sure to document your method appropriately.

To test that your search method is correct, try a couple test cases on the Green Eggs and Ham array and then create two or three other test arrays of words and try test cases on them as well. Be sure to test the boundary cases (e.g., a null list, empty list, list of one element). Mark the section of your main method that implements these tests as "Exercise 8.3".

You will use this method frequently in later exercises, so save it and be glad for all the testing you’ve done on it.

Computing Statistics on Array Data

It can be very useful to compute statistics on the data stored in arrays.

Exercise 8.4

Write an algorithm for a method that receives an array of words and prints the length of the shortest and the longest word in that array. The method does not return anything. Put your algorithm in your code file as a multi-line comment. When you’re satisfied that your algorithm is correct, implement it and try it out. Leave your algorithm as your method documentation, and document your tests in the main method as "Exercise 8.4".

Languages tend to use certain utility words like "a" "an" and "the" over and over again. These are commonly called stop words. Search engines and other text analysis tools tend to ignore them because they tend to be periferal to the central meaning of the text.

Exercise 8.5

Start by copying this definition of an array of stop words: StopWords.java. Place this definition into your program, inside the class definition, at the very bottom, below your definition of the Green Eggs and Ham array.

Write an algorithm that counts the number of non-stop words in an array of words. Your method should receive an array of words and an array of stop words and should return the number of words in the first array that are not in the second array. You do not need to keep track of words that you’ve seen before; if a non-stop word occurs in the first array more than once, count it multiple times.

Make use of the search method you implemented in Exercise 8.3 for this algorithm. As in Exercise 8.4, put your algorithm in a multi-line comment in your code file. When you are confident in your algorithm, implement it and try it out. Leave your algorithm in place of the method documentation, and document your tests in the main method as "Exercise 8.5."

Extra Credit Exercise

Green Eggs and Ham is famous for only using 50 words (51 if you count the proper name “Sam-i-am”). This statistic can be computed as well, as shown in this optional exercise.

Exercise 8.6 (Extra-credit)

Write a method that counts the number of unique words in the text. Consider all the words, including stop words. This problem requires that you keep track of the unique words you’ve seen so far. The following algorithm implements this behavior:

  1. Receive an array of strings called list.
  2. Set count = 0;
  3. If list is not null then
    1. Declare a new array of strings of the same length as list. Call it uniqueWords and initialize each of its elements to an empty string (i.e., "", not null).
    2. Loop for all words in list
      1. If the current word in list is not already in uniqueWords then
        • Set uniqueWords[count] = the current word in list.
        • Increment count by 1.
  4. Return count.

Notice a couple things about this algorithm before implementing it:

Implement this algorithm in a method and test it appropriately. Document your method in the standard way (not with the algorithm), and document your tests in the main method as "Exercise 8.6."

Checking In

Submit all the code and supporting files for the exercises in this lab. We will grade this exercise according to the following criteria: