### Objectives

Students who complete this lab will demonstrate that they can:

• Implement array processing

In this lab, you’ll work with basic array techniques in Java in the context of simple text analysis. You will turn in a single version of your program, but be sure to complete the exercises in the order given.

Exercise 8.1
1. Create an appropriate package for lab 8.
2. Create a class named `TextAnalysis`.

The first step in text analysis is to find and format an appropriate text. Your analysis will focus on the popular children’s book Green Eggs and Ham by Theo Geisel, aka Dr. Seuss, but the techniques can be applied to any text.

Exercise 8.2

Start by copying this definition of an array of strings: GreenEggsAndHam.java. Place this definition into your program, inside the class definition, at the very bottom, below your `main()` method.

Verify that your definition is usable by adding code to your `main()` method that prints out the total number of words in the array. Mark this test in your file with a comment that says "Exercise 8.2".

# Searching Arrays

Search is a common computational task.

Exercise 8.3

Adapt the linear search algorithm in the text to search through an array of words looking for a given target word. Your method should receive an array of strings and a target string from the calling program and return the index at which it finds the target word or -1 if it doesn't find the target word. Be sure to document your method appropriately.

To test that your search method is correct, try a couple test cases on the Green Eggs and Ham array and then create two or three other test arrays of words and try test cases on them as well. Be sure to test the boundary cases (e.g., a `null` list, empty list, list of one element). Mark the section of your main method that implements these tests as "Exercise 8.3".

You will use this method frequently in later exercises, so save it and be glad for all the testing you’ve done on it.

# Computing Statistics on Array Data

It can be very useful to compute statistics on the data stored in arrays.

Exercise 8.4

Write an algorithm for a method that receives an array of words and prints the length of the shortest and the longest word in that array. The method does not return anything. Put your algorithm in your code file as a multi-line comment. When you’re satisfied that your algorithm is correct, implement it and try it out. Leave your algorithm as your method documentation, and document your tests in the main method as "Exercise 8.4".

Languages tend to use certain utility words like "a" "an" and "the" over and over again. These are commonly called stop words. Search engines and other text analysis tools tend to ignore them because they tend to be periferal to the central meaning of the text.

Exercise 8.5

Start by copying this definition of an array of stop words: StopWords.java. Place this definition into your program, inside the class definition, at the very bottom, below your definition of the Green Eggs and Ham array.

Write an algorithm that counts the number of non-stop words in an array of words. Your method should receive an array of words and an array of stop words and should return the number of words in the first array that are not in the second array. You do not need to keep track of words that you’ve seen before; if a non-stop word occurs in the first array more than once, count it multiple times.

Make use of the search method you implemented in Exercise 8.3 for this algorithm. As in Exercise 8.4, put your algorithm in a multi-line comment in your code file. When you are confident in your algorithm, implement it and try it out. Leave your algorithm in place of the method documentation, and document your tests in the main method as "Exercise 8.5."

# Extra Credit Exercise

Green Eggs and Ham is famous for only using 50 words (51 if you count the proper name “Sam-i-am”). This statistic can be computed as well, as shown in this optional exercise.

Exercise 8.6 (Extra-credit)

Write a method that counts the number of unique words in the text. Consider all the words, including stop words. This problem requires that you keep track of the unique words you’ve seen so far. The following algorithm implements this behavior:

1. Receive an array of strings called `list`.
2. Set `count` = 0;
3. If `list` is not `null` then
1. Declare a new array of strings of the same length as `list`. Call it `uniqueWords` and initialize each of its elements to an empty string (i.e., `""`, not `null`).
2. Loop for all words in `list`
1. If the current word in `list` is not already in `uniqueWords` then
• Set `uniqueWords[count]` = the current word in `list`.
• Increment count by 1.
4. Return `count`.

• It uses a temporary array of unique words to keep track of what it has seen so far as it works through the array. Making this unique words array the same size as the original array ensures that it has enough spaces for all the words in the text, in the case where every word in the text is unique, but it probably uses more space than is necessary. We’ll look at alternate data structures that address this concern later in the course.
• It handles a `null` , empty or one-word list gracefully.

Implement this algorithm in a method and test it appropriately. Document your method in the standard way (not with the algorithm), and document your tests in the main method as "Exercise 8.6."

# Checking In

Submit all the code and supporting files for the exercises in this lab. We will grade this exercise according to the following criteria:

• Correctness:
• 20% - Search - The search method should function properly.
• 20% - Longest and Shortest - The method finding the longest and shortest words should function properly.
• 20% - Non-stop Count - The method counting the non-stop words should function properly.
• 10% - Boundary cases - All the methods should handle boundary cases properly.
• Efficiency:
• 20% - List Traversal - The methods should function efficiently with respect the number of times the lists are traversed (see, in particular, the longest-shortest counter).
• Understandability:
• 5% - Header Documentation - Document the code’s basic purpose, authors and assignment number.
• 5% - Code Documentation - Separate the logical blocks of your program with useful comments and white space.