Your instructor may assign one or more of the following problems. Don’t feel that you must do all the problems; for the homeworks, you are only required to do those that are explicitly assigned via Moodle.

  1. Biologists use a sequence of letters A,C,T and G to model a genome. A gene is a substring of a genome that starts after a triplet ATG and ends before a triplet TAG, TAA or TGA. Furthermore, the length of a gene string is a multiple of 3 and the gene does not contain any of the triplets ATG, TAG, TAA and TGA. Write a program called find_genes.py that prompts the user to enter a genome and displays all genes in the genome. If no gene is found in the input sequence, the program displays no gene is found. Below are a couple sample runs:
    	==================================================================
    	Enter a genome string: TTATGTTTTAAGGATGGGGCGTTAGTT
    	Gene 1: TTT
    	Gene 2: GGGCGT
    	==================================================================
    	Enter a genome string: TGTGTGTATAT
    	no gene is found
    	==================================================================
    	
  2. Write a function that assigns the appropriate part of speech (POS) to each word in a sentence. It should receive a sentence like “John kicked the dog.”, and return the list of POS tags: ['n', 'v', 'd', 'n'], indicating that John is a noun, kicked is a verb, the is a determiner and dog is another noun. Include “unknown” for words that don’t match any known words. Helpful functions for this include string.split(), which creates a list of strings corresponding to the words in the sentence, and string.endswith(subString), which checks to see if a string ends with a given sub-string.

    The sentence could contain different forms of the words. For example, the nouns could include “dog” and “dogs” and the verbs could include “kick”, “kicks”, “kicked”. Thus, your system should stem each work, that is, remove known suffixes in order to find the unadorned stem of the word. Ignore all punctuation in the input sentence.

    Your system should support the following words:

    It should also support the following suffixes: -s, -es, -ed.

    To make this problem easier to solve, you may make the following assumptions:

  3. Write a function that returns the longest common prefix of two strings. This function should receive two strings and return a string. If the strings have no common prefix, the function should return an empty string. Put this function in a program called find_prefix.py that prompts the user to enter two strings and then displays their common prefix.

Checking In

Submit all appropriate files for grading, including code files, screen captures, supplemental files (e.g., image files), and text files. We will grade this exercise according to the following criteria: