Our goals for the homework assignment this week are:
- Practice working with tokenization, so that you have a clear idea about how you might convert strings into sequences of numbers that you can provide to a neural net
- Try out some NLP tasks on real data.
- Review what we’ve done in the past.
This Week’s Fundamentals
There’s only one notebook this week, designed to help you practice with tokenizers. As usual, do your work in the corresponding notebook.
Try Out NLP Tasks
Try out two different pre-built Transformers models on some real data. For each model, write a brief summary (bullet point is fine) including:
- What you ran (see below for a list of options). Include the full URL or
pipeline()construction code - A specific example where it works well (copy and paste the input and output). Use public data as examples (e.g., Wikipedia, review sites, news articles, etc.)
- A specific example where it breaks (returns incorrect results).
- A brief reaction. You might discuss: is this a useful model? How easy was it to break it? How did its behavior compare with your expectations?
Where to find a model? The Hugging Face course discussed several different NLP tasks, many of which are bundled up into ready-to-run “pipelines”. You might run that code on Colab or a lab machine. In addition to the out-of-the-box pipelines, you can find tasks and models on the Models page, or look through the list of demos on Spaces.
Review Prior Weeks
Many students have mentioned wanting to review material from prior weeks. This exercise might help:
- Pick two previous learning objectives (see the list under each unit).
- For each one, write a quiz question, and a correct answer about that topic. (You may need to review prior material to do this.)
- Optional but encouraged: ask another student in the class your question. Compare your answers.
- Aim for one question about concepts/math and one question about implementation/coding.
- Aim for at least one question being on something that you weren’t actually solid on.
I recommend chatting in office hours about this, and sharing questions/answers before the due date.
Submitting
I recommend working on the other answers outside of Moodle (e.g., in Word or Google Docs), then copy-pasting in.
In the Moodle assignment for this Homework:
- Attach the
ipynbfiles. - Copy and paste your responses to the Analysis questions into the text box.
- Copy and paste your Try Out and Review responses here too, separated by headings so we know which part is which.