Project

This course will culminate in a semester project. Successful projects will demonstrate that you can:

Projects should generally be done in pairs, though allowances may be made on request.

Successful projects will also demonstrate mastery of various other skills, but the specific skills will vary between people and projects. Some options include:

You are encouraged to try to demonstrate mastery of several of these topics even before the final project submission. Please either write a note or arrange a brief meeting.

One measure of a successful project would be one that has a path towards commercialization. We have connections to funding for students interested in pursuing this.

See the rubric in Moodle for more specifics on the areas of evaluation.

Initiatives

As an experiment this year, we’ll group projects by initiatives of related ideas. For the first milestone, simply choose which initiative you’d like to be a part of, if any. Projects in the same initiative can share ideas, resources, and debugging, enabling each individual project to be more ambitious but less risky.

Milestones and Deliverables

Proposal

First, read the Project Guidelines for a description of what sort of projects are expected in this class.

Then, think of two or three potential project ideas. (Note that there are three types of projects; you might perhaps try to think of an idea for each type.)

For each idea, write a paragraph (or informative bullet points) to address:

Mention whom you might work with (the ideal team size is probably 2, but 1 or 3 is ok too). It may be different for different projects. Submit individually, though.

Updated Proposal

Submit (as a Jupyter notebook, proposal.ipynb), an enhanced version of your vision statement. It should include:

  1. Who is working on this project. (One person submits the document, other teammates just submit a note about who submitted the document.) Describe how you plan to work together so that everyone feels ownership of the result.

  2. Very preliminary drafts of all of the sections of your final report (leave clearly marked placeholders as necessary):

    • Vision: Overview of your project and its purpose. What are you trying to do? Why is it important or interesting? What does a successful project outcome look like?
    • Background:
      • What data are you using? Describe what you chose and why. Include a “backup” dataset in case the primary one doesn’t work out (or give specific evidence for your confidence in the primary dataset).
      • What technologies are you using? Briefly describe a few options you’re considering and what criteria you’ll use to evaluate them.
      • Your final report will describe the technologies you’re using and why you chose to use them. Include citations of the work on which you’ve based your system, both what we’ve used in class and new technologies you’ve experimented with (include descriptions of these if applicable).
    • Implementation:
      • What prior code can you build on?
      • Your final report will summarize your implementation and, if appropriate, how it extends the work you’ve reverenced.
    • Results: Include quantitative (tables, plots) and qualitative (examples) results, including comparisons with similar work if applicable.
    • Implications: Discuss the social and ethical implications of using the technologies you’ve chosen for your project.
  3. A description of what concrete steps you’ve taken towards the project, typically trying out an example of some related system. Some concrete step is expected; it could be “I tried out this example notebook (URL). It worked on Colab but failed on the lab machines …”

  4. What help you think you’ll need from the course staff? (If this is substantial, follow up in person or on Teams).

Walkthrough
Presentations

The final course meeting (during the designated final exam period) will be devoted to final project presentations. Feedback on others’ projects is expected, so attendance is mandatory.

Presentations should communicate the key points (not every detail) of your project, such as:

Slides are not strictly required (you could talk as you scroll through a notebook) but are probably helpful. Aim for 5-10 minutes of content. All team members should participate.

Final Deliverables

By the end of the day of final presentations, submit the following:

The following sections provide additional detail about each component.

Technical Report

The report should be at the level of polish and formality of a blog post (more than a class homework assignment, less than an academic paper). Precise technical language should be used in descriptions of methods.

Here are some elements that would generally be expected in a report. Not all reports need to have all elements, and reports may include other elements. Reports should generally include:

Artistic or exploratory projects may need other elements.

Reflection

Write, individually, about a page on:

  1. What was your role or contribution to the project (if it was a team project)? Look at some examples of Author contributions statements, such as this one.
  2. How you would describe the project in a technical job interview.
  3. A summary of the main things you learned from the process of doing the project.
  4. Superlatives: most fun part? most proud of part? frustrating? surprising? interesting? challenging? rewarding? most useful part of the course for your project?
  5. Wishes: what would you do differently next time? advice for someone else doing a similar project? material you wish you had learned in the course?

At the end of your report, include a brief summary of how the project demonstrates mastery of various components.

Supporting Material

Submit code needed to replicate the visual and quantitative results in your report.

Picking a Project

Several types of projects are permissible, with different criteria for success.

Application

You could apply an ML technique (from class or otherwise) to some real-world problem. For example, you might train an image classifier on a new set of images, or a text classification model on a new domain.

Replication

One way you could extend a replication project is to add constraints: limited compute (e.g., lab computers, your laptop, Raspberry Pi), limited data (a small subset of the original dataset), limited model size (fits in xx MB), etc.

One example I’d really like to see: Train the best language model you can on our lab computers (or your laptop).

Expectations for Replication Projects

Choosing a Replication Project

If you’re choosing a replication project, ask yourself:

  1. Is there some specific write-up, with quantitative results clearly reported, that I can use to anchor the project?
  2. Can I easily access the same data that the original authors used? (Does it fit on computing hardware I can easily access?)
  3. Do I understand the basic approach? Maybe there’s fancy stuff too, but you should be able to think of how you’d implement a simple version of it.

Expository Notebooks (“Notebookify”)

One strategy to take when starting with an existing code is to “Notebookify” it. Most notebooks you’ll find are demo notebooks, designed to show off the best results but hide a lot of details behind opaque code chunks or external libraries. In contrast, an expository notebook walks the reader through what’s going on.

The code part of such a project is relatively straightforward: find a demo notebook, step through it, pull in the contents of the “do-all-the-stuff” functions (test that it still works), split things up into individual cells (test that it still works), and show intermediate results and shapes. But you’ll also write up descriptions of what’s happening.

You will almost certainly want to refer to a paper by the original authors. It’ll usually explain the names of variables and methods, and it’ll show what parameters and data are likely to work well.

If the original has big loops, flatten them. For example, show one example of how the data is prepared, run one minibatch of the model training, show how the evaluation scores are computed for one datapoint.

Simplify the code as needed. e.g., if there are ifs to do different things depending on configuration, remove the code that isn’t actually run in your case.

Most importantly, explain what is going on. Start with an intro about the overall goal of the approach you’re demoing, and the basic outline of what the process looks like. Then dive in. End with a conclusion summarizing the main points that you highlighted about what’s going on. Perhaps end with some questions and future directions: what decisions did the original authors make that aren’t clear to you? What ideas might you have for doing something differently?

How to replicate without duplicating

One strategy: the Benjamin Franklin replication. Here’s how I adapt it to code:

  1. Read the original. Take notes in a separate document. Make them mostly in human language or math; put code in your notes only sparingly.
  2. Close the original. Try to write a replication based on your notes.
  3. Fail at some point because your notes aren’t detailed enough. So close your replication and open the original again, and return to step 1.

Tips for Replication Projects

Basic outline of a project here:

Ideas of what to replicate

See https://paperswithcode.com/ for some examples. Their newsletter is particularly approachable.

Also, see proceedings of general conferences like NeurIPS, ICML, ICLR, …, or domain-focused conferences: text (EMNLP, ACL), speech and music (ISMIR, InterSpeech), computer vision (ICCV, SIGGRAPH), recommender systems (RecSys), etc.

Some potential papers to replicate

A very incomplete list of things that crossed my radar once.

Some potential libraries or codebases:

Exploration Project

Probably you’ll do this as part of one of the initiatives; see above.

General Advice

Technically: keep it simple. A thoughtful analysis of a technically simple thing is much better than a hasty analysis of a technically fancy thing.

See the Resources page here, especially Tools.

Contents