Research Projects

Warning: This content has not yet been fully revised for this year.

Professor Arnold is willing to advise one of the following research-y projects. Others may be permitted if you have a very clear proposal, discussed well in advance.

(You may notice some commonalities among these ideas. That’s intentional.)

Decompose and Recompose Complex Sentences using Simple Sentences

Writers sometimes like to write really long and complicated sentences because those are the first things that come to mind and it’s easy to just keep typing and get your ideas out there but it’s not really clear what you’re trying to say and you’re thinking while you’re writing so you end up with this big long train of thought that’s hard for people to follow and it would be really helpful to readers if the writer could split the big sentence apart into little sentences that are simpler but sometimes there are actually complicated things that the writer is trying to explain and the simple little sentences get hard to follow so we don’t necessary want to do this entirely automatically so it would be helpful to have the writer stay in control of this process. So:

Possible dataset: BiSECT Dataset | Papers With Code. There are also “sentence combination” exercises that language students do; there are probably some datasets from those.

Predictive Text from Very Rough Drafts (e.g., rambling speech)

Speech recognition technology is a powerful and efficient way to enter text on a touchscreen device, but many people don’t use it. One reason is that it is cognitively challenging: you must think of exactly what to say, and how to say it clearly enough to be understood, on the first time, potentially in a distracting or non-private environment. But what if you could first “think out loud” about what you want to say, perhaps whispering a stream of consciousness to your phone—then your phone would give you (a) an outline of the main points you wanted to say and (b) really accurate predictions about what word to type next in order to say it?

Other language tasks

De-EQ

Given a sound corrupted by a random EQ curve or other processing step, predict the parameters for that processing step. This kind of task is called self supervised learning. See Microsoft’s HEXA.

Project Scratch