class: center, middle, inverse, title-slide .title[ # Data and Ethics ] .author[ ### Ken Arnold ] .date[ ### 2022-02-26 ] --- ## Reflection: Surrounded by rich data [Genesis 1](https://www.biblegateway.com/passage/?search=Genesis%201&version=NIV): Ctrl-F "saw" --- ## AI Ethics? .tc[  ] .floating-source[Source: [pixabay](https://pixabay.com/photos/robot-robotic-future-technology-3d-2658699/)] --- ## AI Ethics Beyond Killer Robots - AI needs data (privacy) - ...and empowers organizations that aggregate data - AI needs lots of computation - AI's outputs affect people - predictions and decisions (bias, discrimination, gig labor, ...) - how we perceive each other (and ourselves) - how we perceive ideas (misinformation) - AI does things that people once did - AI systems can be attacked in new ways - AI embeds values about what is human - AI tells us about our own intelligence --- ## Hypothetical Example: Shoplifting Prediction System - Problem: busy store, too many cameras to monitor - Input: surveillance camera video - Output: people, labeled by likelihood to shoplift What issues might come up? -- - **bias** - **transparency / explainability** --- ## Hypothetical Example: Resume Screening - Problem: many candidates, many jobs - Input: resumes, job description - Output: ranked list of most promising matches What issues might come up? -- - **bias** - **feedback loops** --- ## Hypothetical Example: News Recommendation - Problem: too much going on (information overload) - Input: articles - Output: ranked list of articles for me to readable What issues might come up? -- - **feedback loops** - **measurement bias** - potential for **adversarial manipulation** --- ## Hypothetical Example: Popularity Prediction - Problem: past ad campaigns fell flat. - Input: image to be used in social media marketing - Output: predicted level of engagement (likes, shares, comments) What issues might come up? -- - **measurement bias** - **stereotyping** - **clickbait** --- # A few questions to ask about ML - General - Is our data *representative*? (vs, e.g., feedback loops) - Were stakeholders involved in the design? - How vulnerable is the system to attack? - Performance - Robust? Reliable? - Testable - Clarity - Are results accurate in real-world? - Are system's biases understood and reported? - Is there recourse if the system is wrong? - Can decisions be explained? .floating-source[Shneiderman, Human-Centered AI, pp 246-247] --- ## Why transparency? Did this classifier successfully learn to recognize a "dumbbell"? .center[<img src="https://3.bp.blogspot.com/-dc6B2h_o1fc/VYITir_QCgI/AAAAAAAAAlU/Ysi0_reQTpI/s1600/dumbbells.png" style="width: 100%">] .small[.right[*"Dumbbell"*]] .floating-source[<https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html>] --- ## Data Splitting **Why**: Confidence about deployment **How**: * Hide a test set * Single split (train-valid): [`RandomSplitter`](https://docs.fast.ai/data.transforms.html#RandomSplitter) or similar * or: cross-validation --- ## Data Loading .pull-left[ **Why**: Model expects data in ideal format, cleanly chunked... real world is messy. **How**: pipeline of transformations (to *items*, to *batches*) ] --- **Code**: Work through the [`DataBlock` tutorial](https://docs.fast.ai/tutorial.datablock.html). `ImageDataLoader.from_name_func` is convenient way to construct a `DataBlock` and `DataLoader` --- ## Batches **Why**: Often more efficient to process several items at once; more confident in how to update weights **How**: * need to align sizes of each image (or text document, or sound, or ...) * limited by GPU memory (especially for the *backward* pass) --- class: middle, center ## How to evaluate a classifier --- <https://youtu.be/TJgUiZgX5rE?t=121> <https://youtu.be/ridS396W2BY?t=32> --- ## Diagnosing Classifiers - **Why**: Better *quantify* performance (e.g., sensitivity vs specificity), better *understand* performance (analyze errors) - **How**: [Confusion matrix](https://en.wikipedia.org/wiki/Sensitivity_and_specificity#Confusion_matrix); False Positive vs False Negative. Example: Automotive collision avoidance system *predicts imminent collisions* - consequence of false positive? false negative? - effect of adjustments --- ## Averages Hide.  .floating-source[[Racial Discrimination in Face Recognition Technology](https://sitn.hms.harvard.edu/flash/2020/racial-discrimination-in-face-recognition-technology/)] --- ## Data Augmentation **Why**: Discourage overfitting. Encourage generalization. **How**: move the camera around (images), skip words (text), add / subtract stuff. *Related*: mess with the model itself, e.g., Dropout. more in later chapters. ??? Examples of what specific transforms can be used --- ## Running Experiments * Reproducibility * `set_seed` * Restart and Run All * Variability * try different seeds * try different modeling choices * **Keep a log** (e.g., spreadsheet with one line per run) --- class: center, middle ## Tools --- ## List Comprehensions Often readable and declarative. Use judiciously. .pull-left[ ```python result = [] for i in range(4): if i % 2 == 1: for j in range(i): result.append(f'i={i} j={j}') result ``` ``` ## ['i=1 j=0', 'i=3 j=0', 'i=3 j=1', 'i=3 j=2'] ``` ] .pull-right[ ```python [ f'i={i} j={j}' for i in range(4) if i % 2 == 1 for j in range(i) ] ``` ``` ## ['i=1 j=0', 'i=3 j=0', 'i=3 j=1', 'i=3 j=2'] ``` ] - `dict` and `set` comprehensions work similarly. - Numerical libraries often provide "broadcast" alternatives: `x * y` works elementwise when either is `np.array` or `torch.Tensor` --- ## Jupyter Notebooks - execution order issues: Restart and Run All - kernel vs webpage: it's still running when you close the page! - Tracebacks: - *What is the error*? look at the end. "What is it talking about?" - *What triggered it*? look for your own code (farther up) --- ## Markdown ```markdown *italic*, **bold** # heading ## heading 2 [link](https://example.com) - list - list 2 ``` --- ## How to ask a good question - There are no stupid questions in this class. But there are *lazy* questions. - You must learn to *ask well*. *What makes a well-asked question?* -- <https://stackoverflow.com/help/how-to-ask>