{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0bf5e729",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "# Initialize Otter\n",
    "import otter\n",
    "grader = otter.Notebook(\"class_wed.ipynb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "db744bc5",
   "source": "> **About the autograder:** This notebook uses [otter-grader](https://otter-grader.readthedocs.io/) for instant feedback. After completing each task, run the `grader.check(\"t1\")` cell below it — a ✅ means your answer is correct. This exercise is **not graded**; the checks are just for your own feedback. If a check fails, read the message and try again.",
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "id": "38621294-f3c8-4770-89e2-580e1db6c5e2",
   "metadata": {},
   "source": [
    "# Week 1 — Wednesday: Python Review for Machine Learning\n",
    "\n",
    "**DATA 202 · Calvin University**\n",
    "\n",
    "Work through each section with a partner: read the explanation, run the examples, then complete the task together.\n",
    "\n",
    "> *What do we need to remember about Python to get started with machine learning?*\n",
    "\n",
    "1. Lists and list comprehensions\n",
    "2. Tuples\n",
    "3. Dictionaries\n",
    "4. Objects: attributes and methods\n",
    "5. String manipulation\n",
    "6. NumPy arrays"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4860ccac-1084-4f11-b2e1-bb50bfb72107",
   "metadata": {},
   "source": [
    "---\n",
    "## 1. Lists and List Comprehensions\n",
    "\n",
    "Lists are **ordered, mutable** collections — you can change them after creation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a0e87bb-8e8c-43f1-b339-8e110ecbef9c",
   "metadata": {},
   "outputs": [],
   "source": [
    "nums = [1, 2, 3, 4]\n",
    "print(nums[0])      # first element\n",
    "nums.append(5)      # add to end\n",
    "nums[2] = 10        # replace an element\n",
    "print(nums)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d43c3006-fb4f-447c-8b0c-79af75c0bfc3",
   "metadata": {},
   "source": [
    "**List comprehensions** are a concise way to build a new list from an existing one:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "30eb45fc-42e9-4c59-8471-a8391cf6289a",
   "metadata": {},
   "outputs": [],
   "source": [
    "squares = [x**2 for x in range(6)]\n",
    "print(squares)\n",
    "\n",
    "evens = [x for x in range(10) if x % 2 == 0]\n",
    "print(evens)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4336f360-8828-4c02-bf51-cf1b746906c0",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "### Task 1\n",
    "\n",
    "1. Create a list `temps_c = [0, 20, 37, 100]` (temperatures in Celsius).\n",
    "2. Use a list comprehension to convert it to Fahrenheit: `f = c * 9/5 + 32`. Assign the result to `temps_f`.\n",
    "3. Print the Fahrenheit list.\n",
    "\n",
    "> **Discuss:** did any step modify `temps_c` in place, or create a new list?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10d6553e-684b-457e-8bec-eb1ea4cf62fd",
   "metadata": {
    "tags": [
     "otter_answer_cell"
    ]
   },
   "outputs": [],
   "source": [
    "temps_c = ...\n",
    "temps_f = ...\n",
    "print(temps_f)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9025dfca",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "grader.check(\"t1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0644cd7b-b91d-4941-b19a-36f8c8608ac3",
   "metadata": {},
   "source": [
    "---\n",
    "## 2. Tuples\n",
    "\n",
    "Tuples are **ordered but immutable** — you cannot change them after creation. Commonly used to return multiple values from a function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "993da4e3-a95d-47d7-8d73-04a8661bb874",
   "metadata": {},
   "outputs": [],
   "source": [
    "point = (3, 4)\n",
    "x, y = point          # tuple unpacking\n",
    "print(f'x={x}, y={y}')\n",
    "\n",
    "def min_max(values):\n",
    "    return (min(values), max(values))\n",
    "\n",
    "lo, hi = min_max([5, 2, 9, 1])\n",
    "print(f'min={lo}, max={hi}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3772034b-bad1-4939-a4e4-c419cef63bbb",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "### Task 2\n",
    "\n",
    "1. Write a function `summary(values)` that returns a tuple `(min, max, mean)`. Hint: `sum(values) / len(values)` gives the mean.\n",
    "2. Call it with `[10, 20, 30, 40, 50]` and unpack into `lo, hi, avg`.\n",
    "3. Print: `Range: 10-50, average: 30.0`\n",
    "\n",
    "> **Discuss:** why is a tuple more appropriate here than a list?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ad88201-4da6-475e-9256-a1144f3ec52c",
   "metadata": {
    "tags": [
     "otter_answer_cell"
    ]
   },
   "outputs": [],
   "source": [
    "...\n",
    "    ...\n",
    "\n",
    "lo, hi, avg = ...\n",
    "print(f'Range: {lo}-{hi}, average: {avg}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f7de60a9",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "grader.check(\"t2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e687a51c-97b2-4e2c-91de-6194129fbe08",
   "metadata": {},
   "source": [
    "---\n",
    "## 3. Dictionaries\n",
    "\n",
    "Dictionaries store **key-value pairs** with fast lookups. In ML you will encounter them constantly: model parameters, configuration, label encodings, evaluation metrics."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6b9655bb-e451-4361-b76c-a04bb98440cc",
   "metadata": {},
   "outputs": [],
   "source": [
    "student = {'name': 'Alice', 'age': 20}\n",
    "print(student['age'])       # access by key\n",
    "student['grade'] = 'A'      # add a new key\n",
    "\n",
    "for key, value in student.items():\n",
    "    print(f'{key}: {value}')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a499cb0-250f-4fb1-90a3-d7d759a62fc9",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "### Task 3\n",
    "\n",
    "1. Create `label_map = {'low': 0, 'medium': 1, 'high': 2}`.\n",
    "2. Given `raw = ['high', 'low', 'medium', 'high', 'low']`, use a list comprehension to produce numeric codes. Assign to `codes`.\n",
    "3. Print the result: `[2, 0, 1, 2, 0]`\n",
    "\n",
    "> **Discuss:** where else in a data pipeline might you use a mapping like this?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21aaf9ab-70d7-4384-8856-f17bde7c393c",
   "metadata": {
    "tags": [
     "otter_answer_cell"
    ]
   },
   "outputs": [],
   "source": [
    "label_map = ...\n",
    "raw = ...\n",
    "codes = ...\n",
    "print(codes)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aae2790b",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "grader.check(\"t3\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "130ecd82-ab81-4a04-9627-55b885bb3d0e",
   "metadata": {},
   "source": [
    "---\n",
    "## 4. Objects: Attributes and Methods\n",
    "\n",
    "Almost everything in Python is an **object** with attributes (stored values) and methods (functions attached to it).\n",
    "\n",
    "- `object.method()` — called *on* the object\n",
    "- `function(object)` — called *with* the object as argument\n",
    "\n",
    "Watch for **mutability**: some methods modify in place, others return a new object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "89f9b333-d4f1-4609-9365-7ba29c886324",
   "metadata": {},
   "outputs": [],
   "source": [
    "s = 'hello'\n",
    "print(s.upper())    # method — returns NEW string (strings are immutable)\n",
    "print(s.islower())  # method — returns True/False\n",
    "print(len(s))       # function, not method\n",
    "print(s)            # s is still 'hello' — nothing changed it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "69c910a0-ff13-43c9-8e6e-b216ed111748",
   "metadata": {},
   "outputs": [],
   "source": [
    "nums = [3, 1, 4, 1, 5]\n",
    "nums.sort()                   # modifies IN PLACE\n",
    "print('after sort():', nums)\n",
    "\n",
    "nums2 = [3, 1, 4]\n",
    "sorted_nums = sorted(nums2)   # returns a NEW list\n",
    "print('nums2 unchanged:', nums2)\n",
    "print('sorted_nums:', sorted_nums)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c44e67a7-9369-49ea-9acc-6c731338bef7",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "### Task 4\n",
    "\n",
    "1. Given `words = ['Machine', 'learning', 'IS', 'fascinating']`, use a list comprehension with a string method to lowercase every word. Assign to `lower_words`.\n",
    "2. Sort the result with `sorted()` and store it in `sorted_words`.\n",
    "3. Print both `words` and `sorted_words` — confirm `words` is unchanged.\n",
    "\n",
    "> **Discuss:** why does this distinction (in-place vs. new object) matter in a data pipeline?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f98fa73-17b1-478e-8814-04d1518941d2",
   "metadata": {
    "tags": [
     "otter_answer_cell"
    ]
   },
   "outputs": [],
   "source": [
    "words = ['Machine', 'learning', 'IS', 'fascinating']\n",
    "lower_words = ...\n",
    "sorted_words = ...\n",
    "print(words)\n",
    "print(sorted_words)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22ab3512",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "grader.check(\"t4\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84d12f37-cc47-48b3-b9a7-bea21b51b9d1",
   "metadata": {},
   "source": [
    "---\n",
    "## 5. String Manipulation\n",
    "\n",
    "Real datasets are messy. Text columns almost always need cleaning before a model can use them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4692dc4e-08e6-4635-be70-c46288945df0",
   "metadata": {},
   "outputs": [],
   "source": [
    "text = '  data,science,rocks!  '\n",
    "print(text.strip())                    # remove surrounding whitespace\n",
    "print(text.strip().split(','))         # split into a list\n",
    "print('-'.join(['data', 'science']))   # join with a separator\n",
    "print(text.replace('rocks', 'rules'))  # find and replace"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0825f0a7-3b60-48c7-875c-9f67e339f813",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "### Task 5\n",
    "\n",
    "You receive a raw string from a form submission:\n",
    "\n",
    "```python\n",
    "entry = \"  Grand Rapids ; Michigan ; 49546  \"\n",
    "```\n",
    "\n",
    "1. Strip leading/trailing whitespace, then split by `\";\"` to get a list of parts.\n",
    "2. Strip whitespace from each part using a list comprehension.\n",
    "3. Build a dictionary `location = {'city': ..., 'state': ..., 'zip': ...}` from the parts.\n",
    "\n",
    "> **Discuss:** why would inconsistent formatting (extra spaces, different separators) break downstream analysis?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b0e937b5-f5f3-4fa0-a47e-8abc9de81d85",
   "metadata": {
    "tags": [
     "otter_answer_cell"
    ]
   },
   "outputs": [],
   "source": [
    "entry = \"  Grand Rapids ; Michigan ; 49546  \"\n",
    "parts = ...\n",
    "location = ...\n",
    "print(location)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60fa76fe",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "grader.check(\"t5\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c537c091-733a-451b-a31c-54e645dc0ac7",
   "metadata": {},
   "source": [
    "---\n",
    "## 6. NumPy Arrays\n",
    "\n",
    "Python lists are flexible but not designed for numerical computing:\n",
    "\n",
    "```python\n",
    "[1, 2, 3] + [4, 5, 6]  # concatenation, NOT addition\n",
    "[1, 2, 3] * 2           # repetition, NOT multiplication\n",
    "```\n",
    "\n",
    "**NumPy arrays** apply operations **element-wise** and are much faster."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "51ae94e9-13b3-4e3e-96a3-a54fc9011910",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "arr = np.array([1, 2, 3, 4, 5])\n",
    "print(arr * 2)                          # [2 4 6 8 10]\n",
    "print(arr + np.array([10,20,30,40,50])) # element-wise add"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7e555b62-d7fb-4fc0-b514-61dfe95b97bf",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(arr.sum(), arr.mean(), arr.std())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f12ffa9a-1147-482c-8d23-2351e670549a",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 2D arrays: rows = samples, columns = features\n",
    "mat = np.array([[1, 2, 3],\n",
    "                [4, 5, 6]])\n",
    "print(mat.shape)   # (2, 3)\n",
    "print(mat[0, 1])   # row 0, col 1 -> 2\n",
    "print(mat[:, 1])   # all rows, column 1 -> [2 5]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "87b6f4f1-b11e-4a80-8bd5-5a45adb7fb2d",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "### Task 6\n",
    "\n",
    "You have exam scores for 4 students across 3 assignments:\n",
    "\n",
    "```python\n",
    "scores = np.array([[88, 92, 95],\n",
    "                   [78, 85, 80],\n",
    "                   [90, 91, 89],\n",
    "                   [70, 72, 68]])\n",
    "```\n",
    "\n",
    "1. Compute the average score per student (mean across each row, `axis=1`). Assign to `avg_per_student`.\n",
    "2. Compute the average score per assignment (mean down each column, `axis=0`). Assign to `avg_per_assignment`.\n",
    "3. Find the index of the student with the highest overall average using `np.argmax()`. Assign to `best_student`.\n",
    "\n",
    "> **Discuss:** how does this 2D structure relate to how ML models see training data?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1897817-d95e-4e3f-8486-4a482f48f8b0",
   "metadata": {
    "tags": [
     "otter_answer_cell"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "scores = np.array([[88, 92, 95],\n",
    "                   [78, 85, 80],\n",
    "                   [90, 91, 89],\n",
    "                   [70, 72, 68]])\n",
    "\n",
    "avg_per_student = ...\n",
    "avg_per_assignment = ...\n",
    "best_student = ...\n",
    "print('Avg per student:', avg_per_student)\n",
    "print('Avg per assignment:', avg_per_assignment)\n",
    "print('Best student index:', best_student)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c0bcbcce",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "grader.check(\"t6\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b31c3d6-ec2f-4fe0-9690-acbf0ea4bab1",
   "metadata": {},
   "source": [
    "---\n",
    "## Wrap-Up\n",
    "\n",
    "These building blocks appear constantly in the ML stack:\n",
    "\n",
    "| Concept | Where you will see it |\n",
    "|---|---|\n",
    "| Lists / comprehensions | Feature lists, label encoding, data pipelines |\n",
    "| Tuples | Train/test splits, function returns, coordinate pairs |\n",
    "| Dictionaries | Model parameters, configuration, label mappings |\n",
    "| Methods vs. functions | `.fit()`, `.transform()`, `.predict()` in scikit-learn |\n",
    "| String manipulation | Cleaning text columns before analysis |\n",
    "| NumPy arrays | Feature matrices, model inputs/outputs, vectorized math |"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10.0"
  },
  "otter": {
   "OK_FORMAT": true,
   "assignment_name": "class01_wed",
   "tests": {
    "t1": {
     "name": "t1",
     "points": 1,
     "suites": [
      {
       "cases": [
        {
         "code": ">>> assert temps_c == [0, 20, 37, 100], 'temps_c should be unchanged'\n>>> assert temps_f == [32.0, 68.0, 98.6, 212.0], 'check your Fahrenheit formula'\n",
         "hidden": false,
         "locked": false
        }
       ],
       "scored": true,
       "setup": "",
       "teardown": "",
       "type": "doctest"
      }
     ]
    },
    "t2": {
     "name": "t2",
     "points": 1,
     "suites": [
      {
       "cases": [
        {
         "code": ">>> assert lo == 10\n>>> assert hi == 50\n>>> assert avg == 30.0\n",
         "hidden": false,
         "locked": false
        }
       ],
       "scored": true,
       "setup": "",
       "teardown": "",
       "type": "doctest"
      }
     ]
    },
    "t3": {
     "name": "t3",
     "points": 1,
     "suites": [
      {
       "cases": [
        {
         "code": ">>> assert codes == [2, 0, 1, 2, 0]\n",
         "hidden": false,
         "locked": false
        }
       ],
       "scored": true,
       "setup": "",
       "teardown": "",
       "type": "doctest"
      }
     ]
    },
    "t4": {
     "name": "t4",
     "points": 1,
     "suites": [
      {
       "cases": [
        {
         "code": ">>> assert words == ['Machine', 'learning', 'IS', 'fascinating'], 'words should be unchanged'\n>>> assert lower_words == ['machine', 'learning', 'is', 'fascinating']\n>>> assert sorted_words == ['fascinating', 'is', 'learning', 'machine']\n",
         "hidden": false,
         "locked": false
        }
       ],
       "scored": true,
       "setup": "",
       "teardown": "",
       "type": "doctest"
      }
     ]
    },
    "t5": {
     "name": "t5",
     "points": 1,
     "suites": [
      {
       "cases": [
        {
         "code": ">>> assert location['city'] == 'Grand Rapids'\n>>> assert location['state'] == 'Michigan'\n>>> assert location['zip'] == '49546'\n",
         "hidden": false,
         "locked": false
        }
       ],
       "scored": true,
       "setup": "",
       "teardown": "",
       "type": "doctest"
      }
     ]
    },
    "t6": {
     "name": "t6",
     "points": 1,
     "suites": [
      {
       "cases": [
        {
         "code": ">>> assert avg_per_student.shape == (4,)\n>>> assert avg_per_assignment.shape == (3,)\n>>> assert best_student == 2, 'student index 2 has the highest average'\n",
         "hidden": false,
         "locked": false
        }
       ],
       "scored": true,
       "setup": "",
       "teardown": "",
       "type": "doctest"
      }
     ]
    }
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}