{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f6fd9600",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "# Initialize Otter\n",
    "import otter\n",
    "grader = otter.Notebook(\"practice01.ipynb\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb41c2e8",
   "source": "> **About the autograder:** This practice uses [otter-grader](https://otter-grader.readthedocs.io/) for autograding. After completing each task, run the `grader.check(...)` cell below it — a ✅ means your answer passes the tests. If it fails, read the message and try again. The same tests are used when your notebook is submitted for grading. If you are running locally and don't have otter installed: `pip install otter-grader`.",
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "id": "2c70ac38-cae8-43e2-9315-e644b4c07dd1",
   "metadata": {},
   "source": [
    "**Student names and e-mails:**\n",
    "\n",
    "_YOUR NAME — your@calvin.edu_\n",
    "\n",
    "_YOUR NAME — your@calvin.edu_"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7da75610-420b-4468-b6eb-3360d9913dbe",
   "metadata": {},
   "source": [
    "# Practice 01 — DataFrame Basics\n",
    "\n",
    "In this practice you will work with a real dataset about forest fires in Brazil. Each task is tagged with the SLO it covers:\n",
    "\n",
    "| SLO | Description |\n",
    "|-----|-------------|\n",
    "| **02A** | Manipulate the structure and contents of pandas DataFrames |\n",
    "| **02B** | Sort, filter, and query DataFrames |\n",
    "| **02C** | Choose appropriate visual encodings in a visualization |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "eca89e2a-a458-4020-82c7-c05660919d67",
   "metadata": {},
   "source": [
    "---\n",
    "## The Dataset: Amazon Forest Fires (Brazil, 1998–2017)\n",
    "\n",
    "![Amazon forest fire](https://images.unsplash.com/photo-1511027643875-5cbb0439c8f1?q=80&w=1200&auto=format&fit=crop)\n",
    "\n",
    "This dataset reports monthly counts of forest fires by state, recorded by Brazil's national space research agency INPE. Each row represents one state–month–year combination."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f374b740-ecef-423e-accf-607b8f397d45",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import plotly.express as px\n",
    "import plotly.graph_objects as go"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bc8b7713-8eac-485d-8ad5-1fceee9c1945",
   "metadata": {},
   "source": [
    "The cell below loads the data from a CSV file into a pandas DataFrame called `fires`. Run it and look at the first few rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31a65ad6-9c82-4ad9-af23-abb760f930e4",
   "metadata": {},
   "outputs": [],
   "source": [
    "fires = pd.read_csv(\n",
    "    'https://cs.calvin.edu/courses/data/202/26fa/datasets/amazon.csv',\n",
    "    encoding='latin-1'\n",
    ")\n",
    "fires.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2a55394c-aa4a-4056-a35a-d420e4042406",
   "metadata": {},
   "source": [
    "Notice that the `month` column contains **Portuguese names** (e.g., `Janeiro`, `Fevereiro`). The cell below translates them to integers (1–12) and corrects the `date` column to match. Run it — you don't need to modify it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15da55e6-81ae-4fac-9a41-0d9358543cac",
   "metadata": {},
   "outputs": [],
   "source": [
    "month_map = {\n",
    "    'Janeiro': 1, 'Fevereiro': 2, 'Marco': 3, 'Abril': 4,\n",
    "    'Maio': 5, 'Junho': 6, 'Julho': 7, 'Agosto': 8,\n",
    "    'Setembro': 9, 'Outubro': 10, 'Novembro': 11, 'Dezembro': 12\n",
    "}\n",
    "fires['month'] = fires['month'].map(month_map)\n",
    "fires['date'] = pd.to_datetime(fires[['year', 'month']].assign(day=1))\n",
    "fires.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "252ecebc-9aa9-4899-a0e9-b826929a5297",
   "metadata": {},
   "source": [
    "---\n",
    "## Part 1 — Understanding the DataFrame\n",
    "\n",
    "Before any analysis, explore what you have. Key tools:\n",
    "\n",
    "| Expression | What it gives you |\n",
    "|---|---|\n",
    "| `df.shape` | Tuple `(rows, columns)` |\n",
    "| `df.columns` | Index of column names |\n",
    "| `df['col']` | A Series (one column) |\n",
    "| `df['col'].sum()` | Total of that column |\n",
    "| `df['col'].idxmax()` | Index of the max value |\n",
    "| `df.loc[idx, 'col']` | Value at row `idx`, column `'col'` |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3654b30e-0277-433b-98a5-12b1bdbb1f27",
   "metadata": {},
   "outputs": [],
   "source": [
    "fires.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1603e53a-a950-4b96-b245-bea5901d95df",
   "metadata": {},
   "outputs": [],
   "source": [
    "fires.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "23004bc4-3497-4164-91b3-41ec8ecdca34",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "### Task 02A.1 — Shape and Columns *(1 pt)*\n",
    "\n",
    "Using `fires.shape`, assign:\n",
    "- `n_rows` to the number of rows\n",
    "- `n_cols` to the number of columns\n",
    "\n",
    "Use the attribute — don't type the literal numbers."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d75645e1-cfcb-44ec-afdb-0a0e82a5f45f",
   "metadata": {
    "tags": [
     "otter_answer_cell"
    ]
   },
   "outputs": [],
   "source": [
    "n_rows = ...\n",
    "n_cols = ...\n",
    "print(f'Rows: {n_rows}  |  Columns: {n_cols}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c9aa4eb7",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "grader.check(\"02A.1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f98fde2e-44b2-4422-b9b0-725b0edc2f6a",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "### Task 02A.2 — Accessing Columns and Computing *(2 pts)*\n",
    "\n",
    "Using the `number` column (fires per record) and the `state` column:\n",
    "\n",
    "1. Assign the **total** number of fires across all rows to `total_fires`.\n",
    "2. Find the **state that had the single highest monthly fire count** in the dataset. Assign its name (a string) to `peak_state`.\n",
    "\n",
    "*Hints: `.sum()`, `.idxmax()`, and `.loc[]` will be useful.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4c9b17d3-b568-4d50-aa87-72347b3225f9",
   "metadata": {
    "tags": [
     "otter_answer_cell"
    ]
   },
   "outputs": [],
   "source": [
    "total_fires = ...\n",
    "peak_idx = ...\n",
    "peak_state = ...\n",
    "print(f'Total fires across all records: {total_fires:,.0f}')\n",
    "print(f'State with highest single-month count: {peak_state}')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ef574085",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "grader.check(\"02A.2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d79f0f6e-bd88-483a-9ce7-0984d5235919",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "### Task 02A.3 — Adding a Column *(2 pts)*\n",
    "\n",
    "Add a new column called `'decade'` to `fires` that records the **decade** of each row as a string:\n",
    "\n",
    "- 1990–1999 → `'1990s'`\n",
    "- 2000–2009 → `'2000s'`\n",
    "- 2010–2019 → `'2010s'`\n",
    "\n",
    "*Hint: integer division (`//`) on the `year` column gets you the decade start; string concatenation adds the `'s'`.*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f1639aa4-d36c-4888-921b-95f997bb2078",
   "metadata": {
    "tags": [
     "otter_answer_cell"
    ]
   },
   "outputs": [],
   "source": [
    "fires['decade'] = ...\n",
    "fires[['year', 'decade']].drop_duplicates().sort_values('year').head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bf30964d",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "grader.check(\"02A.3\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2aa3896f-034a-471f-a964-19064307b9b5",
   "metadata": {},
   "source": [
    "---\n",
    "## Part 2 — Sorting and Filtering\n",
    "\n",
    "Pandas lets you focus on subsets of your data:\n",
    "\n",
    "| Operation | Syntax | Example |\n",
    "|---|---|---|\n",
    "| Boolean filter | `df[condition]` | `df[df['year'] > 2010]` |\n",
    "| Multiple conditions | `df[cond1 & cond2]` | Use `&` not `and` |\n",
    "| Filter by list | `df['col'].isin(lst)` | checks membership |\n",
    "| Sort | `df.sort_values('col', ascending=False)` | largest first |\n",
    "| Top N | `.head(N)` after sort | |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "857d291a-7111-48eb-b3de-7ef837da2f94",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Records from 2015 onward, during August (peak fire season)\n",
    "late_august = fires[(fires['year'] >= 2015) & (fires['month'] == 8)]\n",
    "print(f'{len(late_august)} rows matched')\n",
    "late_august.sort_values('number', ascending=False).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b66f4a3a-f842-4a26-97e5-c5a7b610c397",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "### Task 02B.1 — Filtering *(2 pts)*\n",
    "\n",
    "Filter `fires` to include only rows for the state of **`'Mato Grosso'`**. Assign the result to `mato_grosso`.\n",
    "\n",
    "Then print the number of rows in your filtered dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "117689a1-5d14-45a6-a219-7d1989d9b574",
   "metadata": {
    "tags": [
     "otter_answer_cell"
    ]
   },
   "outputs": [],
   "source": [
    "mato_grosso = ...\n",
    "print(f'Mato Grosso rows: {len(mato_grosso)}')\n",
    "mato_grosso.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31da4645",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "grader.check(\"02B.1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8f2d8db7-6332-4e71-8174-b1ce5ff98064",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "### Task 02B.2 — Sorting *(2 pts)*\n",
    "\n",
    "From the full `fires` dataframe, find the **5 rows with the highest fire counts**. Assign the result to `top5`.\n",
    "\n",
    "Then display only the `state`, `year`, `month`, and `number` columns of `top5`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2dc5fb1a-53f6-44e0-ab84-5c079a0e9887",
   "metadata": {
    "tags": [
     "otter_answer_cell"
    ]
   },
   "outputs": [],
   "source": [
    "top5 = ...\n",
    "top5[['state', 'year', 'month', 'number']]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2e2fdcb7",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "grader.check(\"02B.2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ae280250-7275-40a3-ac87-6baeca115dc4",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "### Task 02B.3 — Compound Filtering *(2 pts)*\n",
    "\n",
    "The north region of Brazil (the Amazon basin) includes these states:\n",
    "\n",
    "```python\n",
    "north_states = ['Acre', 'Amapa', 'Amazonas', 'Para', 'Rondonia', 'Roraima', 'Tocantins']\n",
    "```\n",
    "\n",
    "Create a new dataframe `north_recent` containing only rows where **both** conditions hold:\n",
    "1. The state is in `north_states`\n",
    "2. The year is **2010 or later**\n",
    "\n",
    "Print its shape."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "135e0216-8845-47fb-85d1-31601daf76dd",
   "metadata": {
    "tags": [
     "otter_answer_cell"
    ]
   },
   "outputs": [],
   "source": [
    "north_states = ['Acre', 'Amapa', 'Amazonas', 'Para', 'Rondonia', 'Roraima', 'Tocantins']\n",
    "north_recent = ...\n",
    "print(f'north_recent shape: {north_recent.shape}')\n",
    "north_recent.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "359a8120",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "grader.check(\"02B.3\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0c581cc7-ffff-41a0-af89-eee8bc80aba3",
   "metadata": {},
   "source": [
    "---\n",
    "## Part 3 — Visual Encodings\n",
    "\n",
    "A chart maps data to **visual properties** called **encodings**:\n",
    "\n",
    "| Encoding | Plotly Express argument | Best for |\n",
    "|---|---|---|\n",
    "| x-axis | `x=` | time, ordered categories |\n",
    "| y-axis | `y=` | numeric values |\n",
    "| color | `color=` | categories or continuous gradient |\n",
    "| size | `size=` | quantity (use carefully) |\n",
    "| facet | `facet_col=` | small multiples by category |"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f0558851-257d-4f12-9da4-e709874ac635",
   "metadata": {},
   "outputs": [],
   "source": [
    "fires_per_year = fires.groupby('year')['number'].sum().reset_index()\n",
    "\n",
    "fig_example = px.bar(\n",
    "    fires_per_year,\n",
    "    x='year',\n",
    "    y='number',\n",
    "    title='Total Forest Fires in Brazil per Year',\n",
    "    labels={'number': 'Number of Fires', 'year': 'Year'}\n",
    ")\n",
    "fig_example.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "654c01a1-909c-4d99-9a84-bd2dc3ac754e",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "### Task 02C.1 — Line Plot for One State *(2 pts)*\n",
    "\n",
    "Using your `mato_grosso` dataframe from Task 02B.1, create a **line plot** showing fire counts over time:\n",
    "\n",
    "- x-axis: `'date'`\n",
    "- y-axis: `'number'`\n",
    "- A descriptive title\n",
    "- Axis labels via the `labels=` argument\n",
    "\n",
    "Assign the figure to `fig1` and display it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5453343a-1ad6-40e0-8299-3b87c309b788",
   "metadata": {
    "tags": [
     "otter_answer_cell"
    ]
   },
   "outputs": [],
   "source": [
    "...\n",
    "fig1.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4bf20863",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "grader.check(\"02C.1\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b242cb23-4226-4a72-b326-93a4ccf86259",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "### Task 02C.2 — Comparing Multiple States *(3 pts)*\n",
    "\n",
    "One state is interesting, but comparison reveals the bigger picture. Using your `north_recent` dataframe (Task 02B.3), create a line plot comparing fire trends across all north region states.\n",
    "\n",
    "Requirements:\n",
    "- Chart type: **line**\n",
    "- x-axis: `'date'`\n",
    "- y-axis: `'number'`\n",
    "- **Color** encoding: `'state'`\n",
    "- A title and axis labels\n",
    "\n",
    "Assign to `fig2` and display it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "393ef7f4-a2fb-4361-a3b1-e0b1084dcd46",
   "metadata": {
    "tags": [
     "otter_answer_cell"
    ]
   },
   "outputs": [],
   "source": [
    "...\n",
    "fig2.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7ac8e4a1",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "outputs": [],
   "source": [
    "grader.check(\"02C.2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c4911ca-9562-4efd-be15-798888cfa8b8",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "<!-- BEGIN QUESTION -->\n",
    "\n",
    "### Task 02C.3 — Evaluate Your Visualization *(2 pts)*\n",
    "\n",
    "Look critically at `fig2`. In **3–5 sentences**, answer:\n",
    "\n",
    "1. Does the color encoding help you compare states? Why or why not?\n",
    "2. What would you change to make the patterns clearer?\n",
    "3. What does this plot **not** show — what information is hidden or lost?\n",
    "\n",
    "*Edit the cell below and write your answer.*"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7c61106-f1cb-438d-a9d5-0cfe5b53bbba",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "_Your answer here._"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "858bf761",
   "metadata": {
    "deletable": false,
    "editable": false
   },
   "source": [
    "<!-- END QUESTION -->\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python",
   "version": "3.10.0"
  },
  "otter": {
   "OK_FORMAT": true,
   "assignment_name": "practice01",
   "tests": {
    "02A.1": {
     "name": "02A.1",
     "points": 1,
     "suites": [
      {
       "cases": [
        {
         "code": ">>> assert isinstance(n_rows, int), 'n_rows should be an integer'\n>>> assert isinstance(n_cols, int), 'n_cols should be an integer'\n>>> assert n_cols == 5, 'fires should have 5 columns'\n>>> assert n_rows > 1000, 'n_rows seems too small'\n",
         "hidden": false,
         "locked": false
        }
       ],
       "scored": true,
       "setup": "",
       "teardown": "",
       "type": "doctest"
      }
     ]
    },
    "02A.2": {
     "name": "02A.2",
     "points": 2,
     "suites": [
      {
       "cases": [
        {
         "code": ">>> assert isinstance(peak_state, str), 'peak_state should be a string'\n>>> assert total_fires > 0, 'total_fires should be positive'\n",
         "hidden": false,
         "locked": false
        }
       ],
       "scored": true,
       "setup": "",
       "teardown": "",
       "type": "doctest"
      }
     ]
    },
    "02A.3": {
     "name": "02A.3",
     "points": 2,
     "suites": [
      {
       "cases": [
        {
         "code": ">>> assert 'decade' in fires.columns, \"Column 'decade' not found\"\n>>> assert fires['decade'].dtype == object, \"'decade' should be a string column\"\n",
         "hidden": false,
         "locked": false
        }
       ],
       "scored": true,
       "setup": "",
       "teardown": "",
       "type": "doctest"
      }
     ]
    },
    "02B.1": {
     "name": "02B.1",
     "points": 2,
     "suites": [
      {
       "cases": [
        {
         "code": ">>> assert len(mato_grosso) > 0, 'mato_grosso is empty'\n>>> assert mato_grosso['state'].nunique() == 1, 'mato_grosso should contain only one state'\n",
         "hidden": false,
         "locked": false
        }
       ],
       "scored": true,
       "setup": "",
       "teardown": "",
       "type": "doctest"
      }
     ]
    },
    "02B.2": {
     "name": "02B.2",
     "points": 2,
     "suites": [
      {
       "cases": [
        {
         "code": ">>> assert len(top5) == 5, 'top5 should have exactly 5 rows'\n>>> nums = list(top5['number'])\n>>> assert nums == sorted(nums, reverse=True), 'top5 should be sorted by number descending'\n",
         "hidden": false,
         "locked": false
        }
       ],
       "scored": true,
       "setup": "",
       "teardown": "",
       "type": "doctest"
      }
     ]
    },
    "02B.3": {
     "name": "02B.3",
     "points": 2,
     "suites": [
      {
       "cases": [
        {
         "code": ">>> assert 'north_recent' in dir(), 'north_recent is not defined'\n>>> assert north_recent['year'].min() >= 2010, 'north_recent contains years before 2010'\n>>> assert north_recent['state'].isin(north_states).all(), 'north_recent contains states outside north_states'\n>>> assert len(north_recent) > 0, 'north_recent is empty'\n",
         "hidden": false,
         "locked": false
        }
       ],
       "scored": true,
       "setup": "",
       "teardown": "",
       "type": "doctest"
      }
     ]
    },
    "02C.1": {
     "name": "02C.1",
     "points": 2,
     "suites": [
      {
       "cases": [
        {
         "code": ">>> assert isinstance(fig1, go.Figure), 'fig1 should be a Plotly Figure'\n",
         "hidden": false,
         "locked": false
        }
       ],
       "scored": true,
       "setup": "",
       "teardown": "",
       "type": "doctest"
      }
     ]
    },
    "02C.2": {
     "name": "02C.2",
     "points": 3,
     "suites": [
      {
       "cases": [
        {
         "code": ">>> assert isinstance(fig2, go.Figure), 'fig2 should be a Plotly Figure'\n",
         "hidden": false,
         "locked": false
        }
       ],
       "scored": true,
       "setup": "",
       "teardown": "",
       "type": "doctest"
      }
     ]
    }
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}