Dear Heavenly Father, As we gather here today to embark on a new journey of learning, we invite Your presence into this classroom. Bless each student with wisdom, understanding, and a thirst for knowledge. Let Your light shine upon us, illuminating the path of learning, so we may contemplate your beauty and love in everything you made.
May this classroom be a place of respect, fellowship, and growth. Guide our imaginations and desires towards your love and justice, so that we may respond adequately to your call to be Christ’s agents of renewal in the world.
Through our Lord Jesus Christ, Amen.
“Using data to search for meaningfulness in creation.”
“A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks T, as measured by P, improves with experience E.” — Tom Mitchell, Machine Learning (1997)
Supervised learning — answering a question (classification, estimation, prediction) based on past experiences
Unsupervised learning — just find structure in data (no pre-given answer)
Reinforcement learning — elaborate a way of acting given previous rewards or penalties
| Traditional Programming | Machine Learning |
|---|---|
| Rules + Data → Output | Data + Output → Rules |
| We write the logic | The model learns the logic |
| Brittle to new situations | Generalizes (if trained well) |
Data is one way we record experience and make it shareable.
A model trained on bad data learns bad patterns.
| Criteria | Python | R |
|---|---|---|
| Community and Support | Large and growing community, with extensive resources and tutorials available. Popular in industry and academia. | Strong community in academia, especially in fields like statistics, bioinformatics, and social sciences. |
| Ease of Learning | Easier for beginners, especially with programming experience. | Steeper learning curve, particularly for those new to programming. |
| Data Visualization | Strong, with libraries like Matplotlib, Seaborn, Plotly, and Bokeh. Interactive visualizations are well-supported. | Excellent, with ggplot2 being one of the most powerful visualization libraries. However, interactive visualizations are less integrated. |
| Machine Learning | Extensive support with libraries like scikit-learn, TensorFlow, and PyTorch. Broad adoption in industry. | Adequate support for machine learning, though Python libraries are more robust and widely used in industry. |
| Statistical Analysis | Good for general-purpose analysis; extensive libraries, though more basic for advanced statistical techniques. | Excellent for complex statistical analysis; originally designed for statisticians and excels in this area. |
| Integration and Flexibility | Highly flexible, integrates well with other languages and systems (e.g., C, C++, Java, SQL). Versatile for many tasks beyond data science. | Primarily focused on statistical computing, less flexible for other types of programming or integration with non-statistical systems. |
| Performance and Scalability | Generally faster for large datasets, especially with optimized libraries (e.g., NumPy, Dask). Better for large-scale production environments. | Can be slower with large datasets, though packages like data.table improve performance. Not as well-suited for big data as Python. |
| Deployment | Strong tools for deploying models in production (e.g., Flask, FastAPI, Streamlit). Easy to integrate with web services and databases. | More challenging to deploy in production; Shiny can be used for web applications but is less flexible than Python tools. |
Skills, knowledge, and dispositions — all three need to be developed.
A disposition is a habit of using skills wisely — formed only in community, through practice.
Thomas Aquinas distinguishes studiositas (virtue) from curiositas (vice).
Some failure modes:
We will practice: noticing and reporting our decisions, acknowledging limitations, validating results.
It is tempting to massage data, cherry-pick results, or report conclusions you wanted rather than found.
We will practice:
We can use our tools to illuminate — or to obscure.
Data science can cause harm and reveal it.
We will practice:
No practice or quiz this week.
Complete before next Monday: