Data Engineering
Keith VanderLinden
Calvin University
Big Data
The Vs of big data:
- Volume
- Variety
- Velocity
- Veracity
- Value
Data engineering is the process of designing, building, and managing the infrastructure for big data.
Row vs Column Storage
The distinction between row-major vs column-major format can make a difference in performance.
Versioning
Supporting reproducibility requires that we maintain version histories for everything.
- Code
- Configuration
- Data
- Models
Data Storage
Data storage systems are classified into different types.