Systems Engineering for Data Science

Keith VanderLinden
Calvin University

Systems Engineering

Systems engineering conceives of, designs, builds, and deploys systems that satisfy business requirements.

DMLS Figure 1-1

Machine Learning

Machine learning is an approach to learn complex patterns from existing data and use these patterns to make   predictions on unseen data.

Machine Learning: Patterns

Machine learning is an approach to learn complex patterns from existing data and use these patterns to make   predictions on unseen data. To be viable:

The patterns should be sufficiently complicated and changing that they can’t be pre-specified. Distinguish:

  • Traditional Programming
    inputs + algorithms = outputs

  • Machine learning
    inputs + outputs = patterns

Machine Learning: Data

Machine learning is an approach to learn complex patterns from existing data and use these patterns to make  predictions on unseen data. To be viable:

There must be data for learning that’s:

  • Appropriate
  • Voluminous
  • Balanced
  • Available
  • Unbiased

Machine Learning: Predictions

Machine learning is an approach to learn complex patterns from existing data and use these patterns to make   predictions on unseen data. To be viable:

Predictions must be:

  • Possible
  • Non-Mission-Critical
  • Valuable
  • Appropriate

Software vs ML

Software: Code & Data

  • Code/data are separate.
  • Only code is versioned.
  • Code-bases are small.
  • Code is unit-tested.
  • Code updates are infrequent.

ML: Datasets and Models

  • Data/models are coupled.
  • Data/models are versioned.
  • Data/models are huge.
  • ML is hard to test.
  • Data/model updates are frequent.

Systems Engineering Process

Software Engineering

  1. Analysis
  2. Design
  3. Implementation
  4. Testing
  5. Deployment &
    Maintenance

ML/Data Engineering

  1. Project Scoping
  2. Data & Model Engineering

     

     

  3. System Deployment
  4. System Monitoring
  5. Business Analysis