Fundamentals of Data Engineering

Data Science
205

3 units

Course Description

Storing, managing, and processing datasets are foundational processes in data science. This course introduces the fundamental knowledge and skills of data engineering that are required to be effective as a data scientist. This course focuses on the basics of data pipelines, data pipeline flows and associated business use cases, and how organizations derive value from data and data engineering. As these fundamentals of data engineering are introduced, learners will interact with data and data processes at various stages in the pipeline, understand key data engineering tools and platforms, and use and connect critical technologies through which one can construct storage and processing architectures that underpin data science applications.

Skill Sets

Analytics Solution Architectures / Data at Scale Concerns and Tradeoffs / Distributed Data Processing / Relational Databases / Graph Databases / Streaming Data Applications / Cube Technology

Tools

Python / Relational databases / Hadoop / Map reduce / Spark / Cloud Computing (AWS)

Course Designers

Mark Mims
Former Lecturer

Previously listed as DATASCI W205. Prior to Spring 2018, this course was titled “Storing and Retrieving Data”.

Prerequisites

MIDS students only. Intermediate competency in Python, C, or Java, and competency in Linux, GitHub, and relevant Python libraries. Knowledge of database management including SQL is recommended but not required.
Last updated: October 6, 2022