Fundamentals of Data Engineering
Data Science
205
3 units
Course Description
Storing, managing, and processing datasets are foundational processes in data science. This course introduces the fundamental knowledge and skills of data engineering that are required to be effective as a data scientist. This course focuses on the basics of data pipelines, data pipeline flows and associated business use cases, and how organizations derive value from data and data engineering. As these fundamentals of data engineering are introduced, learners will interact with data and data processes at various stages in the pipeline, understand key data engineering tools and platforms, and use and connect critical technologies through which one can construct storage and processing architectures that underpin data science applications.
Skill Sets
Analytics Solution Architectures / Data at Scale Concerns and Tradeoffs / Distributed Data Processing / Relational Databases / Graph Databases / Streaming Data Applications / Cube Technology
Tools
Python / Relational databases / Hadoop / Map reduce / Spark / Cloud Computing (AWS)
Course Designers
Previously listed as DATASCI W205. Prior to Spring 2018, this course was titled “Storing and Retrieving Data”.