Bringing Software Engineering Practices to Data and Artificial Intelligence
In the rush to store everything and parallelize data processing, the art and rigor of building reliable data systems have been lost. Denny will begin this session with a focus on the fundamentals of reliable data: providing both a retrospective of the database and data warehousing age and how those concepts differ and resonate in the age of data.
Storing data without regard for why, what, and how it is being stored can create a new set of problems. Denny suggests that these problems can be avoided by applying rigorous software engineering practices to data and data engineering.
In this talk, Denny will explore how the distribution of data, microservices, and shifting to the cloud has solved many problems but has also introduced new ones. He’ll then review data engineering best practices and the machine learning life cycle. The key theme is the merging (finally) of data engineering and software engineering practices.
Denny Lee is a developer advocate at Databricks. He is a hands-on distributed systems and data sciences engineer with extensive experience developing internet-scale infrastructure, data platforms, and predictive analytics systems for both on-premise and cloud environments. He also has a master’s in biomedical informatics from Oregon Health and Sciences University and has architected and implemented powerful data solutions for enterprise healthcare customers. His current technical focuses include distributed systems, Apache Spark, deep learning, machine learning, and genomics.