Test-Driven Data Engineering Using DBT & Python
"In the field of data science and engineering, automated testing is far from commonplace. One way to commit to tackling the goal of using automated testing is to identify methods for following test-driven development (TDD). Test-driven data engineering (TDDE) is special because it is often very complex to build automated tests due to lack of frameworks, accessibility to tool internals, or even localized environments. In addition, new tools and technologies are frequently introduced to the environment that abstract away the ability to develop automated tests for code that touches data. With data being one of an organization’s most critical assets, doesn’t it make sense to invest in the skillsets to achieve quantifiable quality for their data?
In this presentation and demonstration, you will learn some of the fundamental patterns to building automated tests as part of the data engineering process, that would be part of a CI/CD pipeline. DBT (https://www.getdbt.com/) and python will be used as examples to demonstrate different types of data engineering use cases that require creative thinking to achieve TDDE. Topics covered will start with use of DBT and python with Docker and SQL Server. Strategies around Spark, Snowflake, orchestration, and applying the concepts to other technologies may be discussed/demonstrated."
Donald Sawyer is the Director of Data Engineering at Object Partners. He has many years of experience applying software engineering skills like testing, scrum, UX, and architecture design, to data science and engineering. He also built and has taught the course, "Big Data Engineering and Architecture" at the University of Minnesota and St. Cloud State University for the past five years. He has built numerous TDDE frameworks for clients and has given many talks on the quality side of data engineering.
More Learning Events
Join us wherever you are in the world as we share some knowledge – hosted by our Improvers.