What are data unit tests and why we need them
-
Theodore Meynard from GetYourGuide
-
Talk: https://2022.pycon.de/program/MPWLWP/
-
What?
- Frameworks for Data unit testing
- In practice at GetYourGuide
What?
- Data product = Code + Data
- Data product test = Code test + Data test
How to do data unit testing?
- Verify some expectations. Check
- Range and Mean
- Missing values
- Duplicates
- no. of samples
Frameworks
- Great expectations
- Supports SQL, Pandas, Spark
- Data profiling: Gives a draft of expectations
- Data validation
- Data documentation
- Supports distibution and statitstical tests
- Pandora
- Pandas and Spark
- TFDV
- Tenserflow
- SODA