Great Expectation details:

Great expectation is a tool for validating, documenting and profiling our data to maintain data quality and improve communication between teams and also facilitates easy debugging in case of failures.

It asserts what you expect when you load and transform data, Expectation basically a unit test for data and it also publish data documentation and data quality report using expecataion.

Usefule for Data Scientists/Data Engineers:

  1. Help in Validating correctness of data when they took it from other team or raw source before applying any trasformation.
  2. Prevent to slip faulty data in production.
  3. Take Implicit domain knowledge about data from subject matter expert and streamline it as explicit

Key Features

Expectations:

Expectation is just assertion about your data. In Great Expectation these assertion can be written in declarative language, in the form of simple, human-readable and python language.

Automated data profiling:

Writing data assertion is complex and tedious task. Great Expecataion provides methodology for automating generating assertion for data.The library profiles your data to get basic statistics, and automatically generates a suite of Expectations based on what is observed in the data.

Data validation:

Once assertion is written, Great expectation load several batches of data and run assertion test and return any unexpected behavior.

Data Docs:

Great expectation generates a HTML report describing data.

Machine learning | Deep Learning | Reinforcement Learning | Probability