hello, team, a general question about `data replay...
# getting-started
a
hello, team, a general question about
data replay strategy
. for example, in our case, we need to calculate the dataset's data quality. The data quality is calculated based on the aspects of a dataset. Since all datasets are already in datastore (MySQL, Neo4j and Elastic Search), we need to one way to pull data and do the calculation. Right now we are pulling data from MySQL using Python script. Do you guys have some suggestions?
m
@acceptable-architect-70237: do you mean "data quality" as metadata quality scores computed on the aspects stored in datahub?
a
yes, it's one of my use cases. another use case might be that, for example, I add a new property for an aspect,
number of schema fields
, and need ES index this property. I need to
replay
all data to do so.
m
@acceptable-architect-70237: sorry for dropping this thread. The answer is not as straightforward today as we would like it, even though we have a good source-of-truth story with our metadata. I'll create an issue to track this.
a
Thanks. Tried to get an idea what you guys think but you have provided your feedback of solutions in the Issue.