Hi team, is any easy way to search how many column...
# getting-started
a
Hi team, is any easy way to search how many columns for a schemametadata asepct of all datasets? for example, if I am interested in the all datsets which have columns count greater than 200, whats the easiest way to do that? Can we use ElasticSearch to do the aggregation for us? I felt we could, also felt it is not easy.
e
It should be reasonably straight forward to create a derived field in ElasticSearch that holds the total column count based on the schema aspect.
c
You can refer to
hasOwners
filed in DatasetDocument.pdl, the derived value can be calculated on the fly in DatasetIndexBuilder.java. For your case, looks like a count of fieldPaths is good enough? You can add a new derived field
numberOfXXX
. In case you have use cases where the calculation is complicated, you might need derived aspect or offline help. FYI. @ambitious-battery-33996.
a
+1 we could populate another field in ES index.
@cool-river-24902 instead of materializing a new field in ES, can we have a derived field containing the count ?
a
I created
numberOfInputs
and
numberOfouputs
for the
dataprocess
entity and its search result. I know getting a count of an aspect field is doable. In the
dataprocess
entity, we have
dataprocessinfo
aspect which has two fields: inputs list and outputs list. similarily, for the
number of columns
in Dataset entity, We will have
schemametadata
aspect. under this aspect, we will have
fields
field. Probably I can just count the size of this
fields
list.