Hi team, I have successfully ingested hive dataset...
# getting-started
n
Hi team, I have successfully ingested hive datasets from hive metastore in mysql using presto-on-hive recipe. I was able to ingest a few datasets that I wanted for testing using
database_pattern
and
table_pattern
. Additionally, I'm trying to do the following things, and I need some advice on it. Any help would be appreciated. 1. Now I'm trying to ingest most of the datasets from
hive metastore
, and I was wondering if there's a way to do pattern filtering for other items in
hive metastore
as well: • DBS.DB_LOCATION_URI (e.g. allow only the pattern "hdfs://cluster1/dsc/.*") • DBS.OWNER_NAME (e.g. deny those with accounts ".*test") • If it is impossible via recipes, would there be any other possible ways? 2. Our organization manages additional metadata for hive's DB, table, and column in other mysql DB (say
our_meta
) which is separated from
hive metastore
. For example, for a table named
customer.cust_mst
on the hive, this table exists in the
hive metastore
, and a separate mysql DB also manages information about this table. Given the situation, I'd like to ingest the metadata of
our_meta
into datahub. What should be the best way to do it? • Some of the managed items in
our_meta
(mainly technical meta) seems to be managed as custom properties, and this should be synced in batch mode. • Some of the managed items in
our_meta
(mainly business meta) can be managed as business glossary or tags, and both batch sync and API calls should possible. • I am looking into custom ingestion source and metadata ingestion transformer, but I am not sure how to approach. I hope I can handle these without forking the source if possible. Thanks in advance.
a
Hi, you can set allow-deny patterns as seen in the config based on regex https://datahubproject.io/docs/generated/ingestion/sources/hive/#config-details
n
@astonishing-answer-96712 thanks for the reply! So I guess it is not possible filtering with db_location_uri or owner_name since it is not included in recipe's config. Maybe I should devise ways to filter without recipe. Could you comment on my second use case too?