dazzling-insurance-83303
11/10/2022, 6:43 PMCOMMENT ingestion.
I am working on making Postgres table/column `COMMENT`s available in DataHub documentation. For some reasons the `COMMENT`s are not getting picked.
The CLI and DataHub version we are using are 0.9.1
The table DDL is as below:
CREATE TABLE IF NOT EXISTS public.accounts
(
id bigint NOT NULL DEFAULT nextval('accounts_id_seq'::regclass),
account_uuid character varying COLLATE pg_catalog."default",
status character varying COLLATE pg_catalog."default",
created_at timestamp(6) without time zone NOT NULL,
updated_at timestamp(6) without time zone NOT NULL,
CONSTRAINT accounts_pkey PRIMARY KEY (id)
);
ALTER TABLE IF EXISTS public.accounts OWNER to mse_accounting_qa_user;
COMMENT ON TABLE public.accounts IS 'Representation of a user account.';
COMMENT ON COLUMN public.accounts.account_uuid IS 'Unique identifier for the account across all services';
COMMENT ON COLUMN public.accounts.status IS 'The current status of the account, default("created")';
-- Recipe file (redacted) is as below
# accounts
source:
type: postgres
config:
# Coordinates
host_port: xxxx:65432
database: accounts_db
# Credentials
username: datahub_user
password: ${DATAHUB_USER_DB_PWD}
env: 'QA'
# allow or deny tables for ingestion
table_pattern:
allow:
- .*
deny: []
# allow or deny schemas for ingestion
schema_pattern:
allow:
- .*
deny:
- "information_schema"
# allow or deny views for ingestion - 'schema_name.view_name'
view_pattern:
allow:
- .*
deny: []
# PostgreSQL DataHub profiler settings
# See README.md for details
profile_pattern:
allow:
- .*
deny: []
profiling:
enabled: true # default false
profile_table_level_only: False # default false
include_field_sample_values: False # default is True.
transformers:
- type: "simple_add_dataset_ownership"
config:
owner_urns:
- "urn:li:corpGroup:d94f1f51-xxxx-4cbc-xxxx-3197b0d9862d" # Team accounts
- "urn:li:corpGroup:ccbf944a-xxxx-4b39-xxxx-65d19ae967d6" # Data Dictionary
- type: "simple_add_dataset_domain"
config:
domains:
- "urn:li:domain:xxxxxxx-51bc-4f87-bc2f-b44dfb8b977d" # Domain
sink:
type: "datahub-kafka"
config:
connection:
bootstrap: "xxxx:9999"
producer_config:
security.protocol: "ssl"
ssl.ca.location: "/secrets/vault_ca_chain.pem"
ssl.certificate.location: "/secrets/vault_cert.pem"
ssl.key.location: "/secrets/vault_key.pem"
schema_registry_url: "<https://schema-registryxxx>"
schema_registry_config:
ssl.ca.location: "/secrets/vault_ca_chain.pem"
ssl.certificate.location: "/secrets/vault_cert.pem"
ssl.key.location: "/secrets/vault_key.pem"
# for `- type: "simple_add_dataset_domain"` to work
datahub_api:
server: "<https://datahub-gms.xxxx:443>"
Could someone please advise if anything is amiss?
TIA! 🙏hundreds-photographer-13496
11/14/2022, 6:24 AMdazzling-insurance-83303
11/14/2022, 2:05 PMhundreds-photographer-13496
11/14/2022, 2:28 PMdazzling-insurance-83303
11/14/2022, 3:07 PMdazzling-insurance-83303
11/14/2022, 9:39 PMsource section to transformers section. I have reverted those changes and I am able to see the comments ingested.
I plan to investigate further and see if I should report a bug. Thanks!hundreds-photographer-13496
11/15/2022, 5:00 AMsimple_add_dataset_domain transformer from recipe, comments start showing up ?dazzling-insurance-83303
11/15/2022, 9:48 PMsimple_add_dataset_domain as well as datahub_api specification and that worked. I didn’t get a chance to narrow it down… yet… let me see if I can shortlyhundreds-photographer-13496
11/16/2022, 2:14 AMdazzling-insurance-83303
11/18/2022, 3:26 PMdatahub_api spec I wasn’t able to get simple_add_dataset_domain to work, it would throw the following error:
[2022-10-17 15:28:01,615] ERROR {***.entrypoints:165} - AddDatasetDomain requires a ***_api to connect to. Consider using the ***-rest sink or provide a ***_api: configuration on your ingestion recipehundreds-photographer-13496
11/18/2022, 3:57 PM