Hello, need help with table/column `COMMENT` inges...
# troubleshoot
d
Hello, need help with table/column
COMMENT
ingestion. I am working on making Postgres table/column `COMMENT`s available in DataHub documentation. For some reasons the `COMMENT`s are not getting picked. The CLI and DataHub version we are using are 0.9.1 The table DDL is as below:
Copy code
CREATE TABLE IF NOT EXISTS public.accounts
(
    id bigint NOT NULL DEFAULT nextval('accounts_id_seq'::regclass),
    account_uuid character varying COLLATE pg_catalog."default",
    status character varying COLLATE pg_catalog."default",
    created_at timestamp(6) without time zone NOT NULL,
    updated_at timestamp(6) without time zone NOT NULL,
    CONSTRAINT accounts_pkey PRIMARY KEY (id)
);

ALTER TABLE IF EXISTS public.accounts OWNER to mse_accounting_qa_user;

COMMENT ON TABLE public.accounts IS 'Representation of a user account.';
COMMENT ON COLUMN public.accounts.account_uuid IS 'Unique identifier for the account across all services';
COMMENT ON COLUMN public.accounts.status IS 'The current status of the account, default("created")';
-- Recipe file (redacted) is as below
Copy code
# accounts

source:
  type: postgres
  config:
    # Coordinates
    host_port: xxxx:65432
    database: accounts_db

    # Credentials
    username: datahub_user
    password: ${DATAHUB_USER_DB_PWD}
    env: 'QA'

    # allow or deny tables for ingestion
    table_pattern:
      allow:
        - .*
      deny: []

    # allow or deny schemas for ingestion
    schema_pattern:
      allow:
        - .*
      deny:
        - "information_schema"

    # allow or deny views for ingestion - 'schema_name.view_name'
    view_pattern:
      allow:
        - .*
      deny: []

    # PostgreSQL DataHub profiler settings
    # See README.md for details
    profile_pattern:
      allow:
        - .*
      deny: []

    profiling:
      enabled: true # default false
      profile_table_level_only: False # default false
      include_field_sample_values: False # default is True. 

transformers:
  - type: "simple_add_dataset_ownership"
    config:
      owner_urns:
        - "urn:li:corpGroup:d94f1f51-xxxx-4cbc-xxxx-3197b0d9862d" # Team accounts
        - "urn:li:corpGroup:ccbf944a-xxxx-4b39-xxxx-65d19ae967d6" # Data Dictionary

  - type: "simple_add_dataset_domain"
    config:
      domains:
        - "urn:li:domain:xxxxxxx-51bc-4f87-bc2f-b44dfb8b977d" # Domain

sink:
  type: "datahub-kafka"
  config:
    connection:
      bootstrap: "xxxx:9999"
      producer_config:
        security.protocol: "ssl"
        ssl.ca.location: "/secrets/vault_ca_chain.pem"
        ssl.certificate.location: "/secrets/vault_cert.pem"
        ssl.key.location: "/secrets/vault_key.pem"
      schema_registry_url: "<https://schema-registryxxx>"
      schema_registry_config:
        ssl.ca.location: "/secrets/vault_ca_chain.pem"
        ssl.certificate.location: "/secrets/vault_cert.pem"
        ssl.key.location: "/secrets/vault_key.pem"

# for `- type: "simple_add_dataset_domain"` to work
datahub_api:
  server: "<https://datahub-gms.xxxx:443>"
Could someone please advise if anything is amiss? TIA! 🙏
h
Hey @dazzling-insurance-83303 the recipe looks fine. Are you getting any runtime errors/ warnings ?
d
Thanks @hundreds-photographer-13496. The only errors I see are around psycopg2 however I recall them being there all the time. (IIRC they fail on ingesting/profiling JSONB data).
h
Got it. did the comments disappear from datahub recently or they weren't present from the start ? I would suggest using file sink to view generated json file, and whether they contain comment / description.
d
I see… where would I find the file sink?
Hello @hundreds-photographer-13496 Based on my troubleshooting, it appears that the comments are not showing up due to the asset domain tagging changes I made from
source
section to
transformers
section. I have reverted those changes and I am able to see the comments ingested. I plan to investigate further and see if I should report a bug. Thanks!
h
So after you remove
simple_add_dataset_domain
transformer from recipe, comments start showing up ?
d
Actually I removed both
simple_add_dataset_domain
as well as
datahub_api
specification and that worked. I didn’t get a chance to narrow it down… yet… let me see if I can shortly
h
This issue with simple_add_dataset_domain was fixed in datahub version 0.9.2.1 . Is it possible for you to use updated datahub cli version ?
d
Ah I see… we are at 0.9.1 at the moment. Could you please share the release note link? And I wanted to circle back with you saying that without the
datahub_api
spec I wasn’t able to get
simple_add_dataset_domain
to work, it would throw the following error:
Copy code
[2022-10-17 15:28:01,615] ERROR    {***.entrypoints:165} - AddDatasetDomain requires a ***_api to connect to. Consider using the ***-rest sink or provide a ***_api: configuration on your ingestion recipe
h
https://github.com/acryldata/datahub/releases/tag/v0.9.2.1 Let me check about need for datahub_api in simple_add_dataset_domain.