Hi Team, I'm having some issues ingesting redshift...
# ingestion
f
Hi Team, I'm having some issues ingesting redshift-lineage into datahub. I've followed the documentation here, yet I'm unable to see any lineage on datahub. Posting the recipe file and other details in the thread.
Copy code
source:
  type: redshift
  config:
    # Coordinates
    host_port: "***"
    database: "***"

    # Credentials
    username: "***"
    password: "***"

    # Options
    # options:
    
    # driver_option:
    include_table_lineage: True
    include_views: True # whether to include views, defaults to True
    include_tables: True # whether to include views, defaults to True
    start_time: 2022-02-17T00:00:00.001Z
    end_time: 2023-02-18T00:00:00.001Z
    table_lineage_mode: sql_based
    schema_pattern:
      allow:
        - "demo"
        - "data_model"

transformers:
  - type: "set_dataset_browse_path"
    config:
      replace_existing: True
      path_templates:
        - /Redshift/DATASET_PARTS
    
sink:
  # type: "console"
  # # sink configs
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>"
I've been running this query to check if any lineage appears.
Copy code
insert into demo.lineage_check(select * from demo.lead_event_fct);
b
What is the output of running this recipe?
f
It's throwing this warning
Copy code
WARNING: dev.demo.lead_event_fct missing table
But otherwise, it shows a success message
Copy code
'warnings': {},
 'failures': {},
 'tables_scanned': 41,
 'views_scanned': 0,
 'entities_profiled': 0,
 'filtered': ['cx.*', 'dw.*', 'growth.*', 'information_schema.*', 'ops.*', 'prediction.*', 'public.*', 'sales.*', 'temp.*'],
 'soft_deleted_stale_entities': [],
 'query_combiner': None}
Sink (datahub-rest) report:
{'records_written': 93,
 'warnings': [],
 'failures': [],
 'downstream_start_time': datetime.datetime(2022, 2, 22, 11, 58, 18, 982654),
 'downstream_end_time': datetime.datetime(2022, 2, 22, 11, 58, 32, 257416),
 'downstream_total_latency_in_seconds': 13.274762}

Pipeline finished successfully
s
I am working on improving this source and sink reports for Redshift so that we can have more debugging information to solve these problems. Will update you once the new CLI is ready for use
f
Thanks!
s
Can you please share the CLI version and server version that you are using currently?
f
DataHub CLI version: 0.8.27 Server version - 0.8.26 (Not sure about this, found this from the UI)
s
Yes, the one from the UI should be fine. In the latest CLI version we have updated so you will see it in the sink report itself. But for CLI version
0.8.28
to work you will have to upgrade the server too. May I suggest you consider upgrading the server and CLI to
0.8.28
release? That way when the new changes are released you will be able to easily update the CLI and check whether the new CLI helps or not for your case
f
Cool, I will do this
I think I figured out part of the issue. I will try to raise a pull request for this