Hi... I'm trying to ingest metadata from Redshift ...
# ingestion
m
Hi... I'm trying to ingest metadata from Redshift into datahub. It appears that the ingestion works ok for regular tables, but those tables defined in external schemas (and accessed using Redshift Spectrum) are simply skipped - and not even mentioned in the logs ? Is this a known restriction ?
l
@mammoth-bear-12532 has started to look into Redshift spectrum since others have reported problems as well.
not intentional
m
Thanks @loud-island-88694 I think it has something to do with the following query not returning external schemas:
Copy code
2021-08-18 06:58:09,235 INFO sqlalchemy.engine.base.Engine
        SELECT
          c.relkind,
          n.oid as "schema_oid",
          n.nspname as "schema",
          c.oid as "rel_oid",
          c.relname,
          CASE c.reldiststyle
            WHEN 0 THEN 'EVEN' WHEN 1 THEN 'KEY' WHEN 8 THEN 'ALL' END
            AS "diststyle",
          c.relowner AS "owner_id",
          u.usename AS "owner_name",
          TRIM(TRAILING ';' FROM pg_catalog.pg_get_viewdef(c.oid, true))
            AS "view_definition",
          pg_catalog.array_to_string(c.relacl, '
') AS "privileges"
        FROM pg_catalog.pg_class c
             LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
             JOIN pg_catalog.pg_user u ON u.usesysid = c.relowner
        WHERE c.relkind IN ('r', 'v', 'm', 'S', 'f')
          AND n.nspname !~ '^pg_'
        ORDER BY c.relkind, n.oid, n.nspname;
Haven't quite figured it out yet but the LOJ appears to be filtering out the external schemas
@mammoth-bear-12532 Do you have an issue number for this so I can track it... cheers