Hmmm. I have a postgres ingestion where I am togg...
# troubleshoot
m
Hmmm. I have a postgres ingestion where I am toggling profiling to true and when I do ingestion fails with
permission denied for table someschema.blahblahblah
…but the user connecting to the instance can query said tables, validated via Postgres CLI. It is failing every table even though the user has select on all tables via role membership. Help! Ingestion script follows:
Copy code
source:
    type: postgres
    config:
        host_port: 'somepostgreshost:port'
        database: someuserdatabase
        username: avaliduser
        password: '${userpassword}'
        schema_pattern:
            allow:
                - public
        include_tables: true
        include_views: true
        profiling:
            enabled: true
sink:
    type: datahub-rest
    config:
        server: '<http://datahub-gms:8080>'
Error text snippet:
Copy code
['Profiling exception (psycopg2.errors.InsufficientPrivilege) permission denied for "
           "table '\n"
l
@chilly-holiday-80781 ^
c
Hey @mysterious-portugal-30527! This looks pretty weird. Does one of your tables have any name that looks like this? Might be some string parsing error.
m
That was a partial clip of the error I did not want to share my actual table names but the table names are valid identifiers, they are not reserved words.
l
@mysterious-portugal-30527 can you try with
Copy code
"schema_pattern": {
    "deny": ["information_schema"]
}
m
will give er a go!
c
I think the issue here isn’t with the schema_pattern but with the permission setup / potentially any manipulation of the table names if it’s actually a table not found error. The schema_pattern here only has the effect of excluding tables, whereas here we are worried about the ingestion trying to access a table it supposedly does not have access to
m
…and it crashed datahub
Copy code
datahub docker check
The following issues were detected:
- datahub-gms is not running
More detailed error, still failed after restart of datahub with requested change to ingestion script:
Copy code
'prodcat_mlflow_svc.public.tags': ['Profiling exception (psycopg2.errors.InsufficientPrivilege) permission denied for "
           "table tags\\n'\n"
           "                                                 '\\n'\n"
           "                                                 '[SQL: SELECT count(*) AS count_1 \\n'\n"
           "                                                 'FROM public.tags]\\n'\n"
           "                                                 '(Background on this error at: <http://sqlalche.me/e/13/f405)']>},\n"
           " 'tables_scanned': 10,\n"
           " 'views_scanned': 0,\n"
           " 'entities_profiled': 10,\n"
           " 'filtered': ['information_schema.*'],\n"
           " 'soft_deleted_stale_entities': [],\n"
           " 'query_combiner': {'total_queries': 50,\n"
           "                    'uncombined_queries_issued': 50,\n"
           "                    'combined_queries_issued': 10,\n"
           "                    'queries_combined': 10,\n"
           "                    'query_exceptions': 10}}\n"
           'Sink (datahub-rest) report:\n'
           "{'records_written': 27,\n"
           " 'warnings': [],\n"
           " 'failures': [],\n"
           " 'downstream_start_time': datetime.datetime(2022, 3, 2, 22, 21, 24, 213217),\n"
           " 'downstream_end_time': datetime.datetime(2022, 3, 2, 22, 21, 26, 371107),\n"
           " 'downstream_total_latency_in_seconds': 2.15789}\n"
           '\n'
           'Pipeline finished with failures\n',
           "2022-03-02 22:21:28.116367 [exec_id=4e3ce6f9-767b-40f2-ad08-93729d4634ab] INFO: Failed to execute 'datahub ingest'",
           '2022-03-02 22:21:28.122718 [exec_id=4e3ce6f9-767b-40f2-ad08-93729d4634ab] INFO: Caught exception EXECUTING '
           'task_id=4e3ce6f9-767b-40f2-ad08-93729d4634ab, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n'
           '    self.event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n'
           '    return f.result()\n'
           '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
           '    raise self._exception\n'
           '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
           '    result = coro.send(None)\n'
           '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
c
So it looks like this is a profiling error—can you check if the table is successfully ingested when profiling is turned off?
Then we can be sure it’s localized just to the SQL profiler
m
Yes the ingestion runs successfully with profiling turned off.
c
Ok, so I think it’s just the profiler then. @gray-shoe-75895 do you have a hunch?
b
Some additional privileges that we don't know about? Or some way to specify a particular role?
👍 1
g
If you run a Postgres client like
pgcli
and run the query
Copy code
SELECT count(*) AS count_1 FROM public.tags
does that work? It seems to be complaining about a permissions issue which is pretty odd, so just want to make sure it’s an issue on our end
m
Copy code
SELECT count(*) AS count_1 FROM public.tags;
 count_1
---------
     678
(1 row)
g
Yep so the privileges are all good. Can you try running with
profiling -> query_combiner_enabled
set to false?
m
I am re-validating my connection I think my user had select on information schema but not public. resetting everything for a clean test
sigh. and now when I go to the ingestion page I get:
Failed to load ingestion sources! An unknown error occurred.
l
@big-carpet-38439 ^ is there a way to debug this more on managed ingestion
m
OK. I am sorry to have been wasting everyone’s time. I did in fact have a permissions issue on my end. After getting the grants squared away, sample data loads as expected. The thought that the datahub user could see the information schema I think was correct. I could load the data structure, but not the data until after I fixed the grants for the user. More specifically, my. datahub user was a member of a read-only role, but the read-only role had no grants! Oops! Again, sorry for the fire-drill, thanks for the help!
😅 1
b
@mysterious-portugal-30527 Most importantly, you were able to get the data in! Hopefully ingestion sources are loading 🙂
And please do not apologize, this is what we're here for!