:wave: Hello :slightly_smiling_face: Is it possibl...
# ingestion
s
šŸ‘‹ Hello šŸ™‚ Is it possible to ingest BigQuery metadata, with the bigquery plugin, for datasets in a project in which I canā€™t submit jobs? i.e. I have datasets in project A, which Iā€™d like to ingest, and I can only submit jobs in project B. I though that by setting credentials.project_id to project B I would be good to go but that doesnā€™t seem to be the case. Iā€™m on v0.8.38:
Copy code
source:
    type: bigquery
    config:
        project_id: A
        use_exported_bigquery_audit_metadata: false
        profiling:
            enabled: false
        credential:
            project_id: B
            private_key_id: '${GCP_PRIVATE_KEY_ID}'
            private_key: '${GCP_PRIVATE_KEY}'
            client_id: '${GCP_CLIENT_ID}'
            client_email: '${GCP_CLIENT_EMAIL}'
        domain:
            foo:
                allow:
                    - 'A\..*'
sink:
    type: datahub-rest
    config:
        server: '<https://xxx/api/gms>'
        token: '${GMS_TOKEN}'
Error:
Copy code
'Forbidden: 403 POST <https://bigquery.googleapis.com/bigquery/v2/projects/A/jobs?prettyPrint=false>: Access Denied: Project '
           'A: User does not have bigquery.jobs.create permission in project A.\n'
g
This looks to me that this service account doesn't have some necessary permissions to execute specific jobs on project A BigQuery.
s
Thatā€™s correct
g
Did you try to create another one or give the necessary permissions?
s
Itā€™s a common pattern in BQ to grant access to somebody to a given dataset but not allow jobs to be submitted in the project where the dataset lives. The user will access that dataset by submitting a job in a project under his own control. This has mostly to do with the way billing works in GCP
As an example: https://cloud.google.com/bigquery/public-data You can read the data of these datasets but you wonā€™t be able to submit any job in the project where they live (bigquery-public-data)
In my case the datasets Iā€™d like to ingest were shared with me by a third party
g
Hmm, I see. So, It's necessary to understand how the ingestion is creating the jobs on BigQuery. One thing that I have sure of is that they are using the BigQuery Python Client.
Did you tried to execute a query using this service account on the dataset on project A to see if it has the necessary permissions?
s
afaik Datahub is using SQLAlchemy to interact with BQ
Did you tried to execute a query using this service account on the dataset on project A to see if it has the necessary permissions?
This service account can only submit queries in project B and it can reference datasets in project A in such queries
g
hmm, so I'm completely wrong kkk Just trying to help
s
No problem, I appreciate it šŸ™‚
g
This service account can only submit queries in project B and it can reference datasets in project A in such queries
I see. But I don't have anything in mind now that could be the problem.
s
g
So, would be better to use the service account project id to create the jobs but reference the project_id on the root of the config, right? As you mention the service account has permissions on project B but not the same on A.
s
Thatā€™s a possible solution and that was my understanding of what credential.project_id is there for but it seems to be only passed for authentication purposes. Another solution would be an extra field under config to specify a ā€œworking_project_idā€ to be used to submit queries while ingesting datasets in ā€œproject_idā€
g
Cool! For me, this seems to be an issue.
s
Iā€™ll create a feature request šŸ™‚
g
I already added my thumbs up!
s
Thank you! šŸ˜‰
teamwork 1
i
facing same issue. earlier this was working in 0.8.33 but with 0.8.38 itā€™s not
is there any quick fix for this?
s
interesting šŸ™‚ did you try to run a diff on this part of the code between .33 and .38? In my case itā€™s not something I urgently need so I parked it there for now