cuddly-butcher-39945
07/28/2022, 7:35 PM'Failed to configure source (snowflake) due to pipeline_name must be provided if stateful ingestion is enabled.\n',
I've never seen anything about a pipeline_name being needed.
Please let me know if there is something you can think of to help me get my first ingestion to complete 🙂
More context around this error:
'[2022-07-28 19:30:04,041] INFO {datahub.cli.ingest_cli:99} - DataHub CLI version: 0.8.41\n'
'[2022-07-28 19:30:04,091] INFO {datahub.ingestion.run.pipeline:160} - Sink configured successfully. DataHubRestEmitter: configured '
'to talk to <http://datahub-gms:8080>\n'
'[2022-07-28 19:30:06,244] INFO {datahub.ingestion.source_config.sql.snowflake:231} - using authenticator type '
"'DEFAULT_AUTHENTICATOR'\n"
'[2022-07-28 19:30:06,244] ERROR {datahub.ingestion.run.pipeline:126} - pipeline_name must be provided if stateful ingestion is '
'enabled.\n'
'[2022-07-28 19:30:06,244] INFO {datahub.cli.ingest_cli:115} - Starting metadata ingestion\n'
'[2022-07-28 19:30:06,244] INFO {datahub.cli.ingest_cli:133} - Finished metadata pipeline\n'
'\n'
'Failed to configure source (snowflake) due to pipeline_name must be provided if stateful ingestion is enabled.\n',
"2022-07-28 19:30:08.102905 [exec_id=70cf693c-d13f-42a5-b0fd-04ca739e33b4] INFO: Failed to execute 'datahub ingest'",
'2022-07-28 19:30:08.103382 [exec_id=70cf693c-d13f-42a5-b0fd-04ca739e33b4] INFO: Caught exception EXECUTING '
'task_id=70cf693c-d13f-42a5-b0fd-04ca739e33b4, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 114, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
faint-translator-23365
07/28/2022, 8:51 PMmicroscopic-mechanic-13766
07/29/2022, 10:23 AMCentralLogoutController.java
(datahub/datahub-frontend/app/controllers) are not used in any part of the code.
I have also found that the default logout URL used is "/"
, shouldn't it be something like this for Keycloak??
https://<keycloak_hsot>/auth/realms/<realm>/protocol/openid-connect/logout?redirect_uri=https://<datahub_host>/logout
victorious-pager-14424
07/29/2022, 1:10 PMapi/v2/graphql
route.
Any tips on how can I debug this issue? More info on 🧵wide-printer-24185
07/29/2022, 1:52 PMred-vr-34382
07/29/2022, 3:42 PMdelightful-barista-90363
07/29/2022, 6:27 PM<s3://bucket_name/{table}/20220729/*.csv>
but it doesnt seem to be working. spark gets initialized but thats about it. thanks in advanced. Actually doesnt look like profiling is run at all. Spark gets initialized but isnt used 🤔
More specifically getting this error
Unable to infer schema for CSV. It must be specified manually.
Looks like in the debug logs, its only going to the {table}
when trying to open up spark
DEBUG:datahub.ingestion.source.s3.source:Opening file <s3://bucket/jordan-test/dataset_a> for profiling in spark
when the file lives 2 folders downchilly-elephant-51826
07/31/2022, 5:48 PMsquare-solstice-69079
08/01/2022, 6:51 AMhelpful-painting-48754
08/01/2022, 8:29 AMcold-autumn-7250
07/31/2022, 5:46 PMs3_sensor = S3KeySensor(
task_id="s3_file_check",
aws_conn_id="aws_prod",
bucket_key=bronze_path,
bucket_name=bronze_bucket,
poke_interval=60,
mode="reschedule",
timeout=60 * 60 * 8,
dag=dag,
inlets=[Dataset("s3", "test/{{ ds }}")],
)
Reason is that I would like to connect the actual file on S3 with the Airflow run.
Thanks a lot for any suggestion 🙂
PS: I am using the following versions:
apache-airflow-providers-amazon==4.1.0
acryl-datahub-airflow-plugin==0.8.41.2handsome-football-66174
07/29/2022, 8:54 PM{
searchAcrossEntities( input: {start: 0, count: 20, query: "*", types: [CONTAINER],filters: ["subTypes","Database"]
})
{
searchResults {
entity {
urn
type
}
}
}
}
big-zoo-81740
08/01/2022, 9:00 PMbase_folder
config option, /home/ubuntu/github/myreponame
, but I keep getting an error saying it can't find the directory or it doesn't exist. The folder has r/w permissions, so datahub should be able to read from it. Is there something I am obviously doing wrong? Does the repo need to be located in a specific folder in order for the base_folder
config to be able to read from it?little-breakfast-38102
08/02/2022, 4:05 AM2022/08/01 222256 Waiting for: http//health
2032/99022, 32-32 38, 28135 P, dotan'tiP;"éfeal"bet"hittp://eath":nttpinoHost in request UfL. Steeping 15Values.yaml
tag: "V0.8.40"
datahub-gms:
image:
repository: $(cg_image_repo}/linkedin/datahub-gms
datahub-frontend:
image:
repository: ${cg_image_repo}/linkedin/datahub-frontend-react
acryl-datahub-actions:
image:
repository: $(cg_image_repo}/acryldata/datahub-actions
datahub-mae-consumer:
image:
repository: ${cg_image_repo}/linkedin/datahub-mae-consumer
datahub-mce-consumer:
image:
repository: ${cg_image_repo}/Linkedin/datahub-mce-consumer
datahub-ingestion-cron:
image:
repository: $(ecr_image_repo}/acryldata/datahub-ingestion:v0.8.40
#customized by adding additional drivers
elasticsearchSetupJob:
image:
repository: $(cg_image_repo}/Linkedin/datahub-elasticsearch-setup
kafkaSetupJob:
image:
repository: ${cg_image_repo}/Linkedin/datahub-kafka-setup
mysqlSetupJob:
image:
repository: $(cg_image_repo}/acryldata/datahub-mysql-setup
postgresqlSetupJob:
image:
repository: $(cg_image_repo}/acryldata/datahub-postgres-setup
datahubUpgrade:
image:
repository: ${cg_image_repo}/acryldata/datahub-upgradeAppreciate any help.
astonishing-guitar-79208
08/01/2022, 1:31 PMicy-portugal-26250
07/25/2022, 12:08 PM<datahub-url>/policies
only to get a
Unauthorized to perform this action. Please contact your DataHub administrator. (code 403)
I wanted to login as a datahub user, but the logout just redirect me to the homepage, and in logs the datahub-frontend
pod the following error:
13:45:34 [application-akka.actor.default-dispatcher-47862] ERROR auth.sso.oidc.OidcCallbackLogic - Unable to renew the session. The session store may not support this feature
I tried also adding myself to the user.props
but it does not any effect.
Is there other ways to add policies? How would I go to debug this?silly-room-64336
07/28/2022, 9:12 AMsql:
datasource:
host: "<http://xxx.privatelink.azure.com:5432|xxx.privatelink.azure.com:5432>"
hostForpostgresqlClient: "<http://xxx.privatelink.azure.com|xxx.privatelink.azure.com>"
port: "5432"
url: "jdbc:<postgresql://xxx.privatelink>..<http://azure.com:5432/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8|azure.com:5432/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8>"
driver: "org.postgresql.Driver"
username: "pgadmin"
password:
secretRef: mysql-secrets
secretKey: password
Error that i am getting
2022/07/28 10:48:26 Waiting for: <tcp://xxx.privatelink.postgres.database.azure.com:5432>
2022/07/28 10:48:26 Connected to <tcp://xxx.privatelink.postgres.database.azure.com:5432>
psql: error: connection to server at "<http://xxx.privatelink.postgres.database.azure.com|xxx.privatelink.postgres.database.azure.com>" , port 5432 failed: FATAL: password authentication failed for user "datahub"
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server at "<http://xxx.privatelink.postgres.database.azure.com|xxx.privatelink.postgres.database.azure.com>" , port 5432 failed: FATAL: SSL connection is required. Please specify SSL options and retry.
I verified pgadmin password is correct and able to login with the same password in PGADMIN4 app
Please help herenutritious-finland-99092
07/26/2022, 2:19 PM--restore-indices
command on ECS.
Doc: https://datahubproject.io/docs/how/restore-indices/
Can someone help me please?adamant-van-21355
07/13/2022, 7:51 AMstateful ingestion
feature on DBT. Once we enable the stateful configuration we got the following stack-trace (in thread) with an assertion error, while metadata is ingested successfully. This is happening either on top of an old DBT ingestion config or on a new one after enabling the stateful ingestion with "remove_stale_metadata": True
. I would appreciate any clues on how we can make this work properly so any stale metadata is removed on future ingestion runs 🙏
...
"pipeline_name": "my-dbt-pipeline",
...
"stateful_ingestion": {
"enabled": True,
"remove_stale_metadata": True
},
...
jolly-traffic-67085
08/02/2022, 8:59 AMdelightful-zebra-4875
08/02/2022, 11:22 AMThe table structure of flink-hivecatalog is in the properties after adding the data source
delightful-zebra-4875
08/02/2022, 11:23 AMWhat should I do at this time?
miniature-journalist-76345
07/19/2022, 8:09 AMancient-apartment-23316
07/28/2022, 5:55 PMkubectl port-forward pod/datahub-datahub-frontend-67986b756-tsrv4 9002:9002
but there is no access from the internet. I have k8s service external-ip (ac5408daf1d594c16b937f80a18e0218-1264659697.us-east-1.elb.amazonaws.com) but this link doesn’t work (9002 port). Also I have the k8s ingress and it doesn’t work too. Can you help me please? I check everything but can’t find the root causenumerous-account-62719
08/01/2022, 6:55 PMnumerous-account-62719
08/02/2022, 6:43 AMpurple-soccer-81736
08/02/2022, 4:00 PMMETADATA_SERVICE_AUTH_ENABLED=true
after the installation of datahub? or the only way is reinstallechoing-alligator-70530
08/02/2022, 4:48 PMfaint-translator-23365
08/02/2022, 7:12 PMdelightful-jelly-56633
08/02/2022, 7:13 PMdocker compose version
Docker Compose version v2.6.0
python --version
Python 3.9.9
docker --version
Docker version 20.10.17, build 100c701
docker version
Client: Docker Engine - Community
Version: 20.10.17
...
Server: Docker Engine - Community
Engine:
Version: 20.10.17
lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04 LTS
Release: 22.04
Codename: jammy
datahub --version
acryl-datahub, version 0.8.41.2