wonderful-egg-79350
05/09/2022, 7:33 AMfresh-napkin-5247
05/09/2022, 12:08 PMsource:
type: athena
config:
# Coordinates
aws_region: "region"
s3_staging_dir: "s3_staging_dir"
work_group: "work_group"
profiling:
enabled: true
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
However, the stats tab on the athena tables is still not filled. Is this not the purpose of the profiling flag? What an I missing?
Additionally, why is the Athena connector so much slower than the Glue connector? The Athena connector takes around 1h for less tables than the Glue connector.
Finally, after enabling the profilling for redshift, I started getting this error:
... cursor.execute(statement, parameters)
psycopg2.errors.InsufficientPrivilege: permission denied for schema 'schema'
However, I have allowed all the permitions from the redshift documentation’s page. What other permission do I need to runt he profilling?
Thank you 🙂adamant-furniture-37835
05/09/2022, 2:04 PMagreeable-army-26750
05/09/2022, 2:40 PMbillowy-flag-4217
05/09/2022, 3:17 PMorange-coat-2879
05/10/2022, 12:32 AMsource:
type: mysql
config:
# Coordinates
host_port: localhost:port
database: database
# Credentials
username: username
password: password
profiling:
enabled: true
sink:
# sink configs
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
orange-coat-2879
05/10/2022, 12:53 AMpip install acryl-datahub[airflow
]: nbconvert requires jinja2>=3.0 meanwhile flask requires jinja2<3.0? Thanks !cuddly-arm-8412
05/10/2022, 1:04 AMbrave-insurance-80044
05/10/2022, 2:41 AMswift-breakfast-25077
05/10/2022, 10:02 AMalert-football-80212
05/10/2022, 10:39 AMagreeable-army-26750
05/10/2022, 11:42 AMsource:
type: postgres
config:
# Coordinates
host_port: localhost:5432
database: postgres
# Credentials
username: admin
password: admin
# Options
database_alias: DatabaseNameToBeIngested
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
gifted-bird-57147
05/10/2022, 5:13 PMFile "/python3.9/site-packages/datahub/ingestion/transformer/base_transformer.py", line 252, in transform for urn, state in self.entity_map.items():
AttributeError: 'AddCustomOwnership' object has no attribute 'entity_map'
Is the example still current or has something changed recently?orange-coat-2879
05/10/2022, 10:30 PMmodern-artist-55754
05/11/2022, 3:52 AMastonishing-dusk-99990
05/11/2022, 8:13 AMsquare-solstice-69079
05/11/2022, 8:21 AMstraight-telephone-84434
05/11/2022, 10:21 AMfresh-napkin-5247
05/11/2022, 12:28 PMsource:
type: tableau
config:
# Coordinates
connect_uri: <https://region.online.tableau.com>
site: site
workbooks_page_size: 1
token_name: token
token_value: token
projects: ['project1', 'project2', …]
# Options
ingest_tags: True
ingest_owner: True
sink:
type: "datahub-rest"
config:
server: "<http://localhost:8080>"
There are no errors or warning logs:
Sink (datahub-rest) report:
{'records_written': 4778,
'warnings': [],
'failures': [],
'downstream_start_time': datetime.datetime(2022, 5, 11, 15, 0, 49, 396728),
'downstream_end_time': datetime.datetime(2022, 5, 11, 15, 2, 0, 111509),
'downstream_total_latency_in_seconds': 70.714781,
'gms_version': 'v0.8.34'}
What could be the problem?
Datahub version: acryl-datahub, version 0.8.34.1elegant-article-21703
05/11/2022, 1:35 PMfailed to parse
error. The change I have introduced to my previous ingestion it's that I change one customProperties
(customProp2
) from a string value to a list. Here is a sample:
[
{
"auditHeader": null,
"proposedSnapshot": {
"com.linkedin.pegasus2avro.metadata.snapshot.DashboardSnapshot": {
"urn": "urn:li:dashboard:(powerbi,analytics_update)",
"aspects": [
{
"com.linkedin.pegasus2avro.common.Ownership": {
"owners": [
{
"owner": "urn:li:corpGroup:some_owner",
"type": "DATAOWNER",
"source": null
}
],
"lastModified": {
"time": 0,
"actor": "urn:li:corpuser:dev",
"impersonator": null
}
}
},
{
"com.linkedin.pegasus2avro.dashboard.DashboardInfo": {
"title": "Analytics_Update",
"description": "Explanatory text about what this power BI is and what information the user can get",
"dashboardUrl": "<http://google.com|google.com>",
"customProperties": {
"customProp1": "MainDomain",
"customProp2": ["Charizard", "Pikachu"],
},
"lastModified": {
"created": {
"time": 1650279002,
"actor": "urn:li:corpuser:devn",
"impersonator": null
},
"deleted": null
},
"access": null,
"lastRefreshed": null
}
}
]
}
}
]
Isn't a value of a list supported in customProperties
? Is there any other workaround?
Thank you all in advance!agreeable-army-26750
05/11/2022, 2:46 PM(cd docker && COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose -p datahub -f docker-compose-without-neo4j.yml -f docker-compose-without-neo4j.override.yml -f docker-compose.dev.yml up -d --no-deps --force-recreate datahub-actions)
But when I try the datahub cli, my changes are not integrated. I am sure I am not building something, so the docker image wont change…
Can you help me what I have to rebuild and run to test my changes? Thanks in advace!rich-policeman-92383
05/11/2022, 5:08 PMsource:
type: oracle
config:
host_port: mydb:1521
env: "PROD"
username: myuser
password: mypass
service_name: myservice # omit database if using this option
schema_pattern:
allow:
- "schema.tablename"
table_pattern:
allow:
- "schema.tablename"
profiling:
enabled: True
profile_pattern:
allow:
- "schema.tablename"
sink:
type: "datahub-rest"
config:
server: '<https://mydatahubinstance.com:8080>'
microscopic-controller-88617
05/11/2022, 8:31 PMurl, source_ref and source_url
for the Glossary, but i can't find those information on Datahub after ingesting it. Is it just not implemented yet, or am i missing a step? Thanks in advance teamworknice-country-99675
05/11/2022, 8:37 PMSuperset
ingestion, using datahub 0.8.34. In method `emit_dashboard_mces`there is this piece of code
dashboard_response = self.session.get(
f"{self.config.connect_uri}/api/v1/dashboard",
params=f"q=(page:{current_dashboard_page},page_size:{PAGE_SIZE})",
)
payload = dashboard_response.json()
The request is failing with Missing Authorization Header
even when the session
object already has a token
{'User-Agent': 'python-requests/2.26.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Authorization': 'Bearer .....', 'Content-Type': 'application/json'}
Anybody has faced a similar issue with Superset? By the way, all these requests worked fine with Postman...orange-coat-2879
05/12/2022, 1:00 AMacryl-datahub-actions
but got error below. Appreciate any help.
Building wheels for collected packages: confluent-kafka
Building wheel for confluent-kafka (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [50 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-310
creating build/lib.linux-x86_64-cpython-310/confluent_kafka
x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwra pv -O2 -fPIC -I/usr/include/python3.10 -c /tmp/pip-install-5ufi6nkd/confluent-ka fka_98e4581a7dd144dcaaf7c59075c25202/src/confluent_kafka/src/Admin.c -o build/te mp.linux-x86_64-cpython-310/tmp/pip-install-5ufi6nkd/confluent-kafka_98e4581a7dd 144dcaaf7c59075c25202/src/confluent_kafka/src/Admin.o
In file included from /tmp/pip-install-5ufi6nkd/confluent-kafka_98e4581a7d d144dcaaf7c59075c25202/src/confluent_kafka/src/Admin.c:17:
/tmp/pip-install-5ufi6nkd/confluent-kafka_98e4581a7dd144dcaaf7c59075c25202 /src/confluent_kafka/src/confluent_kafka.h:23:10: fatal error: librdkafka/rdkafk a.h: No such file or directory
23 | #include <librdkafka/rdkafka.h>
| ^~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem wit h pip.
ERROR: Failed building wheel for confluent-kafka
Running setup.py clean for confluent-kafka
Failed to build confluent-kafka
Installing collected packages: confluent-kafka, fastavro, acryl-datahub-actions
Running setup.py install for confluent-kafka ... error
error: subprocess-exited-with-error
× Running setup.py install for confluent-kafka did not run successfully.
│ exit code: 1
╰─> [52 lines of output]
running install
/home/ubuntu/.local/lib/python3.10/site-packages/setuptools/command/instal l.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-310
creating build/lib.linux-x86_64-cpython-310/confluent_kafka
copying src/confluent_kafka/error.py -> build/lib.linux-x86_64-cpython-310 /confluent_kafka
copying src/confluent_kafka/serializing_producer.py -> build/lib.linux-x86 _64-cpython-310/confluent_kafka
copying src/confluent_kafka/deserializing_consumer.py -> build/lib.linux-x 86_64-cpython-310/confluent_kafka
copying src/confluent_kafka/__init__.py -> build/lib.linux-x86_64-cpython- 310/confluent_kafka
creating build/lib.linux-x86_64-cpython-310/confluent_kafka/schema_registr y
copying src/confluent_kafka/schema_registry/json_schema.py -> build/lib.li nux-x86_64-cpython-310/confluent_kafka/schema_registry
copying src/confluent_kafka/schema_registry/error.py -> build/lib.linux-x8 6_64-cpython-310/confluent_kafka/schema_registry
copying src/confluent_kafka/schema_registry/avro.py -> build/lib.linux-x86 _64-cpython-310/confluent_kafka/schema_registry
copying src/confluent_kafka/schema_registry/__init__.py -> build/lib.linux -x86_64-cpython-310/confluent_kafka/schema_registry
copying src/confluent_kafka/schema_registry/protobuf.py -> build/lib.linux -x86_64-cpython-310/confluent_kafka/schema_registry
copying src/confluent_kafka/schema_registry/schema_registry_client.py -> b uild/lib.linux-x86_64-cpython-310/confluent_kafka/schema_registry
creating build/lib.linux-x86_64-cpython-310/confluent_kafka/kafkatest
copying src/confluent_kafka/kafkatest/verifiable_client.py -> build/lib.li nux-x86_64-cpython-310/confluent_kafka/kafkatest
copying src/confluent_kafka/kafkatest/verifiable_consumer.py -> build/lib. linux-x86_64-cpython-310/confluent_kafka/kafkatest
copying src/confluent_kafka/kafkatest/__init__.py -> build/lib.linux-x86_6 4-cpython-310/confluent_kafka/kafkatest
copying src/confluent_kafka/kafkatest/verifiable_producer.py -> build/lib. linux-x86_64-cpython-310/confluent_kafka/kafkatest
creating build/lib.linux-x86_64-cpython-310/confluent_kafka/avro
copying src/confluent_kafka/avro/cached_schema_registry_client.py -> build /lib.linux-x86_64-cpython-310/confluent_kafka/avro
copying src/confluent_kafka/avro/error.py -> build/lib.linux-x86_64-cpytho n-310/confluent_kafka/avro
copying src/confluent_kafka/avro/__init__.py -> build/lib.linux-x86_64-cpy thon-310/confluent_kafka/avro
copying src/confluent_kafka/avro/load.py -> build/lib.linux-x86_64-cpython -310/confluent_kafka/avro
creating build/lib.linux-x86_64-cpython-310/confluent_kafka/serialization
copying src/confluent_kafka/serialization/__init__.py -> build/lib.linux-x 86_64-cpython-310/confluent_kafka/serialization
creating build/lib.linux-x86_64-cpython-310/confluent_kafka/admin
copying src/confluent_kafka/admin/__init__.py -> build/lib.linux-x86_64-cp ython-310/confluent_kafka/admin
creating build/lib.linux-x86_64-cpython-310/confluent_kafka/avro/serialize r
copying src/confluent_kafka/avro/serializer/message_serializer.py -> build /lib.linux-x86_64-cpython-310/confluent_kafka/avro/serializer
copying src/confluent_kafka/avro/serializer/__init__.py -> build/lib.linux -x86_64-cpython-310/confluent_kafka/avro/serializer
running build_ext
building 'confluent_kafka.cimpl' extension
creating build/temp.linux-x86_64-cpython-310
creating build/temp.linux-x86_64-cpython-310/tmp
creating build/temp.linux-x86_64-cpython-310/tmp/pip-install-5ufi6nkd
creating build/temp.linux-x86_64-cpython-310/tmp/pip-install-5ufi6nkd/conf luent-kafka_98e4581a7dd144dcaaf7c59075c25202
creating build/temp.linux-x86_64-cpython-310/tmp/pip-install-5ufi6nkd/conf luent-kafka_98e4581a7dd144dcaaf7c59075c25202/src
creating build/temp.linux-x86_64-cpython-310/tmp/pip-install-5ufi6nkd/conf luent-kafka_98e4581a7dd144dcaaf7c59075c25202/src/confluent_kafka
creating build/temp.linux-x86_64-cpython-310/tmp/pip-install-5ufi6nkd/conf luent-kafka_98e4581a7dd144dcaaf7c59075c25202/src/confluent_kafka/src
x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwra pv -O2 -fPIC -I/usr/include/python3.10 -c /tmp/pip-install-5ufi6nkd/confluent-ka fka_98e4581a7dd144dcaaf7c59075c25202/src/confluent_kafka/src/Admin.c -o build/te mp.linux-x86_64-cpython-310/tmp/pip-install-5ufi6nkd/confluent-kafka_98e4581a7dd 144dcaaf7c59075c25202/src/confluent_kafka/src/Admin.o
In file included from /tmp/pip-install-5ufi6nkd/confluent-kafka_98e4581a7d d144dcaaf7c59075c25202/src/confluent_kafka/src/Admin.c:17:
/tmp/pip-install-5ufi6nkd/confluent-kafka_98e4581a7dd144dcaaf7c59075c25202 /src/confluent_kafka/src/confluent_kafka.h:23:10: fatal error: librdkafka/rdkafk a.h: No such file or directory
23 | #include <librdkafka/rdkafka.h>
| ^~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem wit h pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> confluent-kafka
great-nest-9369
05/12/2022, 1:04 AMmost-plumber-32123
05/12/2022, 6:44 AM[2022-05-12 12:11:31,452] INFO {datahub.cli.ingest_cli:96} - DataHub CLI version: 0.8.34.1
[2022-05-12 12:11:31,738] ERROR {datahub.entrypoints:165} - Unable to connect to <http://localhost:9002/api/gms/config> with status_code: 401. Maybe you need to set up authentication? Please check your configuration and make sure you are talking to the DataHub GMS (usually <datahub-gms-host>:8080) or Frontend GMS API (usually <frontend>:9002/api/gms).
[2022-05-12 12:11:31,738] INFO {datahub.entrypoints:176} - DataHub CLI version: 0.8.34.1 at C:\Users\*****\AppData\Local\Programs\Python\Python39\lib\site-packages\datahub\__init__.py
[2022-05-12 12:11:31,738] INFO {datahub.entrypoints:179} - Python version: 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)] at C:\Users\*****\AppData\Local\Programs\Python\Python39\python.exe on Windows-10-10.0.22000-SP0
[2022-05-12 12:11:31,738] INFO {datahub.entrypoints:182} - GMS config {}
polite-orange-57255
05/12/2022, 7:09 AMmany-morning-40345
05/12/2022, 8:34 AMalert-football-80212
05/12/2022, 10:59 AMERROR: Please set env variable SPARK_VERSION
does anyone know something about it?