melodic-dentist-23675
07/27/2021, 5:27 PMfresh-fish-73471
07/28/2021, 4:44 AMsquare-activity-64562
07/28/2021, 8:26 AMchilly-holiday-80781
07/28/2021, 9:49 PMlemon-receptionist-88902
07/29/2021, 12:16 PMprehistoric-yak-75049
07/29/2021, 8:13 PMdatahub
but didn’t find event dictionary jar/classes. like we have for Python in ingestion module.polite-flower-25924
07/29/2021, 8:56 PMproud-church-91494
07/30/2021, 12:59 AMairflow.cfg
file. (follow this documentation https://datahubproject.io/docs/metadata-ingestion#using-datahubs-airflow-lineage-backend-recommended)
[lineage]
backend = datahub_provider.lineage.datahub.DatahubLineageBackend
datahub_kwargs = {
"datahub_conn_id": "datahub_rest_default",
"capture_ownership_info": true,
"capture_tags_info": true,
"graceful_exceptions": true }
Airflow show me this error:
airflow-webserver_1 | [2021-07-30 00:47:44,154] {configuration.py:468} ERROR - No module named 'datahub_provider'
airflow-webserver_1 | [2021-07-30 00:47:59,084] {configuration.py:468} ERROR - No module named 'datahub_provider'
airflow-webserver_1 | [2021-07-30 00:48:13,803] {configuration.py:468} ERROR - No module named 'datahub_provider'
Are there something to do before these steps? Like "pip installl" something?
I'm using Airflow 2.1.2.salmon-cricket-21860
07/30/2021, 5:39 AM.
in their user name. GMS says it's ingested and posted but they are not searched in elasticsearch and can't find ES index (used Kibana to find them on ES).
I am using datahub 0.8.6.cool-iron-6335
07/30/2021, 8:52 AMfaint-hair-91313
07/30/2021, 12:45 PMtransformers:
- type: "simple_add_dataset_ownership"
...
- type: "simple_add_dataset_tags"
little-van-63930
07/31/2021, 9:47 PMmysterious-lamp-73086
08/01/2021, 10:19 AMmost-cricket-43285
08/02/2021, 10:36 AMnarrow-policeman-29290
08/03/2021, 3:45 AMMetadataChangeProposal
and MetadataChangeEvent
as both seem to have similar descriptions except for what is emitted after the change occurs:
class MetadataChangeProposalClass(DictWrapper):
"""Kafka event for proposing a metadata change for an entity. A corresponding MetadataChangeLog is emitted when the change is accepted and committed, otherwise a FailedMetadataChangeProposal will be emitted instead."""
class MetadataChangeEventClass(DictWrapper):
"""Kafka event for proposing a metadata change for an entity. A corresponding MetadataAuditEvent is emitted when the change is accepted and committed, otherwise a FailedMetadataChangeEvent will be emitted instead."""
adventurous-scooter-52064
08/03/2021, 8:49 AMproducer_config
in order to use datahub-kafka?colossal-furniture-76714
08/03/2021, 10:37 AM"No root resource defined for path '/entities'","status":404}(base)
If I try to ingest via curl. The other option via acryl addon works...
curl '<http://localhost:8080/entities?action=ingest>' -X POST --data '{
"entity":{
"value":{
"com.linkedin.metadata.snapshot.DatasetSnapshot":{
"aspects":[
{
"com.linkedin.common.Ownership":{
"owners":[
{
"owner":"urn:li:corpuser:fbar",
"type":"DATAOWNER"
}
],
"lastModified":{
"time":0,
"actor":"urn:li:corpuser:fbar"
}
}
},
{
"com.linkedin.common.InstitutionalMemory":{
"elements":[
{
"url":"<https://www.linkedin.com>",
"description":"Sample doc",
"createStamp":{
"time":0,
"actor":"urn:li:corpuser:fbar"
}
}
]
}
},
{
"com.linkedin.schema.SchemaMetadata":{
"schemaName":"FooEvent",
"platform":"urn:li:dataPlatform:foo",
"version":0,
"created":{
"time":0,
"actor":"urn:li:corpuser:fbar"
},
"lastModified":{
"time":0,
"actor":"urn:li:corpuser:fbar"
},
"hash":"",
"platformSchema":{
"com.linkedin.schema.KafkaSchema":{
"documentSchema":"{\"type\":\"record\",\"name\":\"MetadataChangeEvent\",\"namespace\":\"com.linkedin.mxe\",\"doc\":\"Kafka event for proposing a metadata change for an entity.\",\"fields\":[{\"name\":\"auditHeader\",\"type\":{\"type\":\"record\",\"name\":\"KafkaAuditHeader\",\"namespace\":\"com.linkedin.avro2pegasus.events\",\"doc\":\"Header\"}}]}"
}
},
"fields":[
{
"fieldPath":"foo",
"description":"Bar",
"nativeDataType":"string",
"type":{
"type":{
"com.linkedin.schema.StringType":{
}
}
}
}
]
}
}
],
"urn":"urn:li:dataset:(urn:li:dataPlatform:foo,bar,PROD)"
}
}
}
}'
adventurous-scooter-52064
08/04/2021, 6:49 AMself = <datahub.ingestion.api.registry.Registry object at 0x7f10a1cdef70>
key = 'glue'
Type = typing.Type
T = ~T
self._mapping = {'athena': <class 'datahub.ingestion.source.sql.athena.AthenaSource'>,
'bigquery': <class 'datahub.ingestion.source.sql.bigquery.BigQuerySource'>,
'bigquery-usage': <class 'datahub.ingestion.source.usage.bigquery_usage.BigQueryUsageSource'>,
'dbt': <class 'datahub.ingestion.source.dbt.DBTSource'>,
'druid': <class 'datahub.ingestion.source.sql.druid.DruidSource'>,
'feast': <class 'datahub.ingestion.source.feast.FeastSource'>,
'file': <class 'datahub.ingestion.source.file.GenericFileSource'>,
'glue': ModuleNotFoundError("No module named 'mypy_boto3_glue'"),
'hive': <class 'datahub.ingestion.source.sql.hive.HiveSource'>,
'kafka': <class 'datahub.ingestion.source.kafka.KafkaSource'>,
'kafka-connect': <class 'datahub.ingestion.source.kafka_connect.KafkaConnectSource'>,
'ldap': <class 'datahub.ingestion.source.ldap.LDAPSource'>,
'looker': <class 'datahub.ingestion.source.looker.LookerDashboardSource'>,
'lookml': <class 'datahub.ingestion.source.lookml.LookMLSource'>,
'mongodb': <class 'datahub.ingestion.source.mongodb.MongoDBSource'>,
'mssql': <class 'datahub.ingestion.source.sql.mssql.SQLServerSource'>,
'mysql': <class 'datahub.ingestion.source.sql.mysql.MySQLSource'>,...
tp = ModuleNotFoundError("No module named 'mypy_boto3_glue'")
ConfigurationError = <class 'datahub.configuration.common.ConfigurationError'>
.
.
.
ConfigurationError: glue is disabled; try running: pip install 'acryl-datahub[glue]'
faint-hair-91313
08/04/2021, 2:11 PMfaint-hair-91313
08/04/2021, 3:40 PMfast-leather-13054
08/05/2021, 1:28 PMadventurous-scooter-52064
08/05/2021, 4:12 PMnarrow-kitchen-1309
08/05/2021, 5:03 PMfuture-waitress-970
08/05/2021, 5:56 PMCaused by: java.net.URISyntaxException: Urn doesn't start with 'urn:'. Urn: at index 0:
at com.linkedin.common.urn.Urn.<init>(Urn.java:80)
at com.linkedin.common.urn.Urn.createFromString(Urn.java:231)
at com.linkedin.common.urn.DataPlatformUrn.createFromString(DataPlatformUrn.java:26)
at com.linkedin.common.urn.DataPlatformUrn$1.coerceOutput(DataPlatformUrn.java:60)
I already tried nuking datahub several times, fixing things within the file, updating github, etc. Anyone got any tipsbland-easter-53873
08/06/2021, 2:36 PMfuture-waitress-970
08/06/2021, 4:21 PMSink (datahub-rest) report:
{'failures': [], 'records_written': 1, 'warnings': []}
Pipeline finished successfully
On the json file attached, I go to the GUI and it crashes, giving me the following error once you dig through the logs:
Caused by: java.net.URISyntaxException: Urn doesn't start with 'urn:'. Urn: at index 0:
at com.linkedin.common.urn.Urn.<init>(Urn.java:80)
at com.linkedin.common.urn.Urn.createFromString(Urn.java:231)
at com.linkedin.common.urn.DataPlatformUrn.createFromString(DataPlatformUrn.java:26)
at com.linkedin.common.urn.DataPlatformUrn$1.coerceOutput(DataPlatformUrn.java:60)
witty-butcher-82399
08/09/2021, 2:50 PMdataPlatform
as redshift
. This is noted here and here.
Since I want to ingest tables from multiple redshift clusters, I would like to differentiate them by having different values for the dataPlatform
.
I have thought of changing this with a custom transform, but since dataPlatform
is part of the URN, a custom transform wouldn’t work and so this requires to be managed from the connector itself; please, correct me if I’m wrong.
Actually, the model is the one preventing this. Current approach seems to model dataPlatform
as a sort of platform categorization. Is there any plans to model dataPlatform
as platform instances instead?curved-jordan-15657
08/10/2021, 11:06 AMNo entities touched by this run. Double check your run id?
rolling back deletes the entities created by a run and reverts the updated aspects
this rollback deleted 0 entities and rolled back 0 aspects
showing first 0 of 0 aspects reverted by this run
+-------+---------------+--------------+
| urn | aspect name | created at |
+=======+===============+==============+
+-------+---------------+--------------+
I know the runId is correct because i’ve used it with “show” method and clearly saw the all tables i’ve ingested. (71 tables). How do i resolve this issue? Thanks in advance!handsome-football-66174
08/10/2021, 8:02 PMmagnificent-camera-71872
08/11/2021, 5:10 AM