ancient-policeman-73437
08/30/2022, 10:07 AMsilly-finland-62382
08/30/2022, 12:19 PMfrom pyspark.sql import SparkSession
spark=SparkSession.builder \
.config("spark.datahub.metadata.dataset.platformInstance", "dataset") \
.enableHiveSupport() \
.getOrCreate();
df = spark.sql("select * from parkdatabricks_table_test1")
but, I am seeing upstream as hdfs not hive, can u suggest me how to show uostream in dathub as hove of same dataset using spark lineage ?millions-sundown-65420
08/30/2022, 12:35 PMmodern-monitor-68945
08/30/2022, 1:07 PMnarrow-toothbrush-13209
08/30/2022, 1:11 PMTask exited with return code Negsignal.SIGSEGV
sparse-advantage-78335
08/30/2022, 1:30 PMbig-barista-70811
08/30/2022, 1:44 PMlemon-engine-23512
08/30/2022, 3:55 PMsilly-finland-62382
08/30/2022, 5:41 PMfrom pyspark.sql import SparkSession
spark=SparkSession.builder \
.config("spark.datahub.metadata.dataset.platformInstance", "dataset") \
.enableHiveSupport() \
.getOrCreate();
df = spark.sql("select * from parkdatabricks_table_test1")
but, I am seeing upstream as hdfs not hive, can u suggest me how to show upstream in datahub as hive of same dataset using spark lineage? (edited)brave-nail-85388
08/30/2022, 8:13 PMbrave-nail-85388
08/30/2022, 8:13 PMcool-actor-73767
08/30/2022, 3:36 PMfew-carpenter-93837
08/31/2022, 5:35 AM_get_column_info
in vertica.py, for timestamptz & timestamp there's an argument called precision
, as of currently the import for sqlalchemy.sql import sqltypes
, doesn't contain an argument in class TIMESTAMP(DateTime)
for precision, only Timezone.
A dirty fix from our side was to just add precision=None
& self.precision = precision
to the class which now gives a correct output, but as you might think, patching our custom code over the sqlalchemy dependency in CLI codebase isn't a perfect solution.
Any ideas or directions on how to tackle this would be appreciated.flaky-soccer-57765
08/31/2022, 8:15 AMbetter-orange-49102
08/31/2022, 10:13 AMbright-receptionist-94235
08/31/2022, 11:02 AMsilly-finland-62382
08/31/2022, 4:12 PMfrom pyspark.sql import SparkSession
spark=SparkSession.builder \
.config("spark.datahub.metadata.dataset.platformInstance", "dataset") \
.enableHiveSupport() \
.getOrCreate();
df = spark.sql("select * from parkdatabricks_table_test1")
but, I am seeing upstream as hdfs not hive, can u suggest me how to show upstream in datahub as hive of same dataset using spark lineage? (edited) (edited)kind-whale-32412
08/31/2022, 8:01 PMlemon-engine-23512
08/31/2022, 9:25 PMfull-chef-85630
08/31/2022, 9:07 AMjolly-traffic-67085
09/01/2022, 7:34 AMhallowed-kilobyte-916
09/01/2022, 12:54 PMsource:
type: glue
config:
aws_region: ${aws_region}
aws_access_key_id: ${aws_access_key_id}
aws_secret_access_key: ${aws_secret_access_key}
I created a .env
file where i defined the environment variables like aws_region
. How do I reference the .env
file? I can't seem to find any documentation on thismillions-sundown-65420
09/01/2022, 1:39 PMsparkSession = SparkSession.builder \
.appName("Events to Datahub") \
.config("spark.jars.packages","io.acryl:datahub-spark-lineage:0.8.23") \
.config("spark.extraListeners","datahub.spark.DatahubSparkListener") \
.config("spark.datahub.rest.server", "<http://localhost:9002>") \
.enableHiveSupport() \
.getOrCreate()
little-spring-72943
09/01/2022, 4:00 PMrapid-fall-7147
09/01/2022, 4:34 PMlate-truck-7887
09/01/2022, 5:14 PMaspects = [
DatasetPropertiesClass(
name=nice_human_readable_name,
customProperties=properties,
description=description,
externalUrl=url
),
]
or alternatively:
aspects = [
DatasetPropertiesClass(
qualifiedName=nice_human_readable_name,
customProperties=properties,
description=description,
externalUrl=url
),
]
The mcps that are generated have proper format:
connection.submit_change_proposals
[MetadataChangeProposalWrapper(entityType='dataset', changeType='UPSERT', entityUrn='urn:li:dataset:(urn:li:dataPlatform:s3,test_s3_dataset3567c322-fd92-4417-98f0-90a66e32101b,PROD)', entityKeyAspect=None, auditHeader=None, aspectName='ownership', aspect=OwnershipClass({'owners': [OwnerClass({'owner': 'urn:li:corpuser:etl', 'type': 'DATAOWNER', 'source': OwnershipSourceClass({'type': 'SERVICE', 'url': None})})], 'lastModified': AuditStampClass({'time': 1661399154, 'actor': 'urn:li:corpuser:etl', 'impersonator': None, 'message': None})}), systemMetadata=None), MetadataChangeProposalWrapper(entityType='dataset', changeType='UPSERT', entityUrn='urn:li:dataset:(urn:li:dataPlatform:s3,test_s3_dataset3567c322-fd92-4417-98f0-90a66e32101b,PROD)', entityKeyAspect=None, auditHeader=None, aspectName='datasetProperties', aspect=DatasetPropertiesClass({'customProperties': {'here3567c322-fd92-4417-98f0-90a66e32101b': 'are some fake properties', 'that_are': 'used_for_testing'}, 'externalUrl': None, 'name': 'test_s3_dataset3567c322-fd92-4417-98f0-90a66e32101b', 'qualifiedName': None, 'description': 'This is a fake description of a dataset', 'uri': None, 'tags': []}), systemMetadata=None), MetadataChangeProposalWrapper(entityType='dataset', changeType='UPSERT', entityUrn='urn:li:dataset:(urn:li:dataPlatform:s3,test_s3_dataset3567c322-fd92-4417-98f0-90a66e32101b,PROD)', entityKeyAspect=None, auditHeader=None, aspectName='institutionalMemory', aspect=InstitutionalMemoryClass({'elements': [InstitutionalMemoryMetadataClass({'url': '<https://www.google.com/>', 'description': 'link3567c322-fd92-4417-98f0-90a66e32101b', 'createStamp': AuditStampClass({'time': 1661399154, 'actor': 'urn:li:corpuser:etl', 'impersonator': None, 'message': None})})]}), systemMetadata=None), MetadataChangeProposalWrapper(entityType='dataset', changeType='UPSERT', entityUrn='urn:li:dataset:(urn:li:dataPlatform:s3,test_s3_dataset3567c322-fd92-4417-98f0-90a66e32101b,PROD)', entityKeyAspect=None, auditHeader=None, aspectName='globalTags', aspect=GlobalTagsClass({'tags': [TagAssociationClass({'tag': 'urn:li:tag:tag13567c322-fd92-4417-98f0-90a66e32101b', 'context': None}), TagAssociationClass({'tag': 'urn:li:tag:tag_23567c322-fd92-4417-98f0-90a66e32101b', 'context': None})]}), systemMetadata=None)]
but then this rather crytpic error message (see attached screenshot).
Any advise appreciated! Thanks!
Slack Conversationclever-garden-23538
09/01/2022, 6:15 PMcreamy-tent-10151
09/01/2022, 11:30 PMalert-fall-82501
09/02/2022, 4:53 AMsteep-laptop-41463
09/02/2022, 7:19 AM