nutritious-train-7865
04/23/2022, 3:51 AMhive-metastore-postgresql
container and get timeout exception for waiting on service when running the integration tests for presto-on-hive
and trino
source on my M1 mac, could anyone help me with it? Thanks!most-waiter-95820
04/23/2022, 3:27 PMcuddly-arm-8412
04/24/2022, 6:21 AMicy-ram-1893
04/24/2022, 7:47 AMsalmon-rose-54694
04/25/2022, 2:28 AMmysterious-nail-70388
04/25/2022, 6:25 AMmany-guitar-67205
04/25/2022, 7:44 AMupstreamLineage
aspect. This particular dataset (a kafka topic) is reported by Atlas to have a lineage of over 6000 hdfs files. (background: these are json files that are generated every 15 minutes by some external process)
The ingestion reports that all workunits have succeeded, but the gms logs show the following:
07:23:56.574 [qtp544724190-15] INFO c.l.m.r.entity.AspectResource:125 - INGEST PROPOSAL proposal: {aspectName=upstreamLineage, systemMetadata={lastObserved=1650871434684, runId=file-2022_04_25-09_23_54}, entityUrn=urn:li:dataset:(urn:li:dataPlatform:kafka,udexprd
.RESOURCEPERFORMANCE.PROD.STREAM.FAST.15MIN.RAW.FAMILIES,PROD), entityType=dataset, aspect={contentType=application/json, value=ByteString(length=1912263,bytes=7b227570...227d5d7d)}, changeType=UPSERT}
07:23:57.343 [qtp544724190-15] ERROR c.l.m.d.producer.KafkaEventProducer:146 - Failed to emit MCL for entity with urn urn:li:dataset:(urn:li:dataPlatform:kafka,udexprd.RESOURCEPERFORMANCE.PROD.STREAM.FAST.15MIN.RAW.FAMILIES,PROD)
org.apache.kafka.common.errors.RecordTooLargeException: The message is 1856310 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration.
it's clear that the message is too large for Kafka. I could play around with the kafka configuration, and increase max.request.size
but that's not a good longterm solution.
Several questions:
1. the ingest should report a failure. Why doesn't it? (probably because the kafka publish is async?)
2. Is there any other way to add lineage than as an update to the single aspect?
3. I could try to put some hard limits on the lineage ingestion, but then you loose information. Are there any other ways this could be modeled/ingested?cuddly-arm-8412
04/25/2022, 12:15 PMbright-beard-86474
04/25/2022, 4:17 PMlemon-terabyte-66903
04/25/2022, 4:21 PMlemon-terabyte-66903
04/26/2022, 12:36 AMbland-orange-13353
04/26/2022, 1:32 AMbland-orange-13353
04/26/2022, 8:41 AMbrash-photographer-9183
04/26/2022, 11:06 AMbrash-photographer-9183
04/26/2022, 12:05 PMlemon-terabyte-66903
04/26/2022, 4:24 PMFailed to merge incompatible data types double and bigint
. Is it possible to handle this error and record this schema change in schemaMetadata
aspect? cc @hundreds-photographer-13496mysterious-nail-70388
04/27/2022, 2:54 AMicy-ram-1893
04/27/2022, 6:26 AMmysterious-lamp-91034
04/27/2022, 6:54 AM./gradlew :metadata-ingestion:codegen
Then I don't see the snapshotclass generated in metadata-ingestion/src/datahub/metadata/com/linkedin/pegasus2avro/metadata/snapshot/__init__.py
. But I see lots of other snapshots generated. Is it expected?
I want snapshot because I want to ingest MCE, MCE takes snapshot as the first parameter.
I know snapshot is legacy, is there a way to ingest without MCE?
Thanksorange-coat-2879
04/27/2022, 7:44 AMbrash-photographer-9183
04/27/2022, 9:59 AMfresh-coat-71059
04/27/2022, 12:39 PM<db>.<table>
and dataset of Superset`<conection_name>.<db>.<table>`
# mysql dataset
urnlidataset:(urnlidataPlatform:mysql,MySQL.test.test1,PROD)
# dataset used in a superset dashboard
urnlidataset:(urnlidataPlatform:mysql,test.test1,PROD)millions-sundown-65420
04/27/2022, 12:41 PMprehistoric-salesclerk-23462
04/27/2022, 1:31 PMcolossal-easter-99672
04/27/2022, 4:06 PMpurple-student-30113
04/27/2022, 4:14 PMcurved-football-28924
04/27/2022, 7:02 PMFile "/home/karthickaravindan/.local/lib/python3.8/site-packages/datahub/emitter/mcp.py", line 17, in _make_generic_aspect
serialized = json.dumps(pre_json_transform(<http://codegen_obj.to|codegen_obj.to>_obj()))
AttributeError: 'str' object has no attribute 'to_obj'
rhythmic-stone-77840
04/27/2022, 7:47 PMdatahub delete --entity_type glossaryTerm --query "*" -f --hard
datahub delete --entity_type glossaryNode --query "*" -f --hard
And the run says that it did hard delete rows for the entries found, but I'm still seeing the nodes and terms show up on the DataHub UI and I can still click through them. Anyone have an idea on whats going on?cuddly-arm-8412
04/28/2022, 9:55 AMmicroscopic-mechanic-13766
04/28/2022, 10:07 AM