numerous-ram-92457
01/19/2023, 9:58 PMcool-tiger-42613
01/20/2023, 9:31 AMsales
is the dataset belonging to the realestate_db
hence the display name should only be sales
. The output is not as expected.
the code is from the example here https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/dataset_schema.py
Can I get some help with this please?kind-dusk-91074
01/20/2023, 11:35 AMcalm-dinner-63735
01/20/2023, 12:48 PMwitty-butcher-82399
01/20/2023, 2:01 PMalert-fall-82501
01/20/2023, 5:24 PMlemon-lock-92370
01/21/2023, 3:38 AMmetadata-ingestion/src/datahub/ingestion/source/aws/glue.py
file and built an image for datahub-ingestion (using docker/datahub-ingestion/Dockerfile
).
But I’d like to apply this code to UI ingestion. It seems there’s no such aws glue file in datahub-actions code..?
How can I apply code in metadata-ingestion directory to UI ingestion..? I coudn’t find any right place in values.yaml
either.. 😢
Please help 🙏 Thank you in advance 🙇careful-ability-12984
01/23/2023, 5:31 AMelegant-salesmen-99143
01/23/2023, 10:26 AMsteep-family-13549
01/23/2023, 10:37 AMlively-spring-5482
01/23/2023, 10:42 AMCREATE TEMPORARY TABLE tmp1 AS (
SELECT
t1.id
, t1.attr_column_1
, t1.attr_column_2
, t2.attr_column_3
, t3.not_used
FROM src_table_1 AS t1
INNER JOIN src_table_2 AS t2 USING (id_1)
INNER JOIN src_table_3 AS t3 USING (id_2)
);
INSERT INTO target_table
SELECT
t.id
, t.attr_column_1
, t.attr_column_2
, t.attr_column_3
, s.attr_column_4
FROM tmp1 AS t
INNER JOIN src_table_4 AS s USING (id);
What was observed when ingesting the lineage for target_table
likes is that the (somewhat unfortunate) use of a temporary table in the script results in getting partial sourcing information as the outcome. Specifically: target_table
shows as having src_table_4
as its only downstream source, while technically speaking this is not the case -> it is sourced from src_table_1
, src_table_2
& src_table_4
(whether or not src_table_3
should be included is a separate discussion).
I wonder if this behaviour can be modified by configuration in release 0.9.6? If not, then is it a limitation that you plan to remove? Is there a workaround you could suggest other than, of course, refactor to use CTEs?
Thanks in advance for looking into it. Have an excellent day :)wooden-jackal-88380
01/23/2023, 10:55 AMstraight-camera-35934
01/23/2023, 10:59 AMlively-dusk-19162
01/23/2023, 5:44 PMnutritious-yacht-6205
01/23/2023, 7:40 PMlively-dusk-19162
01/23/2023, 9:53 PMlively-dusk-19162
01/23/2023, 9:53 PMrich-state-73859
01/24/2023, 12:10 AMdatahub-protobuf
lib to ingest protobuf schema but it could not parse the message comment correctly after I updated this lib to the latest version (v.0.9.6). Here is the detailed issue info. Could someone help me with that?microscopic-machine-90437
01/24/2023, 9:02 AMblue-rainbow-97669
01/24/2023, 9:55 AMbest-umbrella-88325
01/24/2023, 3:22 PMelegant-salesmen-99143
01/24/2023, 4:23 PMhelpful-tent-87247
01/24/2023, 5:56 PM'2023-01-24 17:53:10.177088 [exec_id=22af9dd4-b420-4021-880f-4eec9c4e0677] INFO: Caught exception EXECUTING '
'task_id=22af9dd4-b420-4021-880f-4eec9c4e0677, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/asyncio/streams.py", line 525, in readline\n'
' line = await self.readuntil(sep)\n'
' File "/usr/local/lib/python3.10/asyncio/streams.py", line 620, in readuntil\n'
' raise exceptions.LimitOverrunError(\n'
'asyncio.exceptions.LimitOverrunError: Separator is found, but chunk is longer than limit\n'
rhythmic-glass-37647
01/24/2023, 8:53 PMelegant-salesmen-99143
01/25/2023, 11:38 AMstateful_ingestion.ignore_old_state
and stateful_ingestion.ignore_new_state parametres
, the description is not clear to me.
It says "If set to True, ignores the previous/current checkpoint state". But what is a checkpoint state? how does it ignore it?limited-forest-73733
01/25/2023, 12:22 PMblue-rainbow-97669
01/25/2023, 3:12 PMmagnificent-lawyer-97772
01/25/2023, 3:19 PMGenericCheckpointState
. @gray-shoe-75895 I noticed that you did a lot of work in that area.stocky-energy-24880
01/25/2023, 3:20 PMtimestampMillis
value is same for multiple datasets then while fetching the TimeSeries aspect for one dataset urn returning the aspect value for other datasets as well. Please find below details.
I have created a TimeSeries aspect with below mentioned .pdl files:
DatasetTimeSeriesTest.pdl
namespace com.mine.tests
import com.linkedin.timeseries.TimeseriesAspectBase
@Aspect = {
"name": "datasetTimeSeriesTest",
"type": "timeseries"
}
record DatasetTimeSeriesTest includes TimeseriesAspectBase {
@TimeseriesFieldCollection = {"key":"urn"}
testItems: optional array[TestItem]
}
TestItem.pdl
namespace com.mine.tests
import com.linkedin.common.Urn
record TestItem {
urn: Urn
@TimeseriesField = {}
name: string
@TimeseriesField = {}
count: long
}
With the TimeSeries aspect (datasetTimeSeriesTest
) I am able to ingest values correctly for 2 different dataset.
urn:li:dataset:(urn:li:dataPlatform:postgres,lusiadas.public.company,PROD)
and urn:li:dataset:(urn:li:dataPlatform:postgres,lusiadas.public.department,PROD)
with same `timestampMillis`(1674658250386)
curl --location --request POST '<http://localhost:8080/aspects?action=ingestProposal>' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
"proposal" : {
"entityType": "dataset",
"entityUrn" : "urn:li:dataset:(urn:li:dataPlatform:postgres,lusiadas.public.company,PROD)",
"changeType" : "UPSERT",
"aspectName" : "datasetTimeSeriesTest",
"aspect" : {
"value" : "{ \"timestampMillis\":1674658250386, \"testItems\": [ {\"urn\": \"urn:li:dataset:(urn:li:dataPlatform:postgres,lusiadas.public.company,PROD)\",\"name\": \"company1\", \"count\": 101}]}",
"contentType": "application/json"
}
}
}'
curl --location --request POST '<http://localhost:8080/aspects?action=ingestProposal>' \
--header 'X-RestLi-Protocol-Version: 2.0.0' \
--header 'Content-Type: application/json' \
--data-raw '{
"proposal" : {
"entityType": "dataset",
"entityUrn" : "urn:li:dataset:(urn:li:dataPlatform:postgres,lusiadas.public.department,PROD)",
"changeType" : "UPSERT",
"aspectName" : "datasetTimeSeriesTest",
"aspect" : {
"value" : "{ \"timestampMillis\":1674658250386, \"testItems\": [ {\"urn\": \"urn:li:dataset:(urn:li:dataPlatform:postgres,lusiadas.public.department,PROD)\",\"name\": \"department1\", \"count\": 102}]}",
"contentType": "application/json"
}
}
}'
But then when I queried the aspect for one dataset urn(urn:li:dataset:(urn:li:dataPlatform:postgres,lusiadas.public.company,PROD)
)
I got the response for the other dataset urn as well (urn:li:dataset:(urn:li:dataPlatform:postgres,lusiadas.public.department,PROD)
)
Query:
curl -X POST '<http://localhost:8080/aspects?action=getTimeseriesAspectValues>' \
--data '{
"urn": "urn:li:dataset:(urn:li:dataPlatform:postgres,lusiadas.public.company,PROD)",
"entity": "dataset",
"aspect": "datasetTimeSeriesTest",
"latest": true
}'
Response:
{
"value": {
"aspectName": "datasetTimeSeriesTest",
"entityName": "dataset",
"values": [
{
"aspect": {
"value": "{\"timestampMillis\":1674658250386,\"testItems\":[{\"urn\":\"urn:li:dataset:(urn:li:dataPlatform:postgres,lusiadas.public.company,PROD)\",\"name\":\"company1\",\"count\":101}]}",
"contentType": "application/json"
}
},
{
"aspect": {
"value": "{\"timestampMillis\":1674658250386,\"testItems\":[{\"urn\":\"urn:li:dataset:(urn:li:dataPlatform:postgres,lusiadas.public.department,PROD)\",\"name\":\"department1\",\"count\":102}]}",
"contentType": "application/json"
}
}
],
"limit": 10000
}
}
Is this a known issue?
Or I am doing something wrong?
Can you please suggest.
Also is "autoRender:true"
does not work with TimeSeries Aspects?
I mean when I tried below mentioned code with Versioned Aspects then I am able to view the Aspect on L-DH UI for a dataset but not able to view it for TimeSeries Aspects.
"autoRender": true,
"renderSpec": {
"displayType": "tabular", // or properties
"key": "tests",
"displayName": "My Tests"
}
Can we fetch custom TimeSeries Aspects with graphql?lively-dusk-19162
01/25/2023, 4:13 PM