hello, i get an error where execute the /opt/data...
# getting-started
b
hello, i get an error where execute the /opt/datahub/datahub-master/metadata-ingestion/sql-etl/mysql_etl.py the error is avro.io.AvroTypeException: The datum is not an example of the schema
m
@bulky-lunch-72217: I would suggest moving to the new and improved python ingestion scripts if you can.
b
thanks, i try it
i successfully execute
datahub ingest -c ./examples/recipes/mysql_to_datahub.yml
, but the datahub's prod has not change.
g
Are you using the datahub-kafka sink or the datahub-rest sink?
Also, it seems curious that "workunits_produced" is empty - maybe all the tables are being filtered out by the allow/deny rules?
b
yes, I was wrong, i set the wrong allow/deny rules.
it's work, but I can't find the lineage of MySQL table, just a single point, how do I make these relationships?
i make these ralationships via curl and api, but i don't know what is "($params:(),namebarUp,originPROD,platform:urn%3Ali%3AdataPlatform%3Afoo)"
Copy code
<http://localhost:8080/datasets/($params:(),name:barUp,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)/downstreamLineage>
github: https://github.com/linkedin/datahub/blob/928444928a1618d0a861cfff371d12951d39a3ab/gms/README.md#create-group
g
Where are you getting the lineage information from?
b
i make these lineages via
Copy code
curl '<http://localhost:8080/dashboards?action=ingest>' -X POST -H 'X-RestLi-Protocol-Version:2.0.0' --data '{"snapshot": {"aspects":[{"com.linkedin.dataset.UpstreamLineage":{"upstreams":[{"auditStamp":{"time":1612576061011,"actor":"urn:li:corpuser:fbar"},"dataset":"urn:li:dataset:(urn:li:dataPlatform:mysql,my.test.user,PROD)","type":"TRANSFORMED"},{"auditStamp":{"time":1612576061011,"actor":"urn:li:corpuser:fbar"},"dataset":"urn:li:dataset:(urn:li:dataPlatform:mysql,my.test.metatable,PROD)","type":"TRANSFORMED"}]}}],"urn":"urn:li:dataset:(urn:li:dataPlatform:mysql,my.test.test,PROD)"}}'
now, i want to get downstream datasets of user, how do i do?
g
the upstream and downstream information should appear in the UI under the lineage tab of the dataset
b
yes, i can see the lineage in the UI, can i get the lineage via api? like this
curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' '<http://localhost:8080/datasets/($params:(),name:barUp,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)/downstreamLineage>' | jq
b
that should be the correct endpoint to fetch down stream lineage for your dataset named BarUp
btw I don't see the "barUp" table name in the curl ^^
b
i see this example, actually, i dont know how format
($params:(),name:barUp,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Afoo)
b
Oh i see -- so that is basically a stringified DatasetKey.pdl struct... the name = the name of the dataset you are querying for, the origin = the environment (PROD, STAGING), the platform = the dataset platform type (kafka, hdfs, hive, etc!)
@bulky-lunch-72217 can you try the following:
Copy code
curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' '<http://localhost:8080/datasets/($params:(),name:my.test.user,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Amysql)/downstreamLineage>' | jq
By the way, is the name "my.test.test" or "my.test.user"? if it is "my.test.test." you can try
Copy code
curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' '<http://localhost:8080/datasets/($params:(),name:my.test.test,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Amysql)/downstreamLineage>' | jq
b
yes, it is my.test.user
thanks!!! it's ok
i use this
curl -H 'X-RestLi-Protocol-Version:2.0.0' -H 'X-RestLi-Method: get' '<http://localhost:8080/datasets/($params:(),name:my.test.user,origin:PROD,platform:urn%3Ali%3AdataPlatform%3Amysql)/downstreamLineage>'
b
wooo!!
r
@bulky-lunch-72217 and @big-carpet-38439 Where did you submit the curl request ? if if did submit the similar request in command line i was getting "URL using bad/illegal format or missing URL" as output