few-sugar-84064
09/22/2022, 3:17 AMError parsing DAG for Glue job. The script <s3://steadio-glue-info/scripts/test-datahub-lineage.py> cannot be processed by Glue (this usually occurs when it has been user-modified): An error occurred (InvalidInputException) when calling the GetDataflowGraph operation: line 11:87 no viable alternative at input \'## @type: DataSource\\n## @args: [catalog_connection = "redshiftconnection", connection_options = {"database" =\'']}
• Dataset job code - have no idea what I need to put for job id and flow idhundreds-photographer-13496
09/22/2022, 7:04 AMfew-sugar-84064
09/22/2022, 7:37 AMhundreds-photographer-13496
09/22/2022, 8:36 AMbuilder.builder.make_data_job_urn
to construct urn for data job. You can directly use the urn of glue datajob ingested in DataHub, for which you need to set lineage.
from typing import List
from datahub.emitter.mcp import MetadataChangeProposalWrapper
from datahub.emitter.rest_emitter import DatahubRestEmitter
from datahub.metadata.com.linkedin.pegasus2avro.datajob import DataJobInputOutputClass
from datahub.metadata.schema_classes import ChangeTypeClass
datajob_input_output = DataJobInputOutputClass( inputDatasets=["<placeholder for input redshift table urn>"],outputDatasets=["<placeholder for output redshift table urn>"])
datajob_input_output_mcp = MetadataChangeProposalWrapper(
entityType="dataJob",
changeType=ChangeTypeClass.UPSERT,
entityUrn="<placeholder for glue job urn>",
aspectName="dataJobInputOutput",
aspect=datajob_input_output,
)
# Create an emitter to the GMS REST API.
emitter = DatahubRestEmitter("<http://localhost:8080>")
# Emit metadata!
emitter.emit_mcp(datajob_input_output_mcp)
few-sugar-84064
09/27/2022, 3:58 AMurn:li:dataJob:(urn:li:dataFlow:(glue,flow name,PROD),flow name)
and ran the code, it ran successfully, but the glue flow still doesnt have lineage on frontend view.