https://datahubproject.io logo
Title
b

brainy-balloon-97302

05/25/2023, 9:40 PM
Hi all! I have a glue ingestion job that constantly fails. It's failing with this error and was wondering if anyone has came across it before and was able to fix it?
'failures': {'<s3://aws-glue-assets-XXXXXX-us-west-2/scripts/Untitled> job.py': ['Unable to download DAG for Glue job from <s3://aws-glue-assets-XXXXXX-us-west-2/scripts/Untitled> job.py, so job subtasks and lineage will be missing: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.', 'Unable to download DAG for Glue job from <s3://aws-glue-assets-XXXXXX-us-west-2/scripts/Untitled> job.py, so job subtasks and lineage will be missing: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.']}
I don't have that file in s3 nor a glue job called
Untitled job.py
so I am trying to see what I can do to resolve. The rest of the metadata is being pulled over but it's annoying it's marking it as a failure.
g

gentle-hamburger-31302

05/31/2023, 11:38 AM
Hi @brainy-balloon-97302 Glue is calling
get_all_jobs
, so in that list somewhere it is getting that file. Ingestion will not fail.
b

brainy-balloon-97302

05/31/2023, 3:23 PM
@gentle-hamburger-31302 thank you for your help, I really appreciate it. I found the problem child glue jobs with this script I am attaching just in case others run into a similar issue down the road.
import boto3

session = boto3.Session(profile_name='XXXXXX')  # Replace 'XXXXXX' with your profile name
glue = session.client('glue')

response = glue.get_jobs()

count = 0
next_token = None

while True:
    if next_token:
        response = glue.get_jobs(NextToken=next_token)
    else:
        response = glue.get_jobs()

    for job in response['Jobs']:
        # Replace XXXXXXXXX with your AWS Account ID or put the proper s3 location
        if job['Command']['ScriptLocation'] == '<s3://aws-glue-assets-XXXXXXXXX-us-west-2/scripts/Untitled> job.py':
            print("The name of the job with the script location 'Untitled job.py' is:", job['Name'])
        count += 1

    next_token = response.get('NextToken')
    if not next_token:
        break

print("Total number of Glue jobs: ", count)