I tried to add integrate datahub with spark, I am ...
# ingestion
s
I tried to add integrate datahub with spark, I am able to get the task lineage but spark job got stuck and is always running.
c
Are you using jupyter notebook or spark-submit? have you called spark.stop() at end?
l
@stocky-midnight-78204 Were you able to get this working?
s
@careful-pilot-86309 @loud-island-88694 I fixed it by add spark.stop() at end by myself. Thanks for your help
k
@careful-pilot-86309 i had same problem spark job keep running how i can stop it after finish i using spark-submit ?! Thanks in advance
c
Have you called spark.stop() in your script?
k
I run my saprk job with airflow in python code when i added spark.stop() at the end of my code give me dag error 'NameError: name 'spark' is not defind! Sorry if this dump question
c
What's the name of your spark session or sparkcontext variable? Basically you need to stop spark context gracefully
you have got interesting case here. Actual spark context creation and submission is being handled internally by airflow and seems like they are not stopping context gracefully.
k
indeed! job done gracefully and succeed without integrate spark config, and never call application end at the end