A complete solution for open data platforms, enterprise data catalogs, data lakes and data management. Open source, mature, fully-featured and production ready.

DataHub

image.png

I have one more interesting issue that I’m trying to resolve now.
When I added great_expectations action to ingest validation data to DataHub, my task failed with memory leak.
```  - name: datahub_action
    action:
      module_name: datahub.integrations.great_expectations.action
      class_name: DataHubValidationAction
      server_url: <http://datahub-gms:8080>```
Do you have any idea what can be the problem?

```[2022-03-30, 11:20:37 UTC] {great_expectations.py:80} INFO - Running validation with Great Expectations...
[2022-03-30, 11:20:37 UTC] {great_expectations.py:83} INFO - Ensuring data context is valid...
[2022-03-30, 11:20:37 UTC] {data_context.py:620} INFO - Usage statistics is disabled; skipping initialization.
[2022-03-30, 11:20:58 UTC] {local_task_job.py:154} INFO - Task exited with return code Negsignal.SIGKILL
[2022-03-30, 11:20:58 UTC] {taskinstance.py:1280} INFO - Marking task as FAILED. dag_id=dwh_process_dim_tables, task_id=ge_dim_truck, execution_date=20220329T000000, start_date=20220330T112036, end_date=20220330T112058
[2022-03-30, 11:20:58 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check```
nothing special in logs

we are running this task in Docker Container

How much memory does docker container configured to have? I’m trying to understand how you identified the issue as being a memory leak.

As I understand from the documentation such error means insufficient resources
```Task exited with return code Negsignal.SIGKILL```
Then I tried to monitor how much resources are used by my task and see that memory quickly increasing to the limit
When I removed the DataHub action it works with max 500MB of memory

How much memory was the container using and still failing? 10GB?

Also we tried to run Pipeline and got the same issue <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/programatic_pipeline.py>

When I removed the limits from my docker-compose I got the max 10 GB and it failed

I see. Could you please open an issue in <https://github.com/datahub-project/datahub/issues> so that we can track this issue? Please add as much information as possible. If you can define a reproducible test case for this it would be perfect!

If you can provide the action with the  great_expectations code you’re using &amp; the characteristics of what was being processed would be awesome

I’ve created the issue <https://github.com/datahub-project/datahub/issues/4531>