Hi team, Tried S3 data ingestion using Custome ing...
# ingestion
c
Hi team, Tried S3 data ingestion using Custome ingestion source option.Ingestion failed for profiling enabled as true with below error. JAVA_HOME and SPARK_VERSION are set. We have deployed Datahub in an ec2 instance with Docker. Please help
c
@cool-vr-73109 Datahub uses spark to do profiling. So we need spark set and running on your machine
c
Yes we have installed spark with hadoop, java installed in our machine
All paths like JAVA_HOME, SPARK_HOME and SPARK_VERSION are set
c
Is there different environment you are using?
c
No, set these paths in /etc/profile with root user
Do we need to set these variables container specific?
c
if you are running recipe from container then yes
c
No we are running from UI
c
Oh got it. You are using managed ingestion
in this case
you need to do it on server and not your local machine
c
Ok i will try with ingestion from server. If these paths are container specific could you please let me know in which container?
c
managed ingestion happens on server itself
c
Setting paths in /etc/profile is enough?
c
You can try. Let me check and get back.
@cool-vr-73109 2 options: 1. run recipe using cli, so that you will have control over environment 2. Do setup on below container: acryldata/datahub-actions
c
Thanks @careful-pilot-86309 we tried cli ingestion and after couple of python setup, s3 ingest with profiling enabled working fine
c
Glad to hear. 👍
c
@careful-pilot-86309 Is it possible to delete metadata from datahub? I tried the datahub hard delete command with urn and i got java unsupported operation error saying only upsert operation is supported. Then how to delete with this command?
c
Have you tried using cli delete? There are bunch of options available.
c
Yes I tried datahub delete --urn "<urn>" --hard
Then I got below exception. Can you suggest any other option?
c
Can you try with --soft?
it will just keep few entries in db
not much of a overhead
c
Hi @careful-pilot-86309 Thanks for your continuous support. I tried with this soft and i got successful message but the dataset is still visible from UI. So i tried with delete command for whole platform s3. Then the s3 platform got deleted from UI. But now the problem is new ingestions are successful but platform s3 is not visible from UI. Any options there to enable s3 platform from UI there?
c
Can you check with cli git that, your new entities are present on gms? Also try to refresh cache and see on UI.
w
Hi there! Got same error with deleting container using CLI command "delete"
Failed to execute operation
java.lang.UnsupportedOperationException: Only upsert operation is supported
Do you know work solution for this case?
c
It's better to go for factory reset and restart datahub
And try to start ingestion in a fresh stage
w
Hm, its strange. But thanks for reply! Firstly i try just restart datahub, hope its help