Hello. I’ve deployed datahub in our company’s k8s ...
# ingestion
c
Hello. I’ve deployed datahub in our company’s k8s cluster and works in private network. I’ve ingested our athena schemas and tables using datahub cli with “datahub ingest -c athena.yml” command with datahub-rest method. Now i want to use rollback method, but even if i get the runId with “datahub ingest list-runs” method and use that runId with “rollback” method, it says that:
Copy code
No entities touched by this run. Double check your run id?
rolling back deletes the entities created by a run and reverts the updated aspects
this rollback deleted 0 entities and rolled back 0 aspects
showing first 0 of 0 aspects reverted by this run
+-------+---------------+--------------+
| urn   | aspect name   | created at   |
+=======+===============+==============+
+-------+---------------+--------------+
I know the runId is correct because i’ve used it with “show” method and clearly saw the all tables i’ve ingested. (71 tables). How do i resolve this issue? Thanks in advance!
m
This is quite surprising. You sure there isn’t a typo in copy pasting the runid?
@curved-jordan-15657 There might be some clues in the gms logs if the run id is valid. Do you know how to check them?
c
@mammoth-bear-12532 I’m pretty sure there is no typo. As i mentioned, i can use that runId with “show” command
Is there any command i can write to get gms logs? Because my teammate did the k8s deployment.
Btw, it works perfect with my local docker containers. To reach our server, i’ve changed “localhost:8080” to “our-gms-endpoint:8080" from .datahubenv file.
m
@green-football-43791 might be able to help you debug.
c
@green-football-43791 that would be great.
g
Hey @curved-jordan-15657! One thing to check first- have you ingested athena data multiple times or just once?
c
When i was testing something, i’ve ingested multiple times
g
I see- and what aspects are listed when you run
show --run-id
with your athena run id?
c
i see the all rows i’ve ingested
g
In the aspect column, do you see aspects other than DatasetKey aspect? E.g. SchemaMetadata, DatasetInfo, anything like that?
c
All the 71 rows are datasetKey
g
Ok- I think I know whats happening.
When we released the deletes feature, we also started creating a Key aspect for each entity when we see it for the first time.
This way, when you roll back a Key aspect, DataHub knows to delete the whole entity
However, there was one tricky case- entities that had already been ingested that did not have keys
here, we still created a Key aspect. However, we didn't want to have deleting this key delete the whole entity since it was in fact created by an earlier run-id.
You are seeing this because your athena datasets were created before the delete API was released. Then, you ingested more data after the delete API was released. There is no one-command way to delete these aspects. You can delete them by rolling back each athena ingestion you've run, and then finally rolling back the
no-run-id-provided
run, which will rollback all pre-deletes data.
Sorry for the trouble- deleting data ingested before the delete API was released is unfortunately somewhat tricky. The good news is that data you ingest from here on out will be easy to delete 🙂
c
Umm, but i guess something is wrong, because i’ve ingested the athena yesterday. I guess delete API was available?
I’m doing POC and i’m pretty new user 😄
g
Hmm- interesting! Maybe we can take this to a Slack huddle to sort out more quickly?
b
yeah we really want to figure this one out ^^
@curved-jordan-15657 @green-football-43791 Were you able to get it resolved?
g
Not quite- Nareg had to hop off but there are a few things he's going to try next and we hope they will resolve the issue.
c
I guess my ingestion is somehow corrupted. There is nothing to do with CLI but i will try accessing MYSQL db and clear them if i’m right
I believe that we will solve the problem then i will try the ingest-rollback thing again 🙂
b
Okay got it. Thanks Nareg, please keep us posted!
c
Thank you all, i will!
I’ve resolved the issue. Connected to our remote mysql db, dropped the tables. Then, copied the tables from my local mysql db which includes datahub tables which created with quickstart. At this point , i’ve checked the remote datahub, i could’ve see the tables but they were empty. Lastly, i’ve ingested athena again and done!
g
Nice!! Glad you got it working!
So after ingesting the data again, you were able to successfully rollback?