I am trying to regenerate a table, I have deleted ...
# troubleshooting
t
I am trying to regenerate a table, I have deleted it but cannot regenerate it. The error states to
try deleting the table to remove all metadata associated with it.
- What metadata do I need to manually remove for this to work?
Schrödinger’s table : both exists and doesn’t exist at the same time
m
What version of Pinot are you using? @Tim Santos is this the same issue you fixed a while back?
t
Copy code
tag: release-0.11.0
t
I notice that you have both an offline and online. Just wondering if you tried deleting both of them first?
t
yes, that is what I did
it seems to leave stuff behind in zk (store>config>table)
I fully cleared it out and the the automated AddTable cli command returned 400: table already exists. However, it did not exist (at least in any functional way). I had to manually re-add the tables in the UI, where only the offline shows up in the table/ returns (expect for the broker, which is the only place the existence of the realtime table is acknowledged)
m
Is this repeatable? If so, @Tim Santos let’s capture the workflow that ends up in this state.
t
I have deleted and recreated these tables 3 times with the same results
m
Ok, so sounds like repeatable. Could you describe the steps on how you create/delete?
And we can try to reproduce on our side
t
I see the table config is defined but the ideal state is null for the REALTIME table
The steps were the following: 1. Create Realtime Table 2. Ingest data via kafka 3. Create Offline Table 4. Realtime to Offline Task runs 5. Batch Ingest Data with overlapping timestamp values 6. Realtime to Offline Task runs (every day) 7. Some time passes, data is working fine 8. Have some malformed records, can’t delete - need to regenerate table. 9. Delete Realtime Table in UI 10. Delete Offline Table in UI 11. Delete pinot controller folder for that table in S3 12. ZK still has record of Realtime table in the config 13. Manually delete ZK record of realtime config 14. Run Automated AddTable k8 Job: Both failed: Error 400: table already exists. Not True - query for the tables and no results would return. 15. Manually create the REALTIME and OFFLINE tables in the UI Realtime table returns
"error": "Failed to get status (ingestion status) for table uplinkpayloadevent. Reason: Specified table name: uplinkpayloadevent of type: realtime does not exist."
Offline table returns
"Cannot retrieve ingestion status for Table : uplinkpayloadevent_OFFLINE since it does not use the built-in SegmentGenerationAndPushTask task"
Broker returns:
Copy code
<http://localhost:56954/v2/brokers/tables/uplinkpayloadevent?type=REALTIME>

[
  {
    "port": 8099,
    "host": "pinot-broker-0.pinot-broker-headless.datalake.svc.cluster.local",
    "instanceName": "Broker_pinot-broker-0.pinot-broker-headless.datalake.svc.cluster.local_8099"
  }
]
I can’t delete the uplinkpayload REALTIME table because it doesn’t exist and I can’t create the uplinkpayload OFFLINE table because it does exist.
If I remove the ZK config table, I can create the table again, but is in the same state. Also tried restarting the whole cluster to no avail
t
Do you see any un-deleted segments for the realtime table. Can you try hitting the /segments API?
@Rong R could this be related to segments not being instantly deleted when
controller.deleted.segments.retentionInDays
is not set to 0?
r
No it doesn't matter that only affects cleanup.
t
@Thomas Steinholz did you see anything in the logs when trying to recreate the realtime table after cleaning up from ZK?
If there is any issue with the realtime consumption, the expected behavior is to clean up the table + metadata. But wondering if the automatic cleanup is not happening correctly.
t
I do indeed have segments
I manually deleted the controller-data folder for that whole table, so the only place in S3 it could be seeing those are the
Deleted_Segments
folder
yes, those segments are exactly the
pinot/controller-data/Deleted_Segments/uplinkpayloadevent
S3 dir
r
Deleted segment folder are indeed controlled by the retention setting tim mentioned above
t
so, I should remove them as well as the ZK config and try create the table again?
r
Hmm. But you should not have any thing left on ZK related to these deleted segments though
t
I did not always have the retention configured, but I recently updated it to 70 days to reduce some of the data we have there
I think that is true, I was talking about the propertystore>config>table>realtime table - since I have no other way of deleting this table at the moment
👍 1
r
There's 2 "retention" settings. One is how long files are kept in the deleted_segment folder before permenantly deleted. One is for keeping realtime segments from being deleted.
Yeah in that case cleaning it up and restart from fresh should work
👍 1
Unfortunately deleted segments encodes the retention setting at the time of deletion so changing you cluster config afterwards will not retroactively remove those in the deleted_segment folder. You have to manually clean them up
t
okay, that is good to know - I had a feeling
👍 1
The difference between the two retention settings is one is on the realtime table and the other is on the offline table, right?
r
one is deleting a realtime segment (move them into the deleted_segment folder) One is permanently delete files (e.g. deleting them from the deleted_segment folder for good)
t
Deleting the S3 Deleted_segments is not enough for the table to stop existing, there are still ZK records of the old segments as well
Copy code
2022/11/05 15:21:18.660 INFO [AddTableCommand] [main] {"code":400,"error":"TableConfigs: uplinkpayloadevent already exists. Use PUT to update existing config"}
r
which version of pinot are you running?
t
Copy code
tag: release-0.11.0
Is there a way for me to delete thousands of the orphaned ZK nodes correlating with the pinot segments?
r
hmm. that table deletion race condition bug should be fixed by 0.11
unfortunately, using a
_2
suffix might be easier.
t
Is it possible that leaving the delete tab in the UI early would cause this incomplete transaction with the pinot DDL?
r
shouldnt. Since the API is async
t
is there a pod that would have logs if something ran into an error? I could check and see if there’s anything there
yeah, I think I will have to use the
_2
method
👍 1
I tried with
_v1
and it doesn’t show up in the ui (like the original table). I tried
-v1
, which broke the kafka subscription part, which ended me up with just
v1
giving us
uplinkpayloadeventv1
Copy code
2022/11/07 21:12:31.301 INFO [AddTableCommand] [main] {"code":500,"error":"org.apache.pinot.shaded.org.apache.kafka.common.KafkaException: Failed to construct kafka consumer"}
r
hmm I am not sure i understand the question. even the table name is different your stream config should still use the same kafka topic name right?
t
The error actually ended up being
Caused by: <http://org.apache.pinot.shaded.org|org.apache.pinot.shaded.org>.apache.kafka.common.config.ConfigException: No resolvable bootstrap urls given in bootstrap.servers
- seems like I have an unrelated issue with coredns after updating EKS.
It wasn’t clear why the 500 error occurred from the app layer message, so I assumed it was because of a special character I added - but that seems to be incorrect