I'm having trouble when using tiered storage confi...
# troubleshooting
s
I'm having trouble when using tiered storage configuration when moving segments from one server to another. Here's the error message I get: Segment fetcher is not configured for protocol: http, using default Download and move segment immutable_events__0__0__20220527T1606Z from peer with scheme http failed. java.lang.IllegalArgumentException: The input uri list is null or empty
m
What tiered storage are you using? And does it support http access (as seen in the log)?
s
For this test I'm trying to move segments older than 5 minutes from a one local server to another. So my local dev environment. But it mimics what our production plan is, to keep a certain amount of data (7d or 30d) on more expensive server and move older data to a cheaper server. It looks like it all works, the segment does think it's on the cheaper server, but it's in a bad state and can't be queried I think b/c of the error above.
I see this in the docs:
As long your remote jobs send Pinot controller with the corresponding URI to the files it will pick up the file and allocate it to proper Pinot Servers and brokers
I'm running the rebalancer when I get this error though, is that considered a remote job?
I followed the steps here
Here's the relevant portion of my table config
{ "tableName": "immutable_events", "tableType": "REALTIME", "segmentsConfig": { "timeColumnName": "event_timestamp", "timeType": "MILLISECONDS", "schemaName": "immutable_events", "replicasPerPartition": "1", "retentionTimeUnit": "DAYS", "retentionTimeValue": 90, "peerSegmentDownloadScheme": "http" }, "routing": { "instanceSelectorType": "strictReplicaGroup" }, "tenants": {"server": "Immutable"}, "tierConfigs": [{ "name": "tierA", "segmentSelectorType": "time", "segmentAge": "5m", "storageType": "pinot_server", "serverTag": "Immutable_OFFLINE" }],
m
s
Yep, that's the doc I patterned after
But got the error above when running the rebalance
m
I am not sure why you need to run rebalance explicitly
s
It was so I could test that the segment moves properly
Instead of waiting for the job to run
b/c according to the docs running the rebalancer manually accomplishes the same thing as the job
This area: Under the hood, this job runs a rebalance. So you can achieve the same effect as a manual trigger by running a rebalance
m
I see, so you are trying to exercise the rebalance path for manual testing
s
correct
m
What’s the deep store configured?
s
Just to watch the tiered storage happen. I set my data age to something really low (5m) and my consuming segment threshold to 10k records. Then I ingest 11k records and fill my consuming segment and make sure the first segment > 5m old and run the rebalancer hoping to see the older segment get pushed to my cheaper server.
It's all local so using whatever the default is for deepstore, in our gke env it's gcs
I'm trying to get it working locally first
my local docker environment
m
How many controllers do you have
s
1
m
All components running on same jvm?
s
all of the jvm stuff is defaulted
I can triple confirm that though if that could be causing my issue
m
I am guessing that in QuickStart like setup may be short circuiting deepstore and hence rebalance might not be working
s
Do you have a link to sample deepstore configuration?
m
Can you query segment metadata using swagger
It should give a download url
s
sure, let me skaffold back up and try that
m
If that is empty then my theory is correct
s
So this is before the rebalance when I can query the table and the two segments are healthy
{ "immutable_events__0__1__20220528T1550Z": { "segmentName": "immutable_events__0__1__20220528T1550Z", "schemaName": "immutable_events", "crc": -9223372036854776000, "creationTimeMillis": 1653753057230, "creationTimeReadable": "2022-05-28T155057:230 UTC", "timeColumn": null, "timeUnit": null, "timeGranularitySec": null, "startTimeMillis": null, "startTimeReadable": null, "endTimeMillis": null, "endTimeReadable": null, "segmentVersion": null, "creatorName": null, "totalDocs": 0, "custom": {}, "indexes": {}, "star-tree-index": null }, "immutable_events__0__0__20220528T1546Z": { "segmentName": "immutable_events__0__0__20220528T1546Z", "schemaName": null, "crc": 2160189803, "creationTimeMillis": 1653753056975, "creationTimeReadable": "2022-05-28T155056:975 UTC", "timeColumn": "event_timestamp", "timeUnit": "MILLISECONDS", "timeGranularitySec": 0, "startTimeMillis": 1653750000000, "startTimeReadable": "2022-05-28T150000.000Z", "endTimeMillis": 1653750000000, "endTimeReadable": "2022-05-28T150000.000Z", "segmentVersion": "v3", "creatorName": null, "totalDocs": 10000, "custom": {}, "columns": [], "indexes": {}, "star-tree-index": null } }
I don't see a URI attribute there though
first question is that the metadata you were looking for and second is are you thinking the download uri will show up after I run the rebalance?
oh but looking in the UI for the segment I see this
Copy code
{
  "segment.crc": "2160189803",
  "segment.creation.time": "1653752792603",
  "segment.download.url": "",
  "segment.end.time": "1653750000000",
  "segment.flush.threshold.size": "10000",
  "segment.index.version": "v3",
  "segment.realtime.endOffset": "10000",
  "segment.realtime.numReplicas": "1",
  "segment.realtime.startOffset": "0",
  "segment.realtime.status": "DONE",
  "segment.start.time": "1653750000000",
  "segment.time.unit": "MILLISECONDS",
  "segment.total.docs": "10000"
}
so def empty download uri
For the index that is not in consuming state
Here's the metadata for that one
Copy code
{
  "segment.creation.time": "1653753057230",
  "segment.flush.threshold.size": "10000",
  "segment.realtime.numReplicas": "1",
  "segment.realtime.startOffset": "10000",
  "segment.realtime.status": "IN_PROGRESS"
}
So for local how do I configure so that that download url is not empty?
I'm guessing all of this would work fine in our gke env, I'm just struggling b/c we don't have a deepstore configured for local dev. But, we'd like to be able to run and test locally before pushing to gke
m
Oh, these are IN_PROGRESS segments, ie they have not been committed yet
s
The last one is in progress, the first one is done
the one that is done is the one the rebalance tries to move to my other server, which is exactly what I want, but fails due to empty URL
m
For the one that is done, can you check the endpoint
/tables/<tableName>/segments/<segmentName>
That should show the
segment.download.url
s
Yep, so the done one shows this: "segment.download.url": "",
as you speculated
m
Yeah, so that’s why rebalance won’t work.
Are you using QuickStart?
if so, it might be short-circuiting deepstore as well as server data dir to be the same (to make it fast).
s
not using QuickStart, using skaffold and we have a values.local.yaml to replace certain settings
m
Whats the value of data dir on controller and server?
s
So maybe the answer is to configure the deepstore for the local use case?
controller.data.dir=/var/pinot/controller/data
for the controller
m
Hmm, does that contain the segment that is DONE?
s
And here's the server config
pinot.server.instance.dataDir=/var/pinot/server/data/index
m
You don’t need to configure deep-store for this experiment, btw
s
ok good
m
Check these data dir to see if both contain the DONE segment
s
The done segment is sitting in a server in a ONLINE state now b/c I haven't run the rebalance, So it's currently sitting here
Server_ica-pinot-server-realtime-immutable-0.ica-pinot-server-realtime-immutable-headless.default.svc.cluster.local_8098
m
I meant inside the dataDir
s
ah let me check
m
on controller as well as server
s
I don't see either of the segments in the controller right now
checking server
m
Ok, so segment never made it to controller, so that’s why it cannot be downloaded by the other server.
But that is the confusing part to me, for segment to be in DONE state, it has to be committed to controller dir (deep store or not)
s
I see the done segment in the server
So the consuming segment is not in the data dir in the server, the committed or done one is persisted there
m
My guess is that the local setup has some quriks to reduce unnecessary copies of data, and that might be responsible here. I am pretty sure in a non-local setup it will work, regardless of deep-store or not (latter if 1 controller only)
s
So this segment is the one that is persisted
Copy code
{
  "segment.crc": "2160189803",
  "segment.creation.time": "1653752792603",
  "segment.download.url": "",
  "segment.end.time": "1653750000000",
  "segment.flush.threshold.size": "10000",
  "segment.index.version": "v3",
  "segment.realtime.endOffset": "10000",
  "segment.realtime.numReplicas": "1",
  "segment.realtime.startOffset": "0",
  "segment.realtime.status": "DONE",
  "segment.start.time": "1653750000000",
  "segment.time.unit": "MILLISECONDS",
  "segment.total.docs": "10000"
}
I think you're right
1
It makes sense, but I'd love to try to get this sorted locally so I don't have to go through our build pipeline to verify things
So the flow is that done segments will move to the controller data dir first?
And the move to their final destination w/ the rebalance?
just curious about the flow
Oh and I do see the consuming segment in the server as well in the consumers folder
I'm going to run the rebalance to see what happens
OK so on rebalance the segment doesn't seem to be on the controller or the first server it was on nor the server I want it to go to
I'll keep battling it, I have to go be dad for a while and I'm sure you have Saturday things going on too. Thanks for your help, it gives me a path!
n
Afaik, there are no local setup optimizations to avoid copying to deep store. One question regarding your table config, why have you set the peer download scheme? That setting makes it bypass deep store I think
m
Ah I didn’t notice that, that will also explain this behavior
s
That's a great question, another team member configured the table, I just grabbed it and started on the tiered portion. I researched though and looked like it could either be http or https. Are you saying we should remove that?
n
Yes try without it
If it still doesn't work, I'll try in my end with quickstart
s
OK will do
May take me some time, need to knock a few other things out, but I'll keep this thread posted on that one. Thanks all!
👍 2
Oh just so you can see my tags, here's how I added them
echo "adding offline server tags:" curl -X PUT "$PINOT_URL/instances/Server_ica-pinot-server-offline-0.ica-pinot-server-offline-headless.default.svc.cluster.local_8098/updateTags?tags=DefaultTenant_OFFLINE%2CDefaultTenant_REALTIME%2CImmutable_OFFLINE&updateBrokerResource=true" -H "accept: application/json" -o /dev/null -s echo "adding realtime mutable server tags:" curl -X PUT "$PINOT_URL/instances/Server_ica-pinot-server-realtime-mutable-0.ica-pinot-server-realtime-mutable-headless.default.svc.cluster.local_8098/updateTags?tags=DefaultTenant_OFFLINE%2CDefaultTenant_REALTIME%2CMutable_REALTIME&updateBrokerResource=true" -H "accept: application/json" -o /dev/null -s echo "adding realtime immutable server tags:" curl -X PUT "$PINOT_URL/instances/Server_ica-pinot-server-realtime-immutable-0.ica-pinot-server-realtime-immutable-headless.default.svc.cluster.local_8098/updateTags?tags=DefaultTenant_OFFLINE%2CDefaultTenant_REALTIME%2CImmutable_REALTIME&updateBrokerResource=true" -H "accept: application/json" -o /dev/null -s
n
Looks like all your servers are tagged with all 3 tags (default OFFLINE, default Realtime, immutable OFFLINE)? Would be better to have them mutually exclusive.
Oh wait, first command has Mutable, second has Immutable. Looks right
s
I wondering if I needed to keep the default tags on there, I leaned toward keeping them as you see.
We have 3 types of servers for our configuration, an offline (cheaper) a realtime immutable (super beefy) and realtime mutable (pretty beefy)
Same error
Download and move segment immutable_events__0__0__20220528T1640Z from peer with scheme http failed.
The entire log:
Failure in getting online servers for segment immutable_events__0__0__20220528T1640Z _org.apache.pinot.spi.utils.retry.RetriableOperationException_: java.lang.NullPointerException at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:58) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.core.util.PeerServerSegmentFinder.getPeerServerURIs(PeerServerSegmentFinder.java:77) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.downloadSegmentFromPeer(RealtimeTableDataManager.java:501) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.downloadAndReplaceSegment(RealtimeTableDataManager.java:445) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:337) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:170) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:164) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?] Caused by: java.lang.NullPointerException at org.apache.pinot.core.util.PeerServerSegmentFinder.getOnlineServersFromExternalView(PeerServerSegmentFinder.java:98) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.core.util.PeerServerSegmentFinder.lambda$getPeerServerURIs$0(PeerServerSegmentFinder.java:78) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.spi.utils.retry.BaseRetryPolicy.attempt(BaseRetryPolicy.java:50) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] ... 18 more Segment fetcher is not configured for protocol: http, using default Download and move segment immutable_events__0__0__20220528T1640Z from peer with scheme http failed. _java.lang.IllegalArgumentException_: The input uri list is null or empty at org.apache.pinot.common.utils.fetcher.BaseSegmentFetcher.fetchSegmentToLocal(BaseSegmentFetcher.java:88) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.downloadSegmentFromPeer(RealtimeTableDataManager.java:503) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.downloadAndReplaceSegment(RealtimeTableDataManager.java:445) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:337) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:170) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:164) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?] at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?]
n
It still says failed to download from peer. It shouldn't say "peer" in a regular setup
s
Interesting ok I'll try to troubleshoot that one
Any direction on where I've set my local incorrectly to think it needs to download from a peer is apprectiated
Is it just this table config? peerSegmentDownloadScheme
I'm going to tear down and re-skaffold my local just to triple confirm that table config is gone and get new logs
s
omg it finally worked
I tore it all down and triple confirmed all the settings for decoupling were correct and it worked. Thanks all!
m
Noice fistbump
s
The grind is why we love to do this I keep telling myself 🙂
Here's my download url now
So all good
m
Glad you were able to get it going. Sorry I missed the peer download setting earlier.
s
All good I really really appreciate both of you helping me out today!
❤️ 2
m
Hopefully this made you more knowledgeable about inner workings of Pinot