https://pinot.apache.org/ logo
l

Laxman Ch

12/23/2020, 1:49 PM
Hi, anyone facing issues with Segment purging with GCS as deep store.
We are facing the following issue
Copy code
2020/12/22 02:35:25.329 ERROR [SegmentDeletionManager] [pool-6-thread-1] Had trouble deleting directories: <gs://test-bucket-dev/pinot-data/Deleted_Segments>
java.io.IOException: java.io.IOException: java.net.URISyntaxException: Expected scheme-specific part at index 5: file:
	at org.apache.pinot.plugin.filesystem.GcsPinotFS.listFiles(GcsPinotFS.java:322) ~[pinot-gcs-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.controller.helix.core.SegmentDeletionManager.removeAgedDeletedSegments(SegmentDeletionManager.java:233) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.controller.helix.core.retention.RetentionManager.postprocess(RetentionManager.java:89) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.controller.helix.core.periodictask.ControllerPeriodicTask.postprocess(ControllerPeriodicTask.java:131) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.controller.helix.core.periodictask.ControllerPeriodicTask.processTables(ControllerPeriodicTask.java:97) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.controller.helix.core.periodictask.ControllerPeriodicTask.runTask(ControllerPeriodicTask.java:68) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.core.periodictask.BasePeriodicTask.run(BasePeriodicTask.java:120) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.core.periodictask.PeriodicTaskScheduler.lambda$start$0(PeriodicTaskScheduler.java:73) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:834) [?:?]
Caused by: java.io.IOException: java.net.URISyntaxException: Expected scheme-specific part at index 5: file:
	at org.apache.pinot.plugin.filesystem.GcsPinotFS.getBase(GcsPinotFS.java:135) ~[pinot-gcs-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.plugin.filesystem.GcsPinotFS.normalizeToDirectoryPrefix(GcsPinotFS.java:105) ~[pinot-gcs-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.plugin.filesystem.GcsPinotFS.listFiles(GcsPinotFS.java:307) ~[pinot-gcs-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	... 13 more
Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 5: file:
	at java.net.URI$Parser.fail(URI.java:2913) ~[?:?]
	at java.net.URI$Parser.failExpecting(URI.java:2919) ~[?:?]
	at java.net.URI$Parser.parse(URI.java:3119) ~[?:?]
	at java.net.URI.<init>(URI.java:685) ~[?:?]
	at java.net.URI.<init>(URI.java:786) ~[?:?]
	at org.apache.pinot.plugin.filesystem.GcsPinotFS.getBase(GcsPinotFS.java:133) ~[pinot-gcs-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.plugin.filesystem.GcsPinotFS.normalizeToDirectoryPrefix(GcsPinotFS.java:105) ~[pinot-gcs-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.plugin.filesystem.GcsPinotFS.listFiles(GcsPinotFS.java:307) ~[pinot-gcs-0.6.0-shaded.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	... 13 more
Had gone through this codepath multiple times but I couldn’t see any issue with config or code.
d

Daniel Lavoie

12/23/2020, 1:53 PM
The exceptions complains of a malformed URI
Copy code
java.net.URISyntaxException: Expected scheme-specific part at index 5: file:
Can you share your non sensitive gcs configuration?
l

Laxman Ch

12/23/2020, 3:53 PM
Yeah. GCS path is also there in the above stacktrace.
Copy code
ERROR [SegmentDeletionManager] [pool-6-thread-1] Had trouble deleting directories: <gs://test-bucket-dev/pinot-data/Deleted_Segments>
Segments are getting uploaded up to gcs cleanly. Also, they are moved to Deleted_Segments too in gcs. But, they are not getting deleted from there after the retention period is over
controller config here (masked some data)
Copy code
root@pinot-controller-0:/opt/pinot# cat /var/pinot/controller/config/pinot-controller.conf
controller.helix.cluster.name=my-views
controller.port=9000
controller.zk.str=zookeeper.host.svc.cluster.local:2181/pinot
pinot.set.instance.id.to.hostname=true
controller.data.dir=<gs://test-bucket-dev/pinot-data>
controller.enable.split.commit=true
controller.local.temp.dir=/var/pinot/controller/data/temp
<http://pinot.controller.storage.factory.class.gs|pinot.controller.storage.factory.class.gs>=org.apache.pinot.plugin.filesystem.GcsPinotFS
pinot.controller.storage.factory.gs.projectId=XXXXXXXXX
pinot.controller.storage.factory.gs.gcpKey=XXXXXXXX.json
pinot.controller.segment.fetcher.protocols=file,http,gs
pinot.controller.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
x

Xiang Fu

12/23/2020, 4:16 PM
@Elon have you seen similar issue or any resolution is recommended?
e

Elon

12/23/2020, 5:16 PM
Checking, I think I know why, will update shortly
Working on a fix, will update shortly
thanks for catching this @Laxman Ch! - we are affected by this as well
x

Xiang Fu

12/23/2020, 7:43 PM
Thanks Elon!
l

Laxman Ch

12/24/2020, 1:17 AM
Thanks for the update @Elon. Curious to know where this bug is in the code. Also, is there workaround?
e

Elon

01/04/2021, 7:54 PM
Happy new year everyone! I was on break but will begin to work on this today. cc @Laxman Ch
x

Xiang Fu

01/05/2021, 2:22 AM
We had some findings @Laxman Ch may fill it up
message has been deleted
message has been deleted
so this tableNameURI should be
gs://
however they are actually using local file system
l

Laxman Ch

01/05/2021, 9:48 AM
@Elon: I did some debugging on this issue last week. As posted in above screenshots, the bugs is in the implementation of org.apache.pinot.plugin.filesystem.GcsPinotFS#listFiles. GCS implementation of listFiles API is returning relative path. Whereas, other FS implementations (S3, Local) are returning full path including scheme.
We need to fix GcsPinotFS.listFiles and also other we need to fix the callers of this API that are explicitly expecting a relative path.
e

Elon

01/05/2021, 5:21 PM
Yep, that's what I see to, working on the fix.
👍 1
Ok, I think I got it - only question: I have unit tests using our gcs file auth. Is there a projectid, bucket that we can use for pinot?
l

Laxman Ch

01/05/2021, 6:42 PM
I was also having the same question when I noticed there are no unit tests for GcsPinotFS.
e

Elon

01/05/2021, 6:43 PM
I have local unit tests but definitely cannot share our gcp key 🙂
l

Laxman Ch

01/05/2021, 6:43 PM
com.google.cloud.storage.contrib.nio.testing.LocalStorageHelper
💡 1
d

Daniel Lavoie

01/05/2021, 6:43 PM
I guess unit test should mock that. Integration tests however should have a chance to configure private gcp key and cover that integration scenario.
l

Laxman Ch

01/05/2021, 6:43 PM
Check this once.
e

Elon

01/05/2021, 6:44 PM
iirc that one doesn't allow for all the fs operations
nice find though! will take a look at that. Just testing the fix locally for now
l

Laxman Ch

01/05/2021, 6:45 PM
yeah. s3 has unit test libraries which I used in past. I was searching for something similar for GCS too.
I came across this one.
e

Elon

01/05/2021, 6:46 PM
Yep, s3 has better testing libraries. Looks like gcp wants $$ :)
l

Laxman Ch

01/05/2021, 6:46 PM
As per my understanding LocalStorageHelper should be sufficient for our unit testing
Copy code
*   <li>Unsupported operations
 *       <ul>
 *         <li>bucket create
 *         <li>bucket get
 *         <li>bucket delete
 *         <li>list all buckets
 *         <li>generations
 *         <li>file attributes
 *         <li>patch
 *         <li>continueRewrite
 *         <li>createBatch
 *         <li>checksums, etags
 *         <li>IAM operations
 *       </ul>
👍 1
e

Elon

01/05/2021, 6:47 PM
Yep, saw the same thing, I'll try it out - will see what required functionality is missing if anything.
👍 1
Either way, right now I'm working on the fix and testing with a real bucket - will create a pr and use that library.
https://github.com/apache/incubator-pinot/pull/6426 - will add the tests tomorrow, converting from using a real gcs bucket to local storage tests.
@Laxman Ch ^ when you have some time it would be great if you can take a look
@Laxman Ch - unit tests are done, pushing changes after work, busy today:)
l

Laxman Ch

01/12/2021, 4:46 PM
@Elon: Change looks good to me. Can you please update the PR with latest changes. Also, I can verify these changes in our cluster.
e

Elon

01/12/2021, 5:08 PM
Actually - we have it working on production also, it's good.
The issue is that in order to test I would have to change the code since everything is based on getBucket() but that is not supported by the local storage.
I think that should be done in a separate pr.
wdyt?
l

Laxman Ch

01/15/2021, 3:41 PM
@Elon - I think its fine to add these tests as a followup PR. However, my vote is a non-binding vote. @Xiang Fu/@Jackie has some comments on the PR. I think we should clarify/fix and proceed with merge.
e

Elon

01/15/2021, 4:56 PM
I updated yesterday and addressed the comments, if @Xiang Fu and @Jackie are ok with it I'm fine to merge. This has been running in production for more than a week with no issues.
👍 2