https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • k

    Kishore G

    12/10/2019, 3:48 AM
    we did the refactor already.
  • a

    Alex

    12/10/2019, 4:09 AM
    hah 🙂
  • k

    Kishore G

    12/10/2019, 6:14 AM
    see this https://github.com/apache/incubator-pinot/pull/4874
  • a

    Alex

    12/10/2019, 7:27 PM
    i was talking about updates to the way SegmentIndexCreationDriverImpl operates. it has loops (using recordreader as iterator inside), so, reading from the stream is a bit challenging. It can be refactored to have: openWriter, addRow, closeWriter methods or something like that.
  • k

    Kishore G

    12/10/2019, 7:32 PM
    is this for real-time?
  • a

    Alex

    12/10/2019, 9:26 PM
    not realtime per se. We are planning to use flink to transform data before ingesting into Pinot. Having similar code responsible for generating offline segments will reduce complexity.
  • k

    Kishore G

    12/10/2019, 9:29 PM
    whats the datasource - Kafka?
  • a

    Alex

    12/10/2019, 9:32 PM
    kafka, presto or hudi (https://github.com/apache/incubator-hudi)
  • k

    Kishore G

    12/10/2019, 9:36 PM
    ok. Pinot has two interfaces to ingest data depending on the data sources (batch vs streaming)
  • k

    Kishore G

    12/10/2019, 9:37 PM
    RecordReader for batch sources and StreamConsumer for streaming sources
  • k

    Kishore G

    12/10/2019, 10:20 PM
    Alex and I discussed this and I like his idea. This will actually help us get rid of recordreader dependency inside segmentcreationDriverImpl
  • k

    Kishore G

    12/10/2019, 10:21 PM
    @User, @User I feel this will help with the encryption and compression as well
  • k

    Kishore G

    12/10/2019, 10:22 PM
    @User can you file an issue and describe your idea.
  • a

    Alex

    12/10/2019, 10:23 PM
    will do
  • a

    Alex

    12/12/2019, 2:49 AM
    question regarding uploaded segments. We dumped them to GCS, so: 1. do they need to be in tar.gz form or just folders 2. what will be a correct url to tell pinot where to discover them? is it like haddop fs (gs://bucket_name/…) or something else?
  • m

    Mayank

    12/12/2019, 2:50 AM
    They need to be tar.gz
  • m

    Mayank

    12/12/2019, 2:50 AM
    Yes something like Hadoop fs for the uri
  • a

    Alex

    12/12/2019, 2:52 AM
    we are not using Hadoop fs, we are trying to use: https://github.com/apache/incubator-pinot/pull/4911
  • m

    Mayank

    12/12/2019, 2:55 AM
    Yeah, so long as you implement the GcsPinotFs correctly, I think it should be able to download segments, if you push a uri like
    <gs://bucket>...
  • a

    Alex

    12/12/2019, 2:55 AM
    we tried , but we are getting NPE:
  • m

    Mayank

    12/12/2019, 2:55 AM
    where's the NPE?
  • a

    Alex

    12/12/2019, 2:56 AM
    Copy code
    2019/12/12 02:54:36.258 ERROR [PinotSegmentUploadDownloadRestletResource] [jersey-server-managed-async-executor-0] Caught internal server exception while uploading segment
    org.apache.pinot.controller.api.resources.InvalidControllerConfigException: Caught exception while initializing file upload path provider
    	at org.apache.pinot.controller.api.resources.FileUploadPathProvider.<init>(FileUploadPathProvider.java:91) ~[pinot-controller-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
    	at org.apache.pinot.controller.api.resources.PinotSegmentUploadDownloadRestletResource.uploadSegment(PinotSegmentUploadDownloadRestletResource.java:201) ~[pinot-controller-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
    	at org.apache.pinot.controller.api.resources.PinotSegmentUploadDownloadRestletResource.uploadSegmentAsJsonV2(PinotSegmentUploadDownloadRestletResource.java:408) ~[pinot-controller-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_232]
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_232]
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_232]
    	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_232]
    	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[jersey-server-2.28.jar:?]
    	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) ~[jersey-server-2.28.jar:?]
    	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) ~[jersey-server-2.28.jar:?]
    	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159) ~[jersey-server-2.28.jar:?]
    	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) ~[jersey-server-2.28.jar:?]
    	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469) ~[jersey-server-2.28.jar:?]
    	at org.glassfish.jersey.server.model.ResourceMethodInvoker.lambda$apply$0(ResourceMethodInvoker.java:381) ~[jersey-server-2.28.jar:?]
    	at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$2$1.run(ServerRuntime.java:819) ~[jersey-server-2.28.jar:?]
    	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) ~[jersey-common-2.28.jar:?]
    	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) ~[jersey-common-2.28.jar:?]
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:292) ~[jersey-common-2.28.jar:?]
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:274) ~[jersey-common-2.28.jar:?]
    	at org.glassfish.jersey.internal.Errors.process(Errors.java:244) ~[jersey-common-2.28.jar:?]
    	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) ~[jersey-common-2.28.jar:?]
    	at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$2.run(ServerRuntime.java:814) ~[jersey-server-2.28.jar:?]
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_232]
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_232]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_232]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_232]
    	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
    Caused by: java.lang.RuntimeException: java.lang.NullPointerException
    	at org.apache.pinot.common.utils.URIUtils.decode(URIUtils.java:85) ~[pinot-common-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
    	at org.apache.pinot.filesystem.LocalPinotFS.toFile(LocalPinotFS.java:159) ~[pinot-common-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
    	at org.apache.pinot.filesystem.LocalPinotFS.exists(LocalPinotFS.java:94) ~[pinot-common-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
    	at org.apache.pinot.controller.api.resources.FileUploadPathProvider.mkdirIfNotExists(FileUploadPathProvider.java:97) ~[pinot-controller-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
    	at org.apache.pinot.controller.api.resources.FileUploadPathProvider.<init>(FileUploadPathProvider.java:65) ~[pinot-controller-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
    	... 26 more
    Caused by: java.lang.NullPointerException
    	at <http://java.net|java.net>.URLDecoder.decode(URLDecoder.java:136) ~[?:1.8.0_232]
    	at org.apache.pinot.common.utils.URIUtils.decode(URIUtils.java:82) ~[pinot-common-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
    	at org.apache.pinot.filesystem.LocalPinotFS.toFile(LocalPinotFS.java:159) ~[pinot-common-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
    	at org.apache.pinot.filesystem.LocalPinotFS.exists(LocalPinotFS.java:94) ~[pinot-common-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
    	at org.apache.pinot.controller.api.resources.FileUploadPathProvider.mkdirIfNotExists(FileUploadPathProvider.java:97) ~[pinot-controller-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
    	at org.apache.pinot.controller.api.resources.FileUploadPathProvider.<init>(FileUploadPathProvider.java:65) ~[pinot-controller-0.3.0-SNAPSHOT.jar:0.3.0-SNAPSHOT-171e9a7e889636e1ae966255011c5826793df7b2]
    	... 26 more
  • m

    Mayank

    12/12/2019, 2:58 AM
    org.apache.pinot.controller.api.resources.InvalidControllerConfigException: Caught exception while initializing file upload path provider
  • m

    Mayank

    12/12/2019, 2:58 AM
    looking at the code
  • a

    Alex

    12/12/2019, 2:59 AM
    I think it is a bit confusing why it is doing localfs
  • m

    Mayank

    12/12/2019, 3:02 AM
    Copy code
    public static URI getUri(String basePath, String... parts) {
        String path = getPath(basePath, parts);
        try {
          URI uri = new URI(path);
          if (uri.getScheme() != null) {
            return uri;
          } else {
            return new URI(CommonConstants.Segment.LOCAL_SEGMENT_SCHEME + ":" + path);
          }
        } catch (URISyntaxException e) {
          throw new IllegalArgumentException("Illegal URI path: " + path, e);
        }
      }
  • m

    Mayank

    12/12/2019, 3:02 AM
    Copy code
    _baseDataDirURI = URIUtils.getUri(dataDir);
  • m

    Mayank

    12/12/2019, 3:03 AM
    Copy code
    _baseDataDirURI = URIUtils.getUri(dataDir);
          <http://LOGGER.info|LOGGER.info>("Data directory: {}", _baseDataDirURI);
          _schemasTmpDirURI = new URI(_baseDataDirURI + SCHEMAS_TEMP);
          <http://LOGGER.info|LOGGER.info>("Schema temporary directory: {}", _schemasTmpDirURI);
          String scheme = _baseDataDirURI.getScheme();
          PinotFS pinotFS = PinotFSFactory.create(scheme);
  • m

    Mayank

    12/12/2019, 3:03 AM
    So based on the controllerConf datadir it gets the uri and then the scheme
  • m

    Mayank

    12/12/2019, 3:04 AM
    and based on the scheme it gets the pinotFs
1...104105106...160Latest