https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • j

    Jackie

    02/20/2020, 1:19 AM
    But smaller segment can hurt performance in general as you need to process and merge more segments
  • h

    Hemavathi

    02/20/2020, 9:11 AM
    We have a column which contains the “json file content” as the value. It seems to be Pinot table string type has some length limitation. So when we try to write this value to the string column, the value got truncated. Is there any configuration available to resolve this?
  • s

    Sidd

    02/20/2020, 11:12 AM
    The default is 512 chars. Also, I am not sure if json file content qualifies for a string column value. It is more into blob/clob territory
  • a

    Adli

    02/20/2020, 11:22 AM
    Hi <!here>, could anyone tell me the recent status of this feature? https://cwiki.apache.org/confluence/display/PINOT/%5BProposal%5D+Pinot+upsert+design+doc
  • k

    Kishore G

    02/20/2020, 4:09 PM
    @User @User can give you more info on that. That feature is available on a branch.
  • x

    Xiang Fu

    02/20/2020, 11:01 PM
    @User do we have any tooling for this? ^^
  • s

    Subbu Subramaniam

    02/21/2020, 12:41 AM
    no tooling, you can update zk metadata. Tooling for this (and other stuff in realtime) is an ask for contribution, I believe i have an issue some place
  • h

    Hemavathi

    02/21/2020, 6:35 AM
    @User Thanks for the details, the schema allows to change the size of the string column, we will update once loaded the data.
  • h

    Hemavathi

    02/21/2020, 6:56 AM
    We have configured “Column1” as “NoDictionaryColumn” and “bloomFilterColumns” and try to filter the query via “Column1” but got timeout error. As per our understanding if we have the high cardinality string column and requires the filter based on this value then prefer the combination of “NoDictionaryColumn” and “bloomFilterColumns”. But it didn’t work as expected. If we include the “Column1” in dictionary and apply “bloomFilterColumns” then we could not able to find any performance differences. Need your help for better understanding.    "tableIndexConfig" : {    "noDictionaryColumns": [            "Column1"       ],    "bloomFilterColumns" : ["Column1"],    "loadMode" : "MMAP",    "lazyLoad" : "false"  } select * from table where Column1 = 'XXX'
  • s

    Seunghyun

    02/21/2020, 7:15 AM
    @User try to add inverted index on
    Column1
    and try out the same query
  • h

    Hemavathi

    02/21/2020, 7:16 AM
    Ya, it works fine in inverted index, we need to understand the exact use case for bloom filer
  • s

    Seunghyun

    02/21/2020, 7:17 AM
    if your cardinality is very high, bloom filter may not perform well
  • s

    Seunghyun

    02/21/2020, 7:17 AM
    what’s your cardinality of the column per segment?
  • s

    Seunghyun

    02/21/2020, 7:18 AM
    we currently put 1MB limit for bloomfilter size per segment
  • s

    Seunghyun

    02/21/2020, 7:20 AM
    so it will work well up to ~1M cardinality
  • s

    Seunghyun

    02/21/2020, 7:22 AM
    https://krisives.github.io/bloom-calculator/ set error = 0.05, and play with count
  • h

    Hemavathi

    02/21/2020, 7:23 AM
    in our case the segment size is around 240 MB and i think the bloomfilter size may exceed 1 MB
  • s

    Seunghyun

    02/21/2020, 7:24 AM
    so if the cardinality is too high, our current implementation makes bloom filter not useful
  • s

    Seunghyun

    02/21/2020, 7:24 AM
    this is because our bloom filter implementation is on-heap based
  • h

    Hemavathi

    02/21/2020, 7:24 AM
    ok got it, will try with some small size
  • h

    Hemavathi

    02/21/2020, 7:24 AM
    thank you
  • s

    Seunghyun

    02/21/2020, 7:25 AM
    yeah we can improve bloomfilter feature by making size limit configurable but that should come along with offheap implementation
  • s

    Sidd

    02/21/2020, 12:09 PM
    <!here>, as part of working on PR (https://github.com/apache/incubator-pinot/pull/5074), I hit a bug where adding a new column and then enabling inverted index with V1 segment format is not supported on the segment reload path. We hit NPE. I don't know if this is intentionally not supported. In my PR I was adding tests for supporting text index reload for both V1 and V3 and that's when I discovered this.
  • s

    Sidd

    02/21/2020, 12:10 PM
    I have put a fix here -- https://github.com/apache/incubator-pinot/pull/5087
    👍 1
  • s

    Seunghyun

    02/21/2020, 6:14 PM
    we changed our distribution to shade everything?
  • k

    Kishore G

    02/21/2020, 10:02 PM
    yes, it that failing?
  • x

    Xiang Fu

    02/21/2020, 10:04 PM
    I think we also change
    quick-start-offline.sh
    to
    quick-start-batch.sh
  • s

    Seunghyun

    02/21/2020, 11:16 PM
    let me retry with a clean checkout
  • s

    Seunghyun

    02/21/2020, 11:16 PM
    maybe old file didn’t get deleted
  • s

    Seunghyun

    02/21/2020, 11:40 PM
    Copy code
    ~/workspace/pinot/pinot-distribution/target/apache-pinot-incubating-0.3.0-SNAPSHOT-bin/apache-pinot-incubating-0.3.0-SNAPSHOT-bin/bin master* 1m 10s
    ❯ ./quick-start-batch.sh
    ***** Starting Zookeeper, controller, broker and server *****
    Executing command: StartZookeeper -zkPort 2123 -dataDir /var/folders/1s/11z0n1j9057dk1nhgjdgfcp0000mp7/T//PinotAdmin/zkData
    Start zookeeper at localhost:2123 in thread main
    Executing command: StartController -clusterName QuickStartCluster -controllerHost 172.25.113.39 -controllerPort 9000 -dataDir /var/folders/1s/11z0n1j9057dk1nhgjdgfcp0000mp7/T//PinotController -zkAddress localhost:2123
    Invalid instance setup, missing znode path: /QuickStartCluster/CONFIGS/PARTICIPANT/Controller_172.25.113.39_9000
    Invalid instance setup, missing znode path: /QuickStartCluster/INSTANCES/Controller_172.25.113.39_9000/MESSAGES
    Invalid instance setup, missing znode path: /QuickStartCluster/INSTANCES/Controller_172.25.113.39_9000/CURRENTSTATES
    Invalid instance setup, missing znode path: /QuickStartCluster/INSTANCES/Controller_172.25.113.39_9000/STATUSUPDATES
    Invalid instance setup, missing znode path: /QuickStartCluster/INSTANCES/Controller_172.25.113.39_9000/ERRORS
    Feb 21, 2020 3:37:50 PM org.glassfish.grizzly.http.server.NetworkListener start
    INFO: Started listener bound to [0.0.0.0:9000]
    Feb 21, 2020 3:37:50 PM org.glassfish.grizzly.http.server.HttpServer start
    INFO: [HttpServer] Started.
    Executing command: StartBroker -brokerHost null -brokerPort 8000 -zkAddress localhost:2123
    Feb 21, 2020 3:37:58 PM org.glassfish.grizzly.http.server.NetworkListener start
    INFO: Started listener bound to [0.0.0.0:8000]
    Feb 21, 2020 3:37:58 PM org.glassfish.grizzly.http.server.HttpServer start
    INFO: [HttpServer-1] Started.
    Invalid instance setup, missing znode path: /QuickStartCluster/CONFIGS/PARTICIPANT/Broker_172.25.113.39_8000
    Invalid instance setup, missing znode path: /QuickStartCluster/INSTANCES/Broker_172.25.113.39_8000/MESSAGES
    Invalid instance setup, missing znode path: /QuickStartCluster/INSTANCES/Broker_172.25.113.39_8000/CURRENTSTATES
    Invalid instance setup, missing znode path: /QuickStartCluster/INSTANCES/Broker_172.25.113.39_8000/STATUSUPDATES
    Invalid instance setup, missing znode path: /QuickStartCluster/INSTANCES/Broker_172.25.113.39_8000/ERRORS
    Executing command: StartServer -clusterName QuickStartCluster -serverHost 172.25.113.39 -serverPort 7000 -serverAdminPort 7500 -dataDir /tmp/1582328263750/PinotServerData0 -segmentDir /tmp/1582328263750/PinotServerSegment0 -zkAddress localhost:2123
    Invalid instance setup, missing znode path: /QuickStartCluster/CONFIGS/PARTICIPANT/Server_172.25.113.39_7000
    Invalid instance setup, missing znode path: /QuickStartCluster/INSTANCES/Server_172.25.113.39_7000/MESSAGES
    Invalid instance setup, missing znode path: /QuickStartCluster/INSTANCES/Server_172.25.113.39_7000/CURRENTSTATES
    Invalid instance setup, missing znode path: /QuickStartCluster/INSTANCES/Server_172.25.113.39_7000/STATUSUPDATES
    Invalid instance setup, missing znode path: /QuickStartCluster/INSTANCES/Server_172.25.113.39_7000/ERRORS
    Feb 21, 2020 3:38:04 PM org.glassfish.grizzly.http.server.NetworkListener start
    INFO: Started listener bound to [0.0.0.0:7500]
    Feb 21, 2020 3:38:04 PM org.glassfish.grizzly.http.server.HttpServer start
    INFO: [HttpServer-2] Started.
    ***** Adding baseballStats table *****
    Executing command: AddTable -tableConfigFile /Users/snlee/workspace/pinot/pinot-distribution/target/apache-pinot-incubating-0.3.0-SNAPSHOT-bin/apache-pinot-incubating-0.3.0-SNAPSHOT-bin/bin/quickStartData1582328263666/baseballStats_offline_table_config.json -schemaFile /Users/snlee/workspace/pinot/pinot-distribution/target/apache-pinot-incubating-0.3.0-SNAPSHOT-bin/apache-pinot-incubating-0.3.0-SNAPSHOT-bin/bin/quickStartData1582328263666/baseballStats_schema.json -controllerHost 172.25.113.39 -controllerPort 9000 -exec
    {"status":"Table baseballStats_OFFLINE succesfully added"}
    ***** Launch data ingestion job to build index segment for baseballStats and push to controller *****
    Exception in thread "main" java.lang.IllegalStateException: PinotFS for scheme: jar has not been initialized
    	at shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:518)
    	at org.apache.pinot.spi.filesystem.PinotFSFactory.create(PinotFSFactory.java:78)
    	at org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner.run(SegmentGenerationJobRunner.java:115)
    	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:96)
    	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.runIngestionJob(IngestionJobLauncher.java:77)
    	at org.apache.pinot.tools.admin.command.QuickstartRunner.launchDataIngestionJob(QuickstartRunner.java:183)
    	at org.apache.pinot.tools.Quickstart.execute(Quickstart.java:154)
    	at org.apache.pinot.tools.Quickstart.main(Quickstart.java:209)
1...113114115...160Latest