https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • t

    Tymm

    12/21/2020, 8:19 AM
    Hello, is it possible to use flink to sink data into pinot?
    k
    y
    • 3
    • 2
  • t

    Tymm

    12/24/2020, 6:32 AM
    Hello, I'm running pinot on docker, and am creating and pushing new data/ segments (from csv files) into pinot every 1 minute. I realize that the amount of time to push the segment into pinot increases as the data/ segment increases, to the point where it take more than a minute to push the new segment into pinot. How can I make the pushing of a segment faster? Thanks.
    k
    v
    • 3
    • 4
  • m

    Mark.Tang

    12/28/2020, 1:49 AM
    Hi, Is there any doc/post detailing comparison of Pinot with Kylin somewhere? Thanks.
    k
    • 2
    • 4
  • c

    Chundong Wang

    12/29/2020, 6:01 PM
    Hi team, when we tried to ran below query,
    Copy code
    SELECT facility_name as key_col, COUNT(*) as val_col
    FROM enriched_station_orders_v1_OFFLINE
    WHERE created_at_seconds BETWEEN 1606756268 AND 1609175468
    AND (facility_organization_id <> 'ac56d23b-a6a2-4c49-8412-a0a0949fb5ef') 
    GROUP BY key_col
    ORDER BY val_col DESC
    LIMIT 5
    We’ll get exceptions on pinot-server like (index number seems to vary),
    Copy code
    Caught exception while processing and combining group-by order-by for index: 1
    However if we change from
    facility_organization_id <> 'ac56d23b-a6a2-4c49-8412-a0a0949fb5ef'
    to
    facility_organization_id = 'ac56d23b-a6a2-4c49-8412-a0a0949fb5ef'
    there won’t be such exception. Or if we switch to
    facility_id
    instead of
    facility_name
    it won’t threw exception as well. Have you seen such issue before?
    m
    k
    j
    • 4
    • 38
  • w

    Will Briggs

    12/30/2020, 10:07 PM
    I apologize in advance for my ignorant question, but I’m struggling conceptually a bit with how to handle dateTime column definitions in my table schema and segmentsConfig. I have a millisecond-level epoch field on my incoming realtime data (creatively named
    eventTimestamp
    ). I would like to maintain this when querying / filtering my records at the individual event level. However, I would also like to define an hourly derived timestamp to be used for pre-aggregating with a star tree index. My segments config looks like this:
    Copy code
    "segmentsConfig": {
            "timeColumnName": "eventTimestamp",
            "timeType": "MILLISECONDS",
            "retentionTimeUnit": "HOURS",
            "retentionTimeValue": "48",
            "segmentPushType": "APPEND",
            "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
            "schemaName": "mySchema",
            "replication": "1",
            "replicasPerPartition": "1"
          },
    My star tree index looks like this:
    Copy code
    "starTreeIndexConfigs": [{
              "dimensionsSplitOrder": [
                "dimension1",
                "dimension2"
              ],
              "skipStarNodeCreationForDimensions": [
              ],
              "functionColumnPairs": [
                "SUM__metric1",
                "SUM__metric2",
                "SUM__metric3",
                "DISTINCT_COUNT_HLL__dimension3",
                "DISTINCT_COUNT_HLL__dimension4"
              ],
              "maxLeafRecords": 10000
            }],
    And my dateTimeFieldSpecs:
    Copy code
    "dateTimeFieldSpecs": [
            {
              "name": "eventTimestamp",
              "dataType": "LONG",
              "format": "1:MILLISECONDS:EPOCH",
              "granularity": "1:HOUR",
              "dateTimeType": "PRIMARY"
            }
          ],
    Can anyone confirm that this is the correct approach? Should I be using an ingestion transformation of
    toEpochHoursRounded
    instead, and specifying that as a DERIVED dateTimeField in the dateTimeFieldSpecs configuration, and manually adding that to the dimensionsSplitOrder of my star tree index?
    x
    j
    • 3
    • 33
  • c

    Chethan UK

    01/04/2021, 1:30 PM
    Thread: Please drop the logo of your company if your are using Pinot I will update this section in docs https://pinot.apache.org/#who-uses
    🍷 1
    k
    • 2
    • 1
  • j

    Jinwei Zhu

    01/04/2021, 7:25 PM
    Hi team, I'm new to pinot and trying to get logs for troubleshooting using logz. I deployed pinot using k8s, want to make sure are the logs of different components exist in the according pods? like gc-pinot-broker.log and pinotBroker.log. What's the different? How to change the log levels? Is the log seen in kubectl logs same as seen inside the pod log file?
    d
    • 2
    • 6
  • k

    Kishore G

    01/05/2021, 1:01 AM
    It automatically sorts it while ingesting from Kafka
    m
    • 2
    • 3
  • m

    Mayank

    01/05/2021, 1:50 AM
    The expectation is to have the partition function used by producer with the one defined in Pinot
    w
    • 2
    • 5
  • w

    Will Briggs

    01/05/2021, 4:46 AM
    Sorry to be a never-ending fount of questions, folks… is it expected / necessary to create a rangeIndex on dateTime fields, or are those automatically indexed efficiently? Likewise, should I add dateTime fields to the noDictionaryColumns list?
    m
    k
    • 3
    • 18
  • m

    Mark.Tang

    01/06/2021, 2:13 AM
    • Hi Team, I have seen that in 0.4.0, pinot has implemented the initial version of theta-sketch based distinct count aggregation function, utilizing the Apache DataSketches library. Compared to Druid the latest release which has also included DataSketches extension(Theta sketch, Tuple sketch, Quantiles sketch ,HLL sketch), pinot has any plan to implement other sketchs other than Theta sketch). Thanks.
    m
    • 2
    • 13
  • o

    Oguzhan Mangir

    01/06/2021, 12:05 PM
    Hello, do pinot supports upsert for offline tables? or do it only supports that for realtime tables? for example; when late data arrived after the real-time segment is flushed, can pinot update it?
    m
    y
    • 3
    • 3
  • m

    Mahesh Yeole

    01/06/2021, 9:45 PM
    Hello, Do we have any pinot DB benchmarks we can refer to ?
    k
    • 2
    • 5
  • j

    Jinwei Zhu

    01/06/2021, 10:21 PM
    Hi, is it possible to monitor Pinot DB metrics with Wavefront instead of Prometheus and Grafana? Are there any docs I can refer to? Thanks
    k
    • 2
    • 9
  • m

    Mark.Tang

    01/07/2021, 6:24 AM
    Hi, Team, a streaming app often does the following: 1. Read local files using flume into kafka 2. Do ETL transformation from kafka topic using flink 3. Pull data from flink into Linkedin's Pinot So, I am not doing direct map from kafka to pinot table just like https://docs.pinot.apache.org/basics/data-import/pinot-stream-ingestion , any suggestion or example can help me, thanks!
    k
    • 2
    • 6
  • m

    Mark.Tang

    01/07/2021, 9:58 AM
    Hi team, Uber makes a contribution about Schema inference for saving a lot of manual effort. I think that while landing production, this capability is important. So, any plan for adding the capability into 2021 roadmap or currently has been implemented? Thanks! (https://eng.uber.com/operating-apache-pinot/)
    👍 3
    y
    • 2
    • 3
  • j

    Jinwei Zhu

    01/11/2021, 11:01 PM
    Hi @User I'm working with @User and trying to use our new Pinot Kinesis support. Want to know do we have any images built on that? Because with the branch, we can not use it directly. Thanks
    n
    x
    d
    • 4
    • 55
  • j

    Jackie

    01/13/2021, 7:24 PM
    Yes, please add the
    enableDynamicStarTreeCreation
    into your index config, see https://docs.pinot.apache.org/configuration-reference/table#table-index-config for more details
    👍 1
    a
    • 2
    • 4
  • n

    Neha Pawar

    01/13/2021, 9:37 PM
    @User we have the reload status API already. Works only for offline tables so far. You can check it out in the cluster manager on the table details page @User is working on adding the API support for realtime tables.
    👍 1
    a
    • 2
    • 5
  • a

    Amit Chopra

    01/13/2021, 11:39 PM
    Few question on sorted index: 1. I was trying to create a sorted index on a STRING column. But it was not working. Then i tried it on a INT column and it worked. Is sorted index only supported on INT (or LONG) types? 2. I see isSorted = true in metadata.properties file for the event time as well as the metric column. Though i did not enable sorted index for those. What does this imply? Given i remember it was mentioned that only one column can be used as sorted index 3. Related to above, if most queries will have time in where clause, then should we add sorted index on time field? Or is it more beneficial to add sorted index on a field (used often to filter) other than time field?
    m
    • 2
    • 16
  • y

    Yupeng Fu

    01/14/2021, 1:17 AM
    hey, the new cluster management UI is very convenient and powerful (e.g. delete table)…. is there a plan to add access control to it
    m
    s
    k
    • 4
    • 7
  • m

    Mahesh Yeole

    01/14/2021, 3:33 AM
    I am trying to fetch PARQUET files from s3 and load into pinot DB. I am using offline table. I am running this command with my job spec ./bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile examples/batch/metrics/ingestionJobSpec.yaml I am seeing the following errors, any idea how to solve this issue ? Jan 13, 2021 63424 PM WARNING: org.apache.parquet.CorruptStatistics: Ignoring statistics because created_by could not be parsed (see PARQUET-251): parquet-mr org.apache.parquet.VersionParser$VersionParseException: Could not parse created_by: parquet-mr using format: (.+) version ((.) )?\(build ?(.)\) at org.apache.parquet.VersionParser.parse(VersionParser.java:112) at org.apache.parquet.CorruptStatistics.shouldIgnoreStatistics(CorruptStatistics.java:60) at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetStatistics(ParquetMetadataConverter.java:263) at org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:567) at org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:544) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:431) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:238) at org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:234) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPFailed to generate Pinot segment for file - s3://cdca-metrics-prod-us-east-1-eedr/eedr/events/event_date=2021-01-12/event_hour=12/20210112_235508_00031_tgepm_5672f969-021f-4dfd-a0ad-c209aaf7e84d java.lang.IllegalArgumentException: INT96 not yet implemented. at org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:251) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-125402b4b3595d61fcc702ba57143d927b00fe7f] at org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:236) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-125402b4b3595d61fcc702ba57143d927b00fe7f] at org.apache.parquet.schema.PrimitiveType$PrimitiveTypeName$7.convert(PrimitiveType.java:222) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-125402b4b3595d61fcc702ba57143d927b00fe7f] at org.apache.parquet.avro.AvroSchemaConverter.convertField(AvroSchemaConverter.java:235) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-125402b4b3595d61fcc702ba57143d927b00fe7f] at org.apache.parquet.avro.AvroSchemaConverter.convertFields(AvroSchemaConverter.java:215) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-125402b4b3595d61fcc702ba57143d927b00fe7f] at org.apache.parquet.avro.AvroSchemaConverter.convert(AvroSchemaConverter.java:209) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-125402b4b3595d61fcc702ba57143d927b00fe7f] at org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:124) ~[pinot-all-0.7.0-SNAPSHOT-jar-with-dependencies.jar:0.7.0-SNAPSHOT-125402b4b3595d61fcc702ba57143d927b00fe7f]
    x
    • 2
    • 16
  • s

    Sean Chen

    01/14/2021, 9:13 AM
    Hi team, is there a limit on number of znodes per parent node in ZK today?
    k
    • 2
    • 3
  • s

    Sean Chen

    01/15/2021, 4:39 AM
    Hi team, when should I set
    exclude.sequence.id
    ? Is it used just for naming the segment? If I create 3 segments, each with a unique name but having the same time-range, can I set
    exclude.sequence.id
    true all the time?
    x
    • 2
    • 10
  • s

    Sean Chen

    01/15/2021, 11:49 AM
    I see. There is an explicit reload command
    n
    m
    • 3
    • 4
  • a

    Amit Chopra

    01/15/2021, 4:53 PM
    Hi, I have a question around broker / server pruning. I have 2 servers and 4 segments. The mapping is: • server-0 1. metrics_OFFLINE_26835599_26835666_3 2. metrics_OFFLINE_26835733_26835799_2 • server-1 1. metrics_OFFLINE_26835799_26835866_0 2. metrics_OFFLINE_26835666_26835733_1 When i do a query like 
    select device, count(device) as aggreg from metrics where eventTime > 26835599 and eventTime < 26835626 group by device order by aggreg desc limit 10
    I see: • numServersQueried = 2 • numServersResponded = 2 • numSegmentsQueried = 4 • numSegmentsProcessed = 1 • numSegmentsMatched  = 1 Questions: 1. Given above query, the 
    eventTime
      falls within time range of a single segment - 
    metrics_OFFLINE_26835599_26835666_3
     . So i was expecting numServersQueried to be 1 (instead of 2). Do i need to set something up for broker pruning to take effect? 2. Similarly i was expecting numSegmentsQueried to be 1 (instead of 4). 3. I always see numSegmentsProcessed and numSegmentsMatched to be same value always. What is the difference between the two. I looked at https://docs.pinot.apache.org/users/api/querying-pinot-using-standard-sql/response-format, but it wasn’t super clear to me from reading there.
    s
    j
    • 3
    • 9
  • k

    Ken Krugler

    01/15/2021, 4:59 PM
    Hi @User - I think you want to check out partitioning on https://docs.pinot.apache.org/operators/operating-pinot/tuning/routing, as a way of avoiding sending the query to all servers (with broker-side pruning).
    a
    m
    j
    • 4
    • 16
  • t

    troywinter

    01/18/2021, 6:33 AM
    Hi, I’m getting an error when using lookup on a local cluster, does anyone know how to solve it?
    Copy code
    [
      {
        "errorCode": 200,
        "message": "QueryExecutionError:\norg.apache.pinot.core.query.exception.BadQueryRequestException: Caught exception while initializing transform function: lookup\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:207)\n\tat org.apache.pinot.core.operator.transform.TransformOperator.<init>(TransformOperator.java:56)\n\tat org.apache.pinot.core.plan.TransformPlanNode.run(TransformPlanNode.java:52)\n\tat org.apache.pinot.core.plan.SelectionPlanNode.run(SelectionPlanNode.java:83)\n\tat org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:100)\n\tat org.apache.pinot.core.plan.InstanceResponsePlanNode.run(InstanceResponsePlanNode.java:33)\n\tat org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:45)\n\tat org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:294)\n\tat org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:215)\n\tat org.apache.pinot.core.query.executor.QueryExecutor.processQuery(QueryExecutor.java:60)\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:157)\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.lambda$createQueryFutureTask$0(QueryScheduler.java:141)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)"
      }
    ]
    k
    l
    x
    • 4
    • 7
  • t

    troywinter

    01/18/2021, 3:18 PM
    Another question regarding using hdfs as pinot deep storage, I have put hadoop-client-3.1.1.3.1.0.0-78.jar, hadoop-common-3.1.1.3.1.0.0-78.jar, hadoop-hdfs-3.1.1.3.1.0.0-78.jar, hadoop-hdfs-client-3.1.1.3.1.0.0-78.jar these jars in pinot controller’s classpath, but controller still reporting class not found for org/apache/hadoop/fs/FSDataInputStream, what other jars should I include? Below are the stack trace for this error:
    Copy code
    2021/01/18 10:26:32.704 INFO [ControllerStarter] [main] Initializing PinotFSFactory
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
    	at java.lang.Class.getDeclaredConstructors0(Native Method)
    	at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
    	at java.lang.Class.getConstructor0(Class.java:3075)
    	at java.lang.Class.getConstructor(Class.java:1825)
    	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:295)
    	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:264)
    	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:245)
    	at org.apache.pinot.spi.filesystem.PinotFSFactory.register(PinotFSFactory.java:53)
    	at org.apache.pinot.spi.filesystem.PinotFSFactory.init(PinotFSFactory.java:74)
    	at org.apache.pinot.controller.ControllerStarter.initPinotFSFactory(ControllerStarter.java:481)
    	at org.apache.pinot.controller.ControllerStarter.setUpPinotController(ControllerStarter.java:329)
    	at org.apache.pinot.controller.ControllerStarter.start(ControllerStarter.java:287)
    	at org.apache.pinot.tools.service.PinotServiceManager.startController(PinotServiceManager.java:116)
    	at org.apache.pinot.tools.service.PinotServiceManager.startRole(PinotServiceManager.java:91)
    	at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.lambda$startBootstrapServices$0(StartServiceManagerCommand.java:234)
    	at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startPinotService(StartServiceManagerCommand.java:286)
    	at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.startBootstrapServices(StartServiceManagerCommand.java:233)
    	at org.apache.pinot.tools.admin.command.StartServiceManagerCommand.execute(StartServiceManagerCommand.java:183)
    	at org.apache.pinot.tools.admin.command.StartControllerCommand.execute(StartControllerCommand.java:130)
    	at org.apache.pinot.tools.admin.PinotAdministrator.execute(PinotAdministrator.java:162)
    	at org.apache.pinot.tools.admin.PinotAdministrator.main(PinotAdministrator.java:182)
    Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
    	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
    	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    	... 21 more
    And below are the startup opts:
    Copy code
    JAVA_OPTS	-Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:/opt/pinot/gc-pinot-controller.log -Dlog4j2.configurationFile=/opt/pinot/conf/pinot-controller-log4j2.xml -Dplugins.dir=/opt/pinot/plugins -Dplugins.include=pinot-hdfs -classpath /opt/hadoop-lib/hadoop-common-3.1.1.3.1.0.0-78.jar:/opt/hadoop-lib/hadoop-client-3.1.1.3.1.0.0-78.jar:/opt/hadoop-lib/hadoop-hdfs-3.1.1.3.1.0.0-78.jar:/opt/hadoop-lib/hadoop-hdfs-client-3.1.1.3.1.0.0-78.jar
    k
    • 2
    • 6
  • d

    Davide Berdin

    01/18/2021, 9:54 PM
    Hello everybody! fantastic project 🚀 I’m totally in love with Apache Pinot ❤️ keep up the great work!
    👋 4
    🍷 4
    k
    m
    • 3
    • 2
1...121314...160Latest