https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • k

    Kishore G

    11/15/2019, 3:29 AM
    Flink in streaming mode or batch mode?
  • a

    Alex

    11/15/2019, 3:29 AM
    batch mode. Can you point me to a spark example?
  • k

    Kishore G

    11/15/2019, 3:30 AM
    @User added Pinot-spark module recently
  • a

    Alex

    11/15/2019, 3:31 AM
    I see it in a separate branch, will take a look, ty!
  • k

    Kishore G

    11/15/2019, 3:32 AM
    https://github.com/apache/incubator-pinot/tree/pinot-ingestion-refactor
  • s

    Shivakumar

    11/15/2019, 8:15 PM
    Am new to Pinot. Join is not supported but how about Union operation. Please let me know if there any way to achieve Union operation. Apologize for basic question
  • m

    Mayank

    11/15/2019, 8:17 PM
    Union isn’t either. However, if you have multiple aggregation functions with group by, the behavior is somewhat like a union
  • m

    Mayank

    11/15/2019, 8:17 PM
    As-in, each aggr is returned with its own groups
  • s

    Shivakumar

    11/15/2019, 8:23 PM
    Thanks. Was trying that option for small query but when query gets completed was thinking about Union support. Thanks for clarifying.
  • k

    Kishore G

    11/15/2019, 8:24 PM
    for full sql support, you can use the presto-pinot connector
  • s

    Shivakumar

    11/15/2019, 8:27 PM
    oh okay. can you please point me to any documentation or example.
  • x

    Xiang Fu

    11/15/2019, 8:32 PM
    @User You can follow this readme to run presto and connect to pinot (https://github.com/apache/incubator-pinot/tree/master/kubernetes/examples/helm#access-pinot-using-presto)
  • s

    Shivakumar

    11/15/2019, 8:34 PM
    thanks. Appreciate it!
  • e

    Elon

    11/15/2019, 10:47 PM
    Is there a limit to the size of a resultset the broker can/should return? For queries with very large result sets (ex. millions of rows?) is it more idiomatic to send requests to the servers and gather the results? Or are queries meant to return relatively small (i.e. 1000's of rows) result sets?
  • k

    Kishore G

    11/15/2019, 10:50 PM
    for large result sets its better to send requests to servers and gather the results. this is exactly happens in presto pinot connector. But note that servers expose thrift end points, no http end points
  • k

    Kishore G

    11/15/2019, 10:50 PM
    whats the scenario for requesting millions of rows
  • k

    Kishore G

    11/15/2019, 10:53 PM
    when querying servers you can specify the exact segments to query
  • e

    Elon

    11/15/2019, 10:55 PM
    We are thinking of using pinot as a realtime datastore and using presto to query for more complex aggregates.
  • k

    Kishore G

    11/15/2019, 10:56 PM
    got it. So, at a high level this is what presto pinot connector does
  • k

    Kishore G

    11/15/2019, 11:00 PM
    • Get the list of segments for a table from Helix (also referred to as Routing Table) • RoutingTable will contain Map<Segment, List<Servers>> • Distribute the S segments among N servers depending on the W presto workers • Each worker will query the individual servers and specify the exact segments to process • Rest of the query processing is done in Presto
  • a

    Alex

    11/16/2019, 6:13 AM
    hey, was doing some load testing of the kube setup, and encountered interesting issue. Pinot was consuming from kafka (almost no read load),, and at some point I saw errors in servers logs:
    Copy code
    2019/11/16 01:37:26.158 ERROR [CombineOperator] [pqr-0] Caught ExecutionException.
    java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out while polling result from first thread
    	at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_232]
    	at java.util.concurrent.FutureTask.get(FutureTask.java:206) ~[?:1.8.0_232]
    	at org.apache.pinot.core.operator.CombineOperator.getNextBlock(CombineOperator.java:158) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.operator.CombineOperator.getNextBlock(CombineOperator.java:44) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:48) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.operator.InstanceResponseOperator.getNextBlock(InstanceResponseOperator.java:37) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.operator.InstanceResponseOperator.getNextBlock(InstanceResponseOperator.java:26) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:48) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:48) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:213) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:152) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.query.scheduler.QueryScheduler.lambda$createQueryFutureTask$0(QueryScheduler.java:136) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_232]
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_232]
    	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) ~[guava-20.0.jar:?]
    	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) ~[guava-20.0.jar:?]
    	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) ~[guava-20.0.jar:?]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_232]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_232]
    	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
    Caused by: java.util.concurrent.TimeoutException: Timed out while polling result from first thread
    	at org.apache.pinot.core.operator.CombineOperator$2.callJob(CombineOperator.java:133) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.operator.CombineOperator$2.callJob(CombineOperator.java:126) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.util.trace.TraceCallable.call(TraceCallable.java:44) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	... 8 more
  • a

    Alex

    11/16/2019, 6:14 AM
    Copy code
    2019/11/16 01:37:26.160 WARN [ClientCnxn] [main-SendThread(pinot-zookeeper.pinot.svc.cluster.local:2181)] Client session timed out, have not heard from server in 52000ms for sessionid 0x3012ce8d36d0009
    2019/11/16 01:37:26.432 WARN [ZKHelixManager] [ZkClient-EventThread-12-pinot-zookeeper:2181] KeeperState:Disconnected, SessionId: 3012ce8d36d0009, instance: Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098, type: PARTICIPANT
    2019/11/16 01:37:26.483 ERROR [ConsumerCoordinator] [Thread-166] [Consumer clientId=consumer-1, groupId=flattened_orders_REALTIME_1573783918483_0] Offset commit failed on partition flattened-orders-json-7 at offset 970894: The coordinator is not aware of this member.
    2019/11/16 01:37:26.484 WARN [ConsumerCoordinator] [Thread-166] [Consumer clientId=consumer-1, groupId=flattened_orders_REALTIME_1573783918483_0] Asynchronous auto-commit of offsets {flattened-orders-json-7=OffsetAndMetadata{offset=970894, metadata=''}, flattened-orders-json-8=OffsetAndMetadata{offset=971076, metadata=''}, flattened-orders-json-9=OffsetAndMetadata{offset=978515, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured <http://max.poll.interval.ms|max.poll.interval.ms>, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
    2019/11/16 01:37:26.485 WARN [ConsumerCoordinator] [Thread-166] [Consumer clientId=consumer-1, groupId=flattened_orders_REALTIME_1573783918483_0] Synchronous auto-commit of offsets {flattened-orders-json-7=OffsetAndMetadata{offset=970894, metadata=''}, flattened-orders-json-8=OffsetAndMetadata{offset=971076, metadata=''}, flattened-orders-json-9=OffsetAndMetadata{offset=978515, metadata=''}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured <http://max.poll.interval.ms|max.poll.interval.ms>, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
    2019/11/16 01:37:27.229 ERROR [CombineOperator] [pqw-0] Caught exception while executing query.
    java.lang.RuntimeException: Thread has been interrupted
    	at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:37) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.operator.query.SelectionOnlyOperator.getNextBlock(SelectionOnlyOperator.java:79) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.operator.query.SelectionOnlyOperator.getNextBlock(SelectionOnlyOperator.java:39) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:48) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.operator.CombineOperator$1.runJob(CombineOperator.java:104) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at org.apache.pinot.core.util.trace.TraceRunnable.run(TraceRunnable.java:40) ~[pinot-core-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_232]
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_232]
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_232]
    	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111) ~[guava-20.0.jar:?]
    	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:58) ~[guava-20.0.jar:?]
    	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:75) ~[guava-20.0.jar:?]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_232]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_232]
    	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
    2019/11/16 01:37:27.887 WARN [ZKHelixManager] [ZkClient-EventThread-12-pinot-zookeeper:2181] KeeperState:Expired, SessionId: 3012ce8d36d0009, instance: Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098, type: PARTICIPANT
  • a

    Alex

    11/16/2019, 6:14 AM
    then bunch of:
    Copy code
    019/11/16 01:37:27.887 WARN [ClientCnxn] [main-SendThread(pinot-zookeeper.pinot.svc.cluster.local:2181)] Unable to reconnect to ZooKeeper service, session 0x3012ce8d36d0009 has expired
    2019/11/16 01:37:28.030 WARN [StateModel] [ZkClient-EventThread-12-pinot-zookeeper:2181] Default reset method invoked. Either because the process longer own this resource or session timedout
  • a

    Alex

    11/16/2019, 6:14 AM
    and
    Copy code
    2019/11/16 01:37:28.909 WARN [flattened_orders_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_STATE_TRANSITION] Skipping adding existing segment: flattened_orders_REALTIME_1573783918483_0__1__1573867451903 for table: flattened_orders_REALTIME with data manager class: ImmutableSegmentDataManager
    2019/11/16 01:37:28.909 WARN [flattened_orders_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_STATE_TRANSITION] Skipping adding existing segment: flattened_orders_REALTIME_1573783918483_0__1__1573865344545 for table: flattened_orders_REALTIME with data manager class: ImmutableSegmentDataManager
  • a

    Alex

    11/16/2019, 6:15 AM
    and finally:
  • a

    Alex

    11/16/2019, 6:15 AM
    Copy code
    .local_8098/MESSAGES/f9fc6bf7-9624-4339-b05f-d79e37d3dd07=-101, /pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/dd41ca56-5cf8-4557-96a3-3a77dd19a7ae=-101, /pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/e9e1e106-8056-4511-a973-38774cbe46aa=-101, /pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/4936681d-cb42-4e8e-a24c-daa37eea54d9=-101, /pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/599d985a-4258-4c0f-8842-1c83ff3abaf1=-101, /pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/d5e4d24a-2d93-46bf-895e-e886acc9f0c7=-101, /pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/aeb5cb54-e6d4-4c9c-973f-489e5805b815=-101}
    2019/11/16 01:37:29.206 WARN [flattened_orders_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_STATE_TRANSITION] Skipping adding existing segment: flattened_orders_REALTIME_1573783918483_0__1__1573865385326 for table: flattened_orders_REALTIME with data manager class: ImmutableSegmentDataManager
    2019/11/16 01:37:29.234 WARN [ZkBaseDataAccessor] [ZkClient-EventThread-12-pinot-zookeeper:2181] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/278a9794-fd47-4237-9026-df420a31f56a=-101, /pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/MESSAGES/c140d331-873e-41fb-94a5-ee1a9a525fd8=-101}
    2019/11/16 01:38:28.826 ERROR [HLRealtimeSegmentDataManager_flattened_orders_REALTIME_1573783918483_0__1__1573868193909_flattened-orders-json] [Thread-166] FATAL: Exception committing or shutting down consumer commitSuccessful=false
    java.lang.IllegalStateException: No current assignment for partition flattened-orders-json-7
    	at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:259) ~[kafka-clients-2.0.0.jar:?]
    	at org.apache.kafka.clients.consumer.internals.SubscriptionState.seek(SubscriptionState.java:264) ~[kafka-clients-2.0.0.jar:?]
    	at org.apache.kafka.clients.consumer.KafkaConsumer.seek(KafkaConsumer.java:1508) ~[kafka-clients-2.0.0.jar:?]
    	at org.apache.pinot.core.realtime.impl.kafka2.KafkaStreamLevelConsumer.resetOffsets(KafkaStreamLevelConsumer.java:101) ~[pinot-connector-kafka-2.0-0.2.0-SNAPSHOT.jar:0.2.0-SNAPSHOT-eb45b438c5053f5caaf289614f386706a472947e]
  • a

    Alex

    11/16/2019, 6:17 AM
    it looks like one of them killed kafka consumer, as no new data is ingested. It still responds to queries. What can cause the issue?
  • a

    Alex

    11/16/2019, 6:17 AM
    and what is the way to get out of the situation?
  • k

    Kishore G

    11/16/2019, 6:20 AM
    GC? whats the table config MMAP or HEAP
  • b

    Buchi Reddy

    11/16/2019, 8:20 AM
    Is the zookeeper session timeout a cause or effect there?
1...969798...160Latest