https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • k

    Kewei Shang

    06/16/2021, 2:13 PM
    Hi team, I downloaded Pinot 0.7.1 and followed the
    Manual cluster setup
    (link)’s
    Using launcher scripts
    section, I ran
    Copy code
    export JAVA_OPTS="-Xms4G -Xmx8G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc-pinot-controller.log"
    bin/pinot-admin.sh StartController \
        -zkAddress localhost:2191 \
        -controllerPort 9000
    to start the controller. However, the controller logs lots of warns like the following (in the thread), and
    <http://localhost:9000/>
    returns a blank web UI. May I have some help please? Thanks. The docker version works for me but I want to install Pinot on our EC2 nodes for further PoC.
    m
    x
    • 3
    • 15
  • m

    Mark Needham

    06/16/2021, 3:48 PM
    Hi, I'm trying to learn how to use dimension tables, but I'm doing something wrong, but what I'm not sure. I have a
    regions
    dim table and
    cases
    normal table. And then I run this query:
    Copy code
    select areaName, lookUp('regions', 'Region', 'LTLAName', areaName)
    from cases limit 10
    But the error message says it doesn't find the lookup function:
    Copy code
    [
      {
        "errorCode": 200,
        "message": "QueryExecutionError:\norg.apache.pinot.core.query.exception.BadQueryRequestException: Unsupported function: lookup with 4 parameters\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:189)\n\tat org.apache.pinot.core.operator.transform.TransformOperator.<init>(TransformOperator.java:56)\n\tat org.apache.pinot.core.plan.TransformPlanNode.run(TransformPlanNode.java:52)\n\tat org.apache.pinot.core.plan.SelectionPlanNode.run(SelectionPlanNode.java:83)\n\tat org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:94)\n\tat org.apache.pinot.core.plan.InstanceResponsePlanNode.run(InstanceResponsePlanNode.java:33)\n\tat org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:45)\n\tat org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:234)\n\tat org.apache.pinot.core.query.executor.QueryExecutor.processQuery(QueryExecutor.java:60)\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:155)\n\tat org.apache.pinot.core.query.scheduler.QueryScheduler.lambda$createQueryFutureTask$0(QueryScheduler.java:139)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)\n\tat shaded.com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:111)"
      }
    ]
    Any ideas?
    j
    k
    j
    • 4
    • 9
  • k

    Kewei Shang

    06/16/2021, 8:01 PM
    Hi team, the query to return the the earliest row’s timestamp
    select DATETIMECONVERT(MIN(created), '1:MILLISECONDS:EPOCH', '1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss', '1:SECONDS') as min_created from delivery_order limit 1
    failed with the following error (in slack thread). The
    created
    column is of type:
    Copy code
    {
      "name": "created",
      "dataType": "LONG",
      "format": "1:MILLISECONDS:EPOCH",
      "granularity": "1:MILLISECONDS"
    }
    Interestingly, the query
    select DATETIMECONVERT(created, '1:MILLISECONDS:EPOCH', '1:SECONDS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss', '1:SECONDS') as min_created from delivery_order limit 1
    without
    MIN()
    works fine. May I have some advice? Thanks.
    • 1
    • 1
  • s

    Sidd

    06/16/2021, 9:12 PM
    Hi All, we published the blog post today that I had referred to in yesterday's talk https://engineering.linkedin.com/blog/2021/text-analytics-on-linkedin-talent-insights-using-apache-pinot
    🍷 6
    🎉 4
    k
    • 2
    • 1
  • j

    Jai Patel

    06/16/2021, 10:11 PM
    Pinot Upsert Question: Upsert is supported only for realtime tables. That’s fine. The time column is use to determine the order of the updates to choose the latest one. What time is used to determine when to evict a row (visible or not). The documents tend to point to segment age to determine when to evict messages. In practice it seems to evict based on when the row was actually imported. What’s the expected behavior for a realtime (upsert) table?
    j
    • 2
    • 4
  • k

    Ken Krugler

    06/16/2021, 10:40 PM
    For an offline (batch-generated) table, if I don’t specify a
    segmentIngestionFrequency
    , then are
    APPEND
    and
    REFRESH
    values for
    segmentIngestionType
    essentially equivalent?
    m
    • 2
    • 16
  • r

    RK

    06/17/2021, 10:12 AM
    I have one table in pinot I was using same realtime table since last 15 days and it was working fine and loading realtime data from Kafka at a time I was able to see one segment in consuming state and others are in online state.but today suddenly it's stopped consuming the data when I checked the segment status all are showing as online and no segment is consuming the data.what could be the issue here. @User @User
    x
    • 2
    • 2
  • m

    Mark Needham

    06/17/2021, 4:26 PM
    I'm trying to query Pinot using the Python 'pinotdb' library:
    Copy code
    version: '3.7'
    
    services:
      pinot:
        image: apachepinot/pinot:0.7.1
        command: "QuickStart -type batch"
        container_name: "pinot-quickstart"
        volumes:
          - ./data:/data
        ports:
          - "9000:9000"
          - "8000:8000"
    Copy code
    from pinotdb import connect
    
    conn = connect(host='localhost', port=9000, path='/query/sql', scheme='http')
    curs = conn.cursor()
    curs.execute("""
        SELECT * from cases LIMIT 10
    """)
    for row in curs:
        print(row)
    I get this error when I run the query:
    Copy code
    Traceback (most recent call last):
      File "query.py", line 5, in <module>
        curs.execute("""
      File "/home/markhneedham/.local/share/virtualenvs/pinot-playground-V0PLiJ36/lib/python3.8/site-packages/pinotdb/db.py", line 44, in g
        return f(self, *args, **kwargs)
      File "/home/markhneedham/.local/share/virtualenvs/pinot-playground-V0PLiJ36/lib/python3.8/site-packages/pinotdb/db.py", line 289, in execute
        self.check_sufficient_responded(
      File "/home/markhneedham/.local/share/virtualenvs/pinot-playground-V0PLiJ36/lib/python3.8/site-packages/pinotdb/db.py", line 253, in check_sufficient_responded
        raise exceptions.DatabaseError(
    pinotdb.exceptions.DatabaseError: Query
    
    
        SELECT * from cases LIMIT 10
     timed out: Out of -1, only -1 responded, while needed was -1
    Am I querying on the right port? In the examples port 8009 is used, but I tried that and got a different error!
    m
    k
    • 3
    • 24
  • p

    Pedro Silva

    06/17/2021, 5:13 PM
    Hello, I've had to downgrade pinot from a 0.8.0 snapshot version to 0.7.1 (I needed some features from 0.8.0, but due to shifting needs was forced to 0.7.1). I deleted my old table and am currently re-ingesting into the 0.7.1 equivalent. However, I note that the UI is extremely slow with the following error when I try to query the table:
    Copy code
    [
      {
        "errorCode": 410,
        "message": "BrokerResourceMissingError"
      }
    ]
    The broker logs show this exception:
    Copy code
    2021/06/17 16:47:13.084 WARN [BaseInstanceSelector] [ClusterChangeHandlingThread] Failed to find servers hosting segment: HitExecutionView__10__19__20210616T1108Z for table: HitExecutionView_REALTIME (all ONLINE/CONSUMING instances: [] and OFFLINE instances: [] are disabled, counting segment as unavailable)
    2021/06/17 17:05:18.183 ERROR [BrokerResourceOnlineOfflineStateModelFactory] [HelixTaskExecutor-message_handle_thread] Caught exception while processing transition from OFFLINE to ONLINE for table: hitexecutionview_REALTIME
    java.lang.IllegalStateException: Failed to find ideal state for table: hitexecutionview_REALTIME
    	at shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:518) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
    	at org.apache.pinot.broker.routing.RoutingManager.buildRouting(RoutingManager.java:309) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
    	at org.apache.pinot.broker.broker.helix.BrokerResourceOnlineOfflineStateModelFactory$BrokerResourceOnlineOfflineStateModel.onBecomeOnlineFromOffline(BrokerResourceOnlineOfflineStateModelFactory.java:80) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_282]
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_282]
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_282]
    	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_282]
    	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
    	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
    	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
    	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_282]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
    	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
    2021/06/17 17:05:18.283 ERROR [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_thread] Exception while executing a state transition task hitexecutionview_REALTIME
    java.lang.reflect.InvocationTargetException: null
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_282]
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_282]
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_282]
    	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_282]
    	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:404) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
    	at org.apache.helix.messaging.handling.HelixStateTransitionHandler.handleMessage(HelixStateTransitionHandler.java:331) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
    	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:97) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
    	at org.apache.helix.messaging.handling.HelixTask.call(HelixTask.java:49) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_282]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
    	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
    Caused by: java.lang.IllegalStateException: Failed to find ideal state for table: hitexecutionview_REALTIME
    	at shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:518) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
    	at org.apache.pinot.broker.routing.RoutingManager.buildRouting(RoutingManager.java:309) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
    	at org.apache.pinot.broker.broker.helix.BrokerResourceOnlineOfflineStateModelFactory$BrokerResourceOnlineOfflineStateModel.onBecomeOnlineFromOffline(BrokerResourceOnlineOfflineStateModelFactory.java:80) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-afa4b252ab1c424ddd6c859bb305b2aa342b66ed]
    	... 12 more
    2021/06/17 17:05:18.301 ERROR [StateModel] [HelixTaskExecutor-message_handle_thread] Default rollback method invoked on error. Error Code: ERROR
    2021/06/17 17:05:18.383 ERROR [HelixTask] [HelixTaskExecutor-message_handle_thread] Message execution failed. msgId: 366af265-24b4-4b59-a28f-1e6387e5d2aa, errorMsg: java.lang.reflect.InvocationTargetException
    2021/06/17 17:05:18.392 ERROR [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_thread] Skip internal error. errCode: ERROR, errMsg: null
    2021/06/17 17:06:50.697 WARN [RoutingManager] [HelixTaskExecutor-message_handle_thread] Routing does not exist for table: hitexecutionview_REALTIME, skipping refreshing segment
    2021/06/17 17:06:51.179 WARN [RoutingManager] [HelixTaskExecutor-message_handle_thread] Routing does not exist for table: hitexecutionview_REALTIME, skipping refreshing segment
    2021/06/17 17:07:22.699 WARN [RoutingManager] [HelixTaskExecutor-message_handle_thread] Routing does not exist for table: hitexecutionview_REALTIME, skipping refreshing segment
    2021/06/17 17:07:23.997 WARN [RoutingManager] [HelixTaskExecutor-message_handle_thread] Routing does not exist for table: hitexecutionview_REALTIME, skipping refreshing segment
    2021/06/17 17:07:25.800 WARN [RoutingManager] [HelixTaskExecutor-message_handle_thread] Routing does not exist for table: hitexecutionview_REALTIME, skipping refreshing segment
    Any ideas?
    m
    • 2
    • 26
  • s

    Sumit Gupta

    06/18/2021, 12:34 AM
    Hey All, Glad to be part of the community 🙂 I work as a Tableau developer and recently was requested to load data from Pinot into Tableau and while searching the net, I found that Pinot has JDBC connector t use with Tableau but was confused around how to actually do it. Landed on the community slack and was wondering if anyone has had any luck connecting Pinot with Tableau using the JDBC connector? Link: https://docs.pinot.apache.org/users/clients/jdbc P.S: I also understand that this question might be too silly or "non-technical" for folks out here, but any kind of help will be appreciated! Coffee is on me!
    m
    • 2
    • 6
  • c

    Carl

    06/18/2021, 11:38 AM
    Hi, is there some documents for how to configure Java client to authenticate Pinot with TLS enabled, does it support both 1-way and 2-way authentication?
    m
    • 2
    • 2
  • n

    Neil Teng

    06/18/2021, 3:06 PM
    Hey, we are using presto on top of pinot. And we want to build star-tree index on the table. The aggregation function is DistinctCountHLL. And I will also use approx_distinct in prestoDB which is also back by HLL. I am wondering will presto respect this star-tree index in pinot?
    m
    x
    • 3
    • 11
  • n

    Neil Teng

    06/18/2021, 9:36 PM
    Hey, can anyone recommend other materials related to the "Raw value forward index" I am having a really difficult time understanding the Raw value forward index example .
    m
    • 2
    • 35
  • c

    Carl

    06/19/2021, 3:17 AM
    Currently we are deploying Pinot for customers facing online query. And we also have a use case to store 2 years data could be hundreds of millions records every day , and to build a offline report generator to query the offline data, do aggregation on different dimensions and convert to a csv report. Is Pinot able to handle this kind of use case? Would the offline report query affect online customer query latency? How is the cost efficiency to host a Pinot cluster to handle this kind of use case?
    m
    k
    • 3
    • 3
  • a

    Atri Sharma

    06/19/2021, 6:26 AM
    Hello!!
    👋 3
    x
    m
    • 3
    • 2
  • s

    Santhosh CT

    06/21/2021, 6:14 AM
    If we use pinotfs as s3 and give the data dir as s3 location. Is that a deep storage or is it replacing the storage engine local to s3
    m
    • 2
    • 2
  • n

    Neil Teng

    06/21/2021, 7:59 PM
    Hey I am wondering, will "Sorted forward index with run-length encoding" help group by on that column? I am not sure about how the execution plan and optimizer work in Pinot? (mysql will take advantage of physical sorted col in group by in some cases)
    m
    • 2
    • 2
  • n

    Neil Teng

    06/22/2021, 3:26 PM
    Hey, I have a query like this, will star-tree index recognize range filter? (all date are truncated to day granularity)
    Copy code
    SELECT approx_distinct(id) AS "count"
    FROM table
    WHERE start_date <= current_date()  AND end_date >= current_date()
    BTW, what about where cat in ('a', 'b', 'c')?
    m
    • 2
    • 2
  • c

    Carl

    06/22/2021, 4:21 PM
    Hi, is there some default limit for string dimensions field in Pinot, we are seeing some partial string returned from field queries.
    m
    • 2
    • 7
  • j

    Jackie

    06/22/2021, 5:50 PM
    Correction to the forward index reader optimization availability in the meetup talk: it is available in 0.7.1, not 0.6.0
    m
    k
    • 3
    • 2
  • e

    Evan Galpin

    06/22/2021, 8:32 PM
    hi folks 🙂 Good to be here! I’d really love to learn more about the capabilities of the star-tree index. In particular, I’m curious to know how it might enable ingesting raw data and creating “materialized views” for specific use cases later once they are known. This might be considered an anti-pattern for Pinot, and if so that would be good to know too 👍
    m
    k
    • 3
    • 23
  • l

    Lian Jiang

    06/22/2021, 8:40 PM
    I am new to pinot. I read online and feel that people choose pinot over druid is because it has better perf. https://leventov.medium.com/comparison-of-the-open-source-olap-systems-for-big-data-clickhouse-druid-and-pinot-8e042a5ed1c7 heavily compares clickhouse VS druid/pinot as opposed to druid VS pinot. I know startree data structure is unique to pinot but do not have a sense how much it help pinot win the race. Which tool is better in what scenario. Or one is obviously better than another in general. Could pinot guru shed some light? I love to hear the pinot advantages mapped to key design differences. Thanks very much.
    k
    d
    k
    • 4
    • 12
  • n

    Neil Teng

    06/22/2021, 10:06 PM
    I am wondering if my time dimension is in millisecond granularity, how it will be used in star-tree? Should I truncate it to day or week first? P.S. I see that star tree will automatically include dictionary-encoded Time/DateTime columns to the dimensionsSplitOrder property.
    m
    • 2
    • 4
  • z

    Zsolt Takacs

    06/23/2021, 3:47 PM
    In the docs at https://docs.pinot.apache.org/users/tutorials/schema-evolution#update-the-schema Is this section still accurate? I can't find any usage of the property in the code.
    Real-Time Pinot table: In case of real-time tables, make sure the "_pinot.server.instance.reload.consumingSegment_" config is set to true inside Server config. Without this, the current consuming segment(s) will not reflect the default null value for newly added columns.
    m
    j
    • 3
    • 8
  • z

    Zsolt Takacs

    06/23/2021, 3:49 PM
    Theres a comment
    // Whether to reload consuming segment on scheme update. Will change default behavior to true when this feature is stabilized
    I assume this has happened already
    j
    • 2
    • 1
  • i

    II

    06/23/2021, 8:29 PM
    hi, does pinot provide any benefits for use cases that no aggregation will be applied to query (means there is no metrics columns, all are dimension columns). and is that the reason why metrics column does not allow other like String types
    m
    • 2
    • 13
  • q

    Qianbo Wang

    06/23/2021, 8:51 PM
    Hi, having a question on lookup UDF join. Can we use the return value of this function in
    group by
    statement? thanks in advance.
    m
    • 2
    • 2
  • j

    Juraj Pohanka

    06/24/2021, 2:37 PM
    Thank you @User. Glad to be a part of a very vibrant community of a project with huge potential! 🙂
    k
    • 2
    • 3
  • j

    Jai Patel

    06/24/2021, 5:03 PM
    I have some questions about Pinot realtime/upsert tables: 1. Retention. We have an upsert table whose retention is set to 10 days. However, I’m seeing “latest” rows where the value is 14 days old. Is the cleanup process “lazy”? 2. Although not required are there any advantages/disadvantages to the time column being the same as the sorted column? Am I correct to understand that the recommendation is to only have one sorted column?
    j
    • 2
    • 6
  • n

    Neil Teng

    06/24/2021, 6:51 PM
    Hi, if I do a hot-update on the table config of a real time table. What will happen to the old index will they be cleaned up automatically?
    m
    j
    • 3
    • 9
1...222324...160Latest