https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • e

    Elon

    06/09/2020, 12:39 AM
    We will update to 3 controllers. In general do you scale brokers to handle qps and scale the servers as the data grows?
    x
    n
    s
    • 4
    • 14
  • e

    Elon

    06/09/2020, 12:41 AM
    And another question: does the controller need to have a lot of cpu and memory, or just brokers and servers (i.e. servers have relatively small jvm heap size but a lot of extra memory for mmap'ed segments).
    x
    • 2
    • 9
  • s

    Subbu Subramaniam

    06/12/2020, 5:37 PM
    @srisudha it will be useful to know a few things about your use case. How many partitions are you ingesting? What is a rough ingestion rate? How many segments did any one partition (say, 0) make before you reached OOM? Realtime completed segments respect the
    loadMode
    setting, so if you. have set that to
    HEAP
    , I suggest you move it to
    MMAP
    and restart your servers. Realtime servers have a setting
    pinot.server.instance.realtime.alloc.offheap
    . Setting this to
    true
    makes sure that we use as little heap as possible during consumption, and memory-map files for the rest. If you do not want memory map (and want to use direct memory instead), you can set
    pinot.server.instance.realtime.alloc.offheap.direct
    to
    true
    but I don't think you have set this config. If you have, then please remove it.
    s
    s
    +2
    • 5
    • 59
  • p

    Pradeep

    06/12/2020, 10:54 PM
    cool thanks, also I see in the release-notes that pinot-s3 is included but i dont see it in the plugins (i just see adls, gcs, hdfs), is it supposed to be downloaded from somewhere else?
    Copy code
    ~/apache-pinot-incubating-0.4.0-bin$ ls plugins/pinot-file-system/
    pinot-adls  pinot-gcs  pinot-hdfs
    k
    • 2
    • 4
  • p

    Pradeep

    06/14/2020, 5:57 PM
    I tried using S3 as deep storage, there seems to be some issue with older clients of httpclient and httpcore See: https://github.com/aws/aws-sdk-java/issues/1919 With the below changes, i see segments getting uploaded to S3
    Copy code
    <dependency>
             <groupId>org.apache.httpcomponents</groupId>
             <artifactId>httpclient</artifactId>
    -        <version>4.5.3</version>
    +        <version>4.5.9</version>
           </dependency>
           <dependency>
             <groupId>org.apache.httpcomponents</groupId>
             <artifactId>httpcore</artifactId>
    -        <version>4.4.6</version>
    +        <version>4.4.9</version>
           </dependency>
    k
    • 2
    • 3
  • p

    Pradeep

    06/16/2020, 7:18 PM
    Is there support for SSL connections to Kafka for the KafkaConsumer used by pinot? https://docs.pinot.apache.org/basics/data-import/pinot-stream-ingestion/import-from-apache-kafka I don’t see option to pass in keystore/truststore etc as part of the configuration?
    k
    n
    • 3
    • 6
  • p

    Pradeep

    06/24/2020, 9:44 PM
    I created a table with following tagOverrideConfig
    Copy code
    "tenants": {
        "broker": "DefaultTenant",
        "server": "DefaultTenant",
        "tagOverrideConfig": {
          "realtimeConsuming": "DefaultTenant_REALTIME",
          "realtimeCompleted": "DefaultTenant_OFFLINE"
        }
      },
    But the restapi /tables/{tableName} only shows
    Copy code
    "tenants": {
          "broker": "DefaultTenant",
          "server": "DefaultTenant"
        },
    Wondering if this config is coupled with something else?
    k
    n
    • 3
    • 3
  • n

    Neha Pawar

    06/24/2020, 10:34 PM
    Segments are moved by a periodic task on the controller. so you might have to wait upto an hour to see the completed segments moving. it runs every hour
    p
    • 2
    • 1
  • e

    Elon

    06/29/2020, 11:43 PM
    We enabled istio in our k8s pinot deployment. Getting zk timeouts. Wanted to know if setting system properties for
    zk.connection.timeout
    and
    helixmanager.waitForConnectedTimeout
    is possible to do in a config or just set with `-D`zk.connection.timeout=...``
    • 1
    • 1
  • d

    Damiano

    07/01/2020, 1:16 PM
    Hello everybody! I need support to set up a cluster. I followed the instructions explained by @Neha Pawar in her video. Everything works as expected but i did that test locally. Now, I should organize all the components inside a real cluster that has 2/3 servers. I need to understand how to organize the components for an high-availability architecture. Obviously, i am talking about a very small cluster, so take "high-availability" with a grain of salt :) My doubt is regarding the distribution of the components over the servers. For example, seeing the video, the Zookeeper instance is just one, we start it with
    pinot-admin.sh StartZookeeper -zkPort 2181
    so the first question is: what about if the server with Zookeeper goes down? Can we share two or more zookeeper instances over multiple servers? Supposing we can create multiple zookeeper instances does every machine should also have its own Controller, Broker and Server components? Because having more than one broker/controller on the same machine does not have much sense to me, maybe for very high traffic? Could someone explain it a little bit more? Thanks.
    j
    n
    • 3
    • 20
  • k

    Kishore G

    07/01/2020, 3:52 PM
    @Dan Hill can you paste the response stats
    d
    • 2
    • 4
  • k

    Kishore G

    07/01/2020, 3:56 PM
    so the latency was also related to this?
    d
    • 2
    • 2
  • c

    Cinto Sunny

    07/01/2020, 5:54 PM
    Hi Team, I just installed Pinot locally and I ran the script
    Copy code
    ./bin/quick-start-batch.sh
    It is throwing an error:
    Copy code
    ***** Offline quickstart setup complete *****
    Total number of documents in the table
    Query : select count(*) from baseballStats limit 0
    Executing command: PostQuery -brokerHost 127.0.0.1 -brokerPort 8000 -queryType pql -query select count(*) from baseballStats limit 0
    Exception in thread "main" java.lang.NullPointerException
    	at org.apache.pinot.tools.Quickstart.prettyPrintResponse(Quickstart.java:75)
    	at org.apache.pinot.tools.Quickstart.execute(Quickstart.java:174)
    	at org.apache.pinot.tools.Quickstart.main(Quickstart.java:207)
    This is how the UI looks like
    m
    j
    x
    • 4
    • 73
  • d

    Dan Hill

    07/01/2020, 10:41 PM
    I'm hitting a slow query case where my combined offline/realtime table
    metrics
    is slow to query but the individual
    metrics_OFFLINE
    and
    metrics_REALTIME
    are quick to query separately. Any ideas?
    Copy code
    select utc_date, sum(impressions) from metrics_OFFLINE where utc_date >= 1591142400000 and utc_date < 1593648000000 group by utc_date order by utc_date ASC limit 1831
    This returns pretty fast (200ms) over a lot of 400mil rows. If I switch to
    metrics_REALTIME
    , it's also fast and returns zero rows.
    Copy code
    select utc_date, sum(impressions) from metrics_REALTIME where utc_date >= 1591142400000 and utc_date < 1593648000000 group by utc_date order by utc_date ASC limit 1831
    However, if I query
    metrics
    , it's very slow.
    Copy code
    select utc_date, sum(impressions) from metrics where utc_date >= 1591142400000 and utc_date < 1593648000000 group by utc_date order by utc_date ASC limit 1831
    x
    j
    • 3
    • 68
  • m

    Mayank

    07/05/2020, 12:10 AM
    @Dan Hill great questions. Let me take a stab at answering them:
    Copy code
    1. Backward compatible schema changes are safe (e.g. adding a new column, safe type changes int -> long, etc). Backward incompatible changes such as deleting a column, or changing to incompatible data type are not allowed.
    2. At LinkedIn, we usually ensure that a change is done in phases so as to not break a deployment. For example, you could deploy the change off by default to all components, and then turn them on in a way that does not break. Would need a bit more info on your specific change to comment on how to achieve that.
    3. We have internal tools at LinkedIn, but would be great to have them in the open source as well. One project in our roadmap that is in this direction is to build a performance validation framework.
    4. There are different ways we evaluate changes. For changes that are limited to a single node you can use PerfBenchmarkRunner along with QueryRunner (to run a specific qps) on two different setups. For a change that impacts scatter/gather and needs entire cluster we have tools internally to do so. But hoping that the project mentioned above can evolve into something that the community can also use.
    👍 1
    d
    • 2
    • 1
  • s

    Somanshu Jindal

    07/06/2020, 10:29 AM
    I need help with hardware requirements for the various components like cores, memory etc? Also which components are memory intensive, io intensive, cpu intensive etc. Currently i am thinking of • Controller - 2 • Broker - 2 • Servers - 3 (for realtime ingestion) • Zookeeper (should i go with standalone or cluster?) As far as i know, segments are stored on servers and controller (segment store), right?
    • 1
    • 2
  • k

    Kishore G

    07/06/2020, 3:50 PM
    @Somanshu Jindal For prod, here is a good setup
    Copy code
    controller 
    - min 2 (for fault tolerance) ideal 3 
    - 4 core, 4 gb (disk space should be sufficient for logs and temp segments) - 100 GB
    Broker
    - Min 2, add more nodes as needed as later to scale
     - 4 core, 4gb (disk space should be sufficient for logs) - 10GB min
    Zookeeper (cluster mode), 
    - min 3 (this is where the entire cluster state is stored)
    - 4 gb, 4 core,  disk space sufficient to store logs, transaction logs and snapshots. If you can afford, go with ssd if not disk will be fine. 100GB
    
    Pinot server
    - Min 2 (this is where the segments will be stored), you can add more servers anytime without downtime
    - 8 core, 16 gb, SSD boxes (pick any size that works for your use case (500 gb to 2TB or even more). 
    - If you are running on cloud, you can use mounted ssd instead of local ssd
    y
    • 2
    • 2
  • e

    Elon

    07/06/2020, 11:40 PM
    This is just updating the spec for the table
    m
    • 2
    • 1
  • m

    Mayank

    07/06/2020, 11:43 PM
    Makes sense, that could be backward incompatible.
    e
    • 2
    • 2
  • s

    Somanshu Jindal

    07/07/2020, 9:38 AM
    I am getting failures at the time of segment commit on controller and server keeps on retrying indefinitely. I have attached controller and server logs screenshot. (v0.3.0) Table config:
    Copy code
    {
      "tableName": "transcript",
      "tableType": "REALTIME",
      "segmentsConfig": {
        "timeColumnName": "timestamp",
        "timeType": "MILLISECONDS",
        "schemaName": "transcript",
        "replicasPerPartition": "1"
      },
      "tenants": {},
      "tableIndexConfig": {
        "loadMode": "MMAP",
        "streamConfigs": {
          "streamType": "kafka",
          "stream.kafka.consumer.type": "lowlevel",
          "stream.kafka.topic.name": "transcript-topic",
          "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
          "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
          "stream.kafka.broker.list": "localhost:9876",
          "realtime.segment.flush.threshold.time": "5m",
          "realtime.segment.flush.threshold.size": "5",
          "stream.kafka.consumer.prop.auto.offset.reset": "largest"
        }
      },
      "metadata": {
        "customConfigs": {}
      }
    }
    schema
    Copy code
    {
      "schemaName": "transcript",
      "dimensionFieldSpecs": [
        {
          "name": "studentID",
          "dataType": "INT"
        },
        {
          "name": "firstName",
          "dataType": "STRING"
        },
        {
          "name": "lastName",
          "dataType": "STRING"
        },
        {
          "name": "gender",
          "dataType": "STRING"
        },
        {
          "name": "subject",
          "dataType": "STRING"
        }
      ],
      "metricFieldSpecs": [
        {
          "name": "score",
          "dataType": "FLOAT"
        }
      ],
      "dateTimeFieldSpecs": [
        {
          "name": "timestamp",
          "dataType": "LONG",
          "format": "1:MILLISECONDS:EPOCH",
          "granularity": "1:MILLISECONDS"
        }
      ]
    }
    n
    • 2
    • 14
  • d

    Damiano

    07/07/2020, 6:07 PM
    Hello everybody I have three servers where I want to put Zookeeper cluster and the other Pinot components. I wonder how can I create a single entry address/port to let other components connect to zookeeper using the same address and port. I did a similar thing for brokers, I set up a load balancer to always call the same address and port without knowing the ip of each broker. But I do not know setting a load balancer for zookeeper seems unnecessary.
    x
    • 2
    • 10
  • e

    Elon

    07/08/2020, 12:14 AM
    Is it possible to add columns to a pinot table in place or do we have to save the data and recreate/reload?
    m
    • 2
    • 2
  • p

    Pradeep

    07/08/2020, 12:23 AM
    Hi, is there a simple way to mention not for
    REGEXP_LIKE
    ? (rather than trying to achieve that in the regular expression)
    n
    k
    s
    • 4
    • 9
  • a

    Alan H

    07/08/2020, 7:00 AM
    Hi. I'm standing up a Kubernetes/Helm cluster as per docs. Modified to use S3, again as per docs: https://docs.pinot.apache.org/plugins/pinot-file-system. Controller & Server are throwing
    Copy code
    ERROR [PinotFSFactory] [main] Could not instantiate file system for class org.apache.pinot.plugin.filesystem.S3PinotFS with scheme s3
    java.lang.ClassNotFoundException: org.apache.pinot.plugin.filesystem.S3PinotFS
            at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_252]
            at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_252]
            at org.apache.pinot.spi.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:80) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
    x
    • 2
    • 1
  • m

    Mayank

    07/08/2020, 10:38 PM
    As per the code the precondition is `
    Copy code
    Preconditions.checkArgument(!isNullOrEmpty(config.getProperty(REGION)));
    • 1
    • 1
  • c

    Cinto Sunny

    07/09/2020, 6:57 PM
    Hi Team, I was playing around with the getting started stream data — meetupRsvp I am able to load data and query. I was trying to understand the size implications of the input data vs output table. I see a rest api with table size info, however it is giving size as 0. Any idea what could be causing this, since I can query the table
    Copy code
    /tables/{tableName}/size
    {
      "tableName": "meetupRsvp",
      "reportedSizeInBytes": 0,
      "estimatedSizeInBytes": 0,
      "offlineSegments": null,
      "realtimeSegments": {
        "reportedSizeInBytes": 0,
        "estimatedSizeInBytes": 0,
        "missingSegments": 0,
        "segments": {}
      }
    }
    Also, where is the actual location on the disk where the segments are stored ? Is there some config for this ?
    n
    • 2
    • 21
  • r

    Raúl G.

    07/13/2020, 12:27 PM
    Hello, I'm newbie with Pinot and I need some help. I have deployed Pinot into a swarm cluster with docker. There is a controller, a broker and a server. I defined an schema and a realtime table. Zookeeper and Kafka were running before than Pinot, and are shared with other applications. I can see the table into the Data Explorer, but when I run a query ( select * from dmaTestData limit 10 ), only get an timeout error. Can someone help me? What's wrong in my configuration? -------------- docker-compose -------------- version: '3.7' services: pinot-controller: image: apachepinot/pinot:latest ports: - "9091:9000" command: ["StartController", "-zkAddress", "zookeeper:2181"] volumes: - ./data/controller_data:/opt/pinot/data - ./data/controller_configs:/opt/pinot/configs networks: - public pinot-broker: image: apachepinot/pinot:latest command: ["StartBroker", "-zkAddress", "zookeeper:2181"] volumes: - ./data/broker_data:/opt/pinot/data - ./data/broker_configs:/opt/pinot/configs networks: - public pinot-server: image: apachepinot/pinot:latest command: ["StartServer", "-zkAddress", "zookeeper:2181"] volumes: - ./data/server_data:/opt/pinot/data - ./data/server_configs:/opt/pinot/configs networks: - public networks: public: external: true ------ Schema ------ { "schemaName": "dmaTestSchema", "dimensionFieldSpecs": [ { "name": "dma_name", "dataType": "STRING" }, { "name": "dma_id", "dataType": "INT" } ], "metricFieldSpecs": [ { "name": "value", "dataType": "FLOAT" }, { "name": "value_old", "dataType": "FLOAT" } ], "dateTimeFieldSpecs": [ { "name": "time_stamp", "dataType": "STRING", "format" : "1DAYSSIMPLE_DATE_FORMAT:yyyy-MM-dd", "granularity": "1:DAYS" } ] } ----- Table ----- { "tableName": "dmaTestData", "tableType": "REALTIME", "segmentsConfig": { "timeColumnName": "time_stamp", "timeType": "DAYS", "schemaName": "dmaTestSchema", "replication": "1", "replicasPerPartition": "1" }, "tenants": { "broker": "pinot-broker", "server": "pinot-server" }, "tableIndexConfig": { "loadMode": "MMAP", "streamConfigs": { "streamType": "kafka", "stream.kafka.consumer.type": "simple", "stream.kafka.topic.name": "transcript-topic", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.broker.list": "kafka:9092", "stream.kafka.hlc.zk.connect.string": "zookeeper:2181/kafka", "stream.kafka.zk.broker.url": "zookeeper:2181/kafka", "realtime.segment.flush.threshold.time": "12h", "realtime.segment.flush.threshold.size": "100000", "stream.kafka.consumer.prop.auto.offset.reset": "smallest" } }, "metadata": { "customConfigs": {} } }
    x
    k
    • 3
    • 12
  • d

    Damiano

    07/19/2020, 2:23 PM
    Hello, i need to configure a ZK cluster. I read a tutorial, it is quite easy (https://dzone.com/articles/how-to-setup-zookeeper-cluster) the "problem" is that i have to create a zoo.cfg I did not find any details on Pinot documentation. I know i can set the port and the dataDir options but, how can i use custom configuration? i should add server.1= .... server.2= etc etc
    n
    m
    • 3
    • 2
  • m

    Mayank

    07/22/2020, 5:14 AM
    So failing queries can pass when run on console?
    b
    • 2
    • 1
  • m

    Mayank

    07/22/2020, 5:15 AM
    Do you have a way to observe metrics Pinot emits?
    b
    • 2
    • 4
12345...166Latest