https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • s

    Sadim Nadeem

    07/05/2021, 3:53 PM
    @Mayank @Xiang Fu @Jackie@Kishore G @Daniel Lavoie @Ken Krugler @Neha Pawar.. we are actually trying upsert above here as mentioned by @Radhika...but the table count is coming up as zero .. we have followed all the steps required as mentioned here https://docs.pinot.apache.org/basics/data-import/upsert .. we are using Apache samza API(check attached code snippet) for partition by as mentioned in the above doc for:- Partition the input stream by the primary key An important requirement for the Pinot upsert table is to partition the input stream by the primary key. For Kafka messages, this means the producer shall set the key in the
    send
    API. If the original stream is not partitioned, then a streaming processing job (e.g. Flink) is needed to shuffle and repartition the input stream into a partitioned one for Pinot's ingestion.
    upsert samza java streaming app snippet.txt
  • s

    Sadim Nadeem

    07/05/2021, 3:58 PM
    we can see this streaming application publishing data on the output topic on top of which this pinot upsert table is created and before writing the data . using samza API .. we are shuffling the data to push data with same key on same partition .. still the data is not ingested by pinot and table count comes as zero . the table schema and table creation script are shared above by @Radhika
  • c

    Carlos Domínguez

    07/08/2021, 9:40 PM
    Hi guys!
  • c

    Carlos Domínguez

    07/08/2021, 9:40 PM
    I have a question regarding Kafka integration with Pinot
  • c

    Carlos Domínguez

    07/08/2021, 9:42 PM
    Thanks in advance!
  • p

    Prashant Pandey

    07/13/2021, 5:47 AM
    Hi everyone, good morning 🙂. I need some help debugging slow queries in our Pinot cluster. We are running the following query:
    Copy code
    Select api_id, service_name, service_id, api_name, COUNT(*) FROM myTable WHERE tenant_id = 'someTenantId' AND ( api_id IS NOT NULL AND start_time_millis >= 1625039026768 AND start_time_millis < 1625643826768 ) GROUP BY api_id, service_name, service_id, api_name ORDER BY PERCENTILETDIGEST99(duration_millis) desc  limit 10000
    And these are the query stats:
    Copy code
    timeUsedMs: 1077
    numDocsScanned: 560325713
    totalDocs: 3103044892
    numServersQueried: 8
    numServersResponded: 8
    numSegmentsQueried: 623
    numSegmentsProcessed: 115
    numSegmentsMatched: 115
    numConsumingSegmentsQueried: 4
    numEntriesScannedInFilter: 25000000
    numEntriesScannedPostFilter: 2801628565
    numGroupsLimitReached: false
    partialResponse: -
    minConsumingFreshnessTimeMs: 1626154723247
    The most conspicuous of these stats is
    numEntriesScannedInFilter
    . The troubleshooting guide says that if this number is too high, we should consider adding an index on the column, While we don’t have an index on this, our segment config is:
    Copy code
    "segmentsConfig": {
          "timeType": "MILLISECONDS",
          "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
          "timeColumnName": "start_time_millis",
          "retentionTimeUnit": "DAYS",
          "retentionTimeValue": "7",
          "replicasPerPartition": "1",
          "schemaName": "rawServiceView"
        }
    As you can see, the
    timeColumnName
    is
    start_time_millis
    and therefore, we haven’t added any index on this column (our reasoning is that segments would be pruned on this column anyway so we don’t need an extra index).
    myTable
    is a real-time table. If I remove the filter on
    start_time_millis
    , then
    numEntriesScannedInFilter
    becomes 0. What are we doing wrong here?
  • k

    Kishore G

    07/13/2021, 6:20 AM
    there is nothing wrong, its working as expected
  • k

    Kishore G

    07/13/2021, 6:20 AM
    a segment is either • no match • full match
  • k

    Kishore G

    07/13/2021, 6:20 AM
    • partial match
  • k

    Kishore G

    07/13/2021, 6:21 AM
    no match or full match will not add to numEntriesScannedInFilter
  • k

    Kishore G

    07/13/2021, 6:21 AM
    but the partial ones will have to scan to evaluate the time filter
  • k

    Kishore G

    07/13/2021, 6:22 AM
    if you want to bring this further down, you can try range index on time column
  • p

    Prashant Pandey

    07/13/2021, 6:38 AM
    Thanks for the reply @Kishore G we’ll add an index on start_time_millis and report back.
  • k

    Kishore G

    07/13/2021, 6:41 AM
    it should be range index
  • p

    Prashant Pandey

    07/13/2021, 6:45 AM
    Yes, adding a range index only @Kishore G. We already have inverted indices for the other two fields. Let me test it and report back.
  • b

    Bruce Ritchie

    07/13/2021, 10:01 PM
    So, um, the dependencies seem to be a wee bit out of date for some things. Hitting this attempting to run the spark job on EMR 6.3.0/jdk 11: https://issues.apache.org/jira/browse/LANG-1384
    Copy code
    Caused by: java.lang.NullPointerException
            at org.apache.commons.lang3.SystemUtils.isJavaVersionAtLeast(SystemUtils.java:1626)
            at org.apache.spark.storage.StorageUtils$.<init>(StorageUtils.scala:207)
            at org.apache.spark.storage.StorageUtils$.<clinit>(StorageUtils.scala)
  • s

    Saurabh Dwivedy

    07/14/2021, 11:32 AM
    hello
  • s

    Saurabh Dwivedy

    07/14/2021, 11:32 AM
    I am trying to follow the steps outlined on the link https://docs.pinot.apache.org/basics/data-import/batch-ingestion for setting up a schema and table data in Pinot
  • s

    Saurabh Dwivedy

    07/14/2021, 11:33 AM
    I am able to upload the schema and the table structure using the commands bin/pinot-admin.sh AddTable \\ -tableConfigFile /path/to/table-config.json \\ -schemaFile /path/to/table-schema.json -exec
  • s

    Saurabh Dwivedy

    07/14/2021, 11:34 AM
    But when I am trying to load the csv file data into the table using bin/pinot-admin.sh LaunchDataIngestionJob \\ -jobSpecFile /tmp/pinot-quick-start/batch-job-spec.yml
  • s

    Saurabh Dwivedy

    07/14/2021, 11:34 AM
    I am getting error as follows "Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 0: <http://localhost:9000>/tables/transcript/schema"
  • s

    Saurabh Dwivedy

    07/14/2021, 11:34 AM
    I am unable to understand why - I am doing nothing special - just following the steps outlined in the document
  • s

    Saurabh Dwivedy

    07/14/2021, 11:34 AM
    Can anyone help me with this issue
  • s

    Saurabh Dwivedy

    07/14/2021, 12:07 PM
    I’m running Pinot on local Mac
  • s

    Saurabh Dwivedy

    07/14/2021, 12:07 PM
    Not on spark etc
  • s

    Saurabh Dwivedy

    07/14/2021, 1:56 PM
    that's why it was unable to locate the file and giving the error accordingly.
  • s

    Saurabh Dwivedy

    07/14/2021, 1:56 PM
    Pinot is amazing
  • l

    Luiz Gabriel Lima Pinheiro

    07/14/2021, 2:59 PM
    Hello! I am trying to create or update a job spec over http. Is there any
    LaunchDataIngestionJob
    http endpoint to be called? I could not find in the swagger interface to upload the jobSpec yaml file.
  • k

    Kishore G

    07/14/2021, 3:22 PM
    @Luiz Gabriel Lima Pinheiro is this for production or poc?
1...150151152...166Latest