https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • m

    murat migdisoglu

    10/14/2020, 9:49 PM
    https://apache-pinot.slack.com/archives/CDRCA57FC/p1602711095152100 I'm following up my thread related to the real time ingestion here..
    n
    m
    • 3
    • 31
  • e

    Elon

    10/15/2020, 7:04 PM
    Hi, we are upgrading to pinot 5 and wanted to change table schemas to use date time field instead of time field spec. Can we just post to schema endpoint or will we have to create new tables?
    n
    • 2
    • 3
  • t

    Tanmay Movva

    10/20/2020, 5:35 PM
    Hello, I am trying to setup s3 as segment store for pinot, which is deployed on kubernetes. Unfortunately it is a cross account bucket and we have to pass bucket ACL also. I couldn’t find any way to pass acl policy in the docs. Can anyone please help me with this?
    x
    k
    p
    • 4
    • 18
  • e

    Elon

    10/27/2020, 11:36 PM
    Hi, we are migrating our pinot installation to a different region in gke. Has anyone done this before? Do you recommend shutting pinot down, creating snapshots and then redeploying in the target region? Also, the ip's of the kafka brokers and schema registry will change, is it possible to modify the table config to reflect that?
    m
    • 2
    • 5
  • e

    Elon

    10/29/2020, 8:00 PM
    reset strategy is "earliest" but is there an api call to tell pinot to reset realtime ingestion?
    s
    • 2
    • 3
  • y

    Yash Agarwal

    11/04/2020, 5:45 AM
    Hey Team, I was working on batch replacement of multiple segments. I am looking to atomically replace all the segments together. I understand that there is an api for replacing segments but how do i configure the whole flow. we are fine with doubling the storage during the ingestion phase.
    m
    s
    • 3
    • 10
  • l

    lâm nguyễn hoàng

    11/04/2020, 7:42 PM
    hi team now I have a problem when I delete the realtime table and recreate the same name ... the segment status always says bad ... can you help me fix this?
    s
    • 2
    • 1
  • n

    Noah Prince

    11/08/2020, 5:56 PM
    https://apache-pinot.slack.com/archives/CDRCA57FC/p1604856017384000 Thread for this
    m
    s
    x
    • 4
    • 14
  • n

    Noah Prince

    11/09/2020, 3:34 PM
    But I thought it would get removed from the ideal state when it’s dead? And another server has no segments so can replace it?
    t
    k
    • 3
    • 9
  • j

    Jackie

    11/10/2020, 6:31 PM
    @Elon Inside Pinot,
    BOOLEAN
    is stored and handled as
    STRING
    e
    • 2
    • 3
  • p

    Pradeep

    11/11/2020, 6:41 PM
    Hi, I added an OFFLINE version to an existing REALTIME table, when I try to query the offline table using select * from table_OFFLINE I get BrokerResourceMissing error, tenant is same as the broker server I have. So wondering anybody has any ideas on what’s going on ?
    j
    • 2
    • 2
  • e

    Elon

    11/12/2020, 12:54 AM
    We have users that want to query offline data that is newer than 24 hours old using the broker (for aggregations) - is there any way to remove the time boundary restriction for offline tables or would it be that involved to change it (add a config, etc.)? lmk if there's a workaround or if someone can point me to the code.
    s
    • 2
    • 2
  • x

    Xiang Fu

    11/12/2020, 11:47 PM
    Copy code
    <https://github.com/apache/incubator-pinot/blob/master/docker/images/pinot/Dockerfile#L32>
    n
    • 2
    • 3
  • n

    Noah Prince

    11/13/2020, 3:42 PM
    I’m not seeing any configuration option for tagging a server. Do you just have to create the server then manually tag afterword?
    k
    c
    x
    • 4
    • 4
  • k

    Ken Krugler

    11/13/2020, 11:13 PM
    I was running Pinot locally, and wanted to reload some revised data. So I first deleted all segments for the target table via
    curl -v "<http://localhost:9000/segments/mytable?type=OFFLINE>" -XDELETE
    , which seemed to work, then re-ran my import job via
    bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile <spec file>
    , which also seemed to work. But when I run a query, I get
    Copy code
    ProcessingException(errorCode: 450, message: InternalError: java.net.SocketException: Host is down(connect failed)
    m
    • 2
    • 26
  • e

    Elon

    11/16/2020, 2:41 AM
    Hi, is there a way to throttle pinot ingestion? We resolved by scaling up, but pinot server pods were in a crash loop due to a spike in ingestion. We were thinking creating backpressure is more acceptable. lmk if there is a way to do this. btw, pinot rebalance is amazing, saved us again, no data loss:) Thanks!
    m
    s
    • 3
    • 3
  • t

    Tanmay Movva

    11/16/2020, 8:08 AM
    Hello, I was using the realtime provisioning tool with the following sample data
    Copy code
    Size of segment directory = 239.4mb
    Number of documents = 3529197
    retention = 30 days
    ingestion rate = 1000
    numPartitions = 1
    Table Replicas = 1
    I am running this by building pinot from source and the results are quite surprising.
    k
    s
    • 3
    • 8
  • e

    Elon

    11/19/2020, 10:49 PM
    Untitled
    Untitled
    m
    a
    • 3
    • 53
  • e

    Elon

    11/20/2020, 7:17 AM
    Trying to delete a broker instance that is no longer there, getting
    Copy code
    "error": "Failed to drop instance Broker_pinot-broker-3.pinot-broker-headless.pinot.svc.cluster.local_8099 - Instance Broker_pinot-broker-3.pinot-broker-headless.pinot.svc.cluster.local_8099 exists in ideal state for brokerResource"
    n
    • 2
    • 7
  • e

    Elon

    11/21/2020, 2:26 AM
    We had users who accidentally produced malformed data to a kafka topic. The realtime segments were in an "offline" state and then we saw log messages that the segments were removed, we do not see them in deleted segments, deep store or the servers. The users showed that non corrupt data from what were consuming segments was also missing (maybe from the bad segments?). Is there anything that would cause that behavior? Is that expected?
    • 1
    • 3
  • j

    João Comini

    11/30/2020, 9:08 PM
    Hello guys, how are you? I'm having some trouble understanding the results from the 
    RealtimeProvisioningHelper
    , may you help me? These are my doubts: • Why do we need a
    numHours
    parameter? What's the impact of having a consuming segment for a certain amount of time (pros/cons)? • And what does
    Mapped
    means in the
    Memory used per host
    result? Is it about the segments in disk? This is the results that I got:
    Copy code
    RealtimeProvisioningHelper -tableConfigFile /tmp/transaction-table.json -numPartitions 20 -pushFrequency null -numHosts 4,8,12,16,20 -numHours 24,48,72,96 -sampleCompletedSegmentDir /tmp/out/transaction_1606528528_1606614928_0 -ingestionRate 4 -maxUsableHostMemory 16G -retentionHours 768
    
    Note:
    
    * Table retention and push frequency ignored for determining retentionHours since it is specified in command
    * See <https://docs.pinot.apache.org/operators/operating-pinot/tuning/realtime>
    
    Memory used per host (Active/Mapped)
    
    numHosts --> 4               |8               |12              |16              |20              |
    numHours
    24 --------> 6.8G/71.9G      |3.4G/35.95G     |2.72G/28.76G    |2.04G/21.57G    |1.36G/14.38G    |
    48 --------> 7.33G/72.62G    |3.66G/36.31G    |2.93G/29.05G    |2.2G/21.79G     |1.47G/14.52G    |
    72 --------> 8.01G/73.11G    |4.01G/36.55G    |3.2G/29.24G     |2.4G/21.93G     |1.6G/14.62G     |
    96 --------> 8.39G/74.08G    |4.2G/37.04G     |3.36G/29.63G    |2.52G/22.22G    |1.68G/14.82G    |
    
    Optimal segment size
    
    numHosts --> 4               |8               |12              |16              |20              |
    numHours
    24 --------> 20.02M          |20.02M          |20.02M          |20.02M          |20.02M          |
    48 --------> 40.04M          |40.04M          |40.04M          |40.04M          |40.04M          |
    72 --------> 60.05M          |60.05M          |60.05M          |60.05M          |60.05M          |
    96 --------> 80.07M          |80.07M          |80.07M          |80.07M          |80.07M          |
    
    Consuming memory
    
    numHosts --> 4               |8               |12              |16              |20              |
    numHours
    24 --------> 756.05M         |378.02M         |302.42M         |226.81M         |151.21M         |
    48 --------> 1.47G           |750.11M         |600.09M         |450.07M         |300.04M         |
    72 --------> 2.15G           |1.07G           |878.76M         |659.07M         |439.38M         |
    96 --------> 2.92G           |1.46G           |1.17G           |896.57M         |597.71M         |
    
    Total number of segments queried per host (for all partitions)
    
    numHosts --> 4               |8               |12              |16              |20              |
    numHours
    24 --------> 320             |160             |128             |96              |64              |
    48 --------> 160             |80              |64              |48              |32              |
    72 --------> 110             |55              |44              |33              |22              |
    96 --------> 80              |40              |32              |24              |16              |
    n
    s
    x
    • 4
    • 43
  • t

    Tanmay Movva

    12/01/2020, 2:37 AM
    Hello, I have added updated the table with indexing config to add indices on some columns. After this I triggered
    Reload All Segments
    to apply the indexes. When I try to check the
    Reload Status
    , I get this error on the UI
    Copy code
    Table type : REALTIME not yet supported.
    j
    • 2
    • 3
  • t

    Tanmay Movva

    12/01/2020, 4:04 AM
    Hello, when I am querying min/max of a column which is not present in the table, pinot returns Infinity/-infinity. Shouldn’t the ideal behaviour be to throw an error saying the column is not present?
    x
    • 2
    • 7
  • y

    Yupeng Fu

    12/01/2020, 9:25 PM
    hey, any good way to optimize such query
    Copy code
    SELECT hour_start_timestamp_utc FROM downtime WHERE (secondsSinceEpoch > 1606247126) ORDER BY secondsSinceEpoch DESC, hour_start_timestamp_utc DESC LIMIT 1
    it scans the past 1 week of data but return only 1 record. since the table is large,  it ends up scanning about 100 million records per query, and takes seconds query output is like
    Copy code
    {
      "selectionResults": {
        "columns": [
          "hour_start_timestamp_utc"
        ],
        "results": [
          [
            "2020-12-01 09:00:00"
          ]
        ]
      },
      "exceptions": [],
      "numServersQueried": 9,
      "numServersResponded": 9,
      "numSegmentsQueried": 1059,
      "numSegmentsProcessed": 1059,
      "numSegmentsMatched": 18,
      "numConsumingSegmentsQueried": 0,
      "numDocsScanned": 142101504,
      "numEntriesScannedInFilter": 0,
      "numEntriesScannedPostFilter": 284203008,
      "numGroupsLimitReached": false,
      "totalDocs": 7374174837,
      "timeUsedMs": 3522,
      "segmentStatistics": [],
      "traceInfo": {},
      "minConsumingFreshnessTimeMs": 0
    }
    m
    x
    k
    • 4
    • 31
  • p

    Pradeep

    12/02/2020, 1:46 AM
    Hi, we are trying NFS as deep store for pinot in an off cloud setting, wondering if there’s anything we should be careful about? Also, did anybody try MinIO as deepstore for pinot?
    m
    x
    • 3
    • 4
  • t

    Tanmay Movva

    12/02/2020, 4:31 AM
    Hello, I am getting this error on one of the realtime tables
    Copy code
    ERROR [LLRealtimeSegmentDataManager_spanEventView__0__15__20201201T0448Z] [spanEventView__0__15__20201201T0448Z] Could not build segment
    java.lang.IllegalStateException: Cannot create output dir: /var/pinot/server/data/index/spanEventView_REALTIME/_tmp/tmp-spanEventView__0__15__20201201T0448Z-160688315943
    because of which pinot is not able to build segments/ingest data. How to debug this?
    k
    x
    • 3
    • 26
  • n

    Neer Shay

    12/02/2020, 1:28 PM
    Hi! I am interested in creating an infrastructure for monitoring machine learning models running in production and am very intrigued with what Pinot has to offer. I had some questions regarding my use case and it would be great to hear some feedback before I get started. 1. I want to monitor input features & output predictions - here I am essentially interested in anomaly detection and it appears ThirdEye answers this 2. I am interested in calculating business KPIs (precision, recall, accuracy, etc.) once labels are available - is it possible to do during ingestion? Is it possible to run custom scripts (python?) to calculate KPIs during ingest? 3. Visualization - I would like the ability to see data in dashboards as well as slice & dice. What tools are available for this? Thanks in advance for the assistance!
    k
    • 2
    • 3
  • j

    Jinwei Zhu

    12/02/2020, 8:37 PM
    Hi I'm creating table using Pinot UI, I followed the example https://docs.pinot.apache.org/basics/getting-started/pushing-your-streaming-data-to-pinot and copied the table into UI, but when I save the table, it says "Invalid table config: transcript_REALTIME" can anyone help me?
    n
    x
    • 3
    • 24
  • t

    Taran Rishit

    12/03/2020, 6:31 AM
    I tried to put this csv to query in pinot but other than the schema the data is not being shown in query console. What should be wrong? all the related files are in the attachment.
    data.txt
    n
    • 2
    • 2
  • k

    Ken Krugler

    12/03/2020, 2:58 PM
    I ran into an issue where a segment I created was > 8GB when tarred, and thus failed during the “converting segment” phase: Converting segment: /tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0 to v3 format v3 segment location for segment: crawldata_OFFLINE_2018-10-13_2020-10-11_0 is /tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3 Deleting files in v1 segment directory: /tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0 Computed crc = 1033854200, based on files [/tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3/columns.psf, /tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3/index_map, /tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0/v3/metadata.properties] Driver, record read time : 236809 Driver, stats collector time : 0 Driver, indexing time : 122449 Tarring segment from: /tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0 to: /tmp/pinot-d6bab609-8906-4c84-966b-5f96d41b1d80/output/crawldata_OFFLINE_2018-10-13_2020-10-11_0.tar.gz Failed to generate Pinot segment for file - s3://adbeat-pinot-files/compressed/3.gz java.lang.RuntimeException: entry size ‘8991809155’ is too big ( > 8589934591 ). at org.apache.commons.compress.archivers.tar.TarArchiveOutputStream.failForBigNumber(TarArchiveOutputStream.java:636) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
    k
    • 2
    • 6
1...567...166Latest