https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • s

    Shadab Anwar

    02/16/2022, 7:10 PM
    I want to create tenant in my pinot cluster but the documentation does not clearly how should i do that in kubernetes. Does tagging here mean labelling as in kubernetes as mentioned in docs ? Find my release file
    pinot-release.yaml
    m
    c
    • 3
    • 2
  • m

    Mohemmad Zaid Khan

    02/17/2022, 7:11 AM
    @User @User Is it deliberate that we don’t call toString() before calling hashCode() function here in HashCodePartitionFunction? If it is not then it’s a bug. https://github.com/apache/pinot/blob/master/pinot-segment-spi/src/main/java/org/ap[…]ache/pinot/segment/spi/partition/HashCodePartitionFunction.java Since we don’t call the
    toString()
    , A different hashCode is being generated for same value when segment pruning is done by PartitionSegmentPruner because it always call toString on literal value before invoking getPartitionId.
    m
    s
    • 3
    • 7
  • p

    Pavel Stejskal

    02/17/2022, 10:39 AM
    Hello! I’m running PoC with Pinot for quite heavy data. I’ve got table with ~ 20 billions rows, 5 predicates (pred1 cardinality ~ 5 millions uniqs, pred2 and pred3 the same - tight correlation, last pred5 has very low cardinality, tens of values). I need to achieve the best possible speed for lookups by these predicates for whole range (20-50 billions/rows). Currently my table is creating for these predicates bloom & inverted indices. Second problem is ingestion rate - apparently there is no problem to get ~ 160k/s documents which is insane in contrast to resources needed, but at the same time the query performance is very bad - 6 servers are pretty busy with ingesting and GC thus query is pretty bad, 20-50 seconds. My current setup is 6 servers, 1 controller. Split commit enabled to s3. Because there will be low QPS, I need to achieve low memory allocation for indices/segments. Do I need to consider some kind of bucketing/hidden partitioning for predicate values or is Pinot able to handle these data in SLA ~ 1000-3000 ms only with proper indexing? I can imagine some sort of work delegation for servers, e.g. consuming/segment creating ~ 3-4 servers and for querying allocate 6 servers. PS: I’ve got replication 1 for space saving as final total will be ~ 20 TB, segment size is currently 460MB (but in table is set to 1GB). Ingesting from 36 kafka partitions Any improvements, thoughts or tricks are welcomed! 🙂
    r
    • 2
    • 4
  • w

    Weixiang Sun

    02/18/2022, 9:28 PM
    A quick question about realtime table, all data inside the in-memory segment (mutable segment) should be in memory even though the pinot is columnar, right? As for the offline segment, only the columns in use are loaded into memory?
    s
    k
    +2
    • 5
    • 12
  • t

    Trust Okoroego

    02/21/2022, 9:46 AM
    Hello! I need to connect presto to Pinot with basic Auth. Could anyone point point me to how I can set this in the pinot.properties Presto catalog configuration.
    d
    m
    • 3
    • 4
  • m

    Minglei Zhang

    02/21/2022, 12:44 PM
    Hi, Why do we use DISTINCTCOUNT instead of using COUNTDISTINCT here ?
    m
    m
    • 3
    • 5
  • d

    Dan DC

    02/21/2022, 1:06 PM
    Hello, I've got a question about realtime tables. If I'm correct the kafka consumer group ID is built in the code using the table name and replica ID, however I'm not able to find a consumer group for the table in my kafka cluster. Is there a way to list all the consumer groups that a realtime table is using? I would look like those IDs are stored in ZK under ideal states but I can't find them. Thanks
    k
    m
    m
    • 4
    • 24
  • c

    Chengxuan Wang

    02/22/2022, 4:29 AM
    hello everyone, wondering if we have
    st_setsrid
    like function in crdb to change the spatial reference system?
    k
    y
    • 3
    • 7
  • p

    Prashant Pandey

    02/22/2022, 6:14 AM
    Hello team, I am trying to run the Realtime Provisioner for one of my tables with the following config:
    RealtimeProvisioningHelper -tableConfigFile /Users/prashant.pandey/table_config.json -numPartitions 4 -pushFrequency null -numHosts 12 -numHours 2 -sampleCompletedSegmentDir /Users/prashant.pandey/segment_dir -ingestionRate 4750 -maxUsableHostMemory 10G -retentionHours 24
    The segment is around 426M in size. But this returns the following:
    Copy code
    Note:
    
    * Table retention and push frequency ignored for determining retentionHours since it is specified in command
    * See <https://docs.pinot.apache.org/operators/operating-pinot/tuning/realtime>
    2022/02/22 11:41:31.825 INFO [RealtimeProvisioningHelperCommand] [main] 
    Memory used per host (Active/Mapped)
    
    numHosts --> 12              |
    numHours
     2 --------> NA              |
    2022/02/22 11:41:31.826 INFO [RealtimeProvisioningHelperCommand] [main] 
    Optimal segment size
    
    numHosts --> 12              |
    numHours
     2 --------> NA              |
    2022/02/22 11:41:31.826 INFO [RealtimeProvisioningHelperCommand] [main] 
    Consuming memory
    
    numHosts --> 12              |
    numHours
     2 --------> NA              |
    2022/02/22 11:41:31.827 INFO [RealtimeProvisioningHelperCommand] [main] 
    Total number of segments queried per host (for all partitions)
    
    numHosts --> 12              |
    numHours
     2 --------> NA              |
    Class transformation time: 0.271994872s for 4134 classes or 6.579459893565553E-5s per class
    Why am I getting
    N/A
    s? Is the config incorrect?
    m
    s
    s
    • 4
    • 9
  • a

    Ali Atıl

    02/22/2022, 7:49 AM
    Hello everyone 🙂 https://github.com/apache/pinot/issues/6921 I was wondering if there is any update on this issue? Is there any work done on it or are you planning on implementing this feature in the near future? Wish everybody a great day!
    m
    • 2
    • 4
  • k

    KISHORE B R

    02/22/2022, 12:52 PM
    Hi, is there any approach to view the contents stored on segment ?
    m
    m
    • 3
    • 3
  • k

    Karin Wolok

    02/22/2022, 1:36 PM
    Meetup tomorrow!! Feel free to share with friends who you think would benefit . 🙂 https://www.meetup.com/apache-pinot/events/283880626/
    ❤️ 1
    s
    • 2
    • 1
  • k

    Karin Wolok

    02/22/2022, 5:52 PM
    Welcome 👋 to all the new Apache Pinot 🍷 community members! Please tell us who you are and what brought you here! 😃 @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User
    🍷 1
    🙂 3
    p
    s
    • 3
    • 3
  • t

    Tiger Zhao

    02/22/2022, 8:43 PM
    Hi, just wondering how does replicas work for realtime tables in terms of choosing which replicas to query? From what I can see, it appears that the broker randomly chooses which replica to use when querying.
    k
    • 2
    • 3
  • k

    KISHORE B R

    02/23/2022, 2:08 PM
    Hi, I was performing stream ingestion through kafka in a standalone machine. I had 5 partitions created and hence 5 segments in pinot. The parameter "segment.flush.threshold.size" is set to 10000. When i try ingesting data with 100k records, only 50k records are available. Will the flushing of consuming segment take time to update or is 50k the upper bound for the configuration mentioned ?
    m
    m
    • 3
    • 12
  • s

    sunny

    02/25/2022, 3:15 AM
    Hi :-) I am new to Pinot. I am trying to test ACL. I want to set table ACL on user. I checked that controller, broker has acl config. but whenever add table or change table ACL, should I restart controller / broker ???
    m
    • 2
    • 1
  • k

    kaivalya apte

    02/25/2022, 10:34 AM
    Hello, I want to run realtimeprovisioninghelper, where can I find a sample completed segment?
    m
    s
    s
    • 4
    • 7
  • d

    Dan DC

    02/28/2022, 12:37 PM
    Hi, I've noticed my RealtimeToOfflineSegmentTask is not working anymore. I'm losing segments because they are not moved to the offline table. I only see 2 errors in the logs: one says "Job TaskQueue_RealtimeToOfflineSegmentsTask_Task_RealtimeToOfflineSegmentsTask_.... exists in JobDAG but JobConfig is missing! Job might have been deleted manually from the JobQueue: TaskQueue_RealtimeToOfflineSegmentsTask, or left in the DAG due to failed clean-up attempt from last purge" the other error is specific to a table and says "Got unexpected instance state map: {<list of pinot servers here>} for segment: <segment name here>"
    m
    x
    • 3
    • 5
  • s

    Saravanan Arumugam

    03/01/2022, 5:31 PM
    Hi everyone. My name is Saravanan. I got to know about Pinot from

    one of the youtube▾

    videos by Kishore. It's interesting to see how things work and amazing to see the practical applications of this system. I am here to learn more about it and along the way contribute in any possible manner.
    👋 1
    s
    m
    k
    • 4
    • 4
  • j

    Jaromir Hamala

    03/02/2022, 8:10 AM
    Hello, congratz on the tiered storage! I'm reading the announcement and it says: Note that this is not implemented as lazy-loading - Pinot servers directly query data on the cloud and are never downloading the entire segments locally. May I ask how does it work? I know close to nothing about S3, but I believe it's a dummy blob-store. You have to download blobs with segments before querying them, don't you? Am I missing anything? Thanks for any hint!
    m
    k
    l
    • 4
    • 8
  • a

    Ayush Kumar Jha

    03/03/2022, 5:55 AM
    hey everyone,This tiered storage thing sounds cool.Is it available for azure blob or it is in the pipeline??
    m
    p
    l
    • 4
    • 7
  • s

    Shadab Anwar

    03/03/2022, 9:21 AM
    Hi just need a confirmation. When i created my tables, my tables did not have any data but segments were created. I checked my s3 and there was no segment uploaded. However, as soon as data arrived in my tables, I checked and saw that segments were then uploaded to S3. So, wanted to confirm if segments are uploaded only when it has some data ??
    m
    m
    • 3
    • 5
  • l

    Lakshmanan Velusamy

    03/03/2022, 9:56 PM
    Hi Community, Can the timezone argument for DATETRUNC come from an another column in the table?
    m
    k
    • 3
    • 5
  • d

    Diana Arnos

    03/04/2022, 9:45 AM
    Hello everyone 😄 Out of curiosity, do you have any idea when the next version will be released? 👀
    a
    • 2
    • 2
  • c

    Chengxuan Wang

    03/04/2022, 2:12 PM
    hey I was trying to use geoindex feature in pinot. but seems the index doesn’t apply because the
    numEntriesScannedInFilter
    . is high (equals to the number of docs). the pinot version is
    0.8.0
    . the query is
    Copy code
    select count(*) from some_table where  st_distance(resto_st_point, st_point(116.459717 , 39.955734, 1)) < 3000
    the table config is
    Copy code
    "fieldConfigList": [
          {
            "name": "resto_st_point",
            "encodingType": "RAW",
            "indexType": "H3",
            "properties": {
              "resolutions": "12"
            }
          }
        ],
    .....
          "noDictionaryColumns": [
            "resto_st_point"
          ],
    and if i change the threshold to 300 (meters), the index hits.
    m
    y
    • 3
    • 18
  • c

    Chengxuan Wang

    03/04/2022, 4:26 PM
    another question related to geoindex, from the doc , seems only
    ST_Distance
    can take advantage of h3 index, how about
    ST_Contains
    ?
    x
    y
    • 3
    • 6
  • w

    Weixiang Sun

    03/04/2022, 7:08 PM
    Does Pinot provide any tool to merge small segments into bigger segments? We have mis-configuration creating a lot of small segments. This is problematic. I am wondering if we can mitigate it by merging the small segments.
    m
    p
    p
    • 4
    • 6
  • k

    Karin Wolok

    03/08/2022, 1:39 PM
    Hey hey! 👋 Welcome all you new Pinot slack members! 🍷 ❤️ Would love to know who you are and what brought you here! Please take a moment and give us a short 1 liner about yourself! 👂 @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User
    👋 2
    🙃 1
    thankyou 1
    a
    r
    • 3
    • 3
  • m

    Monica

    03/09/2022, 4:21 AM
    hey everyone, What size data do you store on pinot, how many machines are used and what are the machine configurations like?our current business is about PB size, but we store in a different way from pinot.we use HBase to store fields' inverted index and write row position in another hbase's table.Then we fetch filtered records from HDFS.we use some technics to reduce random IO, like compression,encoding, store data in batching, cache, etc. Because our data are stored as a row-format, it's really bad when query results hit large numbers. As far as i know, I guess when a query needs to read large segments(if it can't prune data on partition, star-tree...), is it painful for pinot, cause pinot may need to download lots of segments from segment store and rebuild each segment's index in servers' memory?
    m
    • 2
    • 6
  • w

    Weixiang Sun

    03/09/2022, 6:27 AM
    When ingesting the streaming data from kafka, how to concatenate array of strings from one source column to destination column as part of ingestionConfiguration?
    m
    k
    • 3
    • 2
1...353637...160Latest