https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • t

    Tejaswini Edara

    04/26/2022, 11:52 AM
    Hi Team, Anyone aware of dbt integration with pinot. I am trying use dbt to transformation data and push the data to pinot
    k
    • 2
    • 11
  • d

    Diana Arnos

    04/26/2022, 12:44 PM
    Hello, everyone 👋 Does Pinot have (or plan to have) something available inside AWS marketplace?
    m
    • 2
    • 1
  • k

    KISHORE B R

    04/27/2022, 12:52 PM
    Hi all, I have a question regarding historical data. How will pinot handle data which is existing say for quite a few years. Will there be any change in performance metrics when such historical data is queried after very long time?
    d
    m
    • 3
    • 12
  • n

    Nisheet

    04/27/2022, 3:06 PM
    Hi team, I am trying to bootstrap realtime upsert enabled table. I have around 2-3 years that I want to upload to this realtime table. I was trying to utilize the segment generation using spark to create segments and then upload those segments to realtime table. But the initial segment creation job itself fails as it tries to search for OFFLINE table in the table config. I couldn't find any better guide/documentation to perform this. I was just going through whatever changes is there in this PR https://github.com/apache/pinot/pull/6567 and was trying accordingly
    m
    j
    +3
    • 6
    • 12
  • a

    Alice

    04/28/2022, 2:12 AM
    Hi team, can Pinot ingest one partition of one Kafka topic which has many partition?
    m
    j
    k
    • 4
    • 25
  • c

    Chengxuan Wang

    04/28/2022, 9:36 AM
    wondering if it works for kafka streaming ingestion:
    Copy code
    "ingestionConfig": {
          "transformConfigs": [
            {
              "columnName": "brand_name_facility_id_tuple",
              "transformFunction": "concat(brand_name, facility_id, ':')"
            }
          ]
        },
    not sure if the
    concat
    works here. the examples here are mostly groovy function: https://docs.pinot.apache.org/developers/advanced/ingestion-level-transformations#column-transformation
    n
    m
    • 3
    • 7
  • a

    Alice

    04/28/2022, 11:10 AM
    Hi team, Is it possible that two tables belongs to different tenant server and broker has the same table name?
    n
    k
    • 3
    • 3
  • a

    Alice

    04/29/2022, 3:14 AM
    Hi team, is it a requirement to enable partitioning in Pinot to use upsert feature?
    m
    m
    • 3
    • 13
  • j

    Joe Lane

    04/29/2022, 10:21 PM
    I’m interested in building a segment fetcher that builds virtual segments on the fly from an OLTP transaction log.
    m
    k
    • 3
    • 14
  • f

    francoisa

    05/02/2022, 8:44 AM
    Hi. Is there any way from the rest API to retreive informations to monitor like nb_messages read by consumer / nb messages indexed . The goal here in my question is to monitor the ingestion and ensure we are not missing messages. I’ve found messages like that on the pinot-all.log but I want them from API if possible. Any recomanded way ?
    k
    n
    • 3
    • 3
  • v

    Vishnu Ghanta

    05/02/2022, 12:22 PM
    Hey guys, I am trying to establish jdbc connection to execute queries on pinot cluster. The pinot cluster is deployed on production environment and i am connecting from local(port forwarded pinot controller) to test the jdbc feature. I think while executing the query, the controller is resolving the broker with its name rather than IP and hence getting unknownhost exception.
    Copy code
    Caused by: org.apache.pinot.client.PinotClientException: java.util.concurrent.ExecutionException: java.util.concurrent.ExecutionException: java.net.UnknownHostException: pinot-broker-0.pinot-broker-headless.xxxxx-v2.svc.cluster.local: nodename nor servname provided, or not known
    	at org.apache.pinot.client.JsonAsyncHttpPinotClientTransport.executeQuery(JsonAsyncHttpPinotClientTransport.java:104)
    	at org.apache.pinot.client.Connection.execute(Connection.java:127)
    	at org.apache.pinot.client.Connection.execute(Connection.java:96)
    	at org.apache.pinot.client.PinotStatement.executeQuery(PinotStatement.java:63)
    	... 1 more
    Is there a way i can avoid this error because the same might happen when i move to production(Application is in different k8s cluster). TIA
    k
    m
    +2
    • 5
    • 9
  • a

    Aswini Nellimarla

    05/02/2022, 12:28 PM
    Hi, Apache Pinot can directly talk to datastores like Cassandra/Cosmos NoSql DB stores?
    f
    m
    • 3
    • 6
  • j

    Jinal Panchal

    05/02/2022, 12:42 PM
    Hello, I've started exploring Pinot.. So is there any way to define primary key & foreign key relationships so that we can maintain mapping? Because, how will it support join without maintaining relationships?
    m
    • 2
    • 5
  • e

    erik bergsten

    05/02/2022, 1:06 PM
    We started using the "latest" tagged docker image so we can use timestamp indexes but in this version kafka sasl_plain authentication doesnt work (class not found). Is it broken or will we just have to wait for an official release to get timestamp indexes and full kafka support in one image?
    m
    x
    h
    • 4
    • 8
  • a

    Alice

    05/02/2022, 3:05 PM
    Hi team, I noticed Timestamp Index is supported and tried to use it. But there is this error. {“code”400,“error”“Cannot deserialize value of type
    org.apache.pinot.spi.config.table.FieldConfig$IndexType
    from String \“TIMESTAMP\“: not one of the values accepted for Enum class: [INVERTED, FST, JSON, H3, TEXT, SORTED, RANGE]\n at [Source: (String)\“{\“tableName\“\“test time index\“,\“tableType\“\“REALTIME\“,\“segmentsConfig\“{\“schemaName\“\“test_time_index\“,\“timeColumnName\“\“created on\“,\“timeType\“\“MILLISECONDS\“,\“allowNullTimeValue\“true,\“replicasPerPartition\“\“1\“,\“retentionTimeUnit\“\“DAYS\“,\“retentionTimeValue\“\“30\“,\“segmentPushType\“\“APPEND\“,\“completionConfig\“{\“completionMode\“\“DOWNLOAD\“}},\“tenants\“{},\“fieldConfigList\“[{\“name\“\“timestamp\“,\“encodingType\“\“DICTIONARY\“,\“indexTypes\“[\“TIMESTAMP\“],\“time\“[truncated 3199 chars]; line: 1, column: 483] (through reference chain: org.apache.pinot.spi.config.table.TableConfig[\“fieldConfigList\“]->java.util.ArrayList[0]->org.apache.pinot.spi.config.table.FieldConfig[\“indexTypes\“]->java.util.ArrayList[0])“} Part of my table schema is: “dateTimeFieldSpecs”: [ { “name”: “timestamp”, “dataType”: “TIMESTAMP”, “format”: “1MILLISECONDSEPOCH”, “granularity”: “1:MILLISECONDS” } And part of my table config is: “fieldConfigList”: [ { “name”: “timestamp”, “encodingType”: “DICTIONARY”, “indexTypes”: [“TIMESTAMP”], “timestampConfig”: { “granularities”: [ “DAY”, “WEEK”, “MONTH” ] } } ] Any idea how to fix it?
    m
    j
    • 3
    • 5
  • p

    Padma Malladi

    05/02/2022, 11:01 PM
    Hi all, I am working on improving the query latency for my realtime time series table. There is no corresponding offline table and all the data is realtime data. It has about 61 billion records with 3.5 million unique ids and a size of 2.7 TB. I have the range index set as the timestamp and the unique id as the inverted index. I have the incoming streaming data coming from kafka partitioned. I have the segmentation strategy set to the default of balanced segmentation. Stats are saying that there are 2 servers queried, 34 segments matched, 34 segments processed and 34 segments matched. I am getting a query response time of ~2 seconds and sometimes 4 sec and repeated querying is giving me 50 ms. Would the following changes improve the query performance? 1. Changing the segmentation strategy to Partitioned Replica-Group Segment Assignment 2. Bloom filter (does it improve the performance for individual queries or aggregate queries only?) 3. I am assuming star tree index helps with aggregation and not independent records 4. we have the partitioning set as murmur in the table config 5. How can I allocate / increase the hot/warm memory 6. Tenants are set to DefaultTenant for both server and broker. Would changing this improve? If so, what should be changed 7. Would enabling default star tree and dynamic start tree creation help? 8. Would disabling nullhandling affect the performance? Its currently set to true, but i dont expect null values for the indexed id and timestamp fields 9. Should I set autoGeneratedInvertedIndex and createInvertedIndexDuringSegmentGeneration to true. They are false currently
    m
    k
    • 3
    • 51
  • w

    Weixiang Sun

    05/04/2022, 4:04 AM
    What is the difference between timeColumnName and sortedColumn inside the tableConfig from query performance perspective? If my query is mainly based on timeColumnName, should I use use the same column as sortedColumn?
    m
    • 2
    • 1
  • b

    BUNTY kumar

    05/04/2022, 10:19 AM
    Hi All, Is it possible to launch pinot cluster on kubernetes and point it to an already deployed zookeeper consisting of 1 month old metadata.This is more of migration of all the components except zookeeper to another kubernetes cluster within the same VPN.
    m
    x
    • 3
    • 2
  • s

    Saumya Upadhyay

    05/04/2022, 11:33 AM
    Hi All, if we increase kafka partition later as per requirement how pinot will behave and do we we need to change some config to tackle this situation in pinot to avoid any issues or it is fine pinot will create new segments as soon as the new partitions will be added to kafka topics?
    k
    • 2
    • 3
  • k

    Karin Wolok

    05/04/2022, 4:45 PM
    Just a reminder! 📣 StarTree's FIRST in-person conference is scheduled and we're looking for speakers!!! 📣 Real Time Analytics Summit (August 16/17 in San Francisco) You can submit a session or register here: https://www.startree.ai/real-time-analytics-summit Sponsorship opps also available. If interested, please shoot me a message! 🙂
  • x

    Xiang Fu

    05/04/2022, 7:50 PM
    Dear Community, TL;DR, Pinot removed PQL query endpoint and response format from the current master branch. Only SQL endpoint is supported starting from 0.11.0 release. More info: https://github.com/apache/pinot/issues/7430 Thanks @Jackie for all the works!
    👍 7
    p
    • 2
    • 2
  • r

    Ryan Ruane

    05/05/2022, 11:16 AM
    Pinot Client Rust Hi there. I wrote in the other day about multi-value column ingestion jobs, and at the request of @Mayank, I created the issue: https://github.com/apache/pinot/issues/8635. The reason I was trying to create a table with ingestion of all possible types is because I am writing a rustlang client modelled after https://github.com/startreedata/pinot-client-go. Here is the repo, if anyone is interested: https://github.com/yougov/pinot-client-rust
    k
    • 2
    • 9
  • t

    Tonya Moore

    05/05/2022, 5:32 PM
    Hi, folks! 👋 StarTree and Cisco Webex are co-hosting a virtual MeetUp on 12May at 7p CDT called WebEx: Real-Time Observability and Analytics with Apache Pinot pinot

    Presenters are Sachin Joshi, Vaibhav Mittal, and Tim Berglund.▾

    Please join us! 💻
    🆒 1
    🍷 4
    ❤️ 10
    a
    k
    • 3
    • 3
  • m

    Mohemmad Zaid Khan

    05/06/2022, 4:56 AM
    Hi, I have started
    PinotController
    ,
    PinotBroker
    and
    PinotServer
    using git branch
    multi_stage_query_engine
    code, still the join query is not working. Do I need to do something else?
    k
    r
    • 3
    • 7
  • j

    Jinal Panchal

    05/06/2022, 12:04 PM
    Hello, I didn't quite get the concept of dimension columns in Pinot. If we have datatypes well-defined for the columns, then what's the significance of specifying Pinot field specification, like metricsField, dimensionFields, etc?
    m
    n
    j
    • 4
    • 5
  • a

    ashutosh singh

    05/06/2022, 2:35 PM
    👋 Hi everyone!
    m
    m
    • 3
    • 2
  • d

    Diogo Baeder

    05/06/2022, 3:12 PM
    So, I just created a table with >40k rows, but with daily segments, 318 segments in total - not good, I want to rollup to monthly segments later -, and defined a JSON index for my main columns which contain dynamic data (data that just can't be defined as static columns). Even trying to brutalize this thing by querying all the data with a limit that surpasses the amount of rows I still get ~600ms queries! Geez, this thing is fast! 🙂
    m
    p
    • 3
    • 7
  • m

    Mathieu Druart

    05/06/2022, 10:52 PM
    Hi ! this PR : https://github.com/apache/pinot/pull/7272/files removed the Pulsar plug-in from the Pinot build because of this issue : https://github.com/apache/pinot/issues/7270. Now that the issue is marked as closed, does anyone know if the plug-in will be added back to the build ? Thank you !
    m
    k
    • 3
    • 9
  • a

    Alice

    05/07/2022, 1:22 AM
    Hi team, I have a question and don’t know how to solve it. How can I extract numOfStas.Policy in Kafka message and save it to a Pinot table field? When I use transformFunction, it doesn’t work. { “columnName”: “stas_policy”, “transformFunction”: “jsonPathString(stats, ‘$.text_body.fields.numOfStas.Policy’)” } And a sample Kafka message is like this: { “name”: “telemetry_signal_gfw_api_usage”, “stats”: { “text_body”: { “fields”: { “numOfStas”: 0, “numOfStas.Policy”: 21 } } } }
    m
    • 2
    • 8
  • d

    Diogo Baeder

    05/09/2022, 12:45 PM
    Hey guys, I'd like to ask a question which is not really a problem, but rather just a curiosity on how an aspect of the system works: every time I spin up Pinot with my docker-compose, create the tables, add data and query it for the first time, it does't query as fast as I'd like, but then right on the second and subsequent queries it gets blazing fast, even if I change many constraints in my query. I know that Pinot doesn't do "caching", but why is there such a big difference in query times? For example, it may drop from 900ms on the first query to 40ms, 30ms or even lower on the second, third, fourth etc queries.
    m
    m
    • 3
    • 8
1...404142...160Latest