https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • c

    Chundong Wang

    02/15/2021, 6:41 PM
    I’m wondering if there’re way other than groovy to get filter like “past 7 days” to work? Found this question back in 2017 about
    select count(*) as cnt from  log where date >= DATE_SUB(NOW(),INTERVAL 1 HOUR);
    x
    • 2
    • 4
  • k

    Karin Wolok

    02/16/2021, 3:53 PM
    Online Meetup starting in 3 hours : Advanced Pinot Features: Upsert and JSON Indexing https://www.meetup.com/apache-pinot/events/275731277/
    👍 5
    t
    s
    • 3
    • 3
  • e

    Elon

    02/17/2021, 1:37 AM
    Hi, the meetup today was great! Wanted to know when the meetup slides will be available. We have some users very interested in upsert.
    ➕ 5
    y
    k
    • 3
    • 2
  • n

    Nick Bowles

    02/19/2021, 3:11 AM
    So based off of the docs, since Pinot doesn’t have a specific date time format, and dates are converted to either strings, longs, or ints, does this hinder performance in any way? If it does, are there plans to add support for a datetime format?
    k
    • 2
    • 4
  • v

    vmarchaud

    02/22/2021, 1:31 PM
    Hey, question question: Is there any target date / milestone for the 0.7.0 release ? Thanks
    k
    • 2
    • 9
  • s

    Shawn Peng

    02/23/2021, 1:09 AM
    Hi, I’m trying to build a query for data within 7 days, but pinot is throwing error for
    DATETRUNC('hour', second(now()), 'SECONDS')
    , is this expected?
    x
    s
    • 3
    • 11
  • k

    Karin Wolok

    02/23/2021, 1:28 AM
    🎉 We officially passed 1K slack members!!! 🎉 🥳 👋 Welcome to the newbie Pinot community members who brought us over the edge! 🍷 Would love to know what brought you here and what you're working on. 😃 @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User
    🍷 3
    🎉 5
    👋 7
    r
    a
    • 3
    • 3
  • a

    ayush sharma

    02/23/2021, 10:42 PM
    Hi people, I am facing an issue with starting ThirdEye on top of Pinot. I have got pinot successfully set up and running. Now, I am trying to run ThirdEye on top of this pinot using the docker apachepinot/thirdeye image. After running the following docker command, an Error stating
    Database may be already in use
    Please find the attached log file. Any help is appreciated!
    Copy code
    docker run \
        --network=pinot-demo \
        --name thirdeye \
        -p 1426:1426 \
        -p 1427:1427 \
        -d apachepinot/thirdeye:latest
    Slack Conversation
    x
    s
    • 3
    • 15
  • n

    Nick Bowles

    02/26/2021, 5:35 PM
    I put in a request for Gitbook access, if someone could check on that so I can start contributing to the docs I would appreciate it 🙂
    k
    • 2
    • 2
  • k

    Ken Krugler

    02/26/2021, 6:00 PM
    Really interesting article about Uber doing schema-agnostic log aggregations…but they went with ClickHouse, not Pinot?!? https://eng.uber.com/logging/
    k
    y
    • 3
    • 2
  • v

    Vince Vinci

    02/27/2021, 2:46 AM
    Hi, not sure if this asked before, is there a way for pinot to aggregate into 15min / hourly into a new table, and remove the data from raw table, and if there's also late data from raw, can that be easily added back into the aggregated table? We wanted to reduce the storage required, and we wouldn't need them for long (we can keep it for 30 / 90 days)
    k
    • 2
    • 2
  • a

    Anupam Mukherjee

    03/02/2021, 7:23 AM
    Hi, I am from Cisco. we have recently decided to evaluate Apache Pinot for our cloud based analytic project. However while evaluation, I got stuck for one of our non functional requirements which is backup-restore. Can you please suggest how we can take periodic backup of Pinot to S3 for disaster recovery purpose?
    x
    • 2
    • 2
  • j

    Josh Highley

    03/02/2021, 3:55 PM
    Do lowlevel realtime tables support ingestionConfig-transformConfig ?
    k
    n
    • 3
    • 15
  • a

    Alex

    03/02/2021, 11:18 PM
    and what about Upserts?
    c
    j
    +2
    • 5
    • 16
  • j

    Josh Highley

    03/03/2021, 2:55 PM
    Ingesting JSON data into a realtime table. A field in the JSON is a JSON string with leading spaces but is always numeric data otherwise:
    Copy code
    { "account":"      123", .....}
    If my realtime table defines the account column as DOUBLE, then the record loads with no issue -- the spaces appear to be ignored. However, if I define the column as INT then the record does not load. More troublesome, I can't find any error messages in any of the logs -- I would expect some kind of error message?
    m
    k
    • 3
    • 13
  • j

    Josh Highley

    03/04/2021, 1:45 AM
    When streaming data via Kafka to a realtime table, does it have to be 1 record per message or is there a way to put multiple records in a single message?
    x
    k
    k
    • 4
    • 5
  • t

    troywinter

    03/05/2021, 3:48 AM
    Does Pinot support change schema existing column name? I tried change a column name, but got following exceptions on query:
    Copy code
    [
      {
        "errorCode": 500,
        "message": "MergeResponseError:\nData schema mismatch between merged block: [time_to_hour(LONG),age_decade(STRING),age_level(STRING),city(STRING),company_id(STRING),company_name(STRING),count_impression(LONG),count_in(LONG),count_passby(LONG),create_time(LONG),day(STRING),day_in_week(STRING),district(STRING),gate_id(STRING),gender(STRING),holiday_id(STRING),holiday_name(STRING),hour(STRING),is_holiday(STRING),month(STRING),province(STRING),region(STRING),shop_id(STRING),shop_name(STRING),temperature(STRING),temperature_id(STRING),total_duration(LONG),total_impression_duration(LONG),weather_cate_id(STRING),weather_cate_name(STRING),year(STRING)] and block to merge: [time_to_hour(LONG),age_decade(STRING),age_level(STRING),city(STRING),company_id(STRING),company_name(STRING),count_impression(LONG),count_in(LONG),count_passby(LONG),create_time(LONG),day(STRING),day_in_week(STRING),district(STRING),gate_id(STRING),gender(STRING),holiday_id(STRING),holiday_name(STRING),hour(STRING),is_holiday(STRING),month(STRING),province(STRING),region(STRING),shop_id(STRING),shop_name(STRING),temperature(STRING),temperature_id(STRING),total_duration(LONG),total_impression_duraion(LONG),weather_cate_id(STRING),weather_cate_name(STRING),year(STRING)], drop block to merge"
      }
    ]
    m
    • 2
    • 3
  • p

    Pankaj Thakkar

    03/05/2021, 6:54 AM
    If we extend a table schema in Pinot to add new columns (so it does not break backward compatibility); do we have to backfill data or can Pinot use null/default values to handle the older segments?
    👍 1
    m
    n
    • 3
    • 2
  • a

    ayush sharma

    03/05/2021, 7:36 PM
    How to ingest Data into pinot on kubernetes using native batch ingestion? Hi, I am trying to ingest csv data into pinot deployed on kubernetes using LaunchDataIngestionJob arg. I have verified that the table has been created on pinot and the job-spec and csv data are present on the node. This is my job-spec file
    Copy code
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: pinot-case-offline-ingestion
      namespace: my-pinot-kube
    spec:
      template:
        spec:
          containers:
            - name: pinot-load-case-offline
              image: apachepinot/pinot:0.3.0-SNAPSHOT
              args: ["LaunchDataIngestionJob", "-jobSpecFile", "/opt/data/table-configs/case_history/job-spec.yml"]
              volumeMounts:
                - name: mount-data
                  mountPath: /opt/data
          restartPolicy: OnFailure
          volumes:
            - name: mount-data
              hostPath:
                path: /opt/data
      backoffLimit: 100
    After applying this job to node, nothing happens and this is the log of the pod.
    Copy code
    SegmentGenerationJobSpec: 
    !!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
    excludeFileNamePattern: null
    executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
      segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
      segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
    includeFileNamePattern: glob:**/*.csv
    inputDirURI: /opt/data/csv_data/case_prod_data
    jobType: SegmentCreationAndTarPush
    outputDirURI: /pinot-segments/case_history
    overwriteOutput: true
    pinotClusterSpecs:
    - {controllerURI: '<http://192.168.49.2:30892/>'}
    pinotFSSpecs:
    - {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
    pushJobSpec: null
    recordReaderSpec:
      className: org.apache.pinot.plugin.inputformat.csv.CSVRecordReader
      configClassName: org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig
      configs: {delimiter: '|', multiValueDelimiter: ''}
      dataFormat: csv
    segmentNameGeneratorSpec:
      configs: {segment.name.prefix: case_history, exclude.sequence.id: 'true'}
      type: normalizedDate
    tableSpec: {schemaURI: null, tableConfigURI: null, tableName: case_history}
    
    Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
    Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
    Am I ingesting the data incorrectly ?
    x
    • 2
    • 37
  • j

    Jai

    03/09/2021, 2:06 PM
    what is this APAche Pinot all about
    m
    k
    • 3
    • 2
  • m

    Manish Bhoge

    03/10/2021, 3:00 PM
    I'm trying to set up the docker image of Pinot, and to set up this I'm doing the maven build :
    Copy code
    # Build Pinot
    $ mvn clean install -DskipTests -Pbin-dist
    But, it is failing with an error, any idea on this below error: [ERROR] Failed to execute goal org.apache.maven.pluginsmaven shade plugin3.2.1:shade (default) on project pinot-yammer: Execution default of goal org.apache.maven.pluginsmaven shade plugin3.2.1:shade failed: Plugin org.apache.maven.pluginsmaven shade plugin3.2.1 or one of its dependencies could not be resolved: The following artifacts could not be resolved: org.apache.maven.sharedmaven artifact transferjar:0.10.0, org.ow2.asmasmjar7.0 Could not transfer artifact org.apache.maven.sharedmaven artifact transferjar:0.10.0 from/to central (https://repo.maven.apache.org/maven2): Connect to repo.maven.apache.org:443 [repo.maven.apache.org/151.101.12.215] failed: Connection timed out (Connection timed out) -> [Help 1]
    k
    • 2
    • 8
  • j

    Josh Highley

    03/10/2021, 3:08 PM
    what's the difference between
    Copy code
    bin/pinot-admin.sh StartServer
    and
    Copy code
    bin/start-server.sh
    ? Which way should be used?
    k
    d
    • 3
    • 9
  • k

    Ken Krugler

    03/11/2021, 11:47 PM
    If we want to get the total number of groups for a
    group by
    , I assume currently we have to do a separate
    distinctcount
    or
    distinctcounthll
    , right? But if the group by uses multiple columns, what’s the best approach to getting this total group count?
    j
    k
    • 3
    • 17
  • a

    Anupam Mukherjee

    03/12/2021, 11:30 AM
    Hi we will be installing Pinot cluster in AWS on top of EKS. We know that in AWS EKS has Multi (Three) Availability Zone (AZ) based HA in a specific Region. So I would like to understand that whether the EKS based Pinot cluster will be by default Fault Tolerant & HA within the region in case of any AZ failure or not. I know that Pinot Server has Segment Replica and replica-group which provide HA within the cluster in case of server failure. But what will happen if the controller has issue in the cluster (on EKS) or multiple servers have been corrupted or the cluster (on EKS) as a whole goes down? Considering the fact that the server will have EBS as data serving file system (& EBS multi AZ replication/sync will be ON), will EKS by default bring up alternative node like Controller or Server (or even Broker)? Net-net can we expect 100% service availability in Pinot on EKS in any Region? Or do we need to setup another Pinot Cluster on EKS on another AZ i.e. minimum Two Pinot Cluster (On EKS) in Two AZ within a Region? Please suggest
    k
    j
    h
    • 4
    • 7
  • r

    Ravikumar Maddi

    03/12/2021, 4:25 PM
    I have a doubt, is It possible for a nested json data as Pinot table? Avro support nested entities(json) by using record type in Avro Schema. Like Avro, Pinot Table configuration supports nested json entities.(Like Account json contains address json as embedded. )
    k
    • 2
    • 1
  • r

    Ravikumar Maddi

    03/12/2021, 4:28 PM
    I have been gone through Pinot documentation that Pinot support Avro, but I am not able to find any samples or sample code regarding that. Can you help by referring some code with Pinot and Avor combination.
    k
    • 2
    • 2
  • a

    ayush sharma

    03/12/2021, 7:18 PM
    Hi all, I am writing this to explain the loop of problems that we are facing while working on the architecture having Superset (v1.0.1), *Pinot*(latest docker image) and Presto (starburstdata/presto:350-e.3 docker image). Working around a problem in one framework causes problem in the other. I do not know which community can help me to solve this hence, posting it on both. Till now: We have successfully pushed 1 million records in a pinot table and would like to build charts on Superset over it. Problem # 1 We connected superset to pinot successfully and were able to build SQL lab queries only to find out that Superset does not support Exploring of SQL Lab virtual data as a chart if the connected database is Apache Pinot. (The "Explore" button is disabled) Please let me know, if this can be solved or we interpreted it incorrectly as it will solve the whole problem at once. To work it around, we got to know that superset - presto connection would enable this Explore button and we had implementation of presto any-which ways in our plan. So, we implemented Presto on top of pinot. Problem # 2 We found that Presto cannot aggregate pinot records of count more than 50k throwing error
    Segment query returned '50001' rows per split, maximum allowed is '50000' rows. with query "SELECT * FROM pinot_table  LIMIT 50001"
    Presto cannot even query something like this:
    Copy code
    presto:default> select count(*) from pinot.default.pinot_table;
    Even, if we increase the 50k limit of presto's pinot.properties
    pinot.max-rows-per-split-for-segment-queries
    to 1 million, the presto server crashes stating heap memory exceeded. To work it around, we got to know that we can make pinot to do the aggregations and feed the aggregated result to presto which will in turn feed the superset to visualize the charts, by writing the aggregation logic inside the sub query of presto like,
    Copy code
    presto:default> select * from pinot.default."select count(*) from pinot_table"
    This returns the expected result. Problem # 3 We found that, though we can make pinot to do the aggregations, we cannot use the supported transformation function of pinot listed here, inside the sub query of presto. The query
    Copy code
    select datetrunc('day', epoch_ms_col, 'milliseconds') from pinot_table limit 10
    works fine in pinot but when embedded in presto as sub query like below does not work
    Copy code
    presto:default> select * from pinot.default."select datetrunc('day', epoch_ms_col, 'milliseconds') from pinot_table limit 10";
    Query failed: Column datetrunc('day',epoch_ms_col,'milliseconds') not found in table default.select datetrunc('day', epoch_ms_col, 'milliseconds') from pinot_table limit 10
    I do not know if we are doing something wrong while querying/implementing or have missed some useful config setting that can solve our problem. The SQL Lab query which we want to query from pinot and eventually use the result to make a chart is like
    Copy code
    SELECT 
        day_of_week(epoch_ms_col),
        count(*)
    from pinot_table
    group by day_of_week(epoch_ms_col)
    Any help is really appreciated !!!
    k
    x
    +2
    • 5
    • 7
  • r

    Ravikumar Maddi

    03/13/2021, 8:32 AM
    @All -- Few doubts: 1. Pinot Kafka Connector with Avro is possible. 2. If possible kindly any detailed document available online. I am fetching from one day, even no luck. Need Help 🙂
    a
    n
    m
    • 4
    • 4
  • r

    Ravikumar Maddi

    03/15/2021, 8:28 AM
    @All -- I created a flatten json from a lot nested actual json file. How can I create a pinot schema for flatten json, any sample are available.
    x
    • 2
    • 2
  • r

    Ravikumar Maddi

    03/16/2021, 12:40 AM
    Pinot - Not able to start zookeeper I am starting pinot components, as first step I am trying to start zookeeper. I am running the command to start zookeeper:
    Copy code
    bin/pinot-admin.sh StartZookeeper -zkPort 2181
    But I am getting like this after some time:
    Copy code
    zookeeper state changed (SyncConnected)
    Waiting for keeper state SyncConnected
    Terminate ZkClient event thread.
    Session: 0x1000014c3150000 closed
    Start zookeeper at localhost:2181 in thread main
    EventThread shut down for session: 0x1000014c3150000
    Unable to read additional data from client sessionid 0x1000014c3150005, likely client has closed socket
    Unable to read additional data from client sessionid 0x1000014c3150004, likely client has closed socket
    Unable to read additional data from client sessionid 0x1000014c3150002, likely client has closed socket
    Expiring session 0x1000014c3150004, timeout of 30000ms exceeded
    Expiring session 0x1000014c3150005, timeout of 30000ms exceeded
    Expiring session 0x1000014c3150002, timeout of 30000ms exceeded
    Unable to read additional data from client sessionid 0x1000014c3150009, likely client has closed socket
    Unable to read additional data from client sessionid 0x1000014c315000a, likely client has closed socket
    Unable to read additional data from client sessionid 0x1000014c3150007, likely client has closed socket
    Expiring session 0x1000014c315000a, timeout of 30000ms exceeded
    I restarted the server based suggestions prescribed online. Even no luck. Need help 🙂
    m
    • 2
    • 3
1...141516...160Latest