https://pinot.apache.org/ logo
Join Slack
Powered by
# general
  • m

    Mayank

    04/29/2021, 4:45 PM
    Hello Pinot community wondering if there might be interest in talks about your use cases with Pinot in ApacheCon: https://www.apachecon.com/acah2021/
    s
    k
    • 3
    • 7
  • y

    Yupeng Fu

    04/29/2021, 9:29 PM
    hey, Pinot community, I want to share this Uber engineering blog (https://eng.uber.com/charon/) published today on how Uber combats COVID-related challenges for restaurants and other merchants across the world, using Apache Pinot and real-time analytics. Nice blog from @User @User
    🍷 7
    🎉 6
    🚕 4
    🚗 4
    👍 6
    m
    c
    • 3
    • 2
  • k

    kauts shukla

    05/01/2021, 9:23 AM
    Hello, If I add a new column in “primaryKeyColumns” in schema, How much time it would take to create indexing for the new column ?
    k
    • 2
    • 12
  • k

    kauts shukla

    05/02/2021, 11:12 AM
    Hi All, I have a Realtime table consuming from kafka. As of now it has 5 Billion records. I’m performing look up [predicate] on inverted index keys [userid, eventcategory, eventlabel] with using “metricFieldSpecs” column as timestampist for range condition. My Query is taking too much time to finish > 10 seconds almost. How can i can configure it with best optimised configuration. Query
    Copy code
    select userid,eventlabel,sessionid, MIN(timestampist) as mint, MAX(timestampist) as maxt, (MAX(timestampist) - MIN(timestampist)) as diff_time  from default.click_stream where eventlabel !='null' and timestampist between 1615833000000 and 1616225312000 group by userid,eventlabel,sessionid
    m
    • 2
    • 28
  • v

    Vengatesh Babu

    05/03/2021, 6:01 AM
    Hello, For Real-Time Tables, is there a condition like one Kafka topic should have one table data alone ?. In our cases, we have multiple tables data produced in a single topic. i.e we have a group of topics in Kafka. Each topic will serve for the set of tables based on the use case. is it possible to consume multiple table events from a single topic in pinot?
    k
    s
    • 3
    • 3
  • j

    Jonathan Meyer

    05/03/2021, 9:55 AM
    Hello 👋 Has anyone got experience with Pinot on ADLS (Gen 2) ? Specifically: • Any idea on minimum IOPS for running Pinot smoothly on lowish load ? (i.e. is a standard Storage account "enough ? If so, how "far" can we push it ?) • Is it recommended to create a dedicated PVC for
    controller.local.temp.dir
    ?
    🌟 1
    x
    • 2
    • 5
  • v

    Vengatesh Babu

    05/03/2021, 1:10 PM
    Few doubts regarding Streaming data : 1. Pinot supports data ingestion via streaming (Kafka) or batch(Hadoop) process. is there any direct API available for pushing data into Pinot? 2. Does pinot have a segment compaction process like Hbase compaction? Creating a lot of small segments will not affect query performance?
    m
    • 2
    • 4
  • p

    Pedro Silva

    05/03/2021, 1:35 PM
    Hello, Pinot docs related to deep-storage in K8s seem to be broken: https://docs.pinot.apache.org/operators/tutorials/deployment-pinot-on-kubernetes#deep-storage, can anyone point to the right resource?
    m
    r
    • 3
    • 12
  • p

    Pedro Silva

    05/03/2021, 5:28 PM
    Hello again, are pinot helm charts published to any hub? They don't exist in https://artifacthub.io/, are they just available in the github repo?
    x
    • 2
    • 1
  • p

    Pedro Silva

    05/04/2021, 10:46 AM
    Hello, are Pinot helm charts designed to define sensitive information such as credentials for deep storage in configMaps?
    d
    c
    • 3
    • 20
  • j

    Jonathan Meyer

    05/04/2021, 2:41 PM
    Hello What is the recommended (prod) way of ingesting batch data without Hadoop ? I'm thinking about having a Python component generate parquet files + copy on deepstore, and triggering an ingestion Something like the
    /ingestFromFile
    API endpoint but prod-compatible (where can segment creation be done in that case ? Minion ?) Thanks !
    k
    m
    • 3
    • 3
  • j

    Josh Highley

    05/04/2021, 7:29 PM
    in 'normal' sql queries, I can use an aggregate function with * to select all columns:
    Copy code
    select sum(a+b), * from my_table
    pinot query browser gives an error when I try this -- is there another way without specifically listing all the columns?
    j
    x
    • 3
    • 4
  • m

    Mus

    05/04/2021, 11:29 PM
    Hi! Is there a way to have the data streamed from Kafka and then put into S3 in Parquet format?
    m
    k
    l
    • 4
    • 8
  • k

    Karin Wolok

    05/05/2021, 2:16 AM
    Hey Pinot community! 🍷  Please, help us welcome our newest community members! pinot 👋 @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User 🎉 Please, tell us who you are and what brought you here! 😃
    s
    h
    +3
    • 6
    • 5
  • p

    Pedro Silva

    05/05/2021, 10:31 AM
    Hello, Regarding kafka-based streaming ingestion. When does pinot commit offsets to kafka? Is it after creating a segment? Can Pinot be configured to commit offsets only after a segment has been stored in deep storage to ensure no data is lost, in case segments in the server but not in deep storage are deleted?
    k
    m
    • 3
    • 14
  • p

    Pedro Silva

    05/05/2021, 2:26 PM
    Hello, is there documentation explaining the meaning of each configuration property of Pinot's components? I found https://docs.pinot.apache.org/configuration-reference/controller but it does not explain what each property is for, only defaults (for some). For instance, there is nothing for
    controller.local.temp.dir
    m
    • 2
    • 5
  • a

    Ambika

    05/05/2021, 2:29 PM
    Hello -- I am trying to load some 100M records into an offline table. At first attemp, it was a simple table with no additional indexes other than what was in the tutorial doc.... that went fine. Now I am trying to add a star tree index on it and the loading is going on for 30+ mins (last time it tokk 12 min)... This is where it is for the last 20 mins... Is there anyway to monitor progress of this ??
    Copy code
    Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
    Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
    Creating an executor service with 1 threads(Job parallelism: 0, available cores: 6.)
    Submitting one Segment Generation Task for file:/opt/pinot/ai/weather/global_weather100M.csv
    Using class: org.apache.pinot.plugin.inputformat.csv.CSVRecordReader to read segment, ignoring configured file format: AVRO
    RecordReaderSegmentCreationDataSource is used
    Finished building StatsCollector!
    Collected stats for 100000000 documents
    Created dictionary for INT column: date with cardinality: 30, range: 0 to 29
    Using fixed length dictionary for column: country, size: 110
    Created dictionary for STRING column: country with cardinality: 10, max length in bytes: 11, range: Australia to USA
    Created dictionary for INT column: pincode with cardinality: 10, range: 12324 to 3243678
    Created dictionary for INT column: week with cardinality: 53, range: 0 to 52
    Using fixed length dictionary for column: city, size: 80
    Created dictionary for STRING column: city with cardinality: 10, max length in bytes: 8, range: AMD to SRI
    Created dictionary for INT column: year with cardinality: 50, range: 1970 to 2019
    Created dictionary for INT column: temperature with cardinality: 50, range: 0 to 49
    Using fixed length dictionary for column: state, size: 20
    Created dictionary for STRING column: state with cardinality: 10, max length in bytes: 2, range: AS to WB
    Using fixed length dictionary for column: day, size: 63
    Created dictionary for STRING column: day with cardinality: 7, max length in bytes: 9, range: Friday to Wednesday
    Created dictionary for LONG column: ts with cardinality: 530768, range: 1620214278776 to 1620214809690
    Start building IndexCreator!
    Finished records indexing in IndexCreator!
    Finished segment seal!
    Converting segment: /tmp/pinot-00edd913-441c-4958-8555-9b380f12991b/output/weather_1_OFFLINE_1620214278776_1620214809690_0 to v3 format
    v3 segment location for segment: weather_1_OFFLINE_1620214278776_1620214809690_0 is /tmp/pinot-00edd913-441c-4958-8555-9b380f12991b/output/weather_1_OFFLINE_1620214278776_1620214809690_0/v3
    Deleting files in v1 segment directory: /tmp/pinot-00edd913-441c-4958-8555-9b380f12991b/output/weather_1_OFFLINE_1620214278776_1620214809690_0
    Skip creating default columns for segment: weather_1_OFFLINE_1620214278776_1620214809690_0 without schema
    Successfully loaded segment weather_1_OFFLINE_1620214278776_1620214809690_0 with readMode: mmap
    Starting building 1 star-trees with configs: [StarTreeV2BuilderConfig[splitOrder=[country, state, city, pincode, day, date, week],skipStarNodeCreation=[],functionColumnPairs=[max__temperature, minMaxRange__temperature, avg__temperature, min__temperature],maxLeafRecords=1000]] using OFF_HEAP builder
    Starting building star-tree with config: StarTreeV2BuilderConfig[splitOrder=[country, state, city, pincode, day, date, week],skipStarNodeCreation=[],functionColumnPairs=[max__temperature, minMaxRange__temperature, avg__temperature, min__temperature],maxLeafRecords=1000]
    
    
    
    Generated 65977917 star-tree records from 100000000 segment records
    k
    m
    • 3
    • 10
  • s

    Srini Kadamati

    05/05/2021, 3:21 PM
    Congrats to the StarTree team on the announcement! pinot https://www.startree.ai/startree-press-release.html
    dancingcharmander 1
    👍 12
    ❤️ 3
    🎉 19
    k
    m
    i
    • 4
    • 3
  • a

    Arun Vasudevan

    05/05/2021, 7:59 PM
    Hello Everyone….A Quick Question as I am reading thru the docs…..Is the Environment separation (Test/Stage/Prod) as well as achieved thru Tenants in Pinot?
    m
    • 2
    • 2
  • a

    Arun Vasudevan

    05/05/2021, 9:46 PM
    One more question….How is the schema change of a Table handled?
    m
    • 2
    • 3
  • p

    Pedro Silva

    05/06/2021, 10:23 AM
    Hello, I've been reading the Pinot documentation and I'm a bit confused regarding the data that Controller & Server are responsible for respectively. My understanding is that Server instances store actual data segments/partitions of a table. Controllers store only a mapping of which servers store which segments for a given table. If this is the case, what does it mean when a segment is uploaded to a Controller? As mentioned in: "Controller - When a segment is uploaded to controller, the controller saves it in the DFS configured."
    m
    • 2
    • 6
  • r

    RK

    05/06/2021, 11:52 AM
    Hello everyone I have recently started Using Apache Pinot and I am able to integrate my Kafka with Pinot. I have done the setup in local and once I started  Pinot using bin/quick-start-batch.sh. I am able to see all the Pinot details on localhost:9000. I want to add one user Authentication feature here so someone use this localhost:9000. It should ask for credential and then it should go to pinot home page. I checked multiple documents and youtube videos but could not find any reference for the same. Kindly suggest /guide me how Can i implement the same.
    m
    a
    a
    • 4
    • 19
  • r

    RK

    05/06/2021, 1:23 PM
    What is the process to use HDFS as Pinot deepstrage?
    c
    t
    +3
    • 6
    • 33
  • p

    Pedro Silva

    05/07/2021, 4:10 PM
    Hello, Can field transformations be composed? I.e:
    "fromDateTime(JSONPATHSTRING(result,'$.AudioLength','00:00:00.000'), 'HH:mm:ss.SSS')"
    where the transformed field is of type
    Long
    ?
    n
    • 2
    • 4
  • g

    Grace Walkuski

    05/07/2021, 5:06 PM
    Hello! I am wondering, if no fields in a query are aggregated, then is there an advantage to using 
    distinct
     over grouping by all the fields? For example, is there a difference in efficiency between these two?
    select distinct species, name from dataSource
     
    select species, name from dataSource group by species, name
    m
    k
    • 3
    • 9
  • a

    Akash

    05/07/2021, 9:14 PM
    Segment Loading Question: Currently I am loading data into Pinot via Spark job with following config:
    Copy code
    executionFrameworkSpec:
      name: 'spark'
      segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner'
      segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentTarPushJobRunner'
      segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner'
      extraConfigs:
        stagingDir: '<hdfs://hadoop/tmp/pinot_staging/>'
    jobType: SegmentCreationAndTarPush
    inputDirURI: '<hdfs://hadoop/hp/input/Event1/dateid=2020-12-30/>'
    outputDirURI: '<hdfs://hadoop/pinot/output/Event1/dateid=2020-12-30/>'
    Now this generates the segment on pinot/output/Event1/dateid=2020-12-30/ I have Pinot deepstorage on HDFS where controller data
    Copy code
    /hp/pinot/data/controller/Event1/
    Currently AFAIU, The data is moved from HDFS => Pinot Controller => HDFS. Is there a way to short circuit the whole network process ? I can see there is configuration in Table where we can specify batchIngestionConfig=>segmentIngestionType as REFESH. Though, there is no example anywhere, do we have any test in codebase or some blog/docs e.t.c
    m
    r
    • 3
    • 4
  • a

    Akash

    05/07/2021, 10:23 PM
    Currently, when i am uploading a lots of segment into Pinot, the table status moved to in BAD State for long period of time. Is this expected, or i have misconfigured the system ?
    m
    k
    j
    • 4
    • 53
  • a

    Ambika

    05/08/2021, 12:54 AM
    Hi Team -- is there a way to run rank function on top of the data in pinot ?
    m
    • 2
    • 9
  • t

    troywinter

    05/08/2021, 4:55 PM
    Can I use the
    datetimeconvert
    inbuilt function for ingestion transform in pinot? Is there any limitations when transforming time columns? I’m getting error when adding a transform function to table config, but no specific error msg is logged out.
    n
    • 2
    • 2
  • r

    RK

    05/09/2021, 11:33 AM
    am working On kerberos Kafka-Pinot Integration. I have completed the same and able to see Kafka Topics data in Pinot Table. Now I am woking on User Authentication Part. 
 When I am going on localhost:9000 I am able to see all the tables and Pinot details there directly. Instead of displaying all the details directly. I want to add one User Authentication page here. i.e. If someone click on localhost:9000 it should ask for userid and password then should move on Pinot home page. 
 Till now I have tried below steps: Created controller and broker file inside apache-pinot-incubating-0.7.1-bin/bin folder 1. Controller.properties 2. broker.properties 3. Stared zookeeper 4.Stated broker 5.started controller 6. started Pinot 
 Controller.properties content: controller.segment.fetcher.auth.token=Basic YWRtaW46dmVyeXNlY3JldA controller.admin.access.control.factory.class=org.apache.pinot.controller.api.access.BasicAuthAccessControlFactory controller.admin.access.control.principals=admin,user controller.admin.access.control.principals.admin.password=verysecret controller.admin.access.control.principals.user.password=secret controller.admin.access.control.principals.user.tables=myusertable,baseballStats,stuff controller.admin.access.control.principals.user.permissions=READ controller.port=9000 controller.host=localhost controller.helix.cluster.name=PinotCluster controller.zk.str=localhost:2123 controller.data.dir=/user/username/Mypinot 
 broker.properties content: pinot.broker.access.control.class=org.apache.pinot.broker.broker.BasicAuthAccessControlFactory pinot.broker.access.control.principals=admin,user pinot.broker.access.control.principals.admin.password=very secret pinot.broker.access.control.principals.user.password=secret pinot.broker.access.control.principals.user.tables=baseballStats
,otherstuff 
 command to start Broker: bin/pinot-admin.sh StartBroker-configFileName bin/broker.properties command to start Controller: bin/pinot-admin.sh StartController -configFileName bin/controller.properties command to start Pinot. bin/quick-start-batch.sh but still its not asking for username and password on localhost:9000. I have tried to pull latest code from GitHub which @User has added yesterday and trying to build it in maven but for me it's showing multiple error in all the files.i.e. the forked vm terminated without properly saying goodbye. VM crash or system.exit called? Kindly Suggest what else I need to add.?
    a
    • 2
    • 2
1...181920...160Latest