Apache Pinot #general

Mayank

04/29/2021, 4:45 PM

Hello Pinot community wondering if there might be interest in talks about your use cases with Pinot in ApacheCon: https://www.apachecon.com/acah2021/

Yupeng Fu

04/29/2021, 9:29 PM

hey, Pinot community, I want to share this Uber engineering blog (https://eng.uber.com/charon/) published today on how Uber combats COVID-related challenges for restaurants and other merchants across the world, using Apache Pinot and real-time analytics. Nice blog from @User @User

🍷 7

🎉 6

🚕 4

🚗 4

👍 6

kauts shukla

05/01/2021, 9:23 AM

Hello, If I add a new column in “primaryKeyColumns” in schema, How much time it would take to create indexing for the new column ?

kauts shukla

05/02/2021, 11:12 AM

Hi All, I have a Realtime table consuming from kafka. As of now it has 5 Billion records. I’m performing look up [predicate] on inverted index keys [userid, eventcategory, eventlabel] with using “metricFieldSpecs” column as timestampist for range condition. My Query is taking too much time to finish > 10 seconds almost. How can i can configure it with best optimised configuration. Query

Copy code

select userid,eventlabel,sessionid, MIN(timestampist) as mint, MAX(timestampist) as maxt, (MAX(timestampist) - MIN(timestampist)) as diff_time  from default.click_stream where eventlabel !='null' and timestampist between 1615833000000 and 1616225312000 group by userid,eventlabel,sessionid

Vengatesh Babu

05/03/2021, 6:01 AM

Hello, For Real-Time Tables, is there a condition like one Kafka topic should have one table data alone ?. In our cases, we have multiple tables data produced in a single topic. i.e we have a group of topics in Kafka. Each topic will serve for the set of tables based on the use case. is it possible to consume multiple table events from a single topic in pinot?

Jonathan Meyer

05/03/2021, 9:55 AM

Hello 👋 Has anyone got experience with Pinot on ADLS (Gen 2) ? Specifically: • Any idea on minimum IOPS for running Pinot smoothly on lowish load ? (i.e. is a standard Storage account "enough ? If so, how "far" can we push it ?) • Is it recommended to create a dedicated PVC for

controller.local.temp.dir

🌟 1

Vengatesh Babu

05/03/2021, 1:10 PM

Few doubts regarding Streaming data : 1. Pinot supports data ingestion via streaming (Kafka) or batch(Hadoop) process. is there any direct API available for pushing data into Pinot? 2. Does pinot have a segment compaction process like Hbase compaction? Creating a lot of small segments will not affect query performance?

Pedro Silva

05/03/2021, 1:35 PM

Hello, Pinot docs related to deep-storage in K8s seem to be broken: https://docs.pinot.apache.org/operators/tutorials/deployment-pinot-on-kubernetes#deep-storage, can anyone point to the right resource?

Pedro Silva

05/03/2021, 5:28 PM

Hello again, are pinot helm charts published to any hub? They don't exist in https://artifacthub.io/, are they just available in the github repo?

Pedro Silva

05/04/2021, 10:46 AM

Hello, are Pinot helm charts designed to define sensitive information such as credentials for deep storage in configMaps?

Jonathan Meyer

05/04/2021, 2:41 PM

Hello What is the recommended (prod) way of ingesting batch data without Hadoop ? I'm thinking about having a Python component generate parquet files + copy on deepstore, and triggering an ingestion Something like the

/ingestFromFile

API endpoint but prod-compatible (where can segment creation be done in that case ? Minion ?) Thanks !

Josh Highley

05/04/2021, 7:29 PM

in 'normal' sql queries, I can use an aggregate function with * to select all columns:

Copy code

select sum(a+b), * from my_table

pinot query browser gives an error when I try this -- is there another way without specifically listing all the columns?

Mus

05/04/2021, 11:29 PM

Hi! Is there a way to have the data streamed from Kafka and then put into S3 in Parquet format?

Karin Wolok

05/05/2021, 2:16 AM

Hey Pinot community! 🍷 Please, help us welcome our newest community members! pinot 👋 @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User @User 🎉 Please, tell us who you are and what brought you here! 😃

Pedro Silva

05/05/2021, 10:31 AM

Hello, Regarding kafka-based streaming ingestion. When does pinot commit offsets to kafka? Is it after creating a segment? Can Pinot be configured to commit offsets only after a segment has been stored in deep storage to ensure no data is lost, in case segments in the server but not in deep storage are deleted?

Pedro Silva

05/05/2021, 2:26 PM

Hello, is there documentation explaining the meaning of each configuration property of Pinot's components? I found https://docs.pinot.apache.org/configuration-reference/controller but it does not explain what each property is for, only defaults (for some). For instance, there is nothing for

controller.local.temp.dir

Ambika

05/05/2021, 2:29 PM

Hello -- I am trying to load some 100M records into an offline table. At first attemp, it was a simple table with no additional indexes other than what was in the tutorial doc.... that went fine. Now I am trying to add a star tree index on it and the loading is going on for 30+ mins (last time it tokk 12 min)... This is where it is for the last 20 mins... Is there anyway to monitor progress of this ??

Copy code

Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Creating an executor service with 1 threads(Job parallelism: 0, available cores: 6.)
Submitting one Segment Generation Task for file:/opt/pinot/ai/weather/global_weather100M.csv
Using class: org.apache.pinot.plugin.inputformat.csv.CSVRecordReader to read segment, ignoring configured file format: AVRO
RecordReaderSegmentCreationDataSource is used
Finished building StatsCollector!
Collected stats for 100000000 documents
Created dictionary for INT column: date with cardinality: 30, range: 0 to 29
Using fixed length dictionary for column: country, size: 110
Created dictionary for STRING column: country with cardinality: 10, max length in bytes: 11, range: Australia to USA
Created dictionary for INT column: pincode with cardinality: 10, range: 12324 to 3243678
Created dictionary for INT column: week with cardinality: 53, range: 0 to 52
Using fixed length dictionary for column: city, size: 80
Created dictionary for STRING column: city with cardinality: 10, max length in bytes: 8, range: AMD to SRI
Created dictionary for INT column: year with cardinality: 50, range: 1970 to 2019
Created dictionary for INT column: temperature with cardinality: 50, range: 0 to 49
Using fixed length dictionary for column: state, size: 20
Created dictionary for STRING column: state with cardinality: 10, max length in bytes: 2, range: AS to WB
Using fixed length dictionary for column: day, size: 63
Created dictionary for STRING column: day with cardinality: 7, max length in bytes: 9, range: Friday to Wednesday
Created dictionary for LONG column: ts with cardinality: 530768, range: 1620214278776 to 1620214809690
Start building IndexCreator!
Finished records indexing in IndexCreator!
Finished segment seal!
Converting segment: /tmp/pinot-00edd913-441c-4958-8555-9b380f12991b/output/weather_1_OFFLINE_1620214278776_1620214809690_0 to v3 format
v3 segment location for segment: weather_1_OFFLINE_1620214278776_1620214809690_0 is /tmp/pinot-00edd913-441c-4958-8555-9b380f12991b/output/weather_1_OFFLINE_1620214278776_1620214809690_0/v3
Deleting files in v1 segment directory: /tmp/pinot-00edd913-441c-4958-8555-9b380f12991b/output/weather_1_OFFLINE_1620214278776_1620214809690_0
Skip creating default columns for segment: weather_1_OFFLINE_1620214278776_1620214809690_0 without schema
Successfully loaded segment weather_1_OFFLINE_1620214278776_1620214809690_0 with readMode: mmap
Starting building 1 star-trees with configs: [StarTreeV2BuilderConfig[splitOrder=[country, state, city, pincode, day, date, week],skipStarNodeCreation=[],functionColumnPairs=[max__temperature, minMaxRange__temperature, avg__temperature, min__temperature],maxLeafRecords=1000]] using OFF_HEAP builder
Starting building star-tree with config: StarTreeV2BuilderConfig[splitOrder=[country, state, city, pincode, day, date, week],skipStarNodeCreation=[],functionColumnPairs=[max__temperature, minMaxRange__temperature, avg__temperature, min__temperature],maxLeafRecords=1000]



Generated 65977917 star-tree records from 100000000 segment records

Srini Kadamati

05/05/2021, 3:21 PM

Congrats to the StarTree team on the announcement! pinot https://www.startree.ai/startree-press-release.html

dancingcharmander 1

👍 12

❤️ 3

🎉 19

Arun Vasudevan

05/05/2021, 7:59 PM

Hello Everyone….A Quick Question as I am reading thru the docs…..Is the Environment separation (Test/Stage/Prod) as well as achieved thru Tenants in Pinot?

Arun Vasudevan

05/05/2021, 9:46 PM

One more question….How is the schema change of a Table handled?

Pedro Silva

05/06/2021, 10:23 AM

Hello, I've been reading the Pinot documentation and I'm a bit confused regarding the data that Controller & Server are responsible for respectively. My understanding is that Server instances store actual data segments/partitions of a table. Controllers store only a mapping of which servers store which segments for a given table. If this is the case, what does it mean when a segment is uploaded to a Controller? As mentioned in: "Controller - When a segment is uploaded to controller, the controller saves it in the DFS configured."

05/06/2021, 11:52 AM

Hello everyone I have recently started Using Apache Pinot and I am able to integrate my Kafka with Pinot. I have done the setup in local and once I started Pinot using bin/quick-start-batch.sh. I am able to see all the Pinot details on localhost:9000. I want to add one user Authentication feature here so someone use this localhost:9000. It should ask for credential and then it should go to pinot home page. I checked multiple documents and youtube videos but could not find any reference for the same. Kindly suggest /guide me how Can i implement the same.

05/06/2021, 1:23 PM

What is the process to use HDFS as Pinot deepstrage?

Pedro Silva

05/07/2021, 4:10 PM

Hello, Can field transformations be composed? I.e:

"fromDateTime(JSONPATHSTRING(result,'$.AudioLength','00:00:00.000'), 'HH:mm:ss.SSS')"

where the transformed field is of type

Long

Grace Walkuski

05/07/2021, 5:06 PM

Hello! I am wondering, if no fields in a query are aggregated, then is there an advantage to using

distinct

over grouping by all the fields? For example, is there a difference in efficiency between these two?

select distinct species, name from dataSource

select species, name from dataSource group by species, name

Akash

05/07/2021, 9:14 PM

Segment Loading Question: Currently I am loading data into Pinot via Spark job with following config:

Copy code

executionFrameworkSpec:
  name: 'spark'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentUriPushJobRunner'
  extraConfigs:
    stagingDir: '<hdfs://hadoop/tmp/pinot_staging/>'
jobType: SegmentCreationAndTarPush
inputDirURI: '<hdfs://hadoop/hp/input/Event1/dateid=2020-12-30/>'
outputDirURI: '<hdfs://hadoop/pinot/output/Event1/dateid=2020-12-30/>'

Now this generates the segment on pinot/output/Event1/dateid=2020-12-30/ I have Pinot deepstorage on HDFS where controller data

Copy code

/hp/pinot/data/controller/Event1/

Currently AFAIU, The data is moved from HDFS => Pinot Controller => HDFS. Is there a way to short circuit the whole network process ? I can see there is configuration in Table where we can specify batchIngestionConfig=>segmentIngestionType as REFESH. Though, there is no example anywhere, do we have any test in codebase or some blog/docs e.t.c

Akash

05/07/2021, 10:23 PM

Currently, when i am uploading a lots of segment into Pinot, the table status moved to in BAD State for long period of time. Is this expected, or i have misconfigured the system ?

Ambika

05/08/2021, 12:54 AM

Hi Team -- is there a way to run rank function on top of the data in pinot ?

troywinter

05/08/2021, 4:55 PM

Can I use the

datetimeconvert

inbuilt function for ingestion transform in pinot? Is there any limitations when transforming time columns? I’m getting error when adding a transform function to table config, but no specific error msg is logged out.

05/09/2021, 11:33 AM

am working On kerberos Kafka-Pinot Integration. I have completed the same and able to see Kafka Topics data in Pinot Table. Now I am woking on User Authentication Part.   When I am going on localhost:9000 I am able to see all the tables and Pinot details there directly. Instead of displaying all the details directly. I want to add one User Authentication page here. i.e. If someone click on localhost:9000 it should ask for userid and password then should move on Pinot home page.   Till now I have tried below steps: Created controller and broker file inside apache-pinot-incubating-0.7.1-bin/bin folder 1. Controller.properties 2. broker.properties 3. Stared zookeeper 4.Stated broker 5.started controller 6. started Pinot   Controller.properties content: controller.segment.fetcher.auth.token=Basic YWRtaW46dmVyeXNlY3JldA controller.admin.access.control.factory.class=org.apache.pinot.controller.api.access.BasicAuthAccessControlFactory controller.admin.access.control.principals=admin,user controller.admin.access.control.principals.admin.password=verysecret controller.admin.access.control.principals.user.password=secret controller.admin.access.control.principals.user.tables=myusertable,baseballStats,stuff controller.admin.access.control.principals.user.permissions=READ controller.port=9000 controller.host=localhost controller.helix.cluster.name=PinotCluster controller.zk.str=localhost:2123 controller.data.dir=/user/username/Mypinot   broker.properties content: pinot.broker.access.control.class=org.apache.pinot.broker.broker.BasicAuthAccessControlFactory pinot.broker.access.control.principals=admin,user pinot.broker.access.control.principals.admin.password=very secret pinot.broker.access.control.principals.user.password=secret pinot.broker.access.control.principals.user.tables=baseballStats ,otherstuff   command to start Broker: bin/pinot-admin.sh StartBroker-configFileName bin/broker.properties command to start Controller: bin/pinot-admin.sh StartController -configFileName bin/controller.properties command to start Pinot. bin/quick-start-batch.sh but still its not asking for username and password on localhost:9000. I have tried to pull latest code from GitHub which @User has added yesterday and trying to build it in maven but for me it's showing multiple error in all the files.i.e. the forked vm terminated without properly saying goodbye. VM crash or system.exit called? Kindly Suggest what else I need to add.?