Hi fellows Have you guys seen any documentation about using Apache Pinot #general

Hi fellows, Have you guys seen any documentation ...

luanmorenomaciel

08/10/2022, 3:43 PM

Hi fellows, Have you guys seen any documentation about using PySpark to ingest into an offline table? if yes would be able to send me the link?

🌟 1

Mayank

08/10/2022, 3:45 PM

I personally have not, but am also curious if someone has tried it.

luanmorenomaciel

08/10/2022, 3:51 PM

I really want to build something around this I've couple of people that want to try offline tables with Spark but they only have PySpark skillset no Java or Scala 😞 any thoughts?

Kishore G

08/11/2022, 5:29 AM

can you share some more detail on how you would envision the workflow

Kishore G

08/11/2022, 5:30 AM

if it makes it easier, we can file a github issue and continue the discussion there

luanmorenomaciel

08/11/2022, 6:37 PM

Hi @Kishore G today we're loading data to one of ours customers following this process

luanmorenomaciel

08/11/2022, 6:38 PM

basically, we're using PySpark to harmonize the data a deliver in parquet format so I can use the file system ingestion to put data into Pinot

luanmorenomaciel

08/11/2022, 6:38 PM

I'm wondering if we could unlock the following use-case? I'm 100% sure that would widen the doors to have Data Engineers to test it out Spark + Pinot using SQL+PySpark capability!

luanmorenomaciel

08/11/2022, 6:44 PM

the proposed workflow would be that one

luanmorenomaciel

08/11/2022, 6:45 PM

• enable the capability to connect from PySpark straight to the Pinot cluster • that would imply in lot of overwork that we're currently having right now = compress parquet, deal with lifecycle management of files arriving on lake, evolution of the schemas, inability to load from dataframe to pinot using offline tables

luanmorenomaciel

08/11/2022, 6:46 PM

not even that would reduce the hops but also it's going to speed up overall pipeline execution, if you want I can fill up the Git if you give me an example.

Kishore G

08/12/2022, 6:07 AM

will read this and get back to you

luanmorenomaciel

08/12/2022, 3:06 PM

alright see if does make sense and let me know

Open in Slack

Previous Next