Apache Pinot #general

Ryan Clark

07/13/2021, 4:31 PM

JDBC I've added the pinot-client jar file to my DataGrip as a new Driver. It detects

org.apache.pinot.client.PinotDriver

. When I test the connection, I get this error. Any ideas why? Does Pinot integrate well with Tableau yet?

Copy code

Driver class 'org.apache.commons.lang3.tuple.Pair' not found.

Xiang Fu

07/14/2021, 8:19 AM

can you try to cast it to long?

sriramdas sivasai

07/14/2021, 5:38 PM

hello everyone, does any have any idea on this ? . i' m using latest release version of pinot (0.7.1). while doing the spark batchIngestion, its is throwing this. thanks

Copy code

Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/api/java/function/VoidFunction
	at java.base/java.lang.Class.getDeclaredConstructors0(Native Method)
	at java.base/java.lang.Class.privateGetDeclaredConstructors(Class.java:3137)
	at java.base/java.lang.Class.getConstructor0(Class.java:3342)
	at java.base/java.lang.Class.getConstructor(Class.java:2151)
	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:295)
	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:264)
	at org.apache.pinot.spi.plugin.PluginManager.createInstance(PluginManager.java:245)
	at org.apache.pinot.spi.ingestion.batch.IngestionJobLauncher.kickoffIngestionJob(IngestionJobLauncher.java:135)

sriramdas sivasai

07/14/2021, 8:03 PM

hello everyone, i see the queries are not returning any response if i add any UDF's in the query specifically on time. like this is the example query below

Copy code

select SUM(total_run_time) from events where user_id = 'XXXXX' GROUP BY TIMECONVERT(time,'SECONDS','HOURS')

here is my table config

Copy code

{
  "OFFLINE": {
    "tableName": "events_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "timeType": "SECONDS",
      "timeColumnName": "time",
      "replication": "1"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "loadMode": "MMAP",
      "enableDefaultStarTree": true,
      "enableDynamicStarTreeCreation": false,
      "aggregateMetrics": true,
      "nullHandlingEnabled": false
    },
    "metadata": {},
    "ingestionConfig": {
      "batchIngestionConfig": {
        "segmentIngestionType": "APPEND",
        "segmentIngestionFrequency": "DAILY"
      }
    },
    "isDimTable": false
  }
}

i'm actually trying out with less number of records of 0.5million and its has 1 metrics and 1 time stamp and 5 dimensions. please let me know, is there any change that needs to be done in the table config to make the queries run faster. Thanks

sriramdas sivasai

07/15/2021, 12:10 AM

hello everyone, im trying to run the spark batch ingestion job with spark-submit. while running the command, its not able to pickup the plugins and throwing as below.

Copy code

2021/07/15 00:07:42.306 ERROR [PluginManager] [main] Failed to load plugin [pinot-avro] from dir [/data_ssd/spark-retry/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-avro]
java.lang.IllegalArgumentException: object is not an instance of declaring class
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
	at org.apache.pinot.spi.plugin.PluginClassLoader.<init>(PluginClassLoader.java:50) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.createClassLoader(PluginManager.java:196) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.load(PluginManager.java:187) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.init(PluginManager.java:157) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.init(PluginManager.java:123) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.<init>(PluginManager.java:104) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.<clinit>(PluginManager.java:46) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:54) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) [spark-core_2.11-2.4.6.jar:2.4.6]
	at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [spark-core_2.11-2.4.6.jar:2.4.6]
2021/07/15 00:07:42.338 ERROR [PluginManager] [main] Failed to load plugin [pinot-batch-ingestion-spark] from dir [/data_ssd/spark-retry/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark]
java.lang.IllegalArgumentException: object is not an instance of declaring class
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
	at org.apache.pinot.spi.plugin.PluginClassLoader.<init>(PluginClassLoader.java:50) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.createClassLoader(PluginManager.java:196) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.load(PluginManager.java:187) ~[pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.init(PluginManager.java:157) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.init(PluginManager.java:123) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.<init>(PluginManager.java:104) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.spi.plugin.PluginManager.<clinit>(PluginManager.java:46) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand.main(LaunchDataIngestionJobCommand.java:54) [pinot-all-0.7.1-jar-with-dependencies.jar:0.7.1-e22be7c3a39e840321d3658e7505f21768b228d6]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
	at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) [spark-core_2.11-2.4.6.jar:2.4.6]
	at <http://org.apache.spark.deploy.SparkSubmit.org|org.apache.spark.deploy.SparkSubmit.org>$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) [spark-core_2.11-2.4.6.jar:2.4.6]
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) [spark-core_2.11-2.4.6.jar:2.4.6]

Does any face this issue ?

Sevvy Yusuf

07/15/2021, 1:44 PM

Hi everyone, I'm trying to use the controller API to create tenants and assign them brokers and servers but running into some issues. All of our broker and server instances are created with a "DefaultTenant" tag and when I make a POST request to /tenants I end up with a 500 error with message "Failed to allocate broker instances to Tag" due to not having enough untagged instances. Is there a way to create the instances without the "DefaultTenant" tag? I've tried manually changing the tag to "untagged" using the /instances endpoint per this page in the docs but I'm still running into the same issue. It works ok if I just use the /instances endpoint to update the tag but it feels like a hack doing it that way. Can someone advise on whether I'm missing a step and/or the best approach please? Thanks

Evan Galpin

07/15/2021, 3:35 PM

Would anyone be able to point me to either docs or code that would provide lower-level detail on the structure of a segment file and how to create one? Not how to use the admin tools to create a segment, but rather what the admin tool is doing to create a segment from an Avro input file for example. I’m curious about the

Segment Metadata Push

bulk ingestion strategy[1], which seems to imply writing segments to one of a few distributed file systems first, and then informing the controller about the segments and their associated metadata. I suppose I’m looking for the generic internals to create a segment from input data. Is `SegmentGenerationUtils.java`[2] the right starting place? Thanks! [1] https://docs.pinot.apache.org/basics/data-import/batch-ingestion#3-segment-metadata-push [2] https://github.com/apache/incubator-pinot/blob/master/pinot-common/src/main/java/o[…]che/pinot/common/segment/generation/SegmentGenerationUtils.java

Ronie Paolo

07/15/2021, 4:59 PM

Hello! I have an instance in which I want to deploy Pinot and other 3 instances where a Zookeeper cluster is deployed. I would like to connect Pinot with this quorum (the 3 Zookeepers servers). How can I set my 3 Zookeeper urls in the Pinot Controller configuration file (controller.zk.str)? Or I'm going in the wrong way? I would like to receive some orientation in order to use Pinot with my 3 Zookeeper servers. Thanks!

Evan Galpin

07/15/2021, 7:22 PM

Is there any performance implication associated with the number of segments that compose a given table?

kelv

07/16/2021, 3:54 AM

Hi! i'd like to build a realtime dashboard on a webpage, with panels that show last N messages, top values in last defined time period etc. Updates should be reflected on the webpage within a second, ideally. My questions are: is such a use case suited for pinot? is there any intention to provide a long-poll query interface, so I can minimize the number of queries repeatedly polling pinot?

Pedro Silva

07/16/2021, 10:54 AM

Hello, does pinot have support for ingesting avro kafka messages? Is it in the roadmap?

Anusha

07/16/2021, 3:38 PM

Hello, In Pinot I have 2 tenants, tenant A and tenant B. I want to create same table in 2 tenants. Is that possible?

suraj kamath

07/19/2021, 5:12 AM

Hi, I am exploring the possibility of using apache spark to move the segments from realtime table to offline table. What job type can I use in the Ingestion job spec to achieve this ? Has anyone achieved this , if so it would be helpful if you could point me to a doc/wiki

Ananth Packkildurai

07/19/2021, 2:34 PM

I noticed an interesting comment from Uber's recent article on Pinot usage for its support system analytical infrastructure. The article was published three days back, but is this statement still true for Pinot?

While Pinot is good at handling our SLAs, it comes with its own challenges. Pinot is an append-only database, which means users can only append records, rather than being able to update or delete existing records. This makes it difficult to compute even simple metrics, like the number of open orders by city. Query needs to identify the latest record for each order and count if the status is open.

Pinot also has limited query capabilities. When we started working with Pinot it was lagging in its capability to support JOIN operations with other tables. This forced us to denormalize the data before insertion into the database. Denormalizing multi-value fields, such as tags or badges, will result in an explosion of records if the database does not support complex data types like arrays. Pinot’s limited capabilities for upsert, join, and complex data types made our data modeling challenging for certain metrics.

Map

07/19/2021, 9:18 PM

Hi, I have several applications and I would like to watch for a metric they expose and send over Kakfa as a message. When I ingest the Kafka messages into Pinot, is there a way to aggregate them so that only the latest messages sent by each application are kept? If not, which is to say we have to keep all the messages, is there a way to query Pinot to show only the latest message for each application?

Lakshmanan Velusamy

07/20/2021, 6:20 AM

Hi Community, We have a table that records events emitted by an entity (timestamp, entity_id, status (OPEN/CLOSED)). Events are sparse, emitted only when there is a state change. We want to compute at any given point in time, how many entities are open (also want to track the trend, in a time range plot the trend of # of entities open). Are there any time series functions to help with this ?

Yupeng Fu

07/20/2021, 4:01 PM

hey, we just published this blog on geospatial in pinot support. If you are interested in, you can also tune in the meetup talk above

🙌 4

❤️ 2

Ryan Clark

07/20/2021, 7:54 PM

🧵 Complex schema (un-nesting json) not showing up in table

suraj kamath

07/20/2021, 9:28 PM

Hi All, I have written down my little understanding of apache pinot - Tables and segments and tried to put it down in simple and fun terms. Would love if you folks could check it out and help me build many such articles around pinot https://medium.com/@surajkmth29/apache-pinot-tables-and-segments-a72dc5854876 PS: If there any comments/suggestions on the details of the blog, please drop a comment so that we can make it better and accessible to pinot community

👏 5

🙌 3

Abhijeet Kushe

07/21/2021, 2:17 PM

I am interested in getting latest updates on kinesis-integration.This issue https://github.com/apache/incubator-pinot/issues/5648 mentions joining #kinesis-integration but I dont see the channel here in slack.Can someone point me to the right place to get more details ?

Neil Teng

07/21/2021, 4:14 PM

Hi, I am interested in how the system time is synced across nodes. I pass a presto query like

date > now() - interval'30' minute

to pinot. How much I can be sure about the "now()" function? Is it be translated to a exact time in presto and then pass to pinot? Then how much difference it can have across different pinot nodes?

Maitraiyee Gautam

07/21/2021, 4:34 PM

@User and are facing the issue with select * on the pinot, the select * is not reflecting all the columns of the table, when we do select with individual column names, they are getting reflected correctly, has anyone else faced any such problems?

Mark Needham

07/21/2021, 9:14 PM

I wrote a blog post showing how to analyse GitHub events using Pinot + Streamlit - https://markhneedham.medium.com/analysing-github-events-with-apache-pinot-and-streamlit-2ed555e9fb78 piggybacking on the work of @User and @User!

💯 6

🍷 9

Karin Wolok

07/22/2021, 3:04 PM

Don't miss today's meetup! Presented by @User! 🙂 https://www.meetup.com/apache-pinot/events/277818762/

🍷 1

Ryan Clark

07/22/2021, 6:55 PM

Can Pinot be hosted in one AWS account and read a stream from another account? Perhaps a way to give the table config an account number.

Map

07/22/2021, 10:01 PM

Hi, I know we can flush a segment based on the size, number of rows or time since creation. I wonder if there is a way to only trigger a flush at a certain time of the day, say midnight? I am asking because I notice it can take minutes to flush a segment, during which Pinot stops consuming new messages and hence there would be a delay of minutes. This may throw the users off. We might be doing it totally wrong and any suggestions would be appreciated!

Trust Okoroego

07/23/2021, 2:13 AM

Hi, I get a blank screen when I open a table in pinot query console. Pinot version 1.71. I guess its something with the UI. Any one noticed this?

Ryan Clark

07/23/2021, 5:02 PM

I'm trying to implement S3 deep storage with a

controller.conf

, but I believe the controller is not reaching zk. I'm providing the

zookeeper.zk.str

by giving it the

pinot-zookeeper

endpoint. 🧵

🙏 1

Trust Okoroego

07/24/2021, 11:01 AM

Hi group, I am trying to do a join of two realtime tables, but I get an error that my segment is empty

presto error: null value in entry: Server_172.23.0.5_8098=null.

when I check the realtime table I don't have segments already created. but when I query the same table without a join it returns a result

prateek nigam

07/26/2021, 12:00 PM

Data Encryption at rest in apache pinot - using HDFS as deep store, apache pinot support that?