Apache Pinot #troubleshooting

Santosh

05/25/2021, 4:42 PM

Copy code

bin/pinot-admin.sh AddTenant -name Liquidation -role SERVER -instanceCount 2 -offlineInstanceCount 1 -realTimeInstanceCount 1 -exec
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/pinot/lib/pinot-all-0.7.1-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/pinot/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See <http://www.slf4j.org/codes.html#multiple_bindings> for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Anusha to Everyone (12:39 PM)
WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.pinot.spi.plugin.PluginClassLoader (file:/opt/pinot/lib/pinot-all-0.7.1-jar-with-dependencies.jar) to method java.net.URLClassLoader.addURL(java.net.URL)
WARNING: Please consider reporting this to the maintainers of org.apache.pinot.spi.plugin.PluginClassLoader
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Executing command: AddTenant -controllerProtocol http -controllerHost 10.222.86.148 -controllerPort 9000 -name Liquidation -role SERVER -instanceCount 2 -offlineInstanceCount 1 -realTimeInstanceCount 1 -exec
{"code":500,"error":"Failed to create tenant"}
{"code":500,"error":"Failed to create tenant"}

Surendra

05/25/2021, 5:05 PM

Hi, We are testing segment partitioning for REALTIME tables (Kafka as source) , But unable to find the configurations on documentation page except

When emitting an event to kafka, a user need to feed partitioning key and partition function for Kafka producer API

. Can someone give insights on how it works internally? How to configure schema registry for Kafka record keys ?

Machhindra

05/26/2021, 3:41 AM

@Neha Pawar I need help in Kafka streaming with avro schema. Here is my avro schema. Main object is Metric. It contains one MetricSource nested object. I could stream Metric.asvc fields like product/productversion/metricPath. I dont know how to map MetricSource. I would like to map as follows - MetricSource_time, MetricSource_metric, MetricSource_metricValue, MetricSource_category, MetricSource_subCategory, Metric_product, Metric_productVersion, Metric_metricpath

Copy code

Metric.avsc
===========
{
 "namespace": "com.blah",
 "name": "Metric",
 "type": "record",
 "fields": [{
    "name": "product",
    "type": ["string", "null"]
    },{
    "name": "productVersion",
    "type": ["string", "null"]
    },{
    "name": "MetricSource",
    "type": ["com.blah.MetricSource", "null"]
    },{
    "name": "metricPath",
    "type":{
       "type": "array",
       "items": ["string", "null"]
    }
   }]
}

MetricSource.avsc
===========
{
 "namespace": "com.blah",
 "name": "MetricSource",
 "type": "record",
 "fields": [{
     "name": "metric",
     "type": ["string", "null"]
     },{
     "name": "metricValue",
     "type": ["string", "null"]
     },{
     "name": "time",
     "type": "long"
     },{
     "name": "timeOffset",
     "type": "double"
     },{
     "name": "category",
     "order": "ignore",
     "type": ["null", "string"],
     "default": null
     },{
     "name": "subCategory",
     "order": "ignore",
     "type": ["null", "string"],
     "default": null
     }
    ]
 }

Jonathan Meyer

05/26/2021, 8:40 AM

Hello 🙂 When ingesting Batch data + data partitioning (Parquet) using a key, that key is "missing" from the parquet file parts (makes sense) However, from what I've seen, Pinot cannot find that key then, and fails to generate the segments My current workaround is to duplicate the partition column. Is that a known issue / possible to adjusts settings ?

05/26/2021, 4:56 PM

https://apache-pinot.slack.com/files/U020N9ADN5D/F0233EVB6P5/img_20210526_174024.jpg▾

Neil Teng

05/26/2021, 7:16 PM

Hi there, anyone know what should be the schema for such time format? 2021-05-19T180823.583Z Had tried but no luck

Copy code

{
      "name": "date",
      "dataType": "STRING",
      "format": "1:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss.SSS'Z'",
      "granularity": "1:MILLISECONDS"
    },

Sadim Nadeem

05/27/2021, 3:04 PM

is there some way to delete few rows in pinot table .. means i want to get rid of some garbage data consumed into pinot .. so can i run some delete query to delete only rows matching that query

Jonathan Meyer

05/27/2021, 3:35 PM

Hello 😄 Does Pinot have any support for hierarchical aggregations ? Say we have a tree like structure, with values at the leaves, is there an efficient way to get the values at the intermediates & root nodes ? (Tree size can be ~10 layer deep, with ~10000-100000 leaves)

Shailesh Jha

05/27/2021, 3:54 PM

Hi @Mayank @Daniel Lavoie Something wrong with GCS Integration. Pinot server Pods went into crashloopbackoff when I try to integrate with GCS Error Log:

Copy code

ERROR [PinotFSFactory] [Start a Pinot [SERVER]] Could not instantiate file system for class org.apache.pinot.plugin.filesystem.GcsPinotFS with scheme gs

@Mohamed Sultan

Jonathan Meyer

05/27/2021, 5:04 PM

Does any sort of query cache exist in Pinot ?

Machhindra

05/27/2021, 6:23 PM

Trying to browse Pinot in Superset. Following query works fine in the Pinot Query console. BUT fails in Superset. Notice the special character in the metric. Is that something because of Pinot SQLAlchemy driver?

Copy code

SELECT DATETIMECONVERT(metricTime, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MINUTES'),
       AVG(metricValue) AS "AVG_1"
FROM metric_v6.metric_v6
WHERE metricTime >= 1621555200000
  AND metricTime < 1622160000000
  AND metric = 'CECCP%'
GROUP BY DATETIMECONVERT(metricTime, '1:MILLISECONDS:EPOCH', '1:MILLISECONDS:EPOCH', '1:MINUTES')
LIMIT 10000

Screenshot of the Superset SQL Editor -

Shailesh Jha

05/28/2021, 6:01 AM

Hi @Mayank @Daniel Lavoie Our API Team is not able to connect to Pinot Broker. Can you help me with this? I am attaching the broker log.

Copy code

SEVERE: An I/O error has occurred while writing a response message entity to the container output stream.
org.glassfish.jersey.server.internal.process.MappableException: java.io.IOException: Connection closed

CC: @Mohamed Sultan @Sadim Nadeem

pinot-broker-0-28may.txt

05/28/2021, 8:00 AM

This is the config files details. I checked classpath_prefix value also it's pointing the correct jar locations. Kindly help.

Sadim Nadeem

05/28/2021, 1:08 PM

suppose I created a realtime table with some kafka with ssl enabled configs .. now my kafka(Strimzi kafka) crashed or its ssl configs changed along with broker IP .. so can I point the same pinot table to the new kafka broker Ip and updated ssl configs without losing the old data .. Is backup & restore necessary while doing this or can directly update the table creation connfigs @Xiang Fu @Mayank cc: @Mohamed Sultan @Shailesh Jha @Mohamed Kashifuddin @Mohamed Hussain @Pugal

Pedro Silva

05/31/2021, 1:09 PM

Hello, When defining a date time field from a string as:

Copy code

dateTimeFieldSpecs[{
      "name": "dateOfBirth",
      "dataType": "STRING",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss'Z'",
      "granularity": "1:DAYS"
},...,]

Should you be able to apply datetime funcitons transformations at query time? For example, retrieving the year of the field:

select year("dateOfBirth") from ....

I'm getting parsing errors:

Copy code

2021/05/31 12:52:50.536 ERROR [BaseCombineOperator] [pqw-1] Caught exception while executing operator of index: 0 (query: QueryContext{_tableName='HitExecutionView_REALTIME', _selectExpressions=[year(dateOfBirth)], _aliasList=[null], _filter=null, _groupByExpressions=null, _havingFilter=null, _orderByExpressions=null, _limit=10, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9994}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:HitExecutionView_REALTIME), pinotQuery:PinotQuery(dataSource:DataSource(tableName:HitExecutionView_REALTIME), selectList:[Expression(type:FUNCTION, functionCall:Function(operator:YEAR, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:dateOfBirth))]))], orderByList:[], limit:10, queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9994}))})
java.lang.NumberFormatException: For input string: "1997-02-06T00:00:00Z"
	at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) ~[?:1.8.0_292]
	at java.lang.Long.parseLong(Long.java:589) ~[?:1.8.0_292]
	at java.lang.Long.parseLong(Long.java:631) ~[?:1.8.0_292]

Charles

06/01/2021, 9:10 AM

Hi All. If We build a pinot cluster without deep storage , The controller will store all the segments in controller disk configured in

controller.data.dir

, If there have some methods to delete controller segments with retention times, since we don’t have enough disk space to store in controller?

Pedro Silva

06/01/2021, 10:50 AM

Hi guys, is there a safeguard when applying ingestion transformations if the input field is the default value? I.e: Given this transformation:

Copy code

{
  "columnName": "dateOfBirthMs",
  "transformFunction": "fromDateTime(dateOfBirth, 'yyyy-MM-dd''T''HH:mm:ss''Z')"
}

And schema definitions:

Copy code

"dimensionFieldSpecs": [
    ,...,
    {
      "name": "dateOfBirth",
      "dataType": "STRING"
    },...,
],
"dateTimeFieldSpecs": [
    ...,
    {
      "name": "dateOfBirthMs",
      "dataType": "LONG",
      "format": "1:MILLISECONDS:EPOCH",
      "granularity": "1:MILLISECONDS"
    }
  ],

I get this exception:

Copy code

java.lang.IllegalStateException: Caught exception while invoking method: public static long org.apache.pinot.common.function.scalar.DateTimeFunctions.fromDateTime(java.lang.String,java.lang.String) with arguments: [null, yyyy-MM-dd'T'HH:mm:ss'Z]

I was under the impression that Pinot would not apply the transformation if the input field is null or that the transformation itself would be resilient. Is there any way around this?

06/02/2021, 11:26 AM

Hi all , I am trying to push hdfs data in hybrid table. I have added offline table in pinot and now trying to push the hdfs file. When I am executing the final Hadoop jar command. It's showing pinot-plugins.tar.gz doesn't exist. Someone kindly suggest. Error: File file:/home/rah/hybrid/staging/pinot-plugin.tar.gz doesn't exits. I am attaching my config file. Here /user/hdfs is my hdfs location and /home/rah is local location .P.s. for staging and outputdir if I am giving hdfs Location then it's giving error. "Wrong FS:" hdfs://location-of- inputdir/filename.txt, expected: file:/// @Ken Krugler @Elon @Alexander Pucher @Ting Chen @Neha Pawar @Xiang Fu @Mayank Kindly suggest.

Machhindra

06/02/2021, 8:03 PM

Team, I added new index and sortedColumn in the table config which was already ingesting data from Kafka stream. I used “AddTable” command to update the index. “jsonIndexColumns”: [ “entityMap” ], “sortedColumn”: [ “metric” ] I performed “Reload All Segments” in the UI. Is there any way to know if the indexing is complete?

Ken Krugler

06/02/2021, 9:33 PM

I’m running into an issue when building segments with 0.7.1 that didn’t occur with 0.6.0, due to (I think) using a Unicode code point for my

multiValueDelimiter

Elon

06/03/2021, 4:26 AM

We ran into an issue where a thread was blocked, and it caused the entire cluster to stop processing queries due to the worker (

pqw

threads in default scheduler) threadpool on one server being blocked. Would it help to make the worker thread pool use a cached threadpool while still keeping the query runner threadpool (

pqr

threads in default scheduler) fixed? Or do you recommend using one of the other query schedulers? Here is the thread dump:

06/03/2021, 4:36 PM

Hi Everyone, I am Loading Data from HDFS location to Pinot Hybrid Table.I have Pushed data for 5 days and executed this command 5 time ,one time for each day file. hadoop jar \ ${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \ org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \ -jobSpecFile /home/rtrs/hybrid/config/final/executionFrameworkSpec.yaml In the end when I am doing select * from tablename_OFFLINE. I am able to see only latest data .i.e. 5th day's data. This is the timestamp column value in my data "current_ts":"2021-05-30T233431.624000" This is the details from Schema file for timestamp column. "dateTimeFieldSpecs": [ { "name": "current_ts", "dataType": "STRING", "format": "1MILLISECONDSSIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HHmmss.SSSSSS", "granularity": "1:MILLISECONDS" } This is the details from offline_config.json file "tableType": "OFFLINE", "segmentsConfig": { "timeColumnName": "current_ts", "replication": "1", "replicasPerPartition": "1", Looks like some timestamp Issue. Kindly suggest what i need to change here.

06/04/2021, 8:10 AM

While loading data from hdfs to pinot table I m getting this exception. [r-2 apache-pinot-incubating-0.7.1-bin]$ hadoop jar ${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand -jobSpecFile /home/rah/executionFrameworkSpec.yaml Exception in thread "main" java.io.FileNotFoundException: /tmp/hadoop-unjar7575411926296177023/shaded/com/google/common/collect/ImmutableSetMultimap$EntrySet.class (No space left on device) at java.io.FileOutputStream.open0(Native Method) at java.io.FileOutputStream.open(FileOutputStream.java:270) at java.io.FileOutputStream.<init>(FileOutputStream.java:213) at java.io.FileOutputStream.<init>(FileOutputStream.java:162) at org.apache.hadoop.util.RunJar.unJar(RunJar.java:110) at org.apache.hadoop.util.RunJar.unJar(RunJar.java:85) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Someone kindly suggest.

Jonathan Meyer

06/04/2021, 12:19 PM

Hello 🙂 I've just changed the topic from which a REALTIME table is consuming, and some new messages are being published on that topic However it looks like Pinot isn't consuming them I can see a segment with

"segment.realtime.status": "IN_PROGRESS"

/ "CONSUMING" Also, I'm not seeing any related logs On the previous topic, consumption was OK ✔️

Jonathan Meyer

06/04/2021, 4:38 PM

Hello again Not a Pinot only question, but I'm sure most of you had to deal with this issue so here I go Given a limited Kafka retention, how do you handle recreating a table with past data that is no longer available in Kafka ? Basically, what is the "workflow" that you use to repopulate a Pinot table from past data ?

06/07/2021, 5:14 AM

@Shailesh Jha For this use can u can try hybrid table.it's a combination of realtime and offline table, and then u can push the segment in hybrid table.

👍 1

Kaushik Ranganath

06/07/2021, 5:19 AM

Just started exploring Apache Pinot on setting it up on AWS using the documentation here - https://docs.pinot.apache.org/basics/getting-started/public-cloud-examples/aws-quickstart After creating an EKS cluster and installed Pinot on top of it using Helm, using this document - https://docs.pinot.apache.org/basics/getting-started/kubernetes-quickstart docs.pinot.apache.org Running on AWS This guide provides a quick start for running Pinot on Amazon Web Services (AWS). docs.pinot.apache.org Running Pinot in Kubernetes Pinot quick start in Kubernetes

Josh Highley

06/07/2021, 3:31 PM

I have a realtime table consuming messages from a 3 partition Kafka topic. Possibly due to some network issues over the weekend, all 3 consumers are repeating the same error messages about a bad offset:

Copy code

2021/06/07 15:24:27.918 INFO [Fetcher] [agent_daily__2__2__20210605T0819Z] [Consumer clientId=consumer-71, groupId=] Fetch offset 22 is out of range for partition agent_daily-2, resetting offset
2021/06/07 15:24:27.919 INFO [Fetcher] [agent_daily__2__2__20210605T0819Z] [Consumer clientId=consumer-71, groupId=] Resetting offset for partition agent_daily-2 to offset 5.
2021/06/07 15:24:27.938 INFO [Fetcher] [agent_daily__1__2__20210605T0819Z] [Consumer clientId=consumer-73, groupId=] Fetch offset 20 is out of range for partition agent_daily-1, resetting offset
2021/06/07 15:24:27.939 INFO [Fetcher] [agent_daily__1__2__20210605T0819Z] [Consumer clientId=consumer-73, groupId=] Resetting offset for partition agent_daily-1 to offset 0.
2021/06/07 15:24:27.942 INFO [Fetcher] [agent_daily__0__2__20210605T0819Z] [Consumer clientId=consumer-72, groupId=] Fetch offset 24 is out of range for partition agent_daily-0, resetting offset
2021/06/07 15:24:27.943 INFO [Fetcher] [agent_daily__0__2__20210605T0819Z] [Consumer clientId=consumer-72, groupId=] Resetting offset for partition agent_daily-0 to offset 1.

2021/06/07 15:24:33.018 INFO [Fetcher] [agent_daily__2__2__20210605T0819Z] [Consumer clientId=consumer-71, groupId=] Fetch offset 22 is out of range for partition agent_daily-2, resetting offset
2021/06/07 15:24:33.018 INFO [Fetcher] [agent_daily__2__2__20210605T0819Z] [Consumer clientId=consumer-71, groupId=] Resetting offset for partition agent_daily-2 to offset 5.
2021/06/07 15:24:33.038 INFO [Fetcher] [agent_daily__1__2__20210605T0819Z] [Consumer clientId=consumer-73, groupId=] Fetch offset 20 is out of range for partition agent_daily-1, resetting offset
2021/06/07 15:24:33.039 INFO [Fetcher] [agent_daily__1__2__20210605T0819Z] [Consumer clientId=consumer-73, groupId=] Resetting offset for partition agent_daily-1 to offset 0.
2021/06/07 15:24:33.042 INFO [Fetcher] [agent_daily__0__2__20210605T0819Z] [Consumer clientId=consumer-72, groupId=] Fetch offset 24 is out of range for partition agent_daily-0, resetting offset
2021/06/07 15:24:33.043 INFO [Fetcher] [agent_daily__0__2__20210605T0819Z] [Consumer clientId=consumer-72, groupId=] Resetting offset for partition agent_daily-0 to offset 1.

The 'reset' offsets of 5, 0, and 1 are correct: I created a new 'test' table for the same topic and it used those offsets with no issue. I've tried disabling/enabling the table but it resumes those error messages. Is there some other way to reset the table consumers?

Sadim Nadeem

06/07/2021, 7:53 PM

Hi .. Any docs/video for pushing the tar backup on gcs to hybrid/offline table

06/08/2021, 6:39 AM

If I am doing select from any other tables .i.e. hive or druid I am able to select data .but in case of pinot in throwing this error. Some problem with this prestro-pinot connector.kindly suggest if is there any why to select data directly from pinot without prestro connector or any way to increase this limit. @Xiang Fu @Jackie