Apache Pinot #troubleshooting

shivam sood

12/16/2021, 11:25 AM

I am using Pinot as a Java Client I am getting my data in the ResulSet. For a field of type String, I can call resultSet.getString(rowIndex,ColumnIndex). Same works with Int,Long,Double and Float. But I have a field in the table which has a LONG_ARRAY type. Now How can I retrieve a collection type Data from my ResultSet?

Mahesh babu

12/17/2021, 10:08 AM

Hi Team, My date format of source is "2020-07-13 000001" and i tried by using 1DAYSSIMPLE_DATE_FORMAT:yyyy-MM-dd HHmmss i.e

Copy code

"dateTimeFieldSpecs": [
  {
    "name": "Date_Time",
    "dataType": "STRING",
    "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
    "granularity": "1:HOURS"
  }
]

But data in the table is showing Null.

Priyank Bagrecha

12/17/2021, 6:28 PM

i am seeing

Copy code

Dec 17, 2021 5:50:05 PM org.jvnet.mimepull.WeakDataFile close
INFO: File /tmp/MIME4374416604904663586.tmp was not deleted

in controller logs. at the same time i am seeing connection timeouts from servers to controller. i just restarted controller to see if it fixes the issue

Priyank Bagrecha

12/17/2021, 6:33 PM

controller restart didn't stop connection timeouts from server instances.

Ayush Kumar Jha

12/21/2021, 5:11 AM

Any help??

Priyank Bagrecha

12/28/2021, 10:56 PM

table_config.json

Syed Akram

01/03/2022, 8:49 AM

@Mayank @Xiang Fu @Neha Pawar

Valentin Mahrwald

01/05/2022, 12:43 PM

Hi, I had a question on the intersection of Pinot & Trino, particularly the interaction of query options (particularly skipUpsert) and the case insensitive cluster setting (enable.case.insensitive). From what I could see enabling case insensitive works for tables and columns but not for query options (and in the code that also looks to be the case). This causes a problem in the interaction with Trino: as far as I can tell I would use a pass-through query in Trino. However that pass through query gets converted to lowercase (Trino behavior ...) and Pinot ends up receiving skipupsert=true, which doesn't do anything it seems 🙂 The ultimate use case is to query a historic snapshot on an upsert table (something like

select positionid, LASTWITHTIME(quantity, asoftime, 'DOUBLE') from position where asoftime < "some snapshot time" group by positionid

, which would then be further enriched in Trino). Would it be reasonable to expect the enable.case.insensitive setting to also apply to query options?

xtrntr

01/06/2022, 9:59 AM

in_subquery

query fails:

Copy code

SELECT userid FROM table WHERE IN_SUBQUERY(cell_id, 'SELECT ID_SET(id) FROM dimTable WHERE location IN (...)')

Copy code

ProcessingException(errorCode:150, message:PQLParsingError:
org.apache.pinot.sql.parsers.SqlCompilationException: Unsupported filter kind: IN_SUBQUERY
	at org.apache.pinot.sql.parsers.rewriter.PredicateComparisonRewriter.updateComparisonPredicate(PredicateComparisonRewriter.java:58)
	at org.apache.pinot.sql.parsers.rewriter.PredicateComparisonRewriter.rewrite(PredicateComparisonRewriter.java:37)
	at org.apache.pinot.sql.parsers.CalciteSqlParser.queryRewrite(CalciteSqlParser.java:373)
	at org.apache.pinot.sql.parsers.CalciteSqlParser.compileCalciteSqlToPinotQuery(CalciteSqlParser.java:367))

am i using it wrong?

Mark Needham

01/06/2022, 11:04 AM

https://docs.pinot.apache.org/users/user-guide-query/filtering-with-idset

Mark Needham

01/06/2022, 11:05 AM

there's some examples here. Your one looks pretty similar

Mark Needham

01/06/2022, 11:06 AM

do you do the comparison of the result with 1 or 0?

xtrntr

01/06/2022, 11:22 AM

ah yes you’re right, i missed out on the

= 1

at the end. thanks for the catch

Sadim Nadeem

01/07/2022, 5:42 AM

Hi ..I am seeing the below error on sql query editor on pinot UI when I am trying to run some simple select query for limit 10 rows on my old realtime tables .. they were working fine from last one month . .suddenly got this error .. also I saw that one of pinot-server pods got automatically restarted 8 hours ago .. also tried restarting all server pods but still same error coming .. cc: @Xiang Fu @Mayank also I checked the disk usage of pinot-zookeeper pods and they have enough available space and disk usage below 20%

Copy code

[
  {
    "message": "8 segments [test_alerts__13__79__20220106T0532Z, test_alerts__43__77__20220106T0528Z, test_alerts__16__79__20220106T0533Z, test_alerts__1__78__20220106T0538Z, test_alerts__19__79__20220106T0535Z, test_alerts__4__81__20220106T0539Z, test_alerts__10__80__20220106T0540Z, test_alerts__37__78__20220106T0539Z] unavailable",
    "errorCode": 305
  }

Sadim Nadeem

01/09/2022, 11:25 AM

getting the error when creating schema .. (We updated the pinot helm chart latest version after experiencing a crash because of hard disk full recently .. the pinot deployment from earlier version of helm was able to add schema and table with below schema script) cc: @Shailesh Jha please add version here schema is :-

Copy code

{
  "schemaName": "audit_log_schema",
  "dimensionFieldSpecs": [
    {
      "name": "TenantID",
      "dataType": "STRING"
    },
    {
      "name": "Request",
      "dataType": "STRING"
    },
    {
      "name": "Response",
      "dataType": "STRING"
    },
    {
      "name": "APIEndpoint",
      "dataType": "STRING"
    }
  ],
  "dateTimeFieldSpecs": [
    {
      "name": "TimestampInEpoch",
      "dataType": "LONG",
      "format": "1:MILLISECONDS:EPOCH",
      "granularity": "1:MILLISECONDS"
    },
    {
      "name": "Timestamp",
      "dataType": "STRING",
      "format": "1:HOURS:SIMPLE_DATE_FORMAT:dd/MM/yyyy HH:mm:ss",
      "granularity": "1:SECONDS"
    }
  ]
}

Error response on Swagger Rest API endpoint for adding a schema is :-

Copy code

"error": "Cannot add invalid schema: audit_log_schema. Reason: SIMPLE_DATE_FORMAT pattern dd/MM/yyyy HH:mm:ss has to be sorted by both lexicographical and datetime order"

Sadim Nadeem

01/09/2022, 11:36 AM

Note: If I use "dateTimeFieldSpecs" as below .. then the schema is getting created successfully

Copy code

"dateTimeFieldSpecs": [
    {
      "name": "time",
      "dataType": "STRING",
      "format": "1:HOURS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss'Z'",
      "granularity": "1:SECONDS"
    }
  ]

even this below fails with below error (Error: Bad Request

Copy code

{
  "code": 400,
  "error": "Cannot add invalid schema: cilium_alerts_test2. Reason: SIMPLE_DATE_FORMAT pattern dd/MM/yyyy'T'HH:mm:ss'Z' has to be sorted by both lexicographical and datetime order"
}

error format

Copy code

{
            "name": "time",
            "dataType": "STRING",
            "format": "1:HOURS:SIMPLE_DATE_FORMAT:dd/MM/yyyy'T'HH:mm:ss'Z'",
            "granularity": "1:SECONDS"
        }

this below format in schema also works successfully cc: @Xiang Fu @Mayank

Copy code

"dateTimeFieldSpecs": [{
        "name": "timestamp",
        "dataType": "LONG",
        "format": "1:MILLISECONDS:EPOCH",
        "granularity": "1:MILLISECONDS"
    }]
}

Sadim Nadeem

01/09/2022, 4:27 PM

@Mayank I dont think 1:HOURS is the root cause since below format is working fine ie

Copy code

"dateTimeFieldSpecs": [
    {
      "name": "time",
      "dataType": "STRING",
      "format": "1:HOURS:SIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HH:mm:ss'Z'",
      "granularity": "1:SECONDS"
    }
  ]

xtrntr

01/10/2022, 12:57 PM

the reason i ask is because after sorting my parquet files globally (and confirming it by checking the parquet metadata), i don’t see

$MY_SORTED_COLUMN.isSorted = true

in the segment

metadata.properties

file even though https://docs.pinot.apache.org/configuration-reference/table says:

The column which is sorted in the data and hence will have a sorted index. This does not need to be specified for the offline table, as the segment generation job will automatically detect the sorted column in the data and create a sorted index for it.

Kamal Chavda

01/12/2022, 9:49 PM

Hi All, I'm trying to hit my pinot cluster from Superset and getting this error on the broker

Copy code

2022/01/12 21:41:23.242 ERROR [PinotClientRequest] [jersey-server-managed-async-executor-7] Caught exception while processing POST request
java.lang.NullPointerException: null
	at org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleSQLRequest(BaseBrokerRequestHandler.java:243) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:194) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.apache.pinot.broker.requesthandler.BaseBrokerRequestHandler.handleRequest(BaseBrokerRequestHandler.java:99) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.apache.pinot.broker.api.resources.PinotClientRequest.processSqlQueryPost(PinotClientRequest.java:175) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at jdk.internal.reflect.GeneratedMethodAccessor30.invoke(Unknown Source) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.lambda$apply$0(ResourceMethodInvoker.java:381) ~[pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$2$1.run(ServerRuntime.java:819) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at org.glassfish.jersey.server.ServerRuntime$AsyncResponder$2.run(ServerRuntime.java:814) [pinot-all-0.8.0-jar-with-dependencies.jar:0.8.0-c4ceff06d21fc1c1b88469a8dbae742a4b609808]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]

Anyone come across this?

xtrntr

01/13/2022, 4:37 AM

setting these 3 settings:

Copy code

pinot.server.query.executor.min.segment.group.trim.size=-1
pinot.server.query.executor.min.server.group.trim.size=-1
pinot.server.query.executor.groupby.trim.threshold=-1

doesn’t seem to affect my group by query limit, im still getting limited to 2M rows (2 servers) postfixing my query with

Copy code

OPTION(minServerGroupTrimSize=-1)

also doesn’t work

xtrntr

01/13/2022, 4:38 AM

i also see

groupLimitReached=false

in my queries. this is for a group by query on a column with a cardinality of 8.5M

xtrntr

01/13/2022, 4:39 AM

can confirm my server is running on

0.9.1

Mayank

01/13/2022, 4:40 AM

Trim is for trimming the groups not necessarily for limiting the size

Aditya

01/13/2022, 5:01 PM

Hi, I am setting up a test pinot cluster Setting up controller host using following config (ip will be changed to proper uri in future)

Copy code

controller.host=10.11.5.105
controller.port-9000
controller.access.protocols=http
controller.access.protocols.http.port=9000
controller.zk.str=10.11.5.105:2181

controller.helix.cluster.name=defaultpinot

controller.data.dir=/tmp/pinot/data/controller

The controller right now uses local fs for storing segments After uploading a segment using SegmentCreationAndTarPush job, the segment is in bad state Port in download url is null

"segment.download.url": "<http://10.11.5.105:*null*/segments/transcript/transcript_OFFLINE_1570863600000_1572418800700_0>"

What config parameter could be missing here? Also tried assigning host name via command param -controllerHost, using this option the host name is correctly assigned and segment url has port no I'll be setting up s3 deep store so using config file may be the only option in future

Mark Needham

01/13/2022, 5:42 PM

is your config copy/pasted correctly? Because you don't have an

next to

controller.port

, there's a hyphen there instead

abhinav wagle

01/14/2022, 2:53 AM

Hello, trying to run batch ingestion job using spark on one of our clusters followed : https://docs.pinot.apache.org/users/tutorials/batch-data-ingestion-in-practice and running into this. Any pointers ? I checked

LaunchDataIngestionJobCommand

does not have main method.

Syed Akram

01/20/2022, 10:40 AM

when i'm hitting query using java-client(0.9.2), facing this issue...