hello everyone i found some error whith transformFunction js Apache Pinot #troubleshooting

hello everyone i found some error whith transform...

Bordin Suwannatri

03/23/2022, 8:30 AM

hello everyone i found some error whith transformFunction jsonPathString i can not use word order in jsonPathString --> "transformFunction": "jsonPathString(order,'$.channel')" -->this is not work. i test modify json replace from order to hello and user this --> "transformFunction": "jsonPathString(hello,'$.channel')" it's working. why i can not use "order". my real json massage they use "order". Please help.

Bordin Suwannatri

03/23/2022, 9:20 AM

Invalid transform function 'jsonPathString(order,'$.channel')' for column 'channel' exception: Invalid transform function 'jsonPathString(order,'$.channel')' for column 'channel' Handled request from 172.23.188.107 POST http://172.19.131.116:9000/tables, content-type application/json status code 400 Bad Request

Mark Needham

03/23/2022, 9:22 AM

without looking into it any further yet, it might be b/c 'order' is a reserved word in sql and this is likely going through a sql parser...but maybe you can quote the word order and see if that works:

Copy code

"transformFunction": "jsonPathString(\"order\",'$.channel')"

👍 1

✅ 1

Bordin Suwannatri

03/23/2022, 9:24 AM

oh it's work thank very much @User

Bordin Suwannatri

03/23/2022, 9:28 AM

@User "submissionDate" : "2022-03-15T173144.540+0700" how can i use date format with this? I try to use this but not work "dateTimeFieldSpecs": [ { "name": "submissionDate", "dataType": "TIMESTAMP", "format": "1MILLISECONDSSIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HHmmss.SSSZ", "granularity": "1:MILLISECONDS" } ]

Mark Needham

03/23/2022, 9:30 AM

you'll need to use a transformation function for that as well. You can see an example here - https://dev.startree.ai/docs/pinot/recipes/datetime-string-to-timestamp#add-schema-and-table

Bordin Suwannatri

03/23/2022, 9:30 AM

thank you.

Mark Needham

03/23/2022, 9:32 AM

you'll have to use a different column name than what is in the source data btw

Mark Needham

03/23/2022, 9:32 AM

or you'll get an error

Bordin Suwannatri

03/23/2022, 9:35 AM

on my json data type is string. "submissionDate" : "2022-03-15T173144.540+0700" I test to create table . it can create but not see data. schema structure that i test is. { "schemaName": "omx_order_20", "dimensionFieldSpecs": [ { "name": "channel", "dataType": "STRING" }, { "name": "orderId", "dataType": "STRING" } ], "dateTimeFieldSpecs": [ { "name": "submissionDate", "dataType": "STRING", "format": "1MILLISECONDSSIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HHmmss.SSSZ", "granularity": "1:MILLISECONDS" } ] }

Bordin Suwannatri

03/23/2022, 9:36 AM

i will check your link and test, thank you for answer reserved word.

Mark Needham

03/23/2022, 9:38 AM

oh as a string it should work. But if it's a string I don't think you'd be able to do proper data operations (in case you wanted to do that)

Bordin Suwannatri

03/23/2022, 9:39 AM

normally for real time table. we use dataFormat from system or in side json massage ?

Mark Needham

03/23/2022, 9:40 AM

unless you have a datetime column in your schema/table there won't be one, so you do need to specify the data yourself (e.g. via JSON message)

👍 1

Bordin Suwannatri

03/23/2022, 11:22 AM

i try to create table. it can create but not found any Data. { "tableName": "omx_order_20", "tableType": "REALTIME", "tenants": { "broker": "DefaultTenant", "server": "DefaultTenant", "tagOverrideConfig": {} }, "segmentsConfig": { "schemaName": "omx_order_20", "timeColumnName": "submissionDate", "replication": "1", "replicasPerPartition": "1", "retentionTimeUnit": null, "retentionTimeValue": null, "completionConfig": null, "crypterClassName": null, "peerSegmentDownloadScheme": null }, "tableIndexConfig": { "loadMode": "MMAP", "invertedIndexColumns": [], "createInvertedIndexDuringSegmentGeneration": false, "rangeIndexColumns": [], "sortedColumn": [], "bloomFilterColumns": [], "bloomFilterConfigs": null, "noDictionaryColumns": [], "onHeapDictionaryColumns": [], "varLengthDictionaryColumns": [], "enableDefaultStarTree": false, "starTreeIndexConfigs": null, "enableDynamicStarTreeCreation": false, "segmentPartitionConfig": null, "columnMinMaxValueGeneratorMode": null, "aggregateMetrics": false, "nullHandlingEnabled": false, "streamConfigs": { "streamType": "kafka", "stream.kafka.topic.name": "omx_order20", "stream.kafka.broker.list": "172.19.131.55:9092", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.consumer.prop.auto.offset.reset": "smallest", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "24h", "realtime.segment.flush.segment.size": "100M" } }, "metadata": {}, "ingestionConfig": { "filterConfig": null, "transformConfigs": [ { "columnName": "channel", "transformFunction": "jsonPathString(\"order\",'$.channel')" }, { "columnName": "orderId", "transformFunction": "jsonPathString(\"order\",'$.orderId')" }, { "columnName": "submissionDate", "transformFunction": "FromDateTime('$.submissionDate','YYYY-MM-dd''T''HHmmss.SSSZ')" } ] }, "quota": { "storage": null, "maxQueriesPerSecond": null }, "task": null, "routing": { "segmentPrunerTypes": null, "instanceSelectorType": null }, "query": { "timeoutMs": null }, "fieldConfigList": null, "upsertConfig": null, "tierConfigs": null }

Bordin Suwannatri

03/23/2022, 11:22 AM

i use "columnName": "submissionDate", "transformFunction": "FromDateTime('$.submissionDate','YYYY-MM-dd''T''HHmmss.SSSZ')"

Bordin Suwannatri

03/23/2022, 11:22 AM

my json like this --> {"order" : { "channel" : "ABC", "orderId" : "22031500DRS020020017"}, "submissionDate" : "2022-03-15T173144.540+0700"}

Bordin Suwannatri

03/23/2022, 11:23 AM

any recommend for Datetime format converse. @User

Mark Needham

03/23/2022, 11:23 AM

You can't use the JSON path syntax inside

FromDataTime

- it doesn't know what it means

Bordin Suwannatri

03/23/2022, 11:24 AM

Mark Needham

03/23/2022, 11:26 AM

also I think you can only run one function at a time

Mark Needham

03/23/2022, 11:26 AM

so you'll need to do it in two steps

Bordin Suwannatri

03/23/2022, 11:26 AM

i just want to create table consume kafka. i test with EPOCH on json. it no need to do anything. but for this --> "submissionDate" : "2022-03-15T173144.540+0700" we need to converse ?

Mark Needham

03/23/2022, 11:26 AM

epoch you don't need any time conversion

Mark Needham

03/23/2022, 11:26 AM

it'll handle that automatically

Mark Needham

03/23/2022, 11:26 AM

it's only cause of it being a date string that you have to do some conversion

Bordin Suwannatri

03/23/2022, 11:27 AM

how can i do it bro.

Mark Needham

03/23/2022, 11:28 AM

Copy code

{
  "columnName": "submissionTs",
  "transformFunction": "FromDateTime(submissionDate,'YYYY-MM-dd''T''HH:mm:ss.SSSZ')"
}

👍 1

✅ 1

Mark Needham

03/23/2022, 11:28 AM

rename the column to something else

Mark Needham

03/23/2022, 11:28 AM

and then like this should work

Bordin Suwannatri

03/23/2022, 11:29 AM

ok .modify column name another name .

Mark Needham

03/23/2022, 11:29 AM

yeh - if you use a transform fn it can't transform a value to the same name

✅ 1

Mark Needham

03/23/2022, 11:30 AM

it's a bit annoying, but that's how it is!

✅ 1

Bordin Suwannatri

03/23/2022, 11:35 AM

oh it's work.

Bordin Suwannatri

03/23/2022, 11:36 AM

i have some quesion about json msg

Bordin Suwannatri

03/23/2022, 11:38 AM

if in my json have many object. {"order": {"channel": "SFF","orderId": "22031500DRS020020016"},"customer": "570809","omxtrackingID": "99-d173c048-8bf2-4261-a440-36d1045c63e2","submissionDate": "2022-03-15T173144.540+0700"}

Bordin Suwannatri

03/23/2022, 11:39 AM

but i just need only column channel, orderId and submissionDate. i can secify and select some object that i want to use or i need to create all column for that json ? @User

Bordin Suwannatri

03/23/2022, 11:41 AM

you are expert in pinot bro. Thank you so much for you answer, it's help me so much. @User. i just begin poc pinot and plan to use with trino for realtime data platfrom.

Mark Needham

03/23/2022, 11:50 AM

you only need to create the columns that you need - don't need to create columns for every JSON property

✅ 1

Bordin Suwannatri

03/23/2022, 11:51 AM

Thank you bro. you help me so much. tomorrow i will select some column for test poc with trino.

Bordin Suwannatri

03/23/2022, 11:54 AM

Did you know how to see more log. i start via systemctl log it a little not help anything. i want to see more log from pinot. i add this on systemd --> Environment="JAVA_OPTS=-Xms6G -Xmx8G -Dlog4j2.logLevel=DEBUG" log still less...

Mark Needham

03/23/2022, 11:56 AM

logs are under

logs/pinot-all.log

on each component

Mark Needham

03/23/2022, 11:57 AM

will be much more in there

Bordin Suwannatri

03/23/2022, 11:59 AM

thank you bro.

Bordin Suwannatri

03/24/2022, 5:16 PM

now i can create realtime table consume real topic on my kafka production.

Bordin Suwannatri

03/24/2022, 5:21 PM

but not see any data store on my pinot server. i need to design storage space for keep pinot data. where is pinot store the data @User not see any data on pinot-controller (nfs share) and not see any data store on pinot server. how pinot keep data for realtime table?

Mark Needham

03/24/2022, 6:19 PM

it stores it under

data.dir

which will be a tmp dir unless you explicitly set it

Bordin Suwannatri

03/25/2022, 4:29 AM

how can i use separate kerberlos on realtime table config? did you have some example for config separate kdc? this is my stream config. now i can connect only 1 kdc it default on /etc/krb5.conf we have multiple kafka and kdc need to connect... "streamConfigs": { "streamType": "kafka", "stream.kafka.topic.name": "PROD-json-MB-Postpaid-Sales-Online", "stream.kafka.broker.list": "tykbpr019092,tykbpr029092,tykbpr01:9092", "stream.kafka.consumer.type": "lowlevel", "stream.kafka.consumer.prop.auto.offset.reset": "smallest", "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory", "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder", "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "1h", "realtime.segment.flush.segment.size": "100m", "stream.kafka.consumer.group.id": "rdp_lookup", "security.protocol": "SASL_SSL", "sasl.mechanism": "GSSAPI", "sasl.kerberos.service.name": "kafka", "ssl.truststore.location": "/data/apache-pinot/keytab/tykbpr.client.truststore.jks", "ssl.truststore.password": "xxxx", "sasl.jaas.config": "com.sun.security.auth.module.Krb5LoginModule required useTicketCache=true useKeyTab=true storeKey=true keyTab=\"/data/apache-pinot/keytab/U-SVC-RDP.keytab\" principal=\"U-SVC-RDP@TRUE.TH\" doNotPrompt=false;" } @User

Bordin Suwannatri

03/25/2022, 4:31 AM

it stores it under

data.dir

which will be a tmp dir unless you explicitly set it --> it flush data after threshold? i consume 180000 offset, still not see writing data on server. "realtime.segment.flush.threshold.rows": "0", "realtime.segment.flush.threshold.time": "24h", "realtime.segment.flush.segment.size": "100m", @User

Mark Needham

03/25/2022, 10:38 AM

https://docs.pinot.apache.org/basics/components/deep-store take a look at this and then the S3 example as well

Bordin Suwannatri

03/25/2022, 3:34 PM

thank you.

Bordin Suwannatri

03/28/2022, 10:57 AM

hi it's me again bro. @User how can i have multiple Kerberos server (multiple kafka seperate kdc) how can i specific krb5.conf on realtime table config? now i try to config real time table difference kdc, it's working only default realm...

Open in Slack

Previous Next