https://pinot.apache.org/ logo
m

Matt

03/19/2021, 1:55 AM
Hello, I have a JSON data
log
and want to extract values based on keys (
urlpath
). So tried to use JSONIndex however fails during parsing. So ingested it as normal string and tried JSONEXTRACTSCALAR/*json_extract_scalar* however this also fails during parsing. Finally I ended up using Groovy function like
GROOVY('{"returnType": "STRING", "isSingleValue": true}', 'java.util.regex.Pattern p = java.util.regex.Pattern.compile("(\"urlpath\":\")((?:\\\"|[^\\\"]*))"); java.util.regex.Matcher m = p.matcher(arg0); if(m.find()){ return m.group(2); } else { return "";}',log)
and this works in SQL. Now I want to add this Groovy function inside table config to do ingestionTransform to define a new columnName. Is this possible? For ingestion transform can we do multi line , semi colon separated script?
n

Neha Pawar

03/19/2021, 2:09 AM
yes, the same Groovy function will work during ingestion. There’s some subtle differences in the syntax, which the docs should help with
but before that, what is the exact problem when using json index/jsonExtractScalar? Perhaps someone can help with those and we could fix if anything needs fixing.
m

Matt

03/19/2021, 2:56 AM
I was getting two errors
Copy code
server java.lang.IndexOutOfBoundsException: Index: 6279, Size: 1
at java.util.ArrayList.get(ArrayList.java:435) ~[?:1.8.0_282]
Copy code
shaded.com.fasterxml.jackson.core.JsonParseException: Unexpected character ('/' (code 47)): maybe a (non-standard) comment? (not recognized as one since Feature 'ALLOW_COMMENTS' not enabled for parser)
The JSON comes ok in JSON lint, However seems not compatible with the parser. Also few data are not JSON. Wondering it is possible to have an option of selecting a dirty XML parser to handle these situations.
k

Kishore G

03/19/2021, 4:34 AM
Do you have a sample JSON?
m

Matt

03/19/2021, 4:45 AM
Copy code
{
  "message": "\"Executing query =: \"",
  "messageobj": "{\"statement\":\"select sysdate from dual\",\"binds\":{},\"opts\":{\"outFormat\":4002,\"autoCommit\":true}}",
  "executionContext": "{\"system\":\"test\",\"subsystem\":\"operations\",\"capability\":\"checksettings\",\"resource\":\"simplefireconsumer-chs\",\"transactionid\":\"10ba4d0d-0a63-4a40-be70-92c2a7837e3c\",\"username\":\"--REDACTED--\",\"urlpath\":\"/operations/checksettings/v1/simplefireconsumer-chs/healthcheck\",\"requestheaders\":{\"host\":\"145.72.134.21:3000\",\"user-agent\":\"kube-probe/1.16+\",\"accept-encoding\":\"gzip\",\"connection\":\"close\",\"messageauditid\":\"23cf3b61-acc7-42a7-8fac-2f22506f4652\"}}",
  "transactionid": "10ba4d0d-0a63-4a40-be70-92c2a7837e3c",
  "correlationid": "",
  "sessionid": "",
  "sender": "",
  "system": "test",
  "subsystem": "operations",
  "capability": "checksettings",
  "resource": "simplefireconsumer-chs",
  "urlpath": "/operations/checksettings/v1/simplefireconsumer-chs/healthcheck",
  "testloggerversion": "2.0",
  "level": "info",
  "timestamp": "2021-03-19T04:39:43.822Z"
}
k

Kishore G

03/19/2021, 4:49 AM
As you mentioned, looks like json library we are using is failing to parse. Can you file an issue.
Please give it a shot.. it looks easy to fix it
m

Matt

03/19/2021, 2:18 PM
Created PR 6699 for this.
@Neha Pawar I am not able to specify Groovy functions with semicolons in the ingestionTransformations? Do you have some examples that I can try?
1
2 Views