David Cyze
09/01/2021, 3:17 PMDavid Cyze
09/01/2021, 3:17 PMMayank
David Cyze
09/01/2021, 3:18 PMDavid Cyze
09/01/2021, 3:19 PMMayank
David Cyze
09/01/2021, 3:20 PMDavid Cyze
09/01/2021, 3:23 PMorg.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
java.lang.RuntimeException: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
David Cyze
09/01/2021, 3:29 PMDavid Cyze
09/01/2021, 3:29 PM2021/08/31 21:54:12.977 ERROR [JSONMessageDecoder] [simplejson__0__1__20210831T2011Z] Caught exception while decoding row, discarding row. Payload is {"uid":"ad23a2ea-1fac-4a57-8d47-597d3b77a52a","attr_json": {"A": "{"type": "numTickets", "val": 83}","B": "{"type": "numTickets", "val": 51}","C": "{"type": "numTickets", "val": 61}"},"createdDateInEpoch":1570000000247}
shaded.com.fasterxml.jackson.core.JsonParseException: Unexpected character ('t' (code 116)): was expecting comma to separate Object entries
at [Source: (ByteArrayInputStream); line: 1, column: 70]
Neha Pawar
Neha Pawar
Neha Pawar
"columnName":"attr_json_str", "transformFunction":"jsonFormat(attr_json)"
and change the column name in schema to attr_json_strDavid Cyze
09/01/2021, 4:27 PMingestionConfig
, I'm now able to ingest data into the table with a JSON column.
I'm seeing some behavior I don't quite understand, however.
Prior to adding the ingestionConfig
, I ingested some rows where attr_json
was null. After adding the config, I saw new rows where attr_json
was populated.
In my schema, I have defined uid
as the primary key column.
I am seeding 1,000 rows at a time, so I would expect to see (number of runs prior ingestionConfig * 1,000) + (n runs after config * 1,000)
rows.
However, after adding the ingestionConfig
and seeding 1,000 more rows, my table now has 1,002 rows.
My understanding of upserts is that the primary key column + event time are used in conjunction to determine which records should be overwritten.
This being the case, how is it that so many of my rows were overwritten / deleted?* It is of course exceedingly unlikely that I managed to generate 998 of the same UIDs during my second round of ingestion
.* I'm aware that Pinot does not support deletes. I'm using "Delete" here because I'm not sure how else to explain my n(docs) going from 2000 (prior to fixing the ingestion config) to to 1002Neha Pawar
Jackie
09/01/2021, 5:38 PMJackie
09/01/2021, 5:39 PMuid
David Cyze
09/01/2021, 5:58 PMattr_json
values). There were 2,000 records before I ran ingestion with the fixed application.
That means that the minimum number of records that should have been present would be 2,000 --- assuming the exceedingly unlikely possibility that every randomly generated UID was a duplicate of a previously randomly generated UIDDavid Cyze
09/01/2021, 6:00 PMUUID.randomUUID()
) such that each run of my app produced identical uid
values, the total # of records should never have exceeded 1,000David Cyze
09/01/2021, 6:28 PMtransformConfig
, does Pinot re-process all records with the updated config? This could explain the record loss:
• 2k records where JSON is malformed
• update transformConfig
• pinot re-processes these records; they fail the transformFunction
; pinot writes a new segment with them excluded
• 0 records now
• ingest records with fixed application
• 1k well-formed records are ingested (actually, 1,001, as I had an off-by-one "error" in my app and actually generate 1,001 records each run. This doesn't explain why I saw 1,00*2* records, however)Jackie
09/01/2021, 6:58 PMJackie
09/01/2021, 6:59 PMDavid Cyze
09/01/2021, 7:01 PMJackie
09/01/2021, 7:03 PMDavid Cyze
09/01/2021, 7:07 PM