Hi Team, Facing issue while loading into pinto thr...
# troubleshooting
m
Hi Team, Facing issue while loading into pinto through kafka getting _N value in string fileds even there is a data for those fields can any one help me on this.
k
whats the format of the data in kafka and whats the decoder you are using?
m
json in type of data and i'm using org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder
k
do you have sample kafka message and pinot table config/schema
m
edw_data.txt,schema.json,table.json
Hi kishore any inputs?
m
Your data seems to have
_N
for the columns in your screenshot:
Copy code
{
  "NEW_CONTRACT_START_DATE": "1900-01-01 00:00:00.000",
  "EVENT_DATE": "2021-12-14 00:00:00.000",
  "MSISDN": "609041336",
  "NEW_CONTRACT_END_DATE": "1900-01-01 00:00:00.000",
  "ACCOUNT_ID": "_N",
  "PREVIOUS_IMEI": "_N",
  "EBA_PRODUCT_SK": "NOF",
  "ACQUISITION_MSISDN": "_N",
  "LOAD_PLATFORM_ID": "102",
  "EBA_PRICE_PLAN_SK": "0",
  "STATUS": "ACT",
  "EBA_CONTRACT_TYPE_SK": "0",
  "EBA_EVENT_TYPE_SK": "4",
  "BILL_ACCOUNT_ID": "_N",
  "EBA_EDW_TRANS_SK": "153023692",
  "CONNECTION_DATE": "2016-10-14 00:00:00.000",
  "LOADING_ID": "982",
  "TRANSACTION_DEALER_ID": "_N",
  "OLD_TENURE": "0.0",
  "OLD_CONTRACT_END_DATE": "1900-01-01 00:00:00.000",
  "UPGRADE_FLAG": "_N",
  "CURRENT_TENURE": "0.0",
  "NET_AMOUNT": "0.0",
  "EBA_BUSINESS_UNIT_SK": "0",
  "IMEI": "_N",
  "CONTRACT_ID": "_N",
  "EBA_FEATURE_GROUP_SK": "0",
  "EVENT_LOAD_DATE": "2016-10-01 00:00:00.000",
  "DISCONNECTION_DATE": "1900-01-01 00:00:00.000",
  "COMP_PERIOD": "2021-12-01 00:00:00.000",
  "CONNECTING_DEALER_ID": "43257",
  "ORDER_ID": "852083/5",
  "CUSTOMER_NAME": "_N",
  "SIM": "89037000000002108567",
  "PREVIOUS_SIM": "_N",
  "OLD_CONTRACT_START_DATE": "1900-01-01 00:00:00.000",
  "EBA_FROM_PRODUCT_SK": "_N"
}
@Mahesh babu ^^
m
Yes Mayank in my data there is _N in two records total six record ,only this two records are loading other records are not loading in to pinot.
m
what’s the output of
select count(*)
?
m
two
m
I think other records are being skipped because they may not be deserialized correctly as per the schema.
m
image.png
m
Any errors in debug api or in server log?
I suspect the time column for those rows is null, which is why they are dropped
m
Ok let me check that.
i changed data type to string those who has null dates but still coming only two records.
m
Time column is not allowed to be null for realtime table
m
I used EVENT_DATA as time column that column has values for all the records but other date columns i changed to string datatype.
m
Are there records that are ingested in Pinot where other columns are null?
Also, for the 2 records that are ingested, what is the value of the other time columns you see?
m
image.png
i can see date values
m
So this means that if time columns are null, then those are dropped, right?
m
yes those who has null values those records are dropped.
m
Right, time column(s) are not expected to be null for real-time tables.
Is this a real use case, or just test data? Trying to understand if there’s a business use case that requires null time columns for some of them.
Can you check if the debug api (in swagger) bubbles up any errors from server @Mahesh babu
m
in server there is issues like 2022/05/31 031738.622 ERROR [JSONMessageDecoder] [EDW_TRANSACTION_SCHEMA3__0__0__20220531T0317Z] Caught exception while decoding row, discarding row. Payload is java.lang.NullPointerException: null
this is the real usecase.
can we have csv record reader on kafka.stream
k
CSV is typically a file and the first line (header) contains the schema. We can easily add a decoder to Pinot that takes a CSV and and schema as part of the decoder config. But this is not a good design because the input data schema can never change. suggest sticking to formats like Avro that support schema evolution.. JSON does not really support it but it does not need it since each record has the schema embedded
m
Right, I am trying to see if the debug api also surfaces this error.
m
in swagger there is no server tab to check ,where i have to chech that logs in swagger
m
There is a
debug
api in table
m
there is no debug under table tab.
image.png
m
It is under cluster
image.png
m
there is a issue with datatype where we have float and getting null value
Copy code
"stackTrace": "java.lang.RuntimeException: Caught exception while transforming data type for column: OLD_TENURE\
i'm getting this error in debug errorMessage": "Did not get any response from servers for segment: