Hi Everyone, I am Loading Data from HDFS location...
# troubleshooting
r
Hi Everyone, I am Loading Data from HDFS location to Pinot Hybrid Table.I have Pushed data for 5 days and executed this command 5 time ,one time for each day file. hadoop jar \ ${PINOT_DISTRIBUTION_DIR}/lib/pinot-all-${PINOT_VERSION}-jar-with-dependencies.jar \ org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand \ -jobSpecFile /home/rtrs/hybrid/config/final/executionFrameworkSpec.yaml In the end when I am doing select * from tablename_OFFLINE. I am able to see only latest data .i.e. 5th day's data. This is the timestamp column value in my data "current_ts":"2021-05-30T233431.624000" This is the details from Schema file for timestamp column. "dateTimeFieldSpecs": [ { "name": "current_ts", "dataType": "STRING", "format": "1MILLISECONDSSIMPLE_DATE_FORMAT:yyyy-MM-dd'T'HHmmss.SSSSSS", "granularity": "1:MILLISECONDS" } This is the details from offline_config.json file "tableType": "OFFLINE", "segmentsConfig": { "timeColumnName": "current_ts", "replication": "1", "replicasPerPartition": "1", Looks like some timestamp Issue. Kindly suggest what i need to change here.
j
Can you please check how many segments are there in this table?
r
@Jackie gimme 2 min I m trying to reload the files. Will share the details within 5 min
It's creating one one segment @Jackie
Segment status is showing as good and server is stats is Online for this
j
Do you mean only one segment in the table? What is the segment name? I'm suspecting the job is generating the segment with same name everyday and keep replacing the segment
r
Segment name is exactly same as my tablename
Which I have mention as segment.name.prefix in my conf file
This is the segment name @Jackie
j
In your table config, do you specify the
pushType
as
REFRESH
?
r
Yes you were try @Jackie it's replacing the segments I have tried different segment name for next file so now 2 segments are the and data is the for 2 days
Where I need to specify this @Jackie I am not using this property
It offline table config also I am not using this property
j
What is your
ingestionConfig
?
You can have it like:
Copy code
"ingestionConfig": {
  ...,
  "batchIngestionConfig": {
    "segmentIngestionType": "APPEND",
    "segmentIngestionFrequency": "DAILY"
  }
}
r
Ok so this changes only I need to add in offline-table-config.json rtr? Not for realtime-config.json?
j
Yes, no need to add it to the realtime-table-config
k
What is in your
executionFrameworkSpec.yaml
file? You typically need a section like:
Copy code
segmentNameGeneratorSpec:

  # type: Current supported types are 'simple' and 'normalizedDate'.
  type: normalizedDate

  configs:
    segment.name.prefix: 'ads_batch'
So that segment names vary by date. If you would wind up with multiple results that have the same name (by date) you can add
exclude.sequence.id: false
to the
configs
section.
r
Thanks @Jackie @Ken Krugler . Worked.
k
Hi @RK - what worked? Wondering what you had to change.
r
You can have it like: "ingestionConfig": { ..., "batchIngestionConfig": { "segmentIngestionType": "APPEND", "segmentIngestionFrequency": "DAILY" } } added this configuration @Ken Krugler
What is in your
executionFrameworkSpec.yaml
file? You typically need a section like:
Copy code
segmentNameGeneratorSpec:

  # type: Current supported types are 'simple' and 'normalizedDate'.
  type: normalizedDate

  configs:
    segment.name.prefix: 'ads_batch'
So that segment names vary by date. If you would wind up with multiple results that have the same name (by date) you can add
exclude.sequence.id: false
to the
configs
section. This was already there in file.