Kha
02/03/2021, 5:08 PMbaseballStats
offline table with some modifications to the files. When I am uploading my own batch data, I get the following error:
400 (Bad Request) with reason: "Cannot add invalid schema: rows_10m. Reason: null"
I currently have a CSV that's formatted like this
# /DIRECTORIES/rawdata/rows_10m.csv
id, hash_one, text_one
0, (large integer), a
1, (large integer), b
...
A schema.json that has this
# /DIRECTORIES/rows_10m_schema.json
{
"schemaName": "rows_10m",
"dimensionFieldSpecs": [
{
"datatype": "STRING",
"name": "text_one"
}
],
"metricFieldSpecs": [
{
"datatype": "INT",
"name": "id"
},
{
"datatype": "INT",
"name": "hash_one"
}
]
}
and a table config that has this
# /DIRECTORIES/rows_10m_offline_table_config.json
{
"tableName": "rows_10m",
"tableTypes": "OFFLINE",
"segmentsConfig": {
"segmentPushType": "APPEND",
"segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
"schemaName": "rows_10m",
"replication": "1"
},
"tenants": {},
"tableIndexConfig": {
"loadMode": "HEAP",
"invertedIndexColumns": [
"id",
"hash_one"
]
},
"metadata": {
"customConfigs": {
}
}
}
This is very similar to what I used when I manually added the default baseballStats
. Am I missing anything in my schema.json file?Will Briggs
02/03/2021, 5:31 PMAPPEND
push type, even with an offline table, I am pretty sure a primary time column is mandatory. Your schema doesn’t define one.Will Briggs
02/03/2021, 5:33 PMWill Briggs
02/03/2021, 5:34 PMThe primary time column is used by Pinot, for maintaining the time boundary between offline and realtime data in a hybrid table and for retention management. A primary time column is mandatory if the table's push type is APPEND and optional if the push type is REFRESH.
(see DateTime
here)Kha
02/05/2021, 9:26 PMdataType
was misspelled. Thanks for your help!