Hi everyone. I'm currently trying to import some b...
# troubleshooting
k
Hi everyone. I'm currently trying to import some batch data to my Pinot cluster and I'm running into some issues with doing this. I have the latest version of Pinot (0.7.0) in a docker container, and I set everything up manually. I followed the docker version of this guide here: https://docs.pinot.apache.org/basics/getting-started/advanced-pinot-setup. I am able to configure the
baseballStats
offline table with some modifications to the files. When I am uploading my own batch data, I get the following error:
Copy code
400 (Bad Request) with reason: "Cannot add invalid schema: rows_10m. Reason: null"
I currently have a CSV that's formatted like this
Copy code
# /DIRECTORIES/rawdata/rows_10m.csv
id, hash_one, text_one
0, (large integer), a
1, (large integer), b
...
A schema.json that has this
Copy code
# /DIRECTORIES/rows_10m_schema.json
{
    "schemaName": "rows_10m",
    "dimensionFieldSpecs": [
        {
            "datatype": "STRING",
            "name": "text_one"
        }
    ],
    "metricFieldSpecs": [
        {
            "datatype": "INT",
            "name": "id"
        },
        {
            "datatype": "INT",
            "name": "hash_one"
        }
    ]
}
and a table config that has this
Copy code
# /DIRECTORIES/rows_10m_offline_table_config.json
{
    "tableName": "rows_10m",
    "tableTypes": "OFFLINE",
    "segmentsConfig": {
        "segmentPushType": "APPEND",
        "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
        "schemaName": "rows_10m",
        "replication": "1"
    },
    "tenants": {},
    "tableIndexConfig": {
        "loadMode": "HEAP",
        "invertedIndexColumns": [
            "id",
            "hash_one"
        ]
    },
    "metadata": {
        "customConfigs": {
        }
    }
}
This is very similar to what I used when I manually added the default
baseballStats
. Am I missing anything in my schema.json file?
w
@Kha With the
APPEND
push type, even with an offline table, I am pretty sure a primary time column is mandatory. Your schema doesn’t define one.
and your table definition doesn’t contain a `timeColumnName`value either - however, the example you’re running is likely trying to push the schema first, and that’s where it’s failing - so you’re not even getting to the table creation or loading the batch CSV
According to the docs:
Copy code
The primary time column is used by Pinot, for maintaining the time boundary between offline and realtime data in a hybrid table and for retention management. A primary time column is mandatory if the table's push type is APPEND and optional if the push type is REFRESH.
(see
DateTime
here)
k
It turns out
dataType
was misspelled. Thanks for your help!