Hi everyone I m currently trying to import some batch data t Apache Pinot #troubleshooting

Hi everyone. I'm currently trying to import some b...

Kha

02/03/2021, 5:08 PM

Hi everyone. I'm currently trying to import some batch data to my Pinot cluster and I'm running into some issues with doing this. I have the latest version of Pinot (0.7.0) in a docker container, and I set everything up manually. I followed the docker version of this guide here: https://docs.pinot.apache.org/basics/getting-started/advanced-pinot-setup. I am able to configure the

baseballStats

offline table with some modifications to the files. When I am uploading my own batch data, I get the following error:

Copy code

400 (Bad Request) with reason: "Cannot add invalid schema: rows_10m. Reason: null"

I currently have a CSV that's formatted like this

Copy code

# /DIRECTORIES/rawdata/rows_10m.csv
id, hash_one, text_one
0, (large integer), a
1, (large integer), b
...

A schema.json that has this

Copy code

# /DIRECTORIES/rows_10m_schema.json
{
    "schemaName": "rows_10m",
    "dimensionFieldSpecs": [
        {
            "datatype": "STRING",
            "name": "text_one"
        }
    ],
    "metricFieldSpecs": [
        {
            "datatype": "INT",
            "name": "id"
        },
        {
            "datatype": "INT",
            "name": "hash_one"
        }
    ]
}

and a table config that has this

Copy code

# /DIRECTORIES/rows_10m_offline_table_config.json
{
    "tableName": "rows_10m",
    "tableTypes": "OFFLINE",
    "segmentsConfig": {
        "segmentPushType": "APPEND",
        "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
        "schemaName": "rows_10m",
        "replication": "1"
    },
    "tenants": {},
    "tableIndexConfig": {
        "loadMode": "HEAP",
        "invertedIndexColumns": [
            "id",
            "hash_one"
        ]
    },
    "metadata": {
        "customConfigs": {
        }
    }
}

This is very similar to what I used when I manually added the default

baseballStats

. Am I missing anything in my schema.json file?

Will Briggs

02/03/2021, 5:31 PM

@Kha With the

APPEND

push type, even with an offline table, I am pretty sure a primary time column is mandatory. Your schema doesn’t define one.

Will Briggs

02/03/2021, 5:33 PM

and your table definition doesn’t contain a `timeColumnName`value either - however, the example you’re running is likely trying to push the schema first, and that’s where it’s failing - so you’re not even getting to the table creation or loading the batch CSV

Will Briggs

02/03/2021, 5:34 PM

According to the docs:

Copy code

The primary time column is used by Pinot, for maintaining the time boundary between offline and realtime data in a hybrid table and for retention management. A primary time column is mandatory if the table's push type is APPEND and optional if the push type is REFRESH.

(see

DateTime

here)

Kha

02/05/2021, 9:26 PM

It turns out

dataType

was misspelled. Thanks for your help!

Open in Slack

Previous Next