Hi, I faced this error when trying to do BatchInge...
# troubleshooting
m
Hi, I faced this error when trying to do BatchIngestion from the local file system
Failed to generate Pinot segment for file - file:data/orders.csv
java.lang.NumberFormatException: For input string: "2019-05-02 17:49:53"
here is the dateTimeFieldSpecs in the schema file:
Copy code
"dateTimeFieldSpecs": [
        {
            "dataType": "STRING",
            "name": "start_date",
            "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
            "granularity": "1:DAYS"
        },
        {
            "dataType": "STRING",
            "name": "end_date",
            "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
            "granularity": "1:DAYS"
        },
        {
            "dataType": "STRING",
            "name": "created_at",
            "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
            "granularity": "1:DAYS"
        },
        {
            "dataType": "STRING",
            "name": "updated_at",
            "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
            "granularity": "1:DAYS"
        }
    ]
k
What’s the full schema? Looks like you’ve got a numeric (metrics or dimensions) field, but the data in your input file is a date.
m
Copy code
{
  "schemaName": "orders",
  "metricFieldSpecs": [
    {
      "dataType": "DOUBLE",
      "name": "total"
    },
    {
      "dataType": "FLOAT",
      "name": "percentage"
    }
  ],
  "dimensionFieldSpecs": [
    {
      "dataType": "INT",
      "name": "id"
    },
    {
      "dataType": "STRING",
      "name": "user_id"
    },
    {
      "dataType": "STRING",
      "name": "worker_id"
    },
    {
      "dataType": "INT",
      "name": "job_id"
    },
    {
      "dataType": "DOUBLE",
      "name": "lat"
    },
    {
      "dataType": "DOUBLE",
      "name": "lng"
    },
    {
      "dataType": "INT",
      "name": "work_place"
    },
    {
      "dataType": "STRING",
      "name": "note"
    },
    {
      "dataType": "STRING",
      "name": "address"
    },
    {
      "dataType": "STRING",
      "name": "canceled_by"
    },
    {
      "dataType": "INT",
      "name": "status"
    },
    {
      "dataType": "STRING",
      "name": "canceled_message"
    }
  ],
  "dateTimeFieldSpecs": [
    {
      "dataType": "STRING",
      "name": "start_date",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
      "granularity": "1:DAYS"
    },
    {
      "dataType": "STRING",
      "name": "end_date",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
      "granularity": "1:DAYS"
    },
    {
      "dataType": "STRING",
      "name": "created_at",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
      "granularity": "1:DAYS"
    },
    {
      "dataType": "STRING",
      "name": "updated_at",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd HH:mm:ss",
      "granularity": "1:DAYS"
    }
  ]
}
k
I’d take a few rows of your input data and dump into Excel, to confirm the order/number of columns matches what you’ve defined in your schema.
m
I've fixed the error, the raw data was corrupted