Me again. New issue, not sure if it's me or if it'...
# troubleshooting
d
Me again. New issue, not sure if it's me or if it's a genuine bug: I'm trying to ingest JSON data for one of my columns, but I keep getting an error for that column:
Cannot read single-value from Collection:
. More on this thread.
This is a part of the exception, which I believe should contain the necessary info to investigate it:
Copy code
Caused by: java.lang.IllegalStateException: Cannot read single-value from Collection: [1, 2, 1, 1, 1, 1, 2, 1, 1, 1, 9, 1, 1] for column: brands_responses
	at shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:721) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardizeCollection(DataTypeTransformer.java:176) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.standardize(DataTypeTransformer.java:119) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.segment.local.recordtransformer.DataTypeTransformer.transform(DataTypeTransformer.java:63) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	... 13 more
2022/04/25 00:19:39.735 ERROR [SegmentGenerationJobRunner] [pool-2-thread-1] Failed to generate Pinot segment for file - file:/sensitive-data/outputs/cases/br/20150501.json
This is a JSON file I'm trying to consume:
Copy code
[
  {
    "brands_responses": {
      "first_1000226": 2,
      "second_1000226": 1,
      "third_1000226": 1,
      "fourth_1000226": 1,
      "fifth_1000226": 9,
      "sixth_1000226": 2,
      "seventh_1000226": 1,
      "eighth_1000226": 1,
      "ninth_1000226": 1,
      "tenth_1000226": 1,
      "eleventh_1000226": 1,
      "twelfth_1000226": 1,
      "thirteenth_1000226": 1
    },
    "caseid": 251214750,
    "date_": 20150501,
    "pmxid": 52735743,
    "region": "br",
    "sector_id": 1010,
    "uuid": "6702e33a-e961-4f62-b9df-2d65e4fe3fd5",
    "weight": 0.935066
  }
]
This is my schema:
Copy code
{
  "schemaName": "cases_schema",
  "dimensionFieldSpecs": [
    {
      "name": "brands_responses",
      "dataType": "JSON",
      "maxLength": 2147483647
    },
    {
      "name": "caseid",
      "dataType": "INT"
    },
    {
      "name": "pmxid",
      "dataType": "INT"
    },
    {
      "name": "region",
      "dataType": "STRING"
    },
    {
      "name": "sector_id",
      "dataType": "INT"
    },
    {
      "name": "uuid",
      "dataType": "STRING"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "weight",
      "dataType": "FLOAT"
    }
  ],
  "dateTimeFieldSpecs": [
    {
      "name": "date_",
      "dataType": "INT",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyyMMdd",
      "granularity": "1:DAYS"
    }
  ]
}
Sorry, while gathering the information I think I found what the issue is, hold on...
Nah, it's still erroring. I had forgotten to add the fields to
noDictionaryColumns
, but even with that added it still errors out
This is the table definition I send to Pinot when creating it:
Copy code
{
  "tableName": "cases",
  "tableType": "OFFLINE",
  "segmentsConfig": {
    "schemaName": "cases_schema",
    "timeColumnName": "date_",
    "timeType": "DAYS",
    "replicasPerPartition": "1",
    "replication": "1"
  },
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "noDictionaryColumns": [
      "brands_responses"
    ],
    "jsonIndexColumns": [],
    "invertedIndexColumns": [],
    "nullHandlingEnabled": true,
    "segmentPartitionConfig": {
      "columnPartitionMap": {
        "region": {
          "functionName": "Murmur",
          "numPartitions": 400
        }
      }
    }
  },
  "tenants": {
    "broker": "DefaultTenant",
    "server": "DefaultTenant"
  },
  "metadata": {
    "customConfigs": {}
  },
  "routing": {
    "instanceSelectorType": "balanced",
    "segmentPrunerTypes": [
      "partition",
      "time"
    ]
  },
  "transformConfigs": [
    {
      "columnName": "brands_responses",
      "transformFunction": "jsonFormat(\"brands_responses\")"
    }
  ]
}
I noticed, however, that
transformConfigs
is missing from the table definition when looking at how the table got created
This is the table config I see in the incubator UI:
Copy code
{
  "OFFLINE": {
    "tableName": "cases_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "timeType": "DAYS",
      "schemaName": "cases_schema",
      "replication": "1",
      "timeColumnName": "date_",
      "allowNullTimeValue": false,
      "replicasPerPartition": "1"
    },
    "tenants": {
      "broker": "DefaultTenant",
      "server": "DefaultTenant"
    },
    "tableIndexConfig": {
      "invertedIndexColumns": [],
      "noDictionaryColumns": [
        "brands_responses"
      ],
      "segmentPartitionConfig": {
        "columnPartitionMap": {
          "region": {
            "functionName": "Murmur",
            "numPartitions": 400
          }
        }
      },
      "rangeIndexVersion": 2,
      "jsonIndexColumns": [],
      "autoGeneratedInvertedIndex": false,
      "createInvertedIndexDuringSegmentGeneration": false,
      "loadMode": "MMAP",
      "enableDefaultStarTree": false,
      "enableDynamicStarTreeCreation": false,
      "aggregateMetrics": false,
      "nullHandlingEnabled": true
    },
    "metadata": {
      "customConfigs": {}
    },
    "routing": {
      "segmentPrunerTypes": [
        "partition",
        "time"
      ],
      "instanceSelectorType": "balanced"
    },
    "isDimTable": false
  }
}
Alright, I figured that I was missing a
ingestionConfig
as part of the table config - the documentation about JSON indexing is wrong, it doesn't mention this field. But even using this field, it doesn't work, if I send the correct payload I get:
Copy code
{
  "code": 400,
  "error": "Arguments of a transform function '[brands_responses]' cannot contain the destination column 'brands_responses'"
}
Alright, it's working now. I figured out that the source column from where the JSON has to be transformed cannot have the same name as the destination column. I wish Pinot was able to just do the transformation for us without all these extra configuration though.
Fixed on my side. I just opened a ticket to propose changes to the docs to improve them: https://github.com/apache/pinot/issues/8586
m
cc: @Mark Needham
m
d
Thanks man!