Hello Pinot team, I’m learning to work with Pinot ...
# troubleshooting
r
Hello Pinot team, I’m learning to work with Pinot and have hit a couple edge cases, that I couldn’t find the answers to in the Docs. I’ll post them here as two separate threads. There’s some weirdness around using _id as a column name. I’m trying to ingest data into Pinot from an OLTP data-store, and I wanted to have the primary-key be a column named “_id”. During ingestion, I found that our 32-digit hexadecimal string is converted into a much longer string if the column were named “_id”. Renaming the column to “id” works just fine. Is
_id
a reserved name in Pinot? Will attach screenshots with both with _id and id as column names in this thread.
Screen Shot 2022-04-27 at 9.45.12 AM.png,Screen Shot 2022-04-26 at 9.58.32 PM.png
k
Hi can you also add schema and table config
r
Here you go @User. I’ve removed all the other columns and redacted the broker url. I don’t think this matters because the only thing that changed between the transform that worked and the one that didn’t was the name of the
id
field.
schema.json,table_config.json
k
thanks
JSONPATHSTRING(fullDocument, '$._id.$oid')
It seems like
id
is not
_id
here which seems to be a whole object but
oid
field inside
_id
object. That should be the difference between two values
r
I see what’s going on here. The pre-ingested event already has an
_id
attribute, so it looks like this is being used directly instead of the one described in my transformation. Is there any way to force the ingestion config to use my transformation when there’s a conflict with the keys in the event source? Here’s a quick look at how the input to my pinot ingestion looks like.
Copy code
{
  "_id": {
    "_id": {
      "$oid": "6246a32a8b5a712b500f1eec"
    },
    "copyingData": true
  },
  "operationType": "insert",
  "documentKey": {
    "_id": {
      "$oid": "6246a32a8b5a712b500f1eec"
    }
  },
  "fullDocument": {
    "_id": {
      "$oid": "6246a32a8b5a712b500f1eec"
    },
    "isDeleted": false,
    "createdAt": {
      "$date": {
        "$numberLong": "1648796458544"
      }
    },
    "updatedAt": {
      "$date": {
        "$numberLong": "1648796459023"
      }
    }
  }
}
k
I don't understand what you mean by conflict
r
The input event that’s being ingested already has an
_id
field.
Copy code
"_id": {
    "_id": {
      "$oid": "6246a32a8b5a712b500f1eec"
    }
When I add a transformation like the following, I imagine that the
_id
field is picked up as is from the event instead of the transformation.
Copy code
{
  "columnName": "_id",
  "transformFunction": "JSONPATHSTRING(fullDocument, '$._id.$oid')"
}
This is the only way I can explain what’s going on. ~~ If I set
columnName
to
id
instead of
_id
in the transformation, then the transformation works as expected.