https://pinot.apache.org/ logo
#troubleshooting
Title
# troubleshooting
t

Tommaso Peresson

09/27/2022, 1:02 PM
Hello everybody, I have a question for you. Is it possible to modify the metadata of a segment? I would like to: • create the segments with spark and store them in hdfs • move them with distcp to gcs • load them with a metadata push to the cluster but this leaves me with segments having
"custom.map": "{\"input.data.file.uri\":\"hdfs://***\"}",
and instead I would want to have something like
"custom.map": "{\"input.data.file.uri\":\"gs://***\"}",
so that the segment fetcher would know where to get the data from. Do you know if it's possible to do what I'm asking? Thanks
n

Neha Pawar

09/27/2022, 3:55 PM
if you’ve moved segments to gcs and provided that path as “outputDir” in your segment push job, according to the code you should not be seeing “hdfs” anymore. Are you talking about the metadata.properties inside the segment or the SegmentkMetadata in zookeeper? The data detcher only uses the one from zookeeper, which I believe should be correctly set
t

Tommaso Peresson

09/27/2022, 4:01 PM
Hello, I'm talking about the metadata that I can fetch from
/segments​/{tableName}​/{segmentName}​/metadata
API, which from my understanding are stored in zk. Correct?
n

Neha Pawar

09/27/2022, 4:03 PM
i believe so. but the field you want to look at is “segment.download.url”
t

Tommaso Peresson

09/27/2022, 4:03 PM
Copy code
{
  "id": "<segment_id>",
  "simpleFields": {
    "segment.crc": "2408365506",
    "segment.creation.time": "1664198489345",
    "segment.download.url": "http://<controller-host>:9000/segments/<table_name>/<segment_id>",
    "segment.end.time": "1651536000000",
    "segment.index.version": "v3",
    "segment.push.time": "1664199765768",
    "segment.start.time": "1651536000000",
    "segment.time.unit": "MILLISECONDS",
    "segment.total.docs": "74478"
  },
  "mapFields": {
    "custom.map": {
      "input.data.file.uri": "hdfs://<path-to-segment-in-hdfs>"
    }
  },
  "listFields": {}
}
this field points to the controller directly
also, what populates the
Copy code
"mapFields": {
    "custom.map": {
      "input.data.file.uri": "hdfs://<path-to-segment-in-hdfs>"
    }
  },
field
n

Neha Pawar

09/27/2022, 4:05 PM
that means deep store is not set up correctly, and it is defaulting to using the controller disk. have you added the configs for gcs deep store in controller/server config? https://docs.pinot.apache.org/basics/data-import/pinot-file-system/import-from-gcp
t

Tommaso Peresson

09/27/2022, 4:06 PM
That might be the issue. I'll work on this
Is there a way to check the current config of the instances?
Where does the controller/server properties are red from? Are they fetched from the live controller through API or from the local deployment? I'm asking because in my case the live cluster runs on k8 and the ingestion job runs from a different machine that just has the default config. If i modify the segment download uri manually to gcs I can reload them from the cluster and the config seems to be propagating correctly on the cluster.
ok i fixed it by adding the config to the local deployment used to run the ingestion, thanks