Apache Pinot

Hi Team,

I am trying to create hour based segments in pinot but it's creating more than one folder into segments for the same hour, I guess this is due to some default row/data size, can I modify these default configurations and how

what it preferable size of the data segment in pinot, what is the philosophy here too many files with a small size or minimum file with a decent size

any reference on above

schema:

```{
  "schemaName": "svd",
  "dimensionFieldSpecs": [
    {
     "name" : "serviceId",
     "dataType" : "STRING"
    },
    {
     "name" : "currentCity",
     "dataType" : "STRING"
    },
    {
     "name" : "currentCluster",
     "dataType" : "STRING"
    },
    {
     "name" : "phone",
     "dataType" : "STRING"
    },
    {
     "name" : "epoch",
     "dataType" : "LONG"
    }
  ],
  "metricFieldSpecs": [
    {
     "name" : "surge",
     "dataType" : "DOUBLE"
    },
    {
     "name" : "subTotal",
     "dataType" : "DOUBLE"
    }
  ],
  "dateTimeFieldSpecs": [
      {
      "name": "dateString",
      "dataType": "STRING",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd-HH",
      "granularity": "1:DAYS"
    }
]
}```


table config

```{
  "tableName": "svd",
  "ingestionConfig": {
    "transformConfigs": [
    {
      "columnName": "dateString",
      "transformFunction": "toDateTime(epoch, 'yyyy-MM-dd-HH')"
    }
  ]
  },
  "segmentsConfig" : {
    "timeColumnName": "dateString",
    "timeType": "MILLISECONDS",
    "replication" : "1",
    "schemaName" : "svd"
  },
  "tableIndexConfig" : {
    "invertedIndexColumns" : ["serviceId"],
    "loadMode"  : "MMAP",
    "segmentPartitionConfig": {
      "columnPartitionMap": {
        "currentCity": {
          "functionName": "Murmur",
          "numPartitions": 4
        }
      }
    }
  },
  "routing": {
    "segmentPrunerTypes": ["partition"]
  },
  "tenants" : {
    "broker":"DefaultTenant",
    "server":"DefaultTenant"
  },
  "tableType":"OFFLINE",
  "metadata": {}
}```


Hi, you can refer to <https://docs.pinot.apache.org/basics/getting-started/frequent-questions/ingestion-faq#data-processing|https://docs.pinot.apache.org/basics/getting-started/frequent-questions/ingestion-faq#data-processing>

For offline tables, you have to configure number of rows in your output file (that can be converted to segment later). Pinot just converts input file to segment, and one file is equal to the one segment.

For your realtime tables; you can check configurations
<https://docs.pinot.apache.org/basics/data-import/pinot-stream-ingestion>