https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • k

    Kishore G

    04/28/2020, 10:54 PM
    the parallelism is at a segment level
  • d

    Damiano

    04/28/2020, 10:55 PM
    hey, yes but when i run my query the trace has the following output
  • d

    Damiano

    04/28/2020, 10:56 PM
    "numServersQueried": 1,
    "numServersResponded": 1,
    "numSegmentsQueried": 1,
    "numSegmentsProcessed": 1,
    "numSegmentsMatched": 1,
  • d

    Damiano

    04/28/2020, 10:56 PM
    so i think all my records have been saved inside one segment only
  • k

    Kishore G

    04/28/2020, 10:56 PM
    yes
  • k

    Kishore G

    04/28/2020, 10:56 PM
    every file you ingest creates one segment
  • d

    Damiano

    04/28/2020, 10:57 PM
    ah ok, so 1 segment per file... ok so i simple can split my big csv into smaller chunks
  • d

    Damiano

    04/28/2020, 10:57 PM
    in that way pinot will do a parallel search on each segment
  • d

    Damiano

    04/28/2020, 10:57 PM
    and i think it will be faster, right?
  • k

    Kishore G

    04/28/2020, 10:57 PM
    yes
  • d

    Damiano

    04/28/2020, 10:58 PM
    however i set REFRESH, so if i put another file i think it will override the previous one no?
  • k

    Kishore G

    04/28/2020, 10:58 PM
    yes, the segment names need to match
  • k

    Kishore G

    04/28/2020, 10:59 PM
    typically you partition the data on some key and name them segment_0, segment_1 ... segment_N
  • d

    Damiano

    04/28/2020, 11:00 PM
    i added the csv doing
  • d

    Damiano

    04/28/2020, 11:00 PM
    Copy code
    executionFrameworkSpec:
      name: 'standalone'
      segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
      segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
      segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
    jobType: SegmentCreationAndTarPush
    inputDirURI: '/tmp/pinot-quick-start/rawdata/'
    includeFileNamePattern: 'glob:**/*.csv'
    outputDirURI: '/tmp/pinot-quick-start/segments/'
    overwriteOutput: true
    pinotFSSpecs:
      - scheme: file
        className: org.apache.pinot.spi.filesystem.LocalPinotFS
    recordReaderSpec:
      dataFormat: 'csv'
      className: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
      configClassName: 'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
    tableSpec:
      tableName: 'test'
      schemaURI: '<http://pinot-quickstart:9000/tables/test/schema>'
      tableConfigURI: '<http://pinot-quickstart:9000/tables/test>'
    pinotClusterSpecs:
      - controllerURI: '<http://pinot-quickstart:9000>'
  • d

    Damiano

    04/28/2020, 11:00 PM
    as reported on the doc
  • d

    Damiano

    04/28/2020, 11:01 PM
    where is the segment name?
  • d

    Damiano

    04/28/2020, 11:01 PM
    i mean.... i f i put more .csv inside rawdata
  • d

    Damiano

    04/28/2020, 11:01 PM
    each csv will have a different segment (it will be automatically generated)
  • d

    Damiano

    04/28/2020, 11:01 PM
    or i need to force the name somehow ?
  • x

    Xiang Fu

    04/28/2020, 11:03 PM
    it will have a segment name as table_{idx_id}
  • d

    Damiano

    04/28/2020, 11:04 PM
    ok so if i have rawdata/1.csv /rawdata/2.csv it will create the segment automatically, right? i mean, i should not change that yaml file right ?
  • d

    Damiano

    04/28/2020, 11:08 PM
    @Xiang Fu ^
  • x

    Xiang Fu

    04/28/2020, 11:09 PM
    yes
  • x

    Xiang Fu

    04/28/2020, 11:09 PM
    the layout will be /segments/test_0 /segments/test_1 …
  • d

    Damiano

    04/28/2020, 11:13 PM
    ok
  • d

    Damiano

    04/28/2020, 11:19 PM
    ok i do the same test but splitting my giant csv file into a smaller chunks
  • d

    Damiano

    04/28/2020, 11:24 PM
    @Xiang Fu 3x times faster ! now the segments are 38, before only 1...for ~37M documents
  • d

    Damiano

    04/28/2020, 11:24 PM
    in the trace i also see
  • d

    Damiano

    04/28/2020, 11:24 PM
    Copy code
    "numDocsScanned": 37991563,
        "numEntriesScannedInFilter": 0,
        "numEntriesScannedPostFilter": 75983126,
1...878889...166Latest