The segments are not being copied to s3.I get this...
# troubleshooting
a
The segments are not being copied to s3.I get this message
Copy code
Moved segment airlineStats_batch_2014-01-01_2014-01-01 from temp location /tmp/pinot-tmp-data/fileUploadTemp/tmp-66a57920-be10-41a0-a5e3-3f752b660d7c to /var/pinot/controller/data,<s3://ca-ctct-transient-dev-us-east-1-eigi-datalake/pinot-data/pinot-s3-example/controller-data/airlineStats/airlineStats_batch_2014-01-01_2014-01-01>
l
are you getting any errors in the logs?
a
They are actually being copied to a location “/var/pinot/controller/data\,s3\:/ca-ctct-transient-dev-us-east-1-eigi-datalake/pinot-data/pinot-s3-example/controller-data/” instead of s3
No errors
l
in this case you are configuring the controller to upload to s3 yea?
a
Copy code
controller:
  name: controller
  replicaCount: 1
  podManagementPolicy: Parallel
  podSecurityContext: {}
  # fsGroup: 2000
  securityContext: {}

  probes:
    endpoint: "/health"
    livenessEnabled: false
    readinessEnabled: false

  persistence:
    enabled: true
    accessMode: ReadWriteOnce
    size: 1G
    #mountPath: <s3://ca-ctct-transient-dev-us-east-1-eigi-datalake/pinot-data/pinot-s3-example/controller-data/mount>
    storageClass: ""

#  data:
#    dir: <s3://ca-ctct-transient-dev-us-east-1-eigi-datalake/pinot-data/pinot-s3-example/controller-data>

  vip:
    enabled: false
    host: pinot-controller
    port: 9000

  jvmOpts: "-Xms256M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xlog:gc*:file=/opt/pinot/gc-pinot-controller.log"

  log4j2ConfFile: /opt/pinot/conf/log4j2.xml
  pluginsDir: /opt/pinot/plugins

  service:
    annotations: {}
    clusterIP: "None"
    externalIPs: []
    loadBalancerIP: ""
    loadBalancerSourceRanges: []
    type: ClusterIP
    port: 9000
    nodePort: ""
    protocol: TCP
    gcpInternalLB: false
    name: controller

  external:
    enabled: true
    type: LoadBalancer
    port: 9000
    gcpInternalLB: false

  resources: {}

  nodeSelector: {}

  tolerations: []

  affinity: {}

  podAnnotations: {}

  updateStrategy:
    type: RollingUpdate

  # Use envFrom to define all of the ConfigMap or Secret data as container environment variables.
  # ref: <https://kubernetes.io/docs/tasks/configure-pod-container/configure-pod-configmap/#configure-all-key-value-pairs-in-a-configmap-as-container-environment-variables>
  # ref: <https://kubernetes.io/docs/tasks/inject-data-application/distribute-credentials-secure/#configure-all-key-value-pairs-in-a-secret-as-container-environment-variables>
  envFrom: []
  #  - configMapRef:
  #      name: special-config
  #  - secretRef:
  #      name: test-secret

  # Use extraEnv to add individual key value pairs as container environment variables.
  # ref: <https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/>
  extraEnv: []
  #  - name: PINOT_CUSTOM_ENV
  #    value: custom-value

  # Extra configs will be appended to pinot-controller.conf file
  extra:
    configs: |-
      pinot.set.instance.id.to.hostname=true
      controller.task.scheduler.enabled=true
      controller.task.frequencyInSeconds=3600
      controller.data.dir=<s3://ca-ctct-transient-dev-us-east-1-eigi-datalake/pinot-data/pinot-s3-example/controller-data>
      controller.persistence.mountPath=<s3://ca-ctct-transient-dev-us-east-1-eigi-datalake/pinot-data/pinot-s3-example/controller-data/mount>
      controller.local.temp.dir=/tmp/pinot-tmp-data/
      pinot.controller.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
      pinot.controller.storage.factory.s3.region=us-east-1
      controller.enable.split.commit=true
      pinot.controller.segment.fetcher.protocols=s3
      pinot.controller.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
      pinot.controller.storage.factory.s3.disableAcl=false
Yes this is the config
server config
Copy code
# Extra configs will be appended to pinot-server.conf file
  extra:
    configs: |-
      pinot.set.instance.id.to.hostname=true
      pinot.server.instance.realtime.alloc.offheap=true
      pinot.server.instance.currentDataTableVersion=2
      pinot.server.instance.dataDir=/tmp/pinot-tmp/server/index
      pinot.server.instance.segmentTarDir=/tmp/pinot-tmp/server/segmentTars
      pinot.server.instance.enable.split.commit=true
      pinot.server.storage.factory.class.s3=org.apache.pinot.plugin.filesystem.S3PinotFS
      pinot.server.storage.factory.s3.region=us-east-1
      pinot.server.segment.fetcher.protocols=s3
      pinot.server.segment.fetcher.s3.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
l
Copy code
controller.helix.cluster.name=pinot
    controller.port=9000
    controller.data.dir=<gs://pinot-data>
    controller.zk.str=pinot-zookeeper:2181
    pinot.set.instance.id.to.hostname=true
    controller.task.scheduler.enabled=true
    controller.local.temp.dir=/var/pinot/controller/data
    controller.allow.hlc.tables=false
    controller.enable.split.commit=true
    <http://pinot.controller.storage.factory.class.gs|pinot.controller.storage.factory.class.gs>=org.apache.pinot.plugin.filesystem.GcsPinotFS
    pinot.controller.storage.factory.gs.projectId=sandbox
    pinot.controller.storage.factory.gs.gcpKey=pinot-gcp-dev-cred.json
    pinot.controller.segment.fetcher.protocols=file,http,gs
    pinot.controller.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
this is what i have for controller, it’s gcs but should be the same thing overall
how often do you flush segments?
a
This was actually batch ingestion job …the airline stats example
which did not push to s3
Not a part of the realtime but offline table
can u share your server config
l
Copy code
pinot.server.netty.port=8098
    pinot.server.adminapi.port=8097
    pinot.server.instance.dataDir=/var/pinot/server/data/index
    pinot.server.instance.segmentTarDir=/var/pinot/server/data/segment
    pinot.set.instance.id.to.hostname=true
    pinot.server.instance.realtime.alloc.offheap=true
    pinot.server.instance.currentDataTableVersion=2
    <http://pinot.server.storage.factory.class.gs|pinot.server.storage.factory.class.gs>=org.apache.pinot.plugin.filesystem.GcsPinotFS
    pinot.server.storage.factory.gs.projectId=sandbox
    pinot.server.storage.factory.gs.gcpKey=pinot-gcp-dev-cred.json
    pinot.server.segment.fetcher.protocols=file,http,gs
    pinot.server.segment.fetcher.gs.class=org.apache.pinot.common.utils.fetcher.PinotFSSegmentFetcher
    pinot.server.instance.enable.split.commit=true
    pinot.server.instance.segment.store.uri=<gs://pinot-data>
this is what i have for server
but i’m working with a realtime table, it does upload to gcs
a
hmm did u try the airline stats example ?
l
i didn’t try that i just did it with custom data
a
ok but u tried the batch ingestion thats what I was trying to imply
l
ohh lol yea i haven’t just realtime stuff at the moment
but it seems based on the architecture diagram that the job itself is the one that should upload to the segment store
and then have the server pull from s3
a
ohh i will realtime table and see if that goes through
n
it should work for batch too. the configs look correct to me. @Xiang Fu ^ any idea
a
No we found the issue there are 2 configs which is causing the problem..we are trying to fix this
x
is this
pinot-gcp-dev-cred.json
file located at the correct place?
you can run
Copy code
kubectl exec -it pod/pinot-server-0 -n pinot  -- bash
to enter into the server container
l
ours work properly I Think @Abhijeet Kushe is the one having problems