Apache Pinot #troubleshooting

Chetan Anand

04/06/2022, 8:25 AM

Hello team, we are evaluating Pinot and we setup Pinot on GKE with the helm chart version 0.2.6-SNAPSHOT. We created an ingestionJobSpec to ingest from GCS bucket and write segments to GCS as deep store. Although the ingestion job succeeds and there are no apparent error/warn logs in the job output and the controller logs and a segment tar file seemingly appears in the GCS output bucket, the Pinot console shows 0 bytes in the table even long after. What are we missing ? Another question is whether the

outputDirURI

in the

ingestionJobSpec

can be same as the

controller.data.dir

? We are assuming yes. Config files, table config and job spec attached.

pinot_jobspec.yml pinot-server.conf pinot-minion.conf pinot-controller.conf pinot-table-config.json

André Siefken

04/06/2022, 8:58 AM

Hey folks, I am trying to ingest CSVs from S3 with hourly

SegmentGenerationAndPushTask

via the Minions. All jobs keep failing with

"INFO": "java.lang.RuntimeException: Failed to execute SegmentGenerationAndPushTask"

Can you help me find a means to trace and debug the issue? AFAIK it is not easy to monitor Minion tasks, but do you know of any hints towards where it may fail - at the configs level and/or parsing the files (although each individually seem to be valid)?

Aparna Razdan

04/06/2022, 11:28 AM

Hi Team, I am trying to read data from parquet file to Pinot table using spark batch Ingestion. I am facing error for date time STRING datatype. Here the date (‘yyyy-MM-dd ’) is getting loaded in EPOCH format (18234) whereas I need it in original string format with granularity : DAYS (2020-01-02). For now, I am using derived column method and transforming it into string using transformConfigs . With this, I am not longer able to use function like dateTrunc(‘week’ , sql_date_entered_str, ‘DAYS’)

Copy code

{
      "name": "sql_date_entered",
      "dataType": "INT",
      "format": "1:DAYS:EPOCH",
      "granularity": "1:DAYS"
    },
 {
      "name": "sql_date_entered_str",
      "dataType": "STRING",
      "format": "1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd",
      "granularity": "1:DAYS"
    }

Other way to handle is using query transformations:

Copy code

select sql_date_entered ,
DATETIMECONVERT(dateTrunc('week' , sql_date_entered, 'DAYS'), '1:DAYS:EPOCH', '1:DAYS:SIMPLE_DATE_FORMAT:yyyy-MM-dd', '1:DAYS') as week
from
kepler_product_pipegen

Is there any way that I can load the date in ( ‘YYYY-mm-dd’) format and still run the transformation like dateTrunc on the top of it ? Pinot version = 0.7.1

Ali Atıl

04/06/2022, 1:41 PM

Hello everyone, is there anyway to make such a query which includes range filter, order by and high value limit,offset faster? I'd appreciate any help :) 'messageTime' is the dateTimeFieldSpecs column of the table.

Copy code

select fieldA, fieldB, fieldC, messageTime from mytable where messageTime >= 1648813002065 and  messageTime <= 1649245002065 order by messageTime limit 3000000, 1000000 option(timeoutMs=120000)

Grace Lu

04/06/2022, 7:34 PM

Hi team, when I was running the spark pinot batch ingestion with s3 parquet data as input , I noticed that the job will failed when there is an empty file / non parquet file in the input folder, but it’s very common for the upstream processed data output by spark to have a empty _SUCCESS marker file in the folder. I wonder if it is possible to let the ingestion job ignore these non parquet / empty files by changing some config, otherwise we will need to clean up the _SUCCESS file every time for pinot ingestion jobs.

Luis Fernandez

04/06/2022, 9:03 PM

is ORDER BY multiple fields not working properly on pinot?

Copy code

SELECT product_id, COUNT(*) as views
FROM table
ORDER BY views, product_id DESC

in regular RDBMS I would expect for it to order by the views first and then product id so something like

Copy code

views, product_id
3      3
3      2
3      1

but pinot does something completely different is doing

Copy code

views, product_id
1      10
1      9
1      8

anyone has an idea why this may be?

Ali Atıl

04/07/2022, 6:39 AM

Hello everyone, is it normal for pinot-server to flood this log? i have noticed this after upgrading to 0.10.0

[Consumer clientId=consumer-null-808, groupId=null] Seeking to offset 59190962 for partition mytopic-0

Ali Atıl

04/07/2022, 8:45 AM

Hello everyone, If i don't set 'maxNumRecordsPerSegment' config for my 'RealtimeToOfflineSegmentsTask', would it truncate my data if i have more records than default value (it says 5.000.000 in docs) for that time window?

Fizza Abid

04/07/2022, 12:07 PM

Hello everyone! I want to connect my s3 data to Apache pinot? Can someone guide me about it? Is is possible through helm or I'll have to create a job for ingestion? Currently, we don't use kafka.

Luis Fernandez

04/07/2022, 8:25 PM

in the pinot docs we have this about start pinot via IDE but how is this starting the pinot infra via the IDE? I guess that ultimately my question is how can I attach remote debugger to my local pinot processes

Diogo Baeder

04/07/2022, 9:13 PM

Hi folks! This could probably be a question more geared towards @User, but I'll ask broadly anyway: is there any documentation available about how to implement ad-hoc segment replacement, in terms of what this flow would be? I'll follow up in this thread.

Alice

04/08/2022, 1:43 AM

Hi team, my table segments show bad status. Queries on this table return 305 error and segments are not available. I reset all segments and it doesn’t work. What am I gonna do in this case? Thanks.

Mayank

04/08/2022, 8:58 PM

The issue seems like PinotFs has not been initialized. Also, any reason to not use the latest release 0.10.0?

👍 1

francoisa

04/11/2022, 8:20 AM

Hi facing a new weird issue 😄

Copy code

java.lang.RuntimeException: Exception getting FSM for segment candidates__0__0__20220408T1410Z
	at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager.lookupOrCreateFsm(SegmentCompletionManager.java:175) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.controller.helix.core.realtime.SegmentCompletionManager.segmentConsumed(SegmentCompletionManager.java:202) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.apache.pinot.controller.api.resources.LLCSegmentCompletionHandlers.segmentConsumed(LLCSegmentCompletionHandlers.java:144) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at jdk.internal.reflect.GeneratedMethodAccessor128.invoke(Unknown Source) ~[?:?]
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
	at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:469) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:391) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:80) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:253) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:232) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:679) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:353) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:200) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:569) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:549) [pinot-all-0.10.0-jar-with-dependencies.jar:0.10.0-30c4635bfeee88f88aa9c9f63b93bcd4a650607f]
	at java.lang.Thread.run(Thread.java:829) [?:?]

My controller look for a non existing segment 😕 Is there an API way to tells him it does not exist anymore ? 😄 table still acessible and everything is OK but found it weird

Alice

04/11/2022, 11:42 AM

Hi, realtimeToOffline task tried to process some segments that were deleted and moved to Deleted Segments folder, in this case, the following error occurred. What should I do if I want Pinot to resume processing other segments and ignore those deleted segments? org.apache.pinot.common.exception.HttpErrorStatusException: Got error status code: 404 (Not Found) with reason: “Segment telemetrics_data__0__0__20220408T0702Z or table telemetrics_data not found in /var/pinot/controller/data/telemetrics_data/telemetrics_data__0__0__20220408T0702Z”

Eduardo Cusa

04/11/2022, 12:59 PM

Hello guys, similar to this thread segments ingested are bad. Checking in the swagger debug endpoint the error is :

Caught exception in state transition from OFFLINE -> ONLINE for resource: adv1_OFFLINE, partition: adv1_OFFLINE_2022-03-01_2022-03-01_0

. This could be related to the data itself? or something like OOMs/resources like Mayank mentioned in the thread ? Any suggestion on how to debug it? Thanks

Shailesh Jha

04/11/2022, 1:05 PM

Hi Team Can anyone assist on this?? We are not able to consume messages from kafka from last day. When I check Debug information for table. Not sure. how to procced with that can anyone assist??

erik bergsten

04/11/2022, 2:19 PM

We have tagged one server as DefaultTenant_OFFLINE and one as DefaultTenant_a_OFFLINE (using the web gui) and then created a table with the following config:

Copy code

{
  "tableName": "environment",
  "tableType": "OFFLINE",
  "tenants": {
    "broker": "DefaultTenant",
    "server": "DefaultTenant"
  },
  "segmentsConfig": {
    "schemaName": "environment",
    "timeColumnName": "ts",
    "replication": "1",
    "replicasPerPartition": "1",
    "retentionTimeUnit": null,
    "retentionTimeValue": null,
    "segmentPushType": "APPEND",
    "segmentPushFrequency": "DAILY",
    "crypterClassName": null,
    "peerSegmentDownloadScheme": null
  },
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "invertedIndexColumns": [],
    "createInvertedIndexDuringSegmentGeneration": false,
    "rangeIndexColumns": [],
    "sortedColumn": [],
    "bloomFilterColumns": [],
    "bloomFilterConfigs": null,
    "noDictionaryColumns": [],
    "onHeapDictionaryColumns": [],
    "varLengthDictionaryColumns": [],
    "enableDefaultStarTree": false,
    "starTreeIndexConfigs": null,
    "enableDynamicStarTreeCreation": false,
    "segmentPartitionConfig": null,
    "columnMinMaxValueGeneratorMode": null,
    "nullHandlingEnabled": false
  },
  "metadata": {},
  "ingestionConfig": {
    "filterConfig": null,
    "transformConfigs": [
      {
        "columnName": "ts",
        "transformFunction": "FromDateTime(\"DepartureDate\", 'yyyy-MM-dd''T''HH:mm:ss.SSSZ')"
      }
    ]
  },
  "quota": {
    "storage": null,
    "maxQueriesPerSecond": null
  },
  "task": null,
  "routing": {
    "segmentPrunerTypes": null,
    "instanceSelectorType": null
  },
  "instanceAssignmentConfigMap": null,
  "query": {
    "timeoutMs": null
  },
  "fieldConfigList": null,
  "upsertConfig": null,
  "tierConfigs": [
    {
      "name": "tierA",
      "segmentSelectorType": "time",
      "segmentAge": "5m",
      "storageType": "pinot_server",
      "serverTag": "DefaultTenant_a_OFFLINE"
    }
  ]
}

We expected to see segments moved from DefaultTenant_OFFLINE to the other server 5 mins after ingestion (using batch ingestion) but nothing seems to happen. Is there anything obviously wrong in the config? How should we pursue solving this problem? We cannot find any errors / interesting messages in any log.

Luis Fernandez

04/11/2022, 7:19 PM

another question, is it me or minion doesn’t have a

health

endpoint what do you check for livenessProbe there?

Grace Lu

04/11/2022, 7:37 PM

Hi team, I have following issues and questions for pinot spark batch ingestion: 1. The ingestion job will fail when input parquet data contains timestamp type column, it seems to be related to INT96 timestamp type unsupported issue. Is the workaround here just preprocessing the data to cast it to other format? 2. Can the partition columns in the input path be understood in the ingestion job? eg: if I have s3://bucket/metrics/dt=2022-04-01/files.parquet input structure, can the dt column be ingested to a pinot table column directly? Seems like I am running into most of the issues in this thread: https://apache-pinot.slack.com/archives/C011C9JHN7R/p1626295708055100 I wonder if there are update on these problems in last 8 months 👀

yelim yu

04/12/2022, 1:26 AM

hi team, i have a question on schema evolution on hybrid table. is it possible that we add new column on realtime table while this table is still in the form of hybrid table? (i.e. table a is already hybrid table, i want to add new column on table a's realtime table since past data in offline table does not have any feature of new column) if possible, do we just need to follow this to the realtime schema config only? (or both?)

Sumit Lakra

04/12/2022, 11:29 AM

Hi team, we are trying to integrate thirdeye with our pinot cluster using https://docs.pinot.apache.org/integrations/thirdeye and https://thirdeye.readthedocs.io/en/latest/pinot.html and we can access the dashboard on port 1426 but unable to see any pinot data set. What other steps are involved in this integration ?

francoisa

04/12/2022, 12:08 PM

Hi little question about hybrid table ... i’ve manage to purge sucessfully my offline table (big thanks to @User and @User ) Segment size 0 docs 🙂 Pretty happy. But ... I keep see the purged record as they remains in the realtime table 😕 The retention still a bit long. Is there a way to clear either using segment metadata on the offline side or deleteting processed realTimetoOffline segments on the realTimeSide ? Many thanks 😉

Saumya Upadhyay

04/13/2022, 4:24 AM

Hi everyone, we have realtime tables and data ingestion is happening from kafka, but our query performance is very low even with around in total we have 13 lkhs of data. query time is 17 secs, we have 1 tenant, 1 broker 2 server. Do we need to create indexes separately or it is done by default on columns because i saw some indexes are created. Also is there a option that we can create segments as per kafka - topic key. We are usually doing query on timestamp based and id , and our kafka topics have id as key.

Kevin Liu

04/13/2022, 5:55 AM

Hi everyone, How to configure Kafka SSL for Kafka Clients in pinot?

erik bergsten

04/13/2022, 12:47 PM

Hi! We are trying to use tiered storage with an NFS volume mounted on "server-b". When we trigger the rebalance and segments move from server-a to server-b we get alot of errors like:

Copy code

Caused by: java.nio.file.FileSystemException: /var/pinot/server/data/index/environment_OFFLINE/environment_OFFLINE_1618208070664_1649743939567_7/v3/.nfs000000000134004000000058: Device or resource busy

in the logs from server-b. Could this be a problem with how the server is implemented or is it strictly an NFS problem on our end? The end result is that some or all segments go into an error state and the data goes missing during a rebalance.

Nikhil

04/14/2022, 8:34 PM

👋 hi folks, Looking to understand how to get the pinot ingestion job working on EMR spark 2.4 in cluster mode. Using pinot 0.7.1 since the EMR cluster I'm working with is running on java 8. The following spark-submit works successfully and the pinot segments are getting generated when running in client mode. Here the command to start it on the master node

Copy code

sudo spark-submit --class org.apache.pinot.tools.admin.command.LaunchDataIngestionJobCommand --master local --deploy-mode client --conf spark.local.dir=/mnt --conf "spark.driver.extraJavaOptions=-Dplugins.dir=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins -Dplugins.include=pinot-s3,pinot-parquet -Dlog4j2.configurationFile=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/conf/pinot-ingestion-job-log4j2.xml" --conf "spark.driver.extraClassPath=/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-batch-ingestion/pinot-batch-ingestion-spark/pinot-batch-ingestion-spark-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar:/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar" /mnt/pinot/apache-pinot-incubating-0.7.1-bin/lib/pinot-all-0.7.1-jar-with-dependencies.jar -jobSpecFile /mnt/pinot/spark_job_spec_v8.yaml

the ingestion spec used is this:

Copy code

executionFrameworkSpec:
  name: 'spark'
  segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.spark.SparkSegmentGenerationJobRunner'
  extraConfigs:
    stagingDir: <s3://nikhil-dw-dev/pinot/staging/>
    dependencyJarDir: '<s3://nikhil-dw-dev/pinot/apache-pinot-incubating-0.7.1-bin/plugins>'  
jobType: SegmentCreation
inputDirURI: '<s3://nikhil-dw-dev/pinot/pinot_input/>'
includeFileNamePattern: 'glob:**/*.parquet'
outputDirURI: '<s3://nikhil-dw-dev/pinot/pinot_output3/>'
overwriteOutput: true
pinotFSSpecs:
  -
    className: org.apache.pinot.plugin.filesystem.S3PinotFS
    scheme: s3
    configs:
      region: us-east-1
recordReaderSpec:
  dataFormat: 'parquet'
  className: 'org.apache.pinot.plugin.inputformat.parquet.ParquetRecordReader'
tableSpec:
  tableName: 'students'
  schemaURI: '<s3://nikhil-dw-dev/pinot/students_schema.json>'
  tableConfigURI: '<s3://nikhil-dw-dev/pinot/students_table.json>'

But when running this on cluster mode, I get the class not found issue. The plugins.dir is available on all the EMR nodes, and we can see that the plugins are getting successfully loaded., I have tried passing the the s3 location as well as the /mnt path, and both are failing with the same error. I looked at these two previous posts [1] and [2] and they did not help in resolving it. Here is the error

Copy code

22/04/14 07:06:44 INFO PluginManager: Plugins root dir is [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins]
22/04/14 07:06:44 INFO PluginManager: Trying to load plugins: [[pinot-s3, pinot-parquet]]
22/04/14 07:06:44 INFO PluginManager: Trying to load plugin [pinot-s3] from location [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3]
22/04/14 07:06:44 INFO PluginManager: Successfully loaded plugin [pinot-s3] from jar file [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3/pinot-s3-0.7.1-shaded.jar]
22/04/14 07:06:44 INFO PluginManager: Successfully Loaded plugin [pinot-s3] from dir [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-file-system/pinot-s3]
22/04/14 07:06:44 INFO PluginManager: Trying to load plugin [pinot-parquet] from location [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet]
22/04/14 07:06:44 INFO PluginManager: Successfully loaded plugin [pinot-parquet] from jar file [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet/pinot-parquet-0.7.1-shaded.jar]
22/04/14 07:06:44 INFO PluginManager: Successfully Loaded plugin [pinot-parquet] from dir [/mnt/pinot/apache-pinot-incubating-0.7.1-bin/plugins/pinot-input-format/pinot-parquet]
22/04/14 07:06:45 ERROR LaunchDataIngestionJobCommand: Got exception to generate IngestionJobSpec for data ingestion job - 
Can't construct a java object for tag:<http://yaml.org|yaml.org>,2002:org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec; exception=Class not found: org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
 in 'string', line 1, column 1:
    executionFrameworkSpec:
    ^

Will thread the different commands used to submit this job. Thank you for your help 🙇

Alice

04/15/2022, 12:31 PM

Hi, one Pinot server shows dead status. Is there any way to make it cover from this status? 🤣

Diogo Baeder

04/15/2022, 9:51 PM

Hi folks, I'm having issues running batch ingestion in my local experiments with Pinot using docker-compose. More in this thread.

Diogo Baeder

04/16/2022, 10:12 PM

Hey folks, I noticed something strange while testing batch ingestion: apart from the normal segments created that I expect to be there in the tables I'm using, I end up with more segments with name patterns that seem to be using the batch run timestamp, and if I run the ingestion again on the same files as inputs, instead of the segments being kept as they were before (because there's no new file to be ingested), I end up with more of those segments with strange name patterns. Is this expected? The row counts don't change, and neither does the disk size taken by each table, it's just really the amount of segments that are increased somehow.