Hello guys! I'm having some troubles while running...
# troubleshooting
j
Hello guys! I'm having some troubles while running a hybrid table, may someone help me please? I'm receiving these warnings in the Broker when pushing offline segments to Pinot:
Copy code
[BaseBrokerRequestHandler] [jersey-server-managed-async-executor-1] Failed to find time boundary info for hybrid table: transaction
When I try to run a query, i get a timeout. Server log:
Copy code
Timed out while polling results block, numBlocksMerged: 0 (query: QueryContext{_tableName='transaction_REALTIME', _selectExpressions=[count(*)], _aliasMap={}, _filter=transactionDate > '1606971455132', _groupByExpressions=null, _havingFilter=null, _orderByExpressions=null, _limit=10, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9999}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:transaction_REALTIME), filterQuery:FilterQuery(id:0, column:transactionDate, value:[(1606971455132		*)], operator:RANGE, nestedFilterQueryIds:[]), aggregationsInfo:[AggregationInfo(aggregationType:COUNT, aggregationParams:{column=*}, isInSelectList:true, expressions:[*])], filterSubQueryMap:FilterQueryMap(filterQueryMap:{0=FilterQuery(id:0, column:transactionDate, value:[(1606971455132		*)], operator:RANGE, nestedFilterQueryIds:[])}), queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9999}, pinotQuery:PinotQuery(dataSource:DataSource(tableName:transaction_REALTIME), selectList:[Expression(type:FUNCTION, functionCall:Function(operator:COUNT, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:*))]))], filterExpression:Expression(type:FUNCTION, functionCall:Function(operator:GREATER_THAN, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:transactionDate)), Expression(type:LITERAL, literal:<Literal longValue:1606971455132>)]))), limit:10)})
If I try to use
Tracing
i get a NPE in the offline servers:
Copy code
ERROR [QueryScheduler] [pqr-0] Encountered exception while processing requestId 83 from broker Broker_pinot-broker-0.pinot-broker-headless.pinot.svc.cluster.local_8099
java.lang.NullPointerException: null
	at org.apache.pinot.core.util.trace.TraceContext.getTraceInfo(TraceContext.java:188) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:235) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.core.query.executor.QueryExecutor.processQuery(QueryExecutor.java:60) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.core.query.scheduler.QueryScheduler.processQueryAndSerialize(QueryScheduler.java:155) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
I'm running Pinot 0.6.0 btw,
If i delete the offline segment, everything works fine
m
You may want to check if you have the time column setup correctly
j
Offline table config:
Copy code
{
  "tableName": "transaction",
  "tableType": "OFFLINE",
  "tenants": {
    "broker": "fraud",
    "server": "fraud"
  },
  "segmentsConfig": {
    "schemaName": "transaction",
    "timeColumnName": "transactionDate",
    "timeType": "MILLISECONDS",
    "replication": "2",
    "segmentPushType": "APPEND",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "365"
  },
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "invertedIndexColumns": [
      "customerUuid"
    ],
    "noDictionaryColumns": [
      "totalValue"
    ],
    "sortedColumn": [
      "customerUuid"
    ],
    "segmentPartitionConfig": {
      "columnPartitionMap": {
        "customerUuid": {
          "functionName": "Murmur",
          "numPartitions": 4
        }
      }
    }
  },
  "metadata": {},
  "routing": {
    "segmentPrunerTypes": [
      "partition"
    ]
  }
}
Realtime table config:
Copy code
{
  "tableName": "transaction",
  "tableType": "REALTIME",
  "tenants": {
    "broker": "fraud",
    "server": "fraud",
    "tagOverrideConfig": {
      "realtimeConsuming": "fraud_REALTIME",
      "realtimeCompleted": "fraud_OFFLINE"
    }
  },
  "segmentsConfig": {
    "schemaName": "transaction",
    "timeColumnName": "transactionDate",
    "timeType": "MILLISECONDS",
    "replicasPerPartition": "2",
    "segmentPushType": "APPEND",
    "segmentPushFrequency": "DAILY",
    "retentionTimeUnit": "DAYS",
    "retentionTimeValue": "365",
    "completionConfig": {
      "completionMode": "DOWNLOAD"
    },
    "peerSegmentDownloadScheme": "http"
  },
  "tableIndexConfig": {
    "loadMode": "MMAP",
    "streamConfigs": {
      "streamType": "kafka",
      "stream.kafka.consumer.type": "LowLevel",
      "stream.kafka.consumer.prop.auto.offset.reset": "smallest",
      "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
      "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
      "realtime.segment.flush.threshold.rows": "0",
      "realtime.segment.flush.threshold.time": "4h",
      "realtime.segment.flush.segment.size": "10M"
    },
    "invertedIndexColumns": [
      "customerUuid"
    ],
    "noDictionaryColumns": [
      "totalValue"
    ],
    "sortedColumn": [
      "customerUuid"
    ],
    "aggregateMetrics": true,
    "segmentPartitionConfig": {
      "columnPartitionMap": {
        "customerUuid": {
          "functionName": "Murmur",
          "numPartitions": 4
        }
      }
    }
  },
  "metadata": {},
  "routing": {
    "segmentPrunerTypes": [
      "partition"
    ]
  }
}
Schema:
Copy code
{
  "schemaName": "transaction",
  "dimensionFieldSpecs": [
    {
      "name": "customerUuid",
      "dataType": "STRING"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "totalValue",
      "dataType": "DOUBLE"
    }
  ],
  "dateTimeFieldSpecs": [
    {
      "name": "transactionDate",
      "dataType": "LONG",
      "format": "1:MILLISECONDS:EPOCH",
      "granularity": "1:MILLISECONDS"
    }
  ]
}
Ommited some things for security reasons 🙂
I can't see anything wrong here. Maybe i'm missing something 😞
m
Do you see any error messages in the broker for setting time boundary? Like:
Failed to find segment with valid end time for table: {}, no time boundary generated
j
Yes:
[TimeBoundaryManager] [ClusterChangeHandlingThread] Failed to find segment with valid end time for table: transaction_OFFLINE, no time boundary generated
I have these logs at the controller:
Copy code
2020/12/04 16:43:50.103 WARN [ZkBaseDataAccessor] [HelixController-pipeline-default-pinot-(afa2c547_DEFAULT)] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/CURRENTSTATES/102888f0a96000a/transaction_OFFLINE=-101}
2020/12/04 16:43:50.104 WARN [AbstractDataCache] [HelixController-pipeline-default-pinot-(afa2c547_DEFAULT)] znode is null for key: /pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/CURRENTSTATES/102888f0a96000a/transaction_OFFLINE
2020/12/04 16:55:24.471 WARN [TopStateHandoffReportStage] [HelixController-pipeline-default-pinot-(cfba642e_DEFAULT)] Event cfba642e_DEFAULT : Cannot confirm top state missing start time. Use the current system time as the start time.
m
can you check
select max(time)..
?
j
sometimes i get an exception while running this query and sometimes it returns a result:
1607100225939
Copy code
ERROR [ServerQueryExecutorV1Impl] [pqr-1] Exception processing requestId 179
java.lang.RuntimeException: Caught exception while running CombinePlanNode.
	at org.apache.pinot.core.plan.CombinePlanNode.run(CombinePlanNode.java:151) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.core.plan.InstanceResponsePlanNode.run(InstanceResponsePlanNode.java:33) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.core.plan.GlobalPlanImplV0.execute(GlobalPlanImplV0.java:45) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
	at org.apache.pinot.core.query.executor.ServerQueryExecutorV1Impl.processQuery(ServerQueryExecutorV1Impl.java:294) ~[pinot-all-0.6.0-jar-with-dependencies.jar:0.6.0-bb646baceafcd9b849a1ecdec7a11203c7027e21]
m
What is the exact query you are running? Do you have both offline and realtime tables right now?
j
if i try
select max(transactionDate) from transaction_OFFLINE
, it just timeouts
What is the exact query you are running? Do you have both offline and realtime tables right now?
select max(transactionDate) from transaction
m
Hmm,
select max(transactionDate) from transaction_OFFLINE
should be really fast as it only looks at metadata.
as in
max
without predicate from _OFFLINE/_REALTIME table
j
Right, maybe some issues with the zookeeper?
m
NO, it is segment metadata
Check if the offline servers got the query
The only thing that stands out is that you have MILLIS as time unit. Initially we did not support MILLIS for hybrid tables, but IIRC it was added a while back
j
the offline servers just timeouts:
Copy code
ERROR [BaseCombineOperator] [pqr-0] Timed out while polling results block, numBlocksMerged: 0 (query: QueryContext{_tableName='transaction_OFFLINE', _selectExpressions=[max(transactionDate)], _aliasMap={}, _filter=null, _groupByExpressions=null, _havingFilter=null, _orderByExpressions=null, _limit=10, _offset=0, _queryOptions={responseFormat=sql, groupByMode=sql, timeoutMs=9999}, _debugOptions=null, _brokerRequest=BrokerRequest(querySource:QuerySource(tableName:transaction_OFFLINE), aggregationsInfo:[AggregationInfo(aggregationType:MAX, aggregationParams:{column=transactionDate}, isInSelectList:true, expressions:[transactionDate])], queryOptions:{responseFormat=sql, groupByMode=sql, timeoutMs=9999}, pinotQuery:PinotQuery(dataSource:DataSource(tableName:transaction_OFFLINE), selectList:[Expression(type:FUNCTION, functionCall:Function(operator:MAX, operands:[Expression(type:IDENTIFIER, identifier:Identifier(name:transactionDate))]))]), limit:10)})
m
Is that from query to _OFFLINE table?
j
yes
m
Hmm, that really does not make sense. It should just loop over segment metadata and find max. How many segments do you have?
j
just one
i just ran a job for it
m
how big is the segment
j
221mb
i has 4M docs in it
m
That seems fine
Do you really need MILLIS time stamp?
You are getting
Failed to find segment with valid end time for table: {}, no time boundary generated"
because the query to get max time from offline is timing out
And due to that, you don't have a time boundary
Can you delete and re-create the offline table?
j
nope, i can change it to SECONDS
m
And check if you can get max time from offline table
Let's do the delete -> recreate of offline table first
j
ok!
m
before changing the time unit
j
ok, should i recreate the segment already?
m
can you paste the segment metadata here?
j
right, one sec
m
metadata.properties
file in segment folder
j
hmm, the segment folder is empty 😧
m
I thought you said it is 221MB?
j
yes, it is
but there's no segments at the segment folder
m
By segment folder I mean the segment itself
The untarred file
j
oh, right
Copy code
segment.padding.character = \u0000
segment.name = transaction_OFFLINE_1607011097024_1607097496995_0
segment.table.name = transaction_OFFLINE
segment.dimension.column.names = customerUuid,transactionId
segment.datetime.column.names = transactionDate
segment.time.column.name = transactionDate
segment.total.docs = 4198229
segment.start.time = 1607011097024
segment.end.time = 1607097496995
segment.time.unit = MILLISECONDS
column.customerUuid.cardinality = 1523662
column.customerUuid.totalDocs = 4198229
column.customerUuid.dataType = STRING
column.customerUuid.bitsPerElement = 21
column.customerUuid.lengthOfEachEntry = 36
column.customerUuid.columnType = DIMENSION
column.customerUuid.isSorted = false
column.customerUuid.hasNullValue = false
column.customerUuid.hasDictionary = true
column.customerUuid.textIndexType = NONE
column.customerUuid.hasInvertedIndex = true
column.customerUuid.isSingleValues = true
column.customerUuid.maxNumberOfMultiValues = 0
column.customerUuid.totalNumberOfEntries = 4198229
column.customerUuid.isAutoGenerated = false
column.customerUuid.partitionFunction = Murmur
column.customerUuid.numPartitions = 20
column.customerUuid.partitionValues = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
column.customerUuid.minValue = 0000167e-d426-4686-b315-31b756daace3
column.customerUuid.maxValue = fffffca6-52a8-4324-bd1c-c26ab3511720
column.customerUuid.defaultNullValue = null
column.transactionDate.cardinality = 3769234
column.transactionDate.totalDocs = 4198229
column.transactionDate.dataType = LONG
column.transactionDate.bitsPerElement = 22
column.transactionDate.lengthOfEachEntry = 0
column.transactionDate.columnType = DATE_TIME
column.transactionDate.isSorted = true
column.transactionDate.hasNullValue = false
column.transactionDate.hasDictionary = true
column.transactionDate.textIndexType = NONE
column.transactionDate.hasInvertedIndex = true
column.transactionDate.isSingleValues = true
column.transactionDate.maxNumberOfMultiValues = 0
column.transactionDate.totalNumberOfEntries = 4198229
column.transactionDate.isAutoGenerated = false
column.transactionDate.datetimeFormat = 1:MILLISECONDS:EPOCH
column.transactionDate.datetimeGranularity = 1:MILLISECONDS
column.transactionDate.minValue = 1607011097024
column.transactionDate.maxValue = 1607097496995
column.transactionDate.defaultNullValue = -9223372036854775808
column.transactionId.cardinality = 4198229
column.transactionId.totalDocs = 4198229
column.transactionId.dataType = LONG
column.transactionId.bitsPerElement = 23
column.transactionId.lengthOfEachEntry = 0
column.transactionId.columnType = DIMENSION
column.transactionId.isSorted = false
column.transactionId.hasNullValue = false
column.transactionId.hasDictionary = true
column.transactionId.textIndexType = NONE
column.transactionId.hasInvertedIndex = true
column.transactionId.isSingleValues = true
column.transactionId.maxNumberOfMultiValues = 0
column.transactionId.totalNumberOfEntries = 4198229
column.transactionId.isAutoGenerated = false
column.transactionId.minValue = 1956565750
column.transactionId.maxValue = 1960763978
column.transactionId.defaultNullValue = -9223372036854775808
segment.index.version = v3
m
Copy code
column.transactionDate.minValue = 1607011097024
column.transactionDate.maxValue = 1607097496995
Seems min max value of time are set correctly
What folder are you referring to that is empty?
can you push the segment and see if it is ONLINE in external view?
j
the folder inside the server
1 sec
Copy code
"OFFLINE": {
    "transaction_OFFLINE_1607011097024_1607097496995_0": {
      "Server_pinot-server-0.pinot-server-headless.pinot.svc.cluster.local_8098": "ONLINE",
      "Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098": "ONLINE"
    }
  }
m
Have you deleted->recreated already?
j
yes
m
can you get max time from offline table now?
j
still no 😞
m
can you run any query in offline?
what about count(*)?
j
min(transactionDate) works, select * works
count(*) doens't works, max(transactionDate) doesn't works
m
Hmm, that really does not make sense
does min return
1607011097024
?
j
yes
m
did you say max works sometimes?
or it never works?
Are your JVM settings correct?
j
it works sometimes when querying both realtime and offline tables
m
For offline only, does it ever work?
j
nope
Copy code
jvmOpts: "-Xms512M -Xmx1G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:/opt/pinot/gc-pinot-server.log"
i can increase these parameters
m
can you change Xms and Xmx both to 4G?
j
yes
m
I don't think that for metadata query it matters, but I am very puzzled with what you are describing
j
restarting the servers here
well, actually, looks like i matters, ain't getting any errors anymore
m
hmm
j
max(transactionDate) works fine now
m
That should fix the time-boundary issue too
is your original problem solved now?
j
but count(*) returns odd results actually
m
odd?
j
image.png
image.png
m
it is probably coming from realtime?
or is the query from offline table?
j
it's from from both
m
try both individually
j
if i run
select count(*) from transaction_REALTIME
, i get
14226
individually they work
m
Hmm
What is the time granularity you care about? Does DAYS work?
j
i thinks it's related to
Tracing
after that i made a query with tracing enabled, everything broke
max(transactionDate) doen't works anymore
i need at most
MINUTES
granularity
I'll try to recreate the tables with
MINUTES
granularity
m
If you feel Tracing is causing an issue, I recommend filing an issue and also posting the issue on pinot-dev
s
I think the time boundary is not being recorded correctly and is probably throwing an exception while processing the EV event?
m
It is not being recoreded because for some reason offline queyr times out
j
Copy code
2020/12/04 18:38:42.604 WARN [TimeBoundaryManager] [HelixTaskExecutor-message_handle_thread] Failed to find segment with valid end time for table: transaction_OFFLINE, no time boundary generated
2020/12/04 18:38:56.598 WARN [BaseBrokerRequestHandler] [jersey-server-managed-async-executor-1] Failed to find time boundary info for hybrid table: transaction
got this at the broker after recreating the tables
m
and your offline query is timing out for max, right?
j
Two things may happen: • If i enable tracing, i get an exception and after that no other query executes, it just timeouts or throw an exception • if i don't enable tracing, the queries run fine, but the results are wrong
select count(*) from transaction_REALTIME
returns
14243
select count(*) from transaction_OFFLINE
returns
4198229
select count(*) from transaction
returns
2406
the first two results are right, but clearly the last one don't
s
Did you double-check that the time units are correct? Your column is named as Date but the units are millis. Is the offline segment populated in millis or days?
j
yes, it has an AVRO schema,
transactionDate
is a
LONG
. The offline segment is populated in
MINUTES
now
added this:
Copy code
"ingestionConfig": {
        "transformConfigs": [
            {
                "columnName": "minutesSinceEpoch",
                "transformFunction": "toEpochMinutes(transactionDate)"
            }
        ]
    },
one thing, does the
segmentPartitionConfig
has to be the same for both realtime and offline tables?
after changing the tables to
MINUTE
, tracing is not breaking the server anymore
m
Are you saying that your problem is resolved with minutes?
j
about the timeouts and exceptions, yes, but i'm still getting wrong results like above
I'm probably missing something here. do
segmentPushFrequency
and
segmentPushType
configs influence time boundary management? There's something i should care about when pushing offline segments?
m
@Jackie ^^ can you comment on time boundary with minutes and above configs?
Also count(*) seems to return incorrect answers
j
Reading the context
j
One more info: my topic has a retention of 10 days, that is, the realtime table has data from
2020-12-04 11:23:00
to
2020-11-23 18:46:00
as i created it today, and the offline segment that i pushed is a daily segment from day
2020-12-03
to
2020-12-04
j
@João Comini Can you please check the segment ZK metadata? You may use the zookeeper browser
Besides, the segment is not really partitioned (not related to the time boundary issue)
j
Copy code
{
  "id": "transaction_OFFLINE_26781300_26782739_0",
  "simpleFields": {
    "segment.crc": "2804350904",
    "segment.creation.time": "1607129889618",
    "segment.end.time": "26782739",
    "segment.index.version": "v3",
    "segment.name": "transaction_OFFLINE_26781300_26782739_0",
    "segment.offline.download.url": "<s3://dev-pinot-deep-store/controller-data/transaction/transaction_OFFLINE_26781300_26782739_0>",
    "segment.offline.push.time": "1607129929914",
    "segment.offline.refresh.time": "-9223372036854775808",
    "segment.partition.metadata": "{\"columnPartitionMap\":{\"customerUuid\":{\"numPartitions\":5,\"partitions\":[0,1,2,3,4],\"functionName\":\"Murmur\"}}}",
    "segment.start.time": "26781300",
    "segment.table.name": "transaction_OFFLINE",
    "segment.time.unit": "MINUTES",
    "segment.total.docs": "3736571",
    "segment.type": "OFFLINE"
  },
  "mapFields": {},
  "listFields": {}
}
the realtime segments looks like this:
Copy code
{
  "id": "transaction__0__0__20201205T0031Z",
  "simpleFields": {
    "segment.crc": "-1",
    "segment.creation.time": "1607128311373",
    "segment.end.time": "-1",
    "segment.flush.threshold.size": "100000",
    "segment.flush.threshold.time": null,
    "segment.index.version": null,
    "segment.name": "transaction__0__0__20201205T0031Z",
    "segment.partition.metadata": "{\"columnPartitionMap\":{\"customerUuid\":{\"numPartitions\":20,\"partitions\":[0],\"functionName\":\"Murmur\"}}}",
    "segment.realtime.download.url": null,
    "segment.realtime.endOffset": "9223372036854775807",
    "segment.realtime.numReplicas": "2",
    "segment.realtime.startOffset": "8407",
    "segment.realtime.status": "IN_PROGRESS",
    "segment.start.time": "-1",
    "segment.table.name": "transaction_REALTIME",
    "segment.time.unit": "null",
    "segment.total.docs": "-1",
    "segment.type": "REALTIME"
  },
  "mapFields": {},
  "listFields": {}
}
they're all consuming segments
j
Do you have other realtime segments? Or this is the only one?
j
I have one for each partition of the topic
Copy code
{
  "id": "transaction__10__0__20201205T0031Z",
  "simpleFields": {
    "segment.crc": "-1",
    "segment.creation.time": "1607128311373",
    "segment.end.time": "-1",
    "segment.flush.threshold.size": "100000",
    "segment.flush.threshold.time": null,
    "segment.index.version": null,
    "segment.name": "transaction__10__0__20201205T0031Z",
    "segment.partition.metadata": "{\"columnPartitionMap\":{\"customerUuid\":{\"numPartitions\":20,\"partitions\":[10],\"functionName\":\"Murmur\"}}}",
    "segment.realtime.download.url": null,
    "segment.realtime.endOffset": "9223372036854775807",
    "segment.realtime.numReplicas": "2",
    "segment.realtime.startOffset": "8509",
    "segment.realtime.status": "IN_PROGRESS",
    "segment.start.time": "-1",
    "segment.table.name": "transaction_REALTIME",
    "segment.time.unit": "null",
    "segment.total.docs": "-1",
    "segment.type": "REALTIME"
  },
  "mapFields": {},
  "listFields": {}
}
Copy code
2020/12/05 00:55:04.939 WARN [TimeBoundaryManager] [HelixTaskExecutor-message_handle_thread] Failed to find segment with valid end time for table: transaction_OFFLINE, no time boundary generated
2020/12/05 00:55:04.975 WARN [TimeBoundaryManager] [ClusterChangeHandlingThread] Failed to find segment with valid end time for table: transaction_OFFLINE, no time boundary generated
j
Do you see other warning besides these?
Also, does this happen before you push the first offline segment?
j
nope, just after i push the segment
Copy code
2020/12/05 00:58:49.971 WARN [TopStateHandoffReportStage] [HelixController-pipeline-default-pinot-(c508fcc8_DEFAULT)] Event c508fcc8_DEFAULT : Cannot confirm top state missing start time. Use the current system time as the start time.
2020/12/05 00:58:50.004 WARN [ZkBaseDataAccessor] [ZkClient-EventThread-31-pinot-zookeeper:2181] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/CURRENTSTATES/102888f0a96001c/transaction_OFFLINE=-101}
2020/12/05 00:58:50.005 WARN [ZkBaseDataAccessor] [HelixController-pipeline-task-pinot-(c508fcc8_TASK)] Fail to read record for paths: {/pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/CURRENTSTATES/102888f0a96001c/transaction_OFFLINE=-101}
2020/12/05 00:58:50.005 WARN [AbstractDataCache] [HelixController-pipeline-task-pinot-(c508fcc8_TASK)] znode is null for key: /pinot/INSTANCES/Server_pinot-server-1.pinot-server-headless.pinot.svc.cluster.local_8098/CURRENTSTATES/102888f0a96001c/transaction_OFFLINE
got these at the controller as well
j
I think I know the reason for this behavior
There is no overlapping between the realtime and offline data, thus the time boundary won't merge the result properly
Can you try
select min(transactionDate) from transaction_REALTIME
and also
select min(transactionDate) from transaction
?
In order for hybrid table to work, there has to be time overlapping between the realtime table and offline table
With the current config, the time boundary should be set at
offline end time - 1DAY
=
Dec 2nd, 2:59:00 AM
It won't query any data from the offline table, but realtime side does not have the data for this time span, and that's the reason why the result does not match
j
min(minutesSinceEpoch) from realtime -> 26769026 -> Monday, 23 November 2020 142600 min(minutesSinceEpoch) from both -> 26781777 -> Wednesday, 2 December 2020 105700
j
Can you also try
select min(transactionDate) from transaction_REALTIME where transactionDate > 26781299
? I think it should return the same result as querying both
j
yes, u're right :
26781777
j
Here the time boundary is correctly generated, but because the offline data and realtime data is not in sync, it returns unexpected result
j
what do you mean with "in sync"?
j
For the time span of the offline segment (
26781300 to 26782739
), the realtime table should have the same data already consumed
m
I thought real-time has last 7 days of data?
j
Based on this, I don't think that is the case
Copy code
select count(*) from transaction_REALTIME returns 14243
select count(*) from transaction_OFFLINE returns 4198229
Realtime side has longer time span, but way lesser records
j
yes, the realtime table has way less records
m
Oh, is the consuming offset is latest instead of earliest
j
It is
smallest
per the config
j
Yes, it is smallest.
j
Maybe the offline data is generated with a different flow?
m
Then why does RT have so less records?
j
Yes, they are generated by differents flows
We're using our development enviroment so i don't have access to production topics, but i can get access to offline batch data
j
I see, and that explains why the results mismatch
m
I didn’t get it
j
I thought that the offline table would have some kind of priority over the realtime table even with different data
m
Are we saying offline and real-time data are not really in sync at all?
j
based on segments start time and end time
j
They need to be generated from the same source data, but you may config more advanced indexing for offline segment, or do some pre-aggregation if necessary
m
@Jackie so the issue is due to data being from different sources, there is time window without data that causes the issue?
j
Besides the time boundary issue, in order to config sorted column and partition column for offline segment, the records within the input file need to be sorted or partitioned accordingly
@Mayank If there is only one offline segment, we won't query that segment due to the way of computing the time boundary
The time span for offline and real-time table should be as following in order to work:
Copy code
Offline:  |[                               ]                     |
Realtime: |                       [                             ]|
Currently the time span is as following, and causes the unexpected results:
Copy code
Offline:  |                       [        ]                     |
Realtime: |[                                                    ]|
j
right, got it
so, the problem would be resolved if I add more backfill data?
j
Yes, and if both sides are from the same data source
If you push 2 days offline data, then the result should be from the offline segment for the first day, and realtime segment for the following days
j
I still don't undestand why data needs to come from the same data source even if the schema is the same.
It's not our case in production though, but i'm curious about that.
j
If the data is from different data source, when you push a new offline segment which moves the time boundary, the result will change
The purpose of hybrid table is to maintain the historical data separately for easier management, and it should not change the query result
j
Oh, i see, it makes sense.
m
Thanks @Jackie
j
Sorry about that guys! I'm still learning 😄
Thanks for all you support!
😉 1
m
👍