Hi, we have a hybrid table that is not serving rea...
# troubleshooting
e
Hi, we have a hybrid table that is not serving realtime data (unless _REALTIME is explicitly used) and noticed that the time boundary is given in ms but the time column is a datetime field with input in ms from another field (base column in kafka) and output granularity in seconds. Anyone else experience this? Is there a way to set the time boundary?
m
Yes, likely the root cause is incorrect time boundary.
e
Nice! Is there a way we can reset it?
m
Incorrect time boundary is because of incorrect setup/config, fixing that should solve the problem. Afaik, there isn't a way to explicitly set time boundary
e
Ah ok, checking - you mean table config or server config? Or both? Checking table configs and comparing
for offline vs realtime
m
Table config
They should have the same unit
Also, there might be a min granularity (seconds?) needed for hybrid tables. @Jackie?
e
The segment configs use the same column - it's a datetime field
both realtime and offline:
Copy code
"OFFLINE": {
    "tableName": "enriched_orders_OFFLINE",
    "tableType": "OFFLINE",
    "segmentsConfig": {
      "schemaName": "enriched_orders",
      "segmentPushFrequency": "daily",
      "segmentPushType": "APPEND",
      "timeColumnName": "order_timestamp_seconds",
      "retentionTimeUnit": "DAYS",
      "retentionTimeValue": "365",
      "replication": "3",
      "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy"
    },
...
Copy code
"REALTIME": {
    "tableName": "enriched_orders_REALTIME",
    "tableType": "REALTIME",
    "segmentsConfig": {
      "schemaName": "enriched_orders",
      "segmentPushFrequency": "daily",
      "segmentPushType": "APPEND",
      "timeColumnName": "order_timestamp_seconds",
      "retentionTimeUnit": "DAYS",
      "retentionTimeValue": "100",
      "segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
      "replicasPerPartition": "3"
    },
...
and here is the fieldspec from the schema:
Copy code
"dateTimeFieldSpecs": [
    {
      "name": "order_timestamp_seconds",
      "dataType": "LONG",
      "defaultNullValue": 0,
      "transformFunction": "toEpochSeconds(order_timestamp_ms)",
      "format": "1:MILLISECONDS:EPOCH",
      "granularity": "1:SECONDS"
    }
note that the source column
order_timestamp_ms
is not included in the table.
m
Hmm, why do you see time boundary in ms?
e
Not sure:) I just did the curl from the broker
and the value if converted to seconds is expected: it's the newest offline timestamp - 24 hours
m
Can you select min time from RT and max time for OFFLINE?
e
there are realtime records newer than that
Yep, already did:
m
what's the unit there? seconds or ms?
e
Copy code
--offline:  min: 1601424056 max: 1612051039
--realtime: min: 1607816472 max: 1612317600 
--time boundary in ms: 1611964639000
m
Interesting
e
the unit for the time column should be seconds - that's what it is when we select
maybe the code expects that granularity unit == format unit?
n
Copy code
"transformFunction": "toEpochSeconds(order_timestamp_ms)",
      "format": "1:MILLISECONDS:EPOCH",
      "granularity": "1:SECONDS"
this is incorrect ^. toEpochSeconds will convert it to millis/1000, but the format here says MILLISECONDS
e
but granularity is seconds - it seems correct when we do selects on the datetime field (both realtime and offline)
I thought format should be what the input is and granularity is what the output is, right?
n
not really, they both represent what the final value in the time column should be
granularity is not used anywhere atm
in your case, millis will get converted using function millis/1000, so your format should be 1SECONDSEPOCH
e
Oh wow, I didn't know that, thanks:) I tried using millis/1000 and got an error, but when I use the epoch* functions it works
n
if you want to keep it in millis, but simply round to nearest seconds, use “round(order_timestamp, 1000)”
👍 1
if you do this ^, then your current datetimespec is correct
e
thx!
I think they want it in seconds - so should I just change format to
1:SECONDS:EPOCH
?
n
yeah
is this a new table? i’m wondering how any segments got created. the segment creation shoudve failed due to inconsistent value and spec
e
No, it has tons of offline and realtime segments
and they are correctly named as well
not sure if that matters
so right now we can consider granularity to be meaningless? i.e. unused anywhere?
n
yes
😮 1
e
thanks for clearing that up @Neha Pawar!
If I change the table spec now would it cause any issues?
n
not quite sure about that..
e
ok, we'll try it in staging 🙂 And let you know. Thanks again!
👍 1
n
how did that go?
e
We are still just doing basic testing of pinot 0.6.0 - probably today/tomorrow we will try it. I'll let you know.
Hi @Neha Pawar, just wanted to give you an update. I updated the schema and it returns that the format of the time column did change but no change in behavior.
I disabled the table and am reloading all segments to see if that works.
The reloading did not work, neither did restarting the servers but restarting the brokers did! Now we see the data serving as it should
Thanks for all the help @Neha Pawar! Maybe this should go in a doc somewhere? lmk where I can take a stab at it.
n
there’s a schema evolution page, maybe add the notes from your observations there as tips?
👍 1
e
Sure, I'll look for it