Hello, I’m using an hybrid table and I’m having an...
# troubleshooting
e
Hello, I’m using an hybrid table and I’m having an OFFLINE segment containing 1k docs created July 1st 2022, 20733, and I have a REALTIME segment containing 0 docs created July 2nd 2022, 11937 But when I query my table
Copy code
SELECT * FROM datasource_6298afcc7527000300387fdf
only my consuming segment is queried and the query result 0 docs. And if I do
Copy code
SELECT * FROM datasource_6298afcc7527000300387fdf_OFFLINE
I get 1k docs It seems that the OFFLINE table/segment is ignored (like if the table wasn’t hybrid anymore?), do you have any idea on how I can troubleshoot this? Or any tips to fix it? I cannot see something useful in logs
Copy code
2022/07/06 08:21:52.843 INFO [QueryScheduler] [pqr-1] Processed requestId=16284217,table=datasource_6298afcc7527000300387fdf_REALTIME,segments(queried/processed/matched/consuming)=1/0/0/1,schedulerWaitMs=0,reqDeserMs=0,totalExecMs=0,resSerMs=0,totalTimeMs=0,minConsumingFreshnessMs=9223372036854775807,broker=Broker_<ip>_8099,numDocsScanned=0,scanInFilter=0,scanPostFilter=0,sched=FCFS,threadCpuTimeNs(total/thread/sysActivity/resSer)=0/0/0/0
2022/07/06 08:21:52.844 INFO [BaseBrokerRequestHandler] [jersey-server-managed-async-executor-13771] requestId=16284217,table=datasource_6298afcc7527000300387fdf,timeMs=2,docs=0/0,entries=0/0,segments(queried/processed/matched/consuming/unavailable):1/0/0/1/0,consumingFreshnessTimeMs=9223372036854775807,servers=1/1,groupLimitReached=false,brokerReduceTimeMs=0,exceptions=0,serverStats=(Server=SubmitDelayMs,ResponseDelayMs,ResponseSize,DeserializationTimeMs,RequestSentDelayMs);<ip>_R=0,1,927,0,-1,offlineThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,realtimeThreadCpuTimeNs(total/thread/sysActivity/resSer):0/0/0/0,query=SELECT * FROM datasource_6298afcc7527000300387fdf
Thank you
l
whats the retention of REALTIME and OFFLINE table? Can you reduce the overlap and check once? https://docs.pinot.apache.org/basics/components/broker
e
REALTIME:
Copy code
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "3",
OFFLINE:
Copy code
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "90",
My OFFLINE segment
end.time
is
1656518367582
, if I do
Copy code
SELECT * FROM datasource_6298afcc7527000300387fdf WHERE "timestamp" < 1656518367582
I get 0 docs too (with _OFFLINE suffix I get results, and the
creation.time
of my REALTIME segment is
1656760777780
)
l
i don't see any issue. will let other devs with context to respond. cc: @Mayank
e
thank you
n
It's likely the time boundary calculation. At the bottom of the same link shared above https://docs.pinot.apache.org/basics/components/broker
e
probably but how I can fix this? Pinot should query my OFFLINE segment since it doesn’t overlap with the REALTIME one
n
what is
select min(time), max(time) from table_OFFLINE
and same for realtime? what’s the segmentPushType set in OFFLINE table?
e
OFFLINE min: 1656515667518 OFFLINE max: 1656518367582 REALTIME min: Infinity REALTIME max: -Infinity (since it’s empty) segmentPushType is
APPEND
in both tables
Any idea? May be related to the REALTIME segment being empty?
cc @Neha Pawar @Mayank
m
Yeah currently for time boundary to work time column has to be non null
e
so you’re saying that it won’t work since the REALTIME table is empty, and if I push some data to the REALTIME table I won’t have the issue?
Even when pushing a doc into my REALTIME table I don’t get all my docs REALTIME min: 1657724264475 REALTIME max: 1657724264475 since OFFLINE max is 1656518367582, time boundary shouldn’t be an issue @Neha Pawar @Mayank ty
m
Are these values from running the query (select max(time) from …) or from data? One possibility is that your time units don’t match
e
from the select max(), min() query
every timestamp is pushed in ms
m
What does count(*) return, and what does it return when called for OFFLINE and REALTIME tables individually? Also, what’s the time boundary value (check swagger)
e
Copy code
select count(*) FROM datasource_6298afcc7527000300387fdf

1
Copy code
select count(*) FROM datasource_6298afcc7527000300387fdf_REALTIME

1
Copy code
select count(*) FROM datasource_6298afcc7527000300387fdf_OFFLINE

1014
With which route can I get the time boundary value?
m
There is an api in swagger to get time boundary iirc
e
I see
Copy code
<http://10.1.1.12:9000/segments/datasource_6298afcc7527000300387fdf/select?excludeOverlapping=true>
which returns
Copy code
[
  {
    "OFFLINE": [
      "datasource_6298afcc7527000300387fdf_1656515667518_1656518367582_0"
    ]
  },
  {
    "REALTIME": [
      "datasource_6298afcc7527000300387fdf__0__18__20220714T1520Z",
      "datasource_6298afcc7527000300387fdf__0__19__20220715T0244Z",
      "datasource_6298afcc7527000300387fdf__0__20__20220715T1055Z",
      "datasource_6298afcc7527000300387fdf__0__21__20220715T1703Z",
      "datasource_6298afcc7527000300387fdf__0__22__20220716T0013Z",
      "datasource_6298afcc7527000300387fdf__0__23__20220718T0813Z"
    ]
  }
]
(dunno if it’s the route you were expecting)
Any idea? Thank you for the help