https://pinot.apache.org/ logo
Join Slack
Powered by
# troubleshooting
  • e

    Elon

    08/05/2020, 10:54 PM
    Nice, is there an example?
  • e

    Elon

    08/05/2020, 11:01 PM
    Is that expected?
  • e

    Elon

    08/05/2020, 11:02 PM
    We are doing regexp_like on the last hour and combine with a separate query doing text_match as a workaround
  • k

    Kishore G

    08/05/2020, 11:03 PM
    realtime does support text_match
  • k

    Kishore G

    08/05/2020, 11:03 PM
    30 mins seems a lot @Sidd
  • e

    Elon

    08/05/2020, 11:22 PM
    Sorry, didn't mean empty results, meant the max timestamp is delayed by 30 mins, now it's ~ 1 hour
  • e

    Elon

    08/05/2020, 11:22 PM
    Copy code
    select log_timestamp_seconds, kubernetes_container, log_payload 
    from logging_avro_test
    where kubernetes_container = 'istio-proxy' and TEXT_MATCH(log_payload, 'default') 
    order by log_timestamp_seconds desc
    limit 10
  • e

    Elon

    08/05/2020, 11:22 PM
    vs
  • e

    Elon

    08/05/2020, 11:22 PM
    Copy code
    select log_timestamp_seconds, kubernetes_container, log_payload 
    from logging_avro_test 
    where kubernetes_container = 'istio-proxy' and regexp_like(log_payload, 'default') 
    order by log_timestamp_seconds desc 
    limit 10
  • e

    Elon

    08/05/2020, 11:23 PM
    The timestamp is 1 hour behind for the text match query
  • s

    Sidd

    08/05/2020, 11:23 PM
    how many realtime segments are there that are having text index and how many columns with text index per segment
  • s

    Sidd

    08/05/2020, 11:24 PM
    the lag in timestamp is due to text index refresh. realtime text index reads the index snapshot and a background thread refreshes the snapshot periodically
  • s

    Sidd

    08/05/2020, 11:24 PM
    the refresh threshold is configurable but currently it is hardcoded to 10ms
  • s

    Sidd

    08/05/2020, 11:26 PM
    with refresh happening every 10ms, ideally you should see increasing number of hits/matches
  • e

    Elon

    08/05/2020, 11:27 PM
    Yep, but the delay in regexp_like vs text_match is more than 10ms - it's about an hour
  • e

    Elon

    08/05/2020, 11:28 PM
    36 segments, 2 columns for text index
  • e

    Elon

    08/05/2020, 11:28 PM
    realtime ^^
  • s

    Sidd

    08/05/2020, 11:30 PM
    yeah so currently the background refresh thread has a queue of all realtime segments and their text indexes. So essentially it has 72 index snapshots to refresh. Every time it wakes up, it picks the reatime segment at the head of the queue, refreshes its (in this case 2) text indexes and adds it to the back of queue. So, this needs to be changed to do refresh sooner than later
  • e

    Elon

    08/05/2020, 11:34 PM
    Oh nice! Can you point me to the code? This is great stuff to learn about
  • s

    Sidd

    08/05/2020, 11:35 PM
    https://github.com/apache/incubator-pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/realtime/impl/invertedindex/RealtimeLuceneIndexReaderRefreshThread.java
  • e

    Elon

    08/05/2020, 11:40 PM
    Thanks!
  • e

    Elon

    08/07/2020, 10:49 PM
    FYI, when we start up the controller we notice about 385k worth of exceptions from Reflections.setScan in BeanConfig. Are these just because the classes are not in the class path? (i.e. we don't use all the plugins):
  • e

    Elon

    08/07/2020, 10:49 PM
    Copy code
    org.reflections.ReflectionsException: could not get type for name org.osgi.framework.BundleListener
            at org.reflections.ReflectionUtils.forName(ReflectionUtils.java:390) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
            at org.reflections.Reflections.expandSuperTypes(Reflections.java:381) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
            at org.reflections.Reflections.<init>(Reflections.java:126) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
            at io.swagger.jaxrs.config.BeanConfig.classes(BeanConfig.java:276) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
            at io.swagger.jaxrs.config.BeanConfig.scanAndRead(BeanConfig.java:240) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
            at io.swagger.jaxrs.config.BeanConfig.setScan(BeanConfig.java:221) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
            at org.apache.pinot.controller.api.ControllerAdminApiApplication.setupSwagger(ControllerAdminApiApplication.java:126) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
            at org.apache.pinot.controller.api.ControllerAdminApiApplication.start(ControllerAdminApiApplication.java:87) ~[pinot-all-0.4.0-SNAPSHOT-jar-with-dependencies.jar:0.4.0-SNAPSHOT-9e7da0349baa23dd02987a3142818dbc6a144fbe]
  • e

    Elon

    08/07/2020, 10:56 PM
    Are these acceptable or does this indicate some misconfiguration or other issue?
  • x

    Xiang Fu

    08/07/2020, 11:22 PM
    I’m also seeing those swagger related exceptions, they raises when the first time you hit controller ui
  • e

    Elon

    08/08/2020, 4:24 AM
    Yep
  • e

    Elon

    08/08/2020, 4:26 AM
    Doesn’t seem to cause any failure though. But disabling liveness check for controller fixed k8s issue where controllers were crashing.
  • x

    Xiang Fu

    08/08/2020, 4:48 AM
    oh, do you mean that the liveness check doesn’t pass for controller?
  • e

    Elon

    08/10/2020, 5:59 AM
    Yep, under heavy cpu load. There were no gc issues, just latency on the liveness check.
  • d

    Dan Hill

    08/10/2020, 4:19 PM
    If I try to fix historic data with Pinot, at what point would the new data start serving? E.g. after the whole batch ingestion job completes? Or is it after each segment gets uploaded? I'm curious if I can get into a spot where inconsistent data is served if I use a single batch ingestion job. Does it matter depending on ingestion route (standalone, spark, hive)?
1...125126127...166Latest