Hi Team, While executing a query on the query con...
# troubleshooting
a
Hi Team, While executing a query on the query console -
select * from packages_test limit 10
, it's giving different results on different execution i.e. 1 record and 10 records alternatively. Could anyone please help with this issue? Thanks
n
Hi @Awadesh Kumar: if you have replication enabled on your table, the query may get routed to a different replica each time. Ideally, all replicas should be consistent and you should see the same query results. But there may be a difference of a few records now and then, depending on how fast each server is consuming.
What is your replication factor for your table? What source are you consuming from?
a
We are consuming from Kafka and replication is 2
Copy code
"segmentsConfig": {
      "schemaName": "packages_test",
      "replication": "2",
      "timeColumnName": "updated_at",
      "replicasPerPartition": "2"
    }
Any possible reason for replica inconsistency?
n
hmm kafka is typically more stable consumption-wise. do you have upserts enabled? also, please double check that IdealState (IS) and External view (EV) match for this table.
a
yes, upsert is enabled.
Also, both IS and EV are the same for this table except for one field -
"BUCKET_SIZE": "0"
is present in EV but not in IS. not sure if it can cause any issues.
n
I don't think BUCKET_SIZE is going to cause the issue. which version of pinot are you using? I know that there were some upsert related issues that were recently fixed, that could lead to inconsistent query results. can you try your query with `option(skipUpsert=true)`s and see if you get consistent results?
a
Getting the same issue with this query as well -
select * from packages_test limit 10 option(skipUpsert=true)
we are using pinot version - 0.10.0
n
oh that's weird. are both of the replicas ingesting data ? or is one of them stuck ?
a
@Navina we have two server pods that are up and running. How can we check replicas ingesting/stuck?
n
you can check the graphs to see if the metric for document count is increasing similarly on both replicas. I think there is also a metric called LLC_PARTITION_CONSUMING. If that metric is 0 , then ingestion may not be happening as expected
m
select * without any predicates or order by will return the first N rows it reads, which may not be deterministic. If you want consistent results you will need to either have some predicates or apply ordering. Your current query is asking for any 10 rows in the table.
n
@Mayank even if the row contents are not same, the result should have at least 10 rows, right? if one replica can return it, then shouldn't the other return any 10 rows as well?
m
Yeah, both should return 10 rows (assuming there are at least 10 rows).
a
Hi @Mayank @Navina After some time it started working fine and we started getting 10 rows. Can replication lag cause this issue?
n
pinot itself doesn't have a replication layer. The replicas independently consume data from the source topic. so, you would have to look at the replica server logs to see what happened during the time the query returned different results. Again, the document count metric per server can tell you the time-window to look into.
👍 1
a
Hi @Navina, Again facing a strange issue while running the query on the console-
select * from test_table where id=454715266949122
gives no records found
select id, reference_number from test_table where id=454715266949122
gives the data Can you please help, how to debug this?
n
@Awadesh Kumar assuming this is still running with upserts, can you try these queries with
option(skipUpsert=true)
and see if the behavior remains the same?
a
it didn't work with
option(skipUpsert=true)
as well. After a few hours, we started getting results... not able to figure out the root cause.
n
Can you elaborate on what you mean by "it didn't work"? If you can provide the steps to reproduce I can try on my end. So far, we haven't encountered such an issue where the data is available for query after hours of ingest