Evan Galpin
12/18/2024, 4:44 PMJackie
12/18/2024, 7:19 PMEvan Galpin
12/18/2024, 10:48 PMPOST /LogicalTable/myLogicalTable
{
"REALTIME": ["myTable", "myOtherTable_REALTIME"],
"OFFLINE": [],
"HYBRID": ["myHybrid"],
"LOGICAL": []
}
We can do some validation to verify that myHybrid
has both a realtime and offline component etc before finalizing the creation of myLogicalTable
Evan Galpin
12/19/2024, 12:18 AMRajat Venkatesh
12/19/2024, 4:12 AMEvan Galpin
12/19/2024, 6:33 AMRajat Venkatesh
12/19/2024, 7:53 AMThis is not a good constraint because it greatly limits the use casesFrom a comment in the doc, what are use cases for physical tables to be part of multiple logical tables ?
Rajat Venkatesh
12/19/2024, 7:54 AMEvan Galpin
12/20/2024, 4:56 PMRajat Venkatesh
01/06/2025, 3:18 PMRajat Venkatesh
01/06/2025, 3:23 PMList<ServerRequest> offlineTableRequests, List<ServerRequest> realtimeTableRequests
instead of List<ServerRequest> tableRequests
? Why is it more correct to keep the offline & table requests separate in classes like SingleConnectionBrokerRequestHandler ?
I tried this approach in https://github.com/apache/pinot/compare/master...vrajat:rv-routing-hack-2Jackie
01/07/2025, 12:16 AMJackie
01/07/2025, 12:18 AMEvan Galpin
01/07/2025, 5:25 PMRajat Venkatesh
01/30/2025, 4:54 AMRajat Venkatesh
01/30/2025, 4:56 AMRajat Venkatesh
01/30/2025, 4:57 AM❯ curl <http://localhost:9000/logicalTables/nation_l>
{
"tableName" : "nation_l",
"brokerTenant" : "DEFAULT_TENANT",
"physicalTableNames" : [ "nation1", "nation2" ]
}%
❯ curl --request POST <http://localhost:8000/query/sql> --data '{"sql":"SELECT COUNT(*) FROM nation1"}' | jq '.resultTable, .tablesQueried'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1238 100 1200 100 38 119k 3883 --:--:-- --:--:-- --:--:-- 134k
{
"dataSchema": {
"columnNames": [
"count(*)"
],
"columnDataTypes": [
"LONG"
]
},
"rows": [
[
100
]
]
}
[
"nation1"
]
❯ curl --request POST <http://localhost:8000/query/sql> --data '{"sql":"SELECT COUNT(*) FROM nation2"}' | jq '.resultTable, .tablesQueried'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1239 100 1201 100 38 160k 5194 --:--:-- --:--:-- --:--:-- 172k
{
"dataSchema": {
"columnNames": [
"count(*)"
],
"columnDataTypes": [
"LONG"
]
},
"rows": [
[
100
]
]
}
[
"nation2"
]
❯ curl --request POST <http://localhost:8000/query/sql> --data '{"sql":"SELECT COUNT(*) FROM nation_l"}' | jq '.resultTable, .tablesQueried'
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1241 100 1202 100 39 189k 6289 --:--:-- --:--:-- --:--:-- 201k
{
"dataSchema": {
"columnNames": [
"count(*)"
],
"columnDataTypes": [
"LONG"
]
},
"rows": [
[
200
]
]
}
[
"nation_l"
]
Rajat Venkatesh
01/30/2025, 5:01 AMServerQueryExecutorV1Impl
should pick up TableDataManagers
if a list consists of segments from multiple tables. Right now it uses the table in the query request. With logical tables, the query request will contain the logical table name. Changing the data structure will mean the protocol will change. Previously I had opened server channels per table. So I could send different query requests for each physical table.
Ref: https://github.com/apache/pinot/blob/bc831733c5b7be20325e51bec52a2520fef2f2c6/pino[…]apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.javaEvan Galpin
04/08/2025, 4:13 PMRajat Venkatesh
04/14/2025, 3:44 AMTableRouteProvider
interface to collect and calculate table routes.
• Use a TableRouteInfo
to store the route AND generate server requests.
• Send a map of (Table, SegmentList) to the servers rather than just a list of segments. That change is in requests.thrift
.
Rest of the changes support these main ideas.
We've started to break down that large POC PR and merging chunks to master. The current ongoing ones are:
• https://github.com/apache/pinot/pull/15388
• https://github.com/apache/pinot/pull/15523
• https://github.com/apache/pinot/pull/15515Rajat Venkatesh
04/28/2025, 11:00 AMRajat Venkatesh
04/28/2025, 11:07 AMBaseSingleStageBrokerRequestHandler
requires many config parameters that are not & should not be part of logical table config. At the same time, it is too error prone to choose pseudo-random schema, realtime & offline table config from the list like the first listed one. I suggest to explicitly specify the schema, offline table & realtime tables to pick up the config from. for e.g.
{
"schema": "<schemaName>",
"offlineTableConfig": "<offlineTableName_OFFLINE>",
"realtimeTableConfig": "<realtimeTableName_REALTIME>"
}
The table name should be part of the list and schema should either match the name of the logical table OR the name of the one of the physical tables.Jackie
04/29/2025, 10:14 PMQueryConfig
and QuotaConfig
, anything else?
I think we can support adding a reference table to use its schema and table config, but we also need to support having its own config for logical table. This allows us to have different quota, query rewrite and columns (useful for column access control) from physical table.Abhishek Bafna
05/02/2025, 6:18 AMRajat Venkatesh
05/02/2025, 11:41 AMRoutingConfig
used in BrokerRoutingManager
used to build the routing table as that is not part of query planning.
Note that all of these some of these are used from both offline and realtime tables.
Ref: https://docs.google.com/document/d/1iS0wtG_V2-W9sQsKkziMjTljT5qKAH_hhsksXr1Tvys/edit?tab=t.0#heading=h.bi1j2rhyt991Jackie
05/02/2025, 5:31 PMRajat Venkatesh
05/05/2025, 4:54 AMRajat Venkatesh
05/05/2025, 4:55 AMAbhishek Bafna
05/05/2025, 10:14 AMmodifications
map which could be use to replace configs within the table config. e.g.
POST /tables/my_table/clone
Content-Type: application/json
{
"newTableName": "my_new_table",
"modifications": {
"segmentsConfig": {
"retentionTimeUnit": "DAYS",
"retentionTimeValue": "10"
},
"tableIndexConfig": {
"loadMode": "MMAP"
}
}
}
What you folks think about it?
PS: This is not a P0/P1 at the moment, but something to keep in mind and develop to ease the onboarding/migration. These APIs will find other use cases too over the time.
cc: @Rajat Venkatesh @JackieShaurya Chaturvedi
06/26/2025, 1:59 AM