murat migdisoglu
10/14/2020, 9:49 PMElon
10/15/2020, 7:04 PMTanmay Movva
10/20/2020, 5:35 PMElon
10/27/2020, 11:36 PMElon
10/29/2020, 8:00 PMYash Agarwal
11/04/2020, 5:45 AMlâm nguyễn hoàng
11/04/2020, 7:42 PMNoah Prince
11/08/2020, 5:56 PMNoah Prince
11/09/2020, 3:34 PMJackie
11/10/2020, 6:31 PMBOOLEAN
is stored and handled as STRING
Pradeep
11/11/2020, 6:41 PMElon
11/12/2020, 12:54 AMXiang Fu
<https://github.com/apache/incubator-pinot/blob/master/docker/images/pinot/Dockerfile#L32>
Noah Prince
11/13/2020, 3:42 PMKen Krugler
11/13/2020, 11:13 PMcurl -v "<http://localhost:9000/segments/mytable?type=OFFLINE>" -XDELETE
, which seemed to work, then re-ran my import job via bin/pinot-admin.sh LaunchDataIngestionJob -jobSpecFile <spec file>
, which also seemed to work. But when I run a query, I get
ProcessingException(errorCode: 450, message: InternalError: java.net.SocketException: Host is down(connect failed)
Elon
11/16/2020, 2:41 AMTanmay Movva
11/16/2020, 8:08 AMSize of segment directory = 239.4mb
Number of documents = 3529197
retention = 30 days
ingestion rate = 1000
numPartitions = 1
Table Replicas = 1
I am running this by building pinot from source and the results are quite surprising.Elon
11/19/2020, 10:49 PMElon
11/20/2020, 7:17 AM"error": "Failed to drop instance Broker_pinot-broker-3.pinot-broker-headless.pinot.svc.cluster.local_8099 - Instance Broker_pinot-broker-3.pinot-broker-headless.pinot.svc.cluster.local_8099 exists in ideal state for brokerResource"
Elon
11/21/2020, 2:26 AMJoão Comini
11/30/2020, 9:08 PMRealtimeProvisioningHelper
, may you help me?
These are my doubts:
• Why do we need a numHours
parameter? What's the impact of having a consuming segment for a certain amount of time (pros/cons)?
• And what does Mapped
means in the Memory used per host
result? Is it about the segments in disk?
This is the results that I got:
RealtimeProvisioningHelper -tableConfigFile /tmp/transaction-table.json -numPartitions 20 -pushFrequency null -numHosts 4,8,12,16,20 -numHours 24,48,72,96 -sampleCompletedSegmentDir /tmp/out/transaction_1606528528_1606614928_0 -ingestionRate 4 -maxUsableHostMemory 16G -retentionHours 768
Note:
* Table retention and push frequency ignored for determining retentionHours since it is specified in command
* See <https://docs.pinot.apache.org/operators/operating-pinot/tuning/realtime>
Memory used per host (Active/Mapped)
numHosts --> 4 |8 |12 |16 |20 |
numHours
24 --------> 6.8G/71.9G |3.4G/35.95G |2.72G/28.76G |2.04G/21.57G |1.36G/14.38G |
48 --------> 7.33G/72.62G |3.66G/36.31G |2.93G/29.05G |2.2G/21.79G |1.47G/14.52G |
72 --------> 8.01G/73.11G |4.01G/36.55G |3.2G/29.24G |2.4G/21.93G |1.6G/14.62G |
96 --------> 8.39G/74.08G |4.2G/37.04G |3.36G/29.63G |2.52G/22.22G |1.68G/14.82G |
Optimal segment size
numHosts --> 4 |8 |12 |16 |20 |
numHours
24 --------> 20.02M |20.02M |20.02M |20.02M |20.02M |
48 --------> 40.04M |40.04M |40.04M |40.04M |40.04M |
72 --------> 60.05M |60.05M |60.05M |60.05M |60.05M |
96 --------> 80.07M |80.07M |80.07M |80.07M |80.07M |
Consuming memory
numHosts --> 4 |8 |12 |16 |20 |
numHours
24 --------> 756.05M |378.02M |302.42M |226.81M |151.21M |
48 --------> 1.47G |750.11M |600.09M |450.07M |300.04M |
72 --------> 2.15G |1.07G |878.76M |659.07M |439.38M |
96 --------> 2.92G |1.46G |1.17G |896.57M |597.71M |
Total number of segments queried per host (for all partitions)
numHosts --> 4 |8 |12 |16 |20 |
numHours
24 --------> 320 |160 |128 |96 |64 |
48 --------> 160 |80 |64 |48 |32 |
72 --------> 110 |55 |44 |33 |22 |
96 --------> 80 |40 |32 |24 |16 |
Tanmay Movva
12/01/2020, 2:37 AMReload All Segments
to apply the indexes. When I try to check the Reload Status
, I get this error on the UI
Table type : REALTIME not yet supported.
Tanmay Movva
12/01/2020, 4:04 AMYupeng Fu
12/01/2020, 9:25 PMSELECT hour_start_timestamp_utc FROM downtime WHERE (secondsSinceEpoch > 1606247126) ORDER BY secondsSinceEpoch DESC, hour_start_timestamp_utc DESC LIMIT 1
it scans the past 1 week of data but return only 1 record.
since the table is large, it ends up scanning about 100 million records per query, and takes seconds
query output is like
{
"selectionResults": {
"columns": [
"hour_start_timestamp_utc"
],
"results": [
[
"2020-12-01 09:00:00"
]
]
},
"exceptions": [],
"numServersQueried": 9,
"numServersResponded": 9,
"numSegmentsQueried": 1059,
"numSegmentsProcessed": 1059,
"numSegmentsMatched": 18,
"numConsumingSegmentsQueried": 0,
"numDocsScanned": 142101504,
"numEntriesScannedInFilter": 0,
"numEntriesScannedPostFilter": 284203008,
"numGroupsLimitReached": false,
"totalDocs": 7374174837,
"timeUsedMs": 3522,
"segmentStatistics": [],
"traceInfo": {},
"minConsumingFreshnessTimeMs": 0
}
Pradeep
12/02/2020, 1:46 AMTanmay Movva
12/02/2020, 4:31 AMERROR [LLRealtimeSegmentDataManager_spanEventView__0__15__20201201T0448Z] [spanEventView__0__15__20201201T0448Z] Could not build segment
java.lang.IllegalStateException: Cannot create output dir: /var/pinot/server/data/index/spanEventView_REALTIME/_tmp/tmp-spanEventView__0__15__20201201T0448Z-160688315943
because of which pinot is not able to build segments/ingest data. How to debug this?Neer Shay
12/02/2020, 1:28 PMJinwei Zhu
12/02/2020, 8:37 PMTaran Rishit
12/03/2020, 6:31 AMKen Krugler
12/03/2020, 2:58 PM