Elon
06/09/2020, 12:39 AMElon
06/09/2020, 12:41 AMSubbu Subramaniam
06/12/2020, 5:37 PMloadMode
setting, so if you. have set that to HEAP
, I suggest you move it to MMAP
and restart your servers. Realtime servers have a setting pinot.server.instance.realtime.alloc.offheap
. Setting this to true
makes sure that we use as little heap as possible during consumption, and memory-map files for the rest. If you do not want memory map (and want to use direct memory instead), you can set pinot.server.instance.realtime.alloc.offheap.direct
to true
but I don't think you have set this config. If you have, then please remove it.Pradeep
06/12/2020, 10:54 PM~/apache-pinot-incubating-0.4.0-bin$ ls plugins/pinot-file-system/
pinot-adls pinot-gcs pinot-hdfs
Pradeep
06/14/2020, 5:57 PM<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
- <version>4.5.3</version>
+ <version>4.5.9</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpcore</artifactId>
- <version>4.4.6</version>
+ <version>4.4.9</version>
</dependency>
Pradeep
06/16/2020, 7:18 PMPradeep
06/24/2020, 9:44 PM"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant",
"tagOverrideConfig": {
"realtimeConsuming": "DefaultTenant_REALTIME",
"realtimeCompleted": "DefaultTenant_OFFLINE"
}
},
But the restapi /tables/{tableName} only shows
"tenants": {
"broker": "DefaultTenant",
"server": "DefaultTenant"
},
Wondering if this config is coupled with something else?Neha Pawar
Elon
06/29/2020, 11:43 PMzk.connection.timeout
and helixmanager.waitForConnectedTimeout
is possible to do in a config or just set with `-D`zk.connection.timeout=...``Damiano
07/01/2020, 1:16 PMpinot-admin.sh StartZookeeper -zkPort 2181
so the first question is: what about if the server with Zookeeper goes down? Can we share two or more zookeeper instances over multiple servers? Supposing we can create multiple zookeeper instances does every machine should also have its own Controller, Broker and Server components? Because having more than one broker/controller on the same machine does not have much sense to me, maybe for very high traffic? Could someone explain it a little bit more? Thanks.Kishore G
Kishore G
Cinto Sunny
07/01/2020, 5:54 PM./bin/quick-start-batch.sh
It is throwing an error:
***** Offline quickstart setup complete *****
Total number of documents in the table
Query : select count(*) from baseballStats limit 0
Executing command: PostQuery -brokerHost 127.0.0.1 -brokerPort 8000 -queryType pql -query select count(*) from baseballStats limit 0
Exception in thread "main" java.lang.NullPointerException
at org.apache.pinot.tools.Quickstart.prettyPrintResponse(Quickstart.java:75)
at org.apache.pinot.tools.Quickstart.execute(Quickstart.java:174)
at org.apache.pinot.tools.Quickstart.main(Quickstart.java:207)
This is how the UI looks likeDan Hill
07/01/2020, 10:41 PMmetrics
is slow to query but the individual metrics_OFFLINE
and metrics_REALTIME
are quick to query separately. Any ideas?
select utc_date, sum(impressions) from metrics_OFFLINE where utc_date >= 1591142400000 and utc_date < 1593648000000 group by utc_date order by utc_date ASC limit 1831
This returns pretty fast (200ms) over a lot of 400mil rows.
If I switch to metrics_REALTIME
, it's also fast and returns zero rows.
select utc_date, sum(impressions) from metrics_REALTIME where utc_date >= 1591142400000 and utc_date < 1593648000000 group by utc_date order by utc_date ASC limit 1831
However, if I query metrics
, it's very slow.
select utc_date, sum(impressions) from metrics where utc_date >= 1591142400000 and utc_date < 1593648000000 group by utc_date order by utc_date ASC limit 1831
Mayank
1. Backward compatible schema changes are safe (e.g. adding a new column, safe type changes int -> long, etc). Backward incompatible changes such as deleting a column, or changing to incompatible data type are not allowed.
2. At LinkedIn, we usually ensure that a change is done in phases so as to not break a deployment. For example, you could deploy the change off by default to all components, and then turn them on in a way that does not break. Would need a bit more info on your specific change to comment on how to achieve that.
3. We have internal tools at LinkedIn, but would be great to have them in the open source as well. One project in our roadmap that is in this direction is to build a performance validation framework.
4. There are different ways we evaluate changes. For changes that are limited to a single node you can use PerfBenchmarkRunner along with QueryRunner (to run a specific qps) on two different setups. For a change that impacts scatter/gather and needs entire cluster we have tools internally to do so. But hoping that the project mentioned above can evolve into something that the community can also use.
Somanshu Jindal
07/06/2020, 10:29 AMKishore G
controller
- min 2 (for fault tolerance) ideal 3
- 4 core, 4 gb (disk space should be sufficient for logs and temp segments) - 100 GB
Broker
- Min 2, add more nodes as needed as later to scale
- 4 core, 4gb (disk space should be sufficient for logs) - 10GB min
Zookeeper (cluster mode),
- min 3 (this is where the entire cluster state is stored)
- 4 gb, 4 core, disk space sufficient to store logs, transaction logs and snapshots. If you can afford, go with ssd if not disk will be fine. 100GB
Pinot server
- Min 2 (this is where the segments will be stored), you can add more servers anytime without downtime
- 8 core, 16 gb, SSD boxes (pick any size that works for your use case (500 gb to 2TB or even more).
- If you are running on cloud, you can use mounted ssd instead of local ssd
Elon
07/06/2020, 11:40 PMMayank
Somanshu Jindal
07/07/2020, 9:38 AM{
"tableName": "transcript",
"tableType": "REALTIME",
"segmentsConfig": {
"timeColumnName": "timestamp",
"timeType": "MILLISECONDS",
"schemaName": "transcript",
"replicasPerPartition": "1"
},
"tenants": {},
"tableIndexConfig": {
"loadMode": "MMAP",
"streamConfigs": {
"streamType": "kafka",
"stream.kafka.consumer.type": "lowlevel",
"stream.kafka.topic.name": "transcript-topic",
"stream.kafka.decoder.class.name": "org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder",
"stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.broker.list": "localhost:9876",
"realtime.segment.flush.threshold.time": "5m",
"realtime.segment.flush.threshold.size": "5",
"stream.kafka.consumer.prop.auto.offset.reset": "largest"
}
},
"metadata": {
"customConfigs": {}
}
}
schema
{
"schemaName": "transcript",
"dimensionFieldSpecs": [
{
"name": "studentID",
"dataType": "INT"
},
{
"name": "firstName",
"dataType": "STRING"
},
{
"name": "lastName",
"dataType": "STRING"
},
{
"name": "gender",
"dataType": "STRING"
},
{
"name": "subject",
"dataType": "STRING"
}
],
"metricFieldSpecs": [
{
"name": "score",
"dataType": "FLOAT"
}
],
"dateTimeFieldSpecs": [
{
"name": "timestamp",
"dataType": "LONG",
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
}
]
}
Damiano
07/07/2020, 6:07 PMElon
07/08/2020, 12:14 AMPradeep
07/08/2020, 12:23 AMREGEXP_LIKE
? (rather than trying to achieve that in the regular expression)Alan H
07/08/2020, 7:00 AMERROR [PinotFSFactory] [main] Could not instantiate file system for class org.apache.pinot.plugin.filesystem.S3PinotFS with scheme s3
java.lang.ClassNotFoundException: org.apache.pinot.plugin.filesystem.S3PinotFS
at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_252]
at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_252]
at org.apache.pinot.spi.plugin.PluginClassLoader.loadClass(PluginClassLoader.java:80) ~[pinot-all-0.4.0-jar-with-dependencies.jar:0.4.0-8355d2e0e489a8d127f2e32793671fba505628a8]
Mayank
Preconditions.checkArgument(!isNullOrEmpty(config.getProperty(REGION)));
Cinto Sunny
07/09/2020, 6:57 PM/tables/{tableName}/size
{
"tableName": "meetupRsvp",
"reportedSizeInBytes": 0,
"estimatedSizeInBytes": 0,
"offlineSegments": null,
"realtimeSegments": {
"reportedSizeInBytes": 0,
"estimatedSizeInBytes": 0,
"missingSegments": 0,
"segments": {}
}
}
Also, where is the actual location on the disk where the segments are stored ? Is there some config for this ?Raúl G.
07/13/2020, 12:27 PMDamiano
07/19/2020, 2:23 PMMayank
Mayank