Dan Hill
05/01/2020, 12:42 AMXiang Fu
Xiang Fu
pinot.infer-date-type-in-schema=true
pinot.infer-timestamp-type-in-schema=true
Xiang Fu
Xiang Fu
Xiang Fu
Xiang Fu
presto:default> select DATE_TRUNC('day', "dayssinceepoch") as mydate from airlineStats limit 10;
Query 20200501_003454_00012_y857r failed: line 1:8: Unexpected parameters (varchar(3), integer) for function date_trunc. Expected: date_trunc(varchar(x), date) , date_trunc(varchar(x), time) , date_trunc(varchar(x), time with time zone) , date_trunc(varchar(x), timestamp) , date_trunc(varchar(x), timestamp with time zone)
select DATE_TRUNC('day', "dayssinceepoch") as mydate from airlineStats limit 10
presto:default> select DATE_TRUNC('day', "dayssinceepoch") as mydate from airlineStats limit 10;
mydate
------------
2014-04-01
2014-04-01
2014-04-01
2014-04-02
2014-04-02
2014-04-02
2014-04-02
2014-04-02
2014-04-02
2014-04-03
(10 rows)
Query 20200501_022002_00000_9icjz, FINISHED, 1 node
Splits: 17 total, 17 done (100.00%)
0:02 [10 rows, 40B] [5 rows/s, 23B/s]
Xiang Fu
presto:default> describe airlinestats;
Column | Type | Extra | Comment
----------------------+---------+-------+-----------
flightnum | integer | | DIMENSION
origin | varchar | | DIMENSION
quarter | integer | | DIMENSION
lateaircraftdelay | integer | | DIMENSION
divactualelapsedtime | integer | | DIMENSION
divwheelsons | varchar | | DIMENSION
divwheelsoffs | varchar | | DIMENSION
airtime | integer | | DIMENSION
arrdel15 | integer | | DIMENSION
divtotalgtimes | varchar | | DIMENSION
deptimeblk | varchar | | DIMENSION
destcitymarketid | integer | | DIMENSION
divairportseqids | varchar | | DIMENSION
dayssinceepoch | date | | TIME
deptime | integer | | DIMENSION
month | integer | | DIMENSION
.....
Dan Hill
05/01/2020, 2:25 AM2020-04-30T19:06:21.526-0700 ERROR main com.facebook.presto.server.PrestoServer com.google.inject.CreationException: Unable to create injector, see the following errors:
1) Configuration property 'pinot.infer-date-type-in-schema' was not used
at com.facebook.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:238)
2) Configuration property 'pinot.infer-timestamp-type-in-schema' was not used
at com.facebook.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:238)
2 errors
java.lang.RuntimeException: com.google.inject.CreationException: Unable to create injector, see the following errors:
1) Configuration property 'pinot.infer-date-type-in-schema' was not used
at com.facebook.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:238)
2) Configuration property 'pinot.infer-timestamp-type-in-schema' was not used
at com.facebook.airlift.bootstrap.Bootstrap.lambda$initialize$2(Bootstrap.java:238)
2 errors
at com.facebook.presto.pinot.PinotConnectorFactory.create(PinotConnectorFactory.java:94)
at com.facebook.presto.connector.ConnectorManager.createConnector(ConnectorManager.java:364)
at com.facebook.presto.connector.ConnectorManager.addCatalogConnector(ConnectorManager.java:222)
at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:214)
at com.facebook.presto.connector.ConnectorManager.createConnection(ConnectorManager.java:200)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:123)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalog(StaticCatalogStore.java:98)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:80)
at com.facebook.presto.metadata.StaticCatalogStore.loadCatalogs(StaticCatalogStore.java:68)
at com.facebook.presto.server.PrestoServer.run(PrestoServer.java:135)
at com.facebook.presto.server.PrestoServer.main(PrestoServer.java:77)
Xiang Fu
Xiang Fu
Xiang Fu
0.234.3
?Dan Hill
05/01/2020, 5:00 AMDan Hill
05/01/2020, 6:04 AMexpired
2020-05-01T05:51:54.338Z INFO main Bootstrap transaction.max-finishing-concurrency 1 1 Maximum parallelism for committing or aborting a transaction
I've tried a few different run setups and can't get any of them to work. E.g. the docker run command in the gitbook. I merged the etc in the docker directory with my own etc.
docker run \
--network kafka_default \
--name=presto-coordinator \
-v "$(pwd)"/etc:/home/presto/etc \
-p 8080:8080 \
-d apachepinot/pinot-presto:0.234.3
Xiang Fu
Xiang Fu
Xiang Fu
2020-05-01T06:20:58.656Z INFO main Bootstrap transaction.max-finishing-concurrency 1 1 Maximum parallelism for committing or aborting a transaction
2020-05-01T06:21:04.353Z WARN main com.facebook.airlift.jmx.JmxAgent Cannot determine if JMX agent is already running (not an Oracle JVM?). Will try to start it manually.
2020-05-01T06:21:04.452Z INFO main com.facebook.airlift.jmx.JmxAgent JMX agent started and listening on 7b634ba97f04:42235
2020-05-01T06:21:11.644Z WARN node-state-poller-0 com.facebook.presto.metadata.HttpRemoteNodeState Node state update request to <http://172.19.0.4:8080/v1/info/state> has not returned in 290503.50s
2020-05-01T06:21:13.851Z WARN query-management-2 com.facebook.presto.memory.RemoteNodeMemory Memory info update request to <http://172.19.0.4:8080/v1/memory> has not returned in 290505.71s
It took a few seconds for me to pass that logs alsoDamiano
05/01/2020, 3:44 PMORDER BY id ASC
(where id is an Integer). Are the documents inside each partition/segment sorted before the aggregation (lets suppose before processing the MAX() aggregator)? or the sorting will be done at the end? I am asking that because, obviously, the order matter in my case...if each segment is ordered i can optimize my aggregator much more. I know that i can use a timestamp like recommended by @Kishore G to understand the "order" of each document but if the documents coming in an "random" order i must deal with that and this means...more code, more checks = slower.Kishore G
Damiano
05/01/2020, 4:09 PMDamiano
05/01/2020, 4:10 PMKishore G
Damiano
05/01/2020, 4:10 PMDamiano
05/01/2020, 4:11 PMKishore G
Damiano
05/01/2020, 4:11 PMKishore G
Damiano
05/01/2020, 4:12 PMDamiano
05/01/2020, 4:12 PMKishore G