https://pinot.apache.org/ logo
x

Xiang Fu

05/01/2020, 6:19 AM
you can put options like
--memory="1g" --cpus=".5"
in
docker run
command
d

Dan Hill

05/01/2020, 6:17 PM
@Xiang Fu - Even doing that and lower, docker still uses a ton of cpu and memory. Maybe there's a weird interaction with Presto and the other containers while starting up? I'm going to see if I can get the basic docker instructions for quick start to work separate of my setup.
Even if I wait 10 minutes, the logs don't make additional progress
x

Xiang Fu

05/01/2020, 6:20 PM
For presto? Are you using docker for desktop?
Maybe enlarge the system disk and memory
d

Dan Hill

05/01/2020, 6:47 PM
Yea, I'm using docker for desktop.
I'm going to restart docker
x

Xiang Fu

05/01/2020, 7:09 PM
message has been deleted
this is my set up
message has been deleted
d

Dan Hill

05/01/2020, 7:10 PM
Yea, something is off. I'm going to give Docker some more resources.
Thanks!
Restarting helped me get further but I got a point where I'm killing something in Docker.
Cool, I can get to
com.facebook.presto.server.PrestoServer	======== SERVER STARTED ========
but any queries just get queued and don't run.
Hmm, Presto server died
No logs
I'll play with it some more.
Sweet! I got Presto in a docker container working. I still get the same error even with 0.234.3.
Copy code
presto:default> SELECT DATE_TRUNC('DAYS', "timestamp") as mydate, SUM(cost_usd_micros) FROM pinot.default.events_testing GROUP BY mydate;
Query 20200501_195448_00002_878qh failed: line 1:8: Unexpected parameters (varchar(4), bigint) for function date_trunc. Expected: date_trunc(varchar(x), date) , date_trunc(varchar(x), time) , date_trunc(varchar(x), time with time zone) , date_trunc(varchar(x), timestamp) , date_trunc(varchar(x), timestamp with time zone)
x

Xiang Fu

05/01/2020, 7:55 PM
hmm
could you describe the table ?
it this field showing as type string or date/timestamp
d

Dan Hill

05/01/2020, 7:56 PM
Copy code
presto:default> DESCRIBE  pinot.default.events_testing;
     Column      |  Type  | Extra |  Comment  
-----------------+--------+-------+-----------
 cost_usd_micros | bigint |       | METRIC    
 insertions      | bigint |       | METRIC    
 content_id      | bigint |       | DIMENSION 
 platform_id     | bigint |       | DIMENSION 
 clicks          | bigint |       | METRIC    
 impressions     | bigint |       | METRIC    
 customer_id     | bigint |       | DIMENSION 
 ad_group_id     | bigint |       | DIMENSION 
 campaign_id     | bigint |       | DIMENSION 
 advertiser_id   | bigint |       | DIMENSION 
 timestamp       | bigint |       | TIME      
(11 rows)

Query 20200501_195558_00003_878qh, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
0:04 [11 rows, 926B] [2 rows/s, 214B/s]

presto:default>
x

Xiang Fu

05/01/2020, 7:56 PM
it’s time field now
SELECT  *date*(timestamp) as mydate, SUM(cost_usd_micros) FROM pinot.default.events_testing group by *date*(timestamp)
can you try this first
d

Dan Hill

05/01/2020, 7:59 PM
Copy code
presto:default> SELECT DATE("timestamp") as mydate, SUM(cost_usd_micros) FROM pinot.default.events_testing GROUP BY mydate;
Query 20200501_195938_00006_878qh failed: line 1:8: Unexpected parameters (bigint) for function date. Expected: date(varchar(x)) , date(timestamp) , date(timestamp with time zone)
x

Xiang Fu

05/01/2020, 7:59 PM
do you have some sample value for timestamp?
is it mills in epoch?
oh
question, is your pinot schema defined this column as dimension or timefield?
if infer is correct, it should show TIMESTAMP in type, not bigint
d

Dan Hill

05/01/2020, 8:01 PM
The simple queries are causing issues with Docker Presto
Copy code
Query 20200501_200108_00013_878qh failed: <http://java.net|java.net>.UnknownHostException: pinot-broker-0.pinot-broker-headless.events-local.svc.cluster.local
All of my queries for data are timing out
Bb in 15 minutes
I got it further along
Copy code
presto:default> SELECT "timestamp", "impressions" FROM pinot.default.events_testing LIMIT 3;
   timestamp   | impressions 
---------------+-------------
 1554366550435 |           1 
 1565861533888 |           1 
 1569063656996 |           1 
(3 rows)
x

Xiang Fu

05/01/2020, 8:30 PM
ic
could you try to update the pinot schema by setting timestamp as either datetime field or timeField?
Copy code
"timeFieldSpec": {
    "incomingGranularitySpec": {
      "dataType": "LONG",
      "timeType": "MILLISECONDS",
      "name": "timestamp"
    }
  },
d

Dan Hill

05/01/2020, 8:32 PM
Yea, ill try it in ten minutes
Copy code
"timeFieldSpec": {
    "incomingGranularitySpec": {
      "name": "timestamp",
      "dataType": "LONG",
      "timeFormat" : "EPOCH",
      "timeType": "MILLISECONDS"
    }
  }
That's my current
Do you want me to remove
timeFormat
?
x

Xiang Fu

05/01/2020, 9:04 PM
this is good
hmm
then why infer timestamp is not working ….
😅 1
d

Dan Hill

05/01/2020, 9:06 PM
I'm happy to forward the configs and stuff if it'd help
x

Xiang Fu

05/01/2020, 9:06 PM
sure
or maybe your schema
d

Dan Hill

05/01/2020, 9:10 PM
Here's the schema (in the Kubernetes yaml and the etc for Presto).
x

Xiang Fu

05/01/2020, 9:14 PM
hmmm
message has been deleted
seems like it works for me
Copy code
presto:default> SELECT DATE_TRUNC('DAYS', "timestamp") as mydate, SUM(cost_usd_micros) FROM pinot.default.events GROUP BY DATE_TRUNC('DAYS', "timestamp");
 mydate | _col1
--------+-------
(0 rows)

Query 20200501_211538_00009_5qeb6, FINISHED, 1 node
Splits: 49 total, 49 done (100.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
d

Dan Hill

05/01/2020, 9:16 PM
Copy code
presto:default> DESCRIBE pinot.default.events;
     Column      |  Type  | Extra |  Comment  
-----------------+--------+-------+-----------
 cost_usd_micros | bigint |       | METRIC    
 insertions      | bigint |       | METRIC    
 content_id      | bigint |       | DIMENSION 
 platform_id     | bigint |       | DIMENSION 
 clicks          | bigint |       | METRIC    
 impressions     | bigint |       | METRIC    
 customer_id     | bigint |       | DIMENSION 
 ad_group_id     | bigint |       | DIMENSION 
 campaign_id     | bigint |       | DIMENSION 
 advertiser_id   | bigint |       | DIMENSION 
 timestamp       | bigint |       | TIME      
(11 rows)
x

Xiang Fu

05/01/2020, 9:16 PM
can you add
Copy code
pinot.use-date-trunc=true
pinot.infer-date-type-in-schema=true
pinot.infer-timestamp-type-in-schema=true
into
pinot.properties
under
etc
?
message has been deleted
this is my pinot config for presto catalog
d

Dan Hill

05/01/2020, 9:19 PM
Ah, yea, I am missing infer-timestamp-type-in-schema. That's probably it.
x

Xiang Fu

05/01/2020, 9:22 PM
cool, let me know
d

Dan Hill

05/01/2020, 9:28 PM
Yes! That's it. I hit issues iterating on it.
Thanks so much!
🎉 1