Chundong Wang
02/15/2021, 6:41 PMselect count(*) as cnt from log where date >= DATE_SUB(NOW(),INTERVAL 1 HOUR);
Karin Wolok
Elon
02/17/2021, 1:37 AMNick Bowles
02/19/2021, 3:11 AMvmarchaud
02/22/2021, 1:31 PMShawn Peng
02/23/2021, 1:09 AMDATETRUNC('hour', second(now()), 'SECONDS')
, is this expected?Karin Wolok
ayush sharma
02/23/2021, 10:42 PMDatabase may be already in use
Please find the attached log file.
Any help is appreciated!
docker run \
--network=pinot-demo \
--name thirdeye \
-p 1426:1426 \
-p 1427:1427 \
-d apachepinot/thirdeye:latest
Slack ConversationNick Bowles
02/26/2021, 5:35 PMKen Krugler
02/26/2021, 6:00 PMVince Vinci
02/27/2021, 2:46 AMAnupam Mukherjee
03/02/2021, 7:23 AMJosh Highley
03/02/2021, 3:55 PMAlex
03/02/2021, 11:18 PMJosh Highley
03/03/2021, 2:55 PM{ "account":" 123", .....}
If my realtime table defines the account column as DOUBLE, then the record loads with no issue -- the spaces appear to be ignored. However, if I define the column as INT then the record does not load. More troublesome, I can't find any error messages in any of the logs -- I would expect some kind of error message?Josh Highley
03/04/2021, 1:45 AMtroywinter
03/05/2021, 3:48 AM[
{
"errorCode": 500,
"message": "MergeResponseError:\nData schema mismatch between merged block: [time_to_hour(LONG),age_decade(STRING),age_level(STRING),city(STRING),company_id(STRING),company_name(STRING),count_impression(LONG),count_in(LONG),count_passby(LONG),create_time(LONG),day(STRING),day_in_week(STRING),district(STRING),gate_id(STRING),gender(STRING),holiday_id(STRING),holiday_name(STRING),hour(STRING),is_holiday(STRING),month(STRING),province(STRING),region(STRING),shop_id(STRING),shop_name(STRING),temperature(STRING),temperature_id(STRING),total_duration(LONG),total_impression_duration(LONG),weather_cate_id(STRING),weather_cate_name(STRING),year(STRING)] and block to merge: [time_to_hour(LONG),age_decade(STRING),age_level(STRING),city(STRING),company_id(STRING),company_name(STRING),count_impression(LONG),count_in(LONG),count_passby(LONG),create_time(LONG),day(STRING),day_in_week(STRING),district(STRING),gate_id(STRING),gender(STRING),holiday_id(STRING),holiday_name(STRING),hour(STRING),is_holiday(STRING),month(STRING),province(STRING),region(STRING),shop_id(STRING),shop_name(STRING),temperature(STRING),temperature_id(STRING),total_duration(LONG),total_impression_duraion(LONG),weather_cate_id(STRING),weather_cate_name(STRING),year(STRING)], drop block to merge"
}
]
Pankaj Thakkar
03/05/2021, 6:54 AMayush sharma
03/05/2021, 7:36 PMapiVersion: batch/v1
kind: Job
metadata:
name: pinot-case-offline-ingestion
namespace: my-pinot-kube
spec:
template:
spec:
containers:
- name: pinot-load-case-offline
image: apachepinot/pinot:0.3.0-SNAPSHOT
args: ["LaunchDataIngestionJob", "-jobSpecFile", "/opt/data/table-configs/case_history/job-spec.yml"]
volumeMounts:
- name: mount-data
mountPath: /opt/data
restartPolicy: OnFailure
volumes:
- name: mount-data
hostPath:
path: /opt/data
backoffLimit: 100
After applying this job to node, nothing happens and this is the log of the pod.
SegmentGenerationJobSpec:
!!org.apache.pinot.spi.ingestion.batch.spec.SegmentGenerationJobSpec
excludeFileNamePattern: null
executionFrameworkSpec: {extraConfigs: null, name: standalone, segmentGenerationJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner,
segmentTarPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner,
segmentUriPushJobRunnerClassName: org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner}
includeFileNamePattern: glob:**/*.csv
inputDirURI: /opt/data/csv_data/case_prod_data
jobType: SegmentCreationAndTarPush
outputDirURI: /pinot-segments/case_history
overwriteOutput: true
pinotClusterSpecs:
- {controllerURI: '<http://192.168.49.2:30892/>'}
pinotFSSpecs:
- {className: org.apache.pinot.spi.filesystem.LocalPinotFS, configs: null, scheme: file}
pushJobSpec: null
recordReaderSpec:
className: org.apache.pinot.plugin.inputformat.csv.CSVRecordReader
configClassName: org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig
configs: {delimiter: '|', multiValueDelimiter: ''}
dataFormat: csv
segmentNameGeneratorSpec:
configs: {segment.name.prefix: case_history, exclude.sequence.id: 'true'}
type: normalizedDate
tableSpec: {schemaURI: null, tableConfigURI: null, tableName: case_history}
Trying to create instance for class org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner
Initializing PinotFS for scheme file, classname org.apache.pinot.spi.filesystem.LocalPinotFS
Am I ingesting the data incorrectly ?Jai
03/09/2021, 2:06 PMManish Bhoge
03/10/2021, 3:00 PM# Build Pinot
$ mvn clean install -DskipTests -Pbin-dist
But, it is failing with an error, any idea on this below error:
[ERROR] Failed to execute goal org.apache.maven.pluginsmaven shade plugin3.2.1:shade (default) on project pinot-yammer: Execution default of goal org.apache.maven.pluginsmaven shade plugin3.2.1:shade failed: Plugin org.apache.maven.pluginsmaven shade plugin3.2.1 or one of its dependencies could not be resolved: The following artifacts could not be resolved: org.apache.maven.sharedmaven artifact transferjar:0.10.0, org.ow2.asmasmjar7.0 Could not transfer artifact org.apache.maven.sharedmaven artifact transferjar:0.10.0 from/to central (https://repo.maven.apache.org/maven2): Connect to repo.maven.apache.org:443 [repo.maven.apache.org/151.101.12.215] failed: Connection timed out (Connection timed out) -> [Help 1]Josh Highley
03/10/2021, 3:08 PMbin/pinot-admin.sh StartServer
and bin/start-server.sh
? Which way should be used?Ken Krugler
03/11/2021, 11:47 PMgroup by
, I assume currently we have to do a separate distinctcount
or distinctcounthll
, right? But if the group by uses multiple columns, what’s the best approach to getting this total group count?Anupam Mukherjee
03/12/2021, 11:30 AMRavikumar Maddi
03/12/2021, 4:25 PMRavikumar Maddi
03/12/2021, 4:28 PMayush sharma
03/12/2021, 7:18 PMSegment query returned '50001' rows per split, maximum allowed is '50000' rows. with query "SELECT * FROM pinot_table LIMIT 50001"
Presto cannot even query something like this:
presto:default> select count(*) from pinot.default.pinot_table;
Even, if we increase the 50k limit of presto's pinot.properties pinot.max-rows-per-split-for-segment-queries
to 1 million, the presto server crashes stating heap memory exceeded.
To work it around, we got to know that we can make pinot to do the aggregations and feed the aggregated result to presto which will in turn feed the superset to visualize the charts, by writing the aggregation logic inside the sub query of presto like,
presto:default> select * from pinot.default."select count(*) from pinot_table"
This returns the expected result.
Problem # 3
We found that, though we can make pinot to do the aggregations, we cannot use the supported transformation function of pinot listed here, inside the sub query of presto.
The query
select datetrunc('day', epoch_ms_col, 'milliseconds') from pinot_table limit 10
works fine in pinot but when embedded in presto as sub query like below does not work
presto:default> select * from pinot.default."select datetrunc('day', epoch_ms_col, 'milliseconds') from pinot_table limit 10";
Query failed: Column datetrunc('day',epoch_ms_col,'milliseconds') not found in table default.select datetrunc('day', epoch_ms_col, 'milliseconds') from pinot_table limit 10
I do not know if we are doing something wrong while querying/implementing or have missed some useful config setting that can solve our problem.
The SQL Lab query which we want to query from pinot and eventually use the result to make a chart is like
SELECT
day_of_week(epoch_ms_col),
count(*)
from pinot_table
group by day_of_week(epoch_ms_col)
Any help is really appreciated !!!Ravikumar Maddi
03/13/2021, 8:32 AMRavikumar Maddi
03/15/2021, 8:28 AMRavikumar Maddi
03/16/2021, 12:40 AMbin/pinot-admin.sh StartZookeeper -zkPort 2181
But I am getting like this after some time:
zookeeper state changed (SyncConnected)
Waiting for keeper state SyncConnected
Terminate ZkClient event thread.
Session: 0x1000014c3150000 closed
Start zookeeper at localhost:2181 in thread main
EventThread shut down for session: 0x1000014c3150000
Unable to read additional data from client sessionid 0x1000014c3150005, likely client has closed socket
Unable to read additional data from client sessionid 0x1000014c3150004, likely client has closed socket
Unable to read additional data from client sessionid 0x1000014c3150002, likely client has closed socket
Expiring session 0x1000014c3150004, timeout of 30000ms exceeded
Expiring session 0x1000014c3150005, timeout of 30000ms exceeded
Expiring session 0x1000014c3150002, timeout of 30000ms exceeded
Unable to read additional data from client sessionid 0x1000014c3150009, likely client has closed socket
Unable to read additional data from client sessionid 0x1000014c315000a, likely client has closed socket
Unable to read additional data from client sessionid 0x1000014c3150007, likely client has closed socket
Expiring session 0x1000014c315000a, timeout of 30000ms exceeded
I restarted the server based suggestions prescribed online. Even no luck.
Need help 🙂