Jelena Zanko
09/21/2022, 8:48 PMMBM
09/22/2022, 11:22 AMDavid Cromberge
09/23/2022, 3:19 PMvictor regalado
09/23/2022, 4:29 PMSELECT
"key", SUM("impression")
FROM "event_agg_metrics"
WHERE TIME_FORMAT("__time", 'yyyy-MM-dd') = '2022-09-15' AND "user_id" = '123'
GROUP BY "key"
Cory Johannsen
09/23/2022, 4:38 PMSamarth Jain
09/23/2022, 6:30 PMfieldName
in the ingestion spec by using the column name in the table.
{
"fieldName": <table_column_name>,
"name": "name",
"type": "longSum"
}
Iceberg allows renaming column names. So it can happen that the column name in the table doesn't match the column name in the data files that were written to the table before the column was renamed. This is problematic because the Druid indexer extracts value from a row by using the column name. And because it isn't able to find a column with the name in the row, it ends up assigning it the default value for the datatype. Note that this isn't a problem for other engines like Spark or Trino because they depend on the Iceberg read path to resolve a column by using its fieldId and not its name. The only workaround for such a situation is to backfill the table with the new column name.
Has anyone else in the community encountered something similar? If so, how did you solve it?Eyal Yurman
09/23/2022, 6:33 PMMBM
09/27/2022, 6:23 AMSergio Ferragut
09/27/2022, 8:57 PMtilak chowdary
09/27/2022, 11:44 PMSELECT
id
FROM view
WHERE __time BETWEEN '2022-09-22 15:09:17.0' AND '2022-09-27 15:09:17.0'
and MV_CONTAINS(tags, ARRAY["tag1", "tag2"])
and MV_CONTAINS(ARRAY["tag1", "tag2"], tags)
GROUP BY id
LIMIT 100 OFFSET 0
İbrahim Ercan
09/28/2022, 12:33 PM$.raw_info.foo
I've tried jq but couldn't succeeded.
{
"timestamp": 1664192716,
"raw_info": "{\"foo\":\"bar\"}"
}
Jelena Zanko
09/29/2022, 3:12 PMEllen Shen
09/29/2022, 11:07 PMDavid Palmer
09/30/2022, 12:38 AMMark Veidemanis
09/30/2022, 2:00 PMMark Veidemanis
09/30/2022, 2:01 PMneptune-app-1 | {
neptune-app-1 | "limit": 5,
neptune-app-1 | "queryType": "scan",
neptune-app-1 | "dataSource": "main",
neptune-app-1 | "intervals": [
neptune-app-1 | "1000-01-01/4000-01-01"
neptune-app-1 | ],
neptune-app-1 | "filter": {
neptune-app-1 | "type": "and",
neptune-app-1 | "fields": [
neptune-app-1 | {
neptune-app-1 | "type": "search",
neptune-app-1 | "dimension": "msg",
neptune-app-1 | "query": {
neptune-app-1 | "type": "insensitive_contains",
neptune-app-1 | "value": "s"
neptune-app-1 | }
neptune-app-1 | },
neptune-app-1 | {
neptune-app-1 | "type": "in",
neptune-app-1 | "dimension": "src",
neptune-app-1 | "values": [
neptune-app-1 | "4ch"
neptune-app-1 | ]
neptune-app-1 | }
neptune-app-1 | ]
neptune-app-1 | },
neptune-app-1 | "order": "descending"
neptune-app-1 | }
Yang Ou
10/01/2022, 10:35 PMYang Ou
10/01/2022, 10:37 PMYang Ou
10/02/2022, 3:45 AMDavid Cromberge
10/03/2022, 2:31 PMMark Herrera
10/03/2022, 6:31 PMWe are creating a data repository where users will upload various csv files. And it creates inconvenience by the fact that over time a huge number of data resources will be created with which it will be necessary to work. According to our idea, we will give each user one data resource, where new files will be added, i.e. data resource will be constantly replenished with new records and columns. We are also considering a variant with something like nested tables.
I would like to get advice from those who have implemented something like this.
I need a solution on how to store the data of the files uploaded by the users. So far I have an idea to create one datasource for each user in which each column will be a json array of data from each new file. It goes something like this:
file_naem1, file_name2, file_name3
ingestion_date json_aray json_aray
ingestion_date json_aray json_aray
ingestion_date json_aray json_aray json_aray
But it is inconvenient in terms of data selection
victor regalado
10/03/2022, 11:30 PMGian Merlino
10/03/2022, 11:44 PMVishal
10/04/2022, 11:43 AM# Number of tasks per middleManager
druid.worker.capacity=4
# Task launch parameters
druid.indexer.runner.javaOpts=-server -XX:+UseG1GC -Xms1024M -Xmx1024M -XX:MaxDirectMemorySize=1g -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+ExitOnOutOfMemoryError -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
druid.indexer.task.baseTaskDir=var/druid/task
# HTTP server threads
druid.server.http.numThreads=60
# Processing threads and buffers on Peons
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=100000000
druid.indexer.fork.property.druid.processing.numThreads=2
druid.query.groupBy.maxMergingDictionarySize=2000000000
druid.query.groupBy.maxOnDiskStorage=10000000000
Mark Veidemanis
10/04/2022, 4:16 PM# Java tuning
DRUID_XMX=1g
DRUID_XMS=1g
DRUID_MAXNEWSIZE=250m
DRUID_NEWSIZE=250m
DRUID_MAXDIRECTMEMORYSIZE=500m
druid_emitter_logging_logLevel=debug
druid_extensions_loadList=["druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "postgresql-metadata-storage", "druid-kafka-indexing-service"]
druid_zk_service_host=zookeeper
druid_metadata_storage_host=
druid_metadata_storage_type=postgresql
druid_metadata_storage_connector_connectURI=jdbc:<postgresql://postgres:5432/druid>
druid_metadata_storage_connector_user=druid
druid_metadata_storage_connector_password=hunter2
druid_coordinator_balancer_strategy=cachingCost
druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g", "-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
druid_indexer_fork_property_druid_processing_buffer_sizeBytes=128MiB
druid_processing_buffer_sizeBytes=268435456 # 256MiB
druid_storage_type=local
druid_storage_storageDirectory=/opt/shared/segments
druid_indexer_logs_type=file
druid_indexer_logs_directory=/opt/shared/indexing-logs
druid_processing_numThreads=1
druid_processing_numMergeBuffers=1
DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration status="WARN"><Appenders><Console name="Console" target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog" additivity="false" level="DEBUG"><AppenderRef ref="Console"/></Logger></Loggers></Configuration>
Pedro Garcia
10/05/2022, 6:20 AMNikhil Agrawal
10/06/2022, 7:43 AMJAY PATEL
10/06/2022, 5:18 PMdruid.metadata.storage.connector.password={ "type": "aws-rds-token", "user": "USER", "host": "HOST", "port": PORT, "region": "AWS_REGION" }
but I get exception that user can’t connect.
but if I directly set password it works.
Can someone help me how can I use aws-rds-token or using password directly is fine.JAY PATEL
10/07/2022, 3:14 PMNikhil Agrawal
10/07/2022, 6:59 PM