Will Briggs
12/30/2020, 10:07 PMeventTimestamp
). I would like to maintain this when querying / filtering my records at the individual event level. However, I would also like to define an hourly derived timestamp to be used for pre-aggregating with a star tree index.
My segments config looks like this:
"segmentsConfig": {
"timeColumnName": "eventTimestamp",
"timeType": "MILLISECONDS",
"retentionTimeUnit": "HOURS",
"retentionTimeValue": "48",
"segmentPushType": "APPEND",
"segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
"schemaName": "mySchema",
"replication": "1",
"replicasPerPartition": "1"
},
My star tree index looks like this:
"starTreeIndexConfigs": [{
"dimensionsSplitOrder": [
"dimension1",
"dimension2"
],
"skipStarNodeCreationForDimensions": [
],
"functionColumnPairs": [
"SUM__metric1",
"SUM__metric2",
"SUM__metric3",
"DISTINCT_COUNT_HLL__dimension3",
"DISTINCT_COUNT_HLL__dimension4"
],
"maxLeafRecords": 10000
}],
And my dateTimeFieldSpecs:
"dateTimeFieldSpecs": [
{
"name": "eventTimestamp",
"dataType": "LONG",
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:HOUR",
"dateTimeType": "PRIMARY"
}
],
Can anyone confirm that this is the correct approach? Should I be using an ingestion transformation of toEpochHoursRounded
instead, and specifying that as a DERIVED dateTimeField in the dateTimeFieldSpecs configuration, and manually adding that to the dimensionsSplitOrder of my star tree index?Xiang Fu
Will Briggs
12/30/2020, 10:17 PMtimeFieldSpec
, which meant I had to go digging in the wiki and read the 0.4.0 release notes talking about deprecating timeFieldSpec
before I realized I should be using dateTimeFieldSpecs
instead - I might take a stab at updating the example + docs once I get this all straight in my head, to save other people the pain (as long as I’m on the right track, here).Xiang Fu
Xiang Fu
Xiang Fu
Will Briggs
12/30/2020, 10:22 PMlatest
image for submitting admin commands as jobs 🙂Xiang Fu
Xiang Fu
Xiang Fu
Will Briggs
12/30/2020, 10:23 PMWill Briggs
12/30/2020, 10:25 PMdateTimeType
(e..g, PRIMARY
, SECONDARY
, or DERIVED
) is no longer necessary?Xiang Fu
Xiang Fu
Xiang Fu
Xiang Fu
ingestionConfig
in table, e.g.
{
"tableName": "githubEvents",
"tableType": "OFFLINE",
"segmentsConfig": {
"segmentPushType": "APPEND",
"segmentAssignmentStrategy": "BalanceNumSegmentAssignmentStrategy",
"schemaName": "githubEvents",
"replication": "1",
"timeColumnName": "event_time",
"timeType": "MILLISECONDS"
},
"tenants": {},
"tableIndexConfig": {
"starTreeIndexConfigs": [
{
"dimensionsSplitOrder": [
"type",
"repo_id"
],
"skipStarNodeCreationForDimensions": [],
"functionColumnPairs": [
"SUM__pull_request_additions",
"SUM__pull_request_deletions",
"SUM__pull_request_changed_files",
"COUNT__star",
"DISTINCT_COUNT_HLL__actor_id"
],
"maxLeafRecords": 1000
}
],
"enableDynamicStarTreeCreation": true,
"loadMode": "MMAP",
"invertedIndexColumns": [],
"segmentPartitionConfig": {
"columnPartitionMap": {
"repo_id": {
"functionName": "Murmur",
"numPartitions": 1024
}
}
},
"noDictionaryColumns": []
},
"routing": {
"segmentPrunerTypes": [
"partition"
]
},
"metadata": {
"customConfigs": {}
},
"ingestionConfig": {
"batchIngestionConfig": {
"segmentIngestionType": "APPEND",
"segmentIngestionFrequency": "DAILY",
"batchConfigMaps": [],
"segmentNameSpec": {},
"pushSpec": {}
},
"transformConfigs": [
{
"columnName": "event_time",
"transformFunction": "fromDateTime(created_at, \"yyyy-MM-dd'T'HH:mm:ssZ\")"
}
]
}
}
Xiang Fu
yyyy-MM-dd
format string column created_at
in raw data to millis epoch value to event_time
Xiang Fu
Will Briggs
12/30/2020, 10:30 PMEXPLAIN
?Xiang Fu
Xiang Fu
Xiang Fu
Xiang Fu
Will Briggs
12/30/2020, 10:36 PMWill Briggs
12/30/2020, 10:37 PMXiang Fu
Xiang Fu
Xiang Fu
Will Briggs
12/30/2020, 10:38 PMXiang Fu
Will Briggs
12/30/2020, 10:39 PMJackie
12/30/2020, 11:09 PMStarTreeFilterOperator
Jackie
12/30/2020, 11:11 PM"dateTimeFieldSpecs": [
{
"name": "eventTimestamp",
"dataType": "LONG",
"format": "1:MILLISECONDS:EPOCH",
"granularity": "1:HOUR"
}
],
Jackie
12/30/2020, 11:12 PM