This message was deleted.
# troubleshooting
s
This message was deleted.
j
Are you saying you are ingesting with segment granularity of WEEK, and the segment start/end times don't match your Sun-Sat week boundary convention? I don't know of a way to change the start day-of-week for WEEK time periods ... but as far as ingestion and the way the Druid time chunks are built, that is a physical storage mechanism and should not affect your data and/or query results. If something else is going on, e.g. you think time values are changing in the data upon ingestion, then please provide details as that should not happen.
b
Yes, ingesting with segment granularity of WEEK and passing Intervals as 2023-04-09 / 2023-04-15 in timespec
Copy code
"timestampSpec": {
  "column": "cal_wk_start_dt",
  "format": "iso"
}
so cal_wk_start_dt is replacing with __time but values in __time is different than my source data coming for cal_wk_start_dt
__time column i was using for filtering while querying if data is not as expected from source how my filters are going to work?
j
try running the ingestion without specifying intervals ... you may be forcing the __time value to fit within a Druid week boundary by doing that.
b
tried that.. by default it takes Monday as first day of the calendar week.. but source data has Sunday as first day of the calendar week
j
Can you share the granularitySpec you are using?
b
Copy code
{
  "type": "index_hadoop",
  "spec": {
    "dataSchema": {
      "dataSource": "datasource_test",
      "timestampSpec": null,
      "dimensionsSpec": null,
      "metricsSpec": [],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "WEEK",
        "queryGranularity": "WEEK",
        "rollup": true,
        "intervals": [
          "#dataInterval#"
        ]
      },
      "transformSpec": {
        "filter": null,
        "transforms": []
      },
      "parser": {
        "type": "parquet",
        "parseSpec": {
          "format": "parquet",
          "columns": [
            "start_dt",
            "col1",
            "col12",
            "col3",
            "col4",
            "col5"
          ],
          "timestampSpec": {
            "column": "start_dt",
            "format": "iso"
          },
          "dimensionsSpec": {
            "dimensions": [
              {
                "type": "long",
                "name": "start_dt"
              },
              {
                "type": "string",
                "name": "col1"
              },
              {
                "type": "string",
                "name": "col2"
              },
              {
                "type": "string",
                "name": "col3"
              },
              {
                "type": "string",
                "name": "col4"
              },
              {
                "type": "string",
                "name": "col5"
              }
            ],
            "dimensionExclusions": []
          }
        }
      }
    },
    "ioConfig": {
      "type": "hadoop",
      "inputSpec": {
        "type": "granularity",
        "dataGranularity": "week",
        "filePattern": ".*",
        "inputFormat": "org.apache.druid.data.input.parquet.DruidParquetInputFormat",
        "pathFormat": "'wk_nbr='yyyyww/",
        "inputPath": "<gs://gcslocation/gcstable/>"
      },
      "metadataUpdateSpec": null,
      "segmentOutputPath": null
    },
    "tuningConfig": {
      "type": "hadoop",
      "workingPath": null,
      "partitionsSpec": {
        "type": "hashed",
        "numShards": 5,
        "partitionDimensions": [],
        "partitionFunction": "murmur3_32_abs",
        "maxRowsPerSegment": null
      },
      "shardSpecs": {},
      "indexSpec": {
        "bitmap": {
          "type": "concise"
        },
        "dimensionCompression": "lz4",
        "metricCompression": "lz4",
        "longEncoding": "longs",
        "segmentLoader": null
      },
      "indexSpecForIntermediatePersists": {
        "bitmap": {
          "type": "concise"
        },
        "dimensionCompression": "lz4",
        "metricCompression": "lz4",
        "longEncoding": "longs",
        "segmentLoader": null
      },
      "appendableIndexSpec": {
        "type": "onheap"
      },
      "maxRowsInMemory": 1000000,
      "maxBytesInMemory": 0,
      "leaveIntermediate": false,
      "cleanupOnFailure": true,
      "overwriteFiles": false,
      "ignoreInvalidRows": false,
      "jobProperties": {
        "mapreduce.job.classloader": "true",
        "mapreduce.job.user.classpath.first": "true",
        "mapreduce.input.fileinputformat.list-status.num-threads": "8",
        "mapreduce.map.memory.mb": "5461",
        "mapreduce.reduce.memory.mb": "5461",
        "mapreduce.map.output.compress": "true",
        "mapreduce.map.java.opts": "-Xmx4096m",
        "mapreduce.reduce.java.opts": "-Xmx4096m",
        "mapreduce.job.split.metainfo.maxsize": "-1",
        "mapreduce.task.io.sort.mb": "2047",
        "mapred.job.reuse.jvm.num.tasks": "20",
        "io.seqfile.sorter.recordlimit": "10000000",
        "mapred.output.compress": "true",
        "mapreduce.job.reduce.slowstart.completedmaps": "0.5",
        "mapreduce.reduce.shuffle.merge.percent": "0.8"
      },
      "combineText": false,
      "useCombiner": false,
      "buildV9Directly": true,
      "numBackgroundPersistThreads": 0,
      "forceExtendableShardSpecs": false,
      "useExplicitVersion": false,
      "allowedHadoopPrefix": [],
      "logParseExceptions": false,
      "maxParseExceptions": 0,
      "useYarnRMJobStatusFallback": true
    }
  },
  "hadoopDependencyCoordinates": null,
  "classpathPrefix": null,
  "context": {
    "forceTimeChunkLock": true,
    "useLineageBasedSegmentAllocation": true
  }
}
j
Okay ... your queryGranularity is set to WEEK ... that is rounding down the __time values to the week baseline (which is Druid's baseline, starting on a Monday). If you need a date field to consistently be set to the start of your "Sunday" week then I'm not sure how to do that other than a transform with some time functions in it to figure out the Sunday start-of-week date. You may be able to use the TIME_FLOOR() function with the 'origin' parameter for this .. or EXTRACT(DOW ...)
And just to confirm, you are doing rollups as well, so you want your data aggregated during ingestion based on the dimensions listed?
b
actually not rolling up the data during ingestion
not able to figure it out how to use 'origin' parameter within ingestion spec
j
In. your granularitySpec you have "rollup:true" ... set that to 'false' or remove the parameter altogether (default is false) if you are not rolling up your data during ingestion. Here is a simple expression to get the Sunday start-of-week date in a query:
select TIMESTAMPADD(DAY, -extract(dow from CURRENT_TIMESTAMP), CURRENT_TIMESTAMP)
For native ingestion the functions are timestamp_shift() and timestamp_extract() ... I can't get the ingestion expression to take in my demo dataset ... maybe you can get it working.
a
A quick side note:
WEEK
granularity can be tricky since it doesn’t always align well with months or years. Consider using
DAY
or
MONTH
instead - see a recent change that advises against it: https://github.com/apache/druid/pull/14341/files?short_path=2b1d633#diff-2b1d6334204fbf5b1a3bbafb48a34b341caf65e368934447c516f15173226569
also, if you can’t move away from
WEEK
granularity, I think you could also do something like
TIME_FLOOR(__time, 'P1W')
, similar to John’s suggestion above.
b
as per John's suggestion, still data with __time column will be incorrect but can be queried while fetching. sometimes it may create confusions to the end users.
j
Hi Basayya, my suggestion for timestamp_shift() and timestamp_extract() was to transform your __time value correctly to the Sunday start of the week. Just don't use queryGranularity in the spec, because that also changes your __time value, and you don't want that (because of the Monday week base) Remember, the segment granularity can be anything you want ... that's a physical storage mechanism, it does not affect data values.
b
Thanks John, let me try transforming __time value and test it