This message was deleted Apache Druid #general

Join Slack

This message was deleted.

# general

Slackbot

05/31/2023, 6:29 AM

This message was deleted.

Sergio Ferragut

06/01/2023, 1:44 AM

same latitude? What does your datasource look like? what's the query? Also 0.17?! 😄 26.0 is pretty cool 😉

Yang Dark

06/01/2023, 4:22 AM

Hello, my data source comes from Hadoop. When I query by hour, it returns multiple records in the same hour. At this time, I sort by date, but when I sort by hour, some of them are normal. I don’t understand how the underlying query is parsed and executed?

Yang Dark

06/01/2023, 4:24 AM

Yes, due to some reasons, we are unable to directly upgrade Druid to the new version. At the same time, I am not sure if the new version can avoid this issue. Attached is a result of my query.

Sergio Ferragut

06/01/2023, 5:24 PM

That does seem like a bug... can you share the EXPLAIN for the query?

Yang Dark

06/02/2023, 4:58 AM

Copy code

{
  "queryType": "groupBy",
  "dataSource": {
    "type": "table",
    "name": "table_name"
  },
  "intervals": {
    "type": "intervals",
    "intervals": [
      "2023-06-01T00:00:00.000Z/2023-06-01T00:00:00.001Z"
    ]
  },
  "virtualColumns": [],
  "filter": {
    "type": "and",
    "fields": [
      {
        "type": "selector",
        "dimension": "action_day",
        "value": "2023-06-01",
        "extractionFn": null
      },
      {
        "type": "selector",
        "dimension": "adx_name",
        "value": "xxx",
        "extractionFn": null
      },
      {
        "type": "or",
        "fields": [
          {
            "type": "bound",
            "dimension": "impression",
            "lower": "0",
            "upper": null,
            "lowerStrict": true,
            "upperStrict": false,
            "extractionFn": null,
            "ordering": {
              "type": "numeric"
            }
          },
          {
            "type": "bound",
            "dimension": "bid",
            "lower": "0",
            "upper": null,
            "lowerStrict": true,
            "upperStrict": false,
            "extractionFn": null,
            "ordering": {
              "type": "numeric"
            }
          }
        ]
      }
    ]
  },
  "granularity": {
    "type": "all"
  },
  "dimensions": [
    {
      "type": "default",
      "dimension": "action_hour",
      "outputName": "d0",
      "outputType": "LONG"
    },
    {
      "type": "default",
      "dimension": "action_day",
      "outputName": "d1",
      "outputType": "STRING"
    }
  ],
  "aggregations": [
    {
      "type": "longSum",
      "name": "a0",
      "fieldName": "impression",
      "expression": null
    },
    {
      "type": "longSum",
      "name": "a1",
      "fieldName": "bid",
      "expression": null
    }
  ],
  "postAggregations": [
    {
      "type": "expression",
      "name": "p0",
      "expression": "(\"a0\" * 1)",
      "ordering": null
    },
    {
      "type": "expression",
      "name": "p1",
      "expression": "(\"a1\" * 1)",
      "ordering": null
    }
  ],
  "having": null,
  "limitSpec": {
    "type": "default",
    "columns": [
      {
        "dimension": "d1",
        "direction": "ascending",
        "dimensionOrder": {
          "type": "lexicographic"
        }
      }
    ],
    "limit": 100
  },
  "context": {
    "sqlOuterLimit": 100,
    "sqlQueryId": "c48b6789-4ce7-4157-a2f2-f79e0a2bdfde"
  },
  "descending": false
}

Yang Dark

06/02/2023, 5:11 AM

I think it may be because the underlying segments were not merged? My current settings are as follows, `QueryGranularity ‘is set to’ hour ’,

SegmentGranularity ‘is’ day’

Sergio Ferragut

06/02/2023, 4:43 PM

It shouldn't matter that they were not merged, the groupBy query should aggregate them into a single result per group by dimension values... I'll look for bug fixes that may already resolve this between 0.17 and 26.0 and let you know what I find. In the meantime, is this something you can consistently reproduce?

Yang Dark

06/07/2023, 1:49 AM

Sorry for not responding to the message in a timely manner. This error can be replicated, and it will occur when I query yesterday’s data, but it was normal when I query the day before yesterday

Yang Dark

06/07/2023, 1:53 AM

It’s June 7th now, and when I query the data for June 6th, I will get many duplicate entries, like this

Sergio Ferragut

06/07/2023, 2:26 AM

This bug says it affects 0.18 but it may also affect 0.17. Does it fit your query pattern? https://github.com/apache/druid/issues/9866

Yang Dark

06/07/2023, 3:03 AM

Thank you for your help It seems different from mine. I don’t have any joins or sub queries, just like the query plan I replied to earlier. It looks like a regular ‘group by’ query, In Druid version 0.17, join does not seem to be supported https://druid.apache.org/docs/0.17.0/querying/joins.html

Sergio Ferragut

06/07/2023, 2:24 PM

I’ve been searching for other issue reports that might fit… haven’t found anything specific. In the interest of figuring this out…can you try a few thing separately on the date where you see the problem: • remove the CAST for action hour • Remove the limit • Remove the order by also can you share the segments view for the corresponding segments for the day that fails and for the day that doesn’t?

Yang Dark

06/08/2023, 2:12 AM

I have tried these methods separately, and removing the case is feasible, while the other methods remain unchanged, including

Remove the limit

Remove the order by

The segment distribution is like this Regarding segment, I have noticed a phenomenon where all the other days were the same segment. Yesterday’s data is always five, and only yesterday’s data can cause this problem. Does it seem like a coincidence? Meanwhile, how can I set it to merge to verify that merging can avoid this problem?

Open in Slack

Previous Next