https://datahubproject.io logo
Title
n

numerous-address-22061

05/25/2023, 5:23 PM
Hello, I am noticing buggy behavior with the
browse path
of my ingested
Kakfa Topics
. Some are getting a nice, fully qualified browse path, and some are just not. I am not explicitly defining the browse path in my ingestion, here is an example...
Ingestion
pipeline_name: ${PIPELINE_NAME}
source:
  type: "kafka"
  config:
    platform_instance: ${CLUSTER_NAME}
    connection:
      bootstrap: ${BOOTSTRAP_BROKERS}
      consumer_config:
        security.protocol: "SASL_SSL"
        sasl.mechanism: "SCRAM-SHA-512"
        sasl.username: "${KAFKA_USERNAME}"
        sasl.password: "${KAFKA_PASSWORD}"
      schema_registry_url: ${SCHEMA_REGISTRY_URL}
sink:
  type: "datahub-rest"
  config:
    server: ${DATAHUB_GMS_ENDPOINT}
First topic
(queried using GraphQL)
{
  "data": {
    "dataset": {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,platform-instance.org.db.app.topic_name,PROD)",
      "platform": {
        "name": "kafka"
      },
      "browsePaths": [
        {
          "path": [
            "prod",
            "kafka",
            "platform-instance",
            "org",
            "db",
            "app"
          ]
        }
      ],
      "properties": {
        "name": "org.db.app.topic_name"
      }
    }
  }
}
Second Topic
(note this is
undesired
and I cant figure out why it is getting a different browse path than the topic above)
{
  "data": {
    "dataset": {
      "urn": "urn:li:dataset:(urn:li:dataPlatform:kafka,platform-instance.org.db.app.topic_name_2,PROD)",
      "platform": {
        "name": "kafka"
      },
      "browsePaths": [
        {
          "path": [
            "prod",
            "kafka",
            "platform-instance"
          ]
        }
      ],
      "properties": {
        "name": "org.db.app.topic_name_2"
      }
    }
  }
}
Why is the second browse path so short? It is very unfortunate for discovery in the UI
g

gray-shoe-75895

05/25/2023, 11:39 PM
When you don’t set a browse path manually, we autogenerate one for you. One hypothesis for what’s happening here - our browse path autogeneration system is running behind and hasn’t updated everything yet
n

numerous-address-22061

05/25/2023, 11:45 PM
Hmm, that is interesting, I havent seen anything change or catch up since I ran the ingestion.
@gray-shoe-75895 Is there any way only use certain pieces of the
DATASET_PARTS
in this line
- /ENV/PLATFORM/DATASET_PARTS
Like for
<http://platform-instance.org|platform-instance.org>.db.app.topic_name_2
id want the browse path to be
/ENV/PLATFORM/platform-instance/org/db/app
not
/ENV/PLATFORM/platform-instance/org/db/app/topic_name2