Hi All A contribution to extract metadata from Apache Pulsar DataHub #contribute-code

Hi All, A contribution to extract metadata from A...

abundant-solstice-71438

04/22/2022, 9:13 AM

Hi All, A contribution to extract metadata from Apache Pulsar https://github.com/datahub-project/datahub/pull/4721. As this is my first experience with Python and contributing, please be gentle with me. • used the Kafka source as a starting point, so it supports state, domain, etc. • used the doc template @little-megabyte-1074 proposed One known issues is the Pulsar topic naming convention within the UI search, a topic name is for example "_persistent://tenants/namespace/topicname_" .. the double slashes cause the search to fail. Any feedback at this point is highly appreciated.

teamwork 3

mammoth-bear-12532

04/22/2022, 8:06 PM

Thanks for the contrib @abundant-solstice-71438!

mammoth-bear-12532

04/22/2022, 8:07 PM

quick question on the topic naming... the

<persistent://tenants/namespace/topicname>

is that the "searchable" string by which the topic needs to be found?

mammoth-bear-12532

04/22/2022, 8:09 PM

would it be sufficient to create a hash of this URI as the

id

of the dataset, and use the

topicname

as the dataset name (in properties), and then use

tenants

namespace

etc as containers to hold references to the underlying datasets?

abundant-solstice-71438

04/25/2022, 11:19 AM

The unique topic naming within Pulsar is

{persistent|non-persistent}://tenant/namespace/topic

the first part indicates if the messages are durably persisted on disks. I think the persistent or non-persistent is not needed to be searchable, searching for tenant, namespace and topicname should be sufficient. Had a short chat with @big-carpet-38439 https://datahubspace.slack.com/archives/C017W0NTZHR/p1648562230082739 about this.

3 Views

Open in Slack

Previous Next