Hi All, A contribution to extract metadata from A...
# contribute-code
a
Hi All, A contribution to extract metadata from Apache Pulsar https://github.com/datahub-project/datahub/pull/4721. As this is my first experience with Python and contributing, please be gentle with me. • used the Kafka source as a starting point, so it supports state, domain, etc. • used the doc template @little-megabyte-1074 proposed One known issues is the Pulsar topic naming convention within the UI search, a topic name is for example "_persistent://tenants/namespace/topicname_" .. the double slashes cause the search to fail. Any feedback at this point is highly appreciated.
teamwork 3
m
Thanks for the contrib @abundant-solstice-71438!
quick question on the topic naming... the
<persistent://tenants/namespace/topicname>
is that the "searchable" string by which the topic needs to be found?
would it be sufficient to create a hash of this URI as the
id
of the dataset, and use the
topicname
as the dataset name (in properties), and then use
tenants
namespace
etc as containers to hold references to the underlying datasets?
a
The unique topic naming within Pulsar is
{persistent|non-persistent}://tenant/namespace/topic
the first part indicates if the messages are durably persisted on disks. I think the persistent or non-persistent is not needed to be searchable, searching for tenant, namespace and topicname should be sufficient. Had a short chat with @big-carpet-38439 https://datahubspace.slack.com/archives/C017W0NTZHR/p1648562230082739 about this.