Hello all found incosistent thing in `kafka-connec...
# ingestion
s
Hello all found incosistent thing in
kafka-connect
ingestion lib when processing jdbc source connector set for postgres source via url
jdbc:<postgresql://host>:port/db
its datasets are being ingested of
source_platform=postgresql
rather than
postgres
(as for postgresql ingestor) this causes entities mismatch is there a way to handle this?
AFAIK drivername cannot be set to
jdbc:postgres
in KC connector config
b
Ah! Looks like we'd need some hardcoding in the source to map common jdbc platform names to standardized data hub platforms. cc @helpful-optician-78938 @miniature-tiger-96062 @mammoth-bear-12532
s
seems like i beleive postgres is the one and only which differ )
additionally i’ve observed that source dataset schema not ingested (for our case i is
public
)
can this one be handled?
here is a connector config
i suppose this should help
do i need create an issue or it will be done apart as PR?
h
@nutritious-bird-77396 this is still an open issue. Will be fixing it in near future. Let me know if you would like to contribute to fix it.
n
@hundreds-photographer-13496 I am happy to contribute. In my recent analysis this line gets the topic pattern based on Debezium topic patterns which includes
{servername}.{schemaName}.{tableName}
What i found on my case was the CDC pipeline was omitting the schemaName in the topic name created. Once that is added back to the topicName i think this will not be an issue. So to summarize I don't think we need any fix here in Datahub its the topic naming convention in Debezium that needs to be fixed. Correct me if my understanding is wrong.
plus1 1
s
when we trying to map entities imported via kafka-connect and postgres ingestors - we’ll never get same object in lineage moreover there is no platform called as
postgresql
in Datahub ->
postgres
in used - so driver to platform mapping should be provided at least for kafka-conect and jdbc sources
h
Hi @shy-parrot-64120 I have raised this PR for fixing kafka-connect lineage for postgres. https://github.com/linkedin/datahub/pull/4375 cc: @mammoth-bear-12532
s
superb