Hello, When ingested from `mssql`, I'm get some so...
# troubleshoot
d
Hello, When ingested from
mssql
, I'm get some some internal functions ingested as container. I don't remember this behaviour before version
0.8.28
. How can I prevent this?
g
That’s surprising- can you share some more context here?
d
Look
g
Got it, let me a take a look at the sql server code
interesting 🤔 there is no specific logic in our mssql code to extract containers
d
I only noticed this when I included platform_instance, but I don't know if there is any correlation
g
I see… ccing @dazzling-judge-80093 who added logic for ingesting containers
d
My recipe
Copy code
recipe = {
        "source": {
            "type": "mssql",
            "config": {
                "username": user,
                "password": password,
                "database": database,
                "host_port": host_port,
                "use_odbc": "True",
                "uri_args": {
                    "driver": "ODBC Driver 17 for SQL Server",
                    "Encrypt": "yes",
                    "TrustServerCertificate": "Yes",
                    "ssl": "True",
                },
                "env": "PROD",
                "platform_instance": instance_name,
                "domain": {
                    "removed_": {
                        "allow": ['.*removed_.*']
                    },
                    "removed_": removed_,
                    "removed_": {
                        "allow": ['.*removed_.*']
                    }
                },
                "table_pattern": {
                    "deny": [".*MSchange_tracking_history"]
                },
                "profiling": {
                    "enabled": "false",
                    "include_field_sample_values": "false"
                }
            },
        },
        "transformers": [
            {
                "type": "simple_add_dataset_tags",
                "config": {
                    "tag_urns": [
                        f"urn:li:tag:{business_unit}",
                        f"urn:li:tag:{instance_name}",
                        f"urn:li:tag:{project_id}",
                        f"urn:li:tag:{database}",
                    ]
                }
            },
            {
                "type": "add_dataset_properties",
                "config": {
                    "add_properties_resolver_class": "datahub_custom_transformers.AddMssqlProperties",
                }
            }
        ],
        "sink": {
            "type": "datahub-kafka",
            "config": {
                "connection": {
                    "bootstrap": kafka_broker,
                    "schema_registry_url": schema_registry_url
                }
            },
        },
    }
I'll remove the platform_instance and try again
g
ok, let me know how that goes
d
But before I need to delete 60k datasets 😅
Same problem after remove all container and dataset and ingested without platform_instance
Maybe the problem is in sql_common.py?
Example of aspect containerProperties
{"schema":"db_ddladmin","database":"one","instance":"PROD","platform":"mssql"}}
In this case, in the source database, the database is
one
the schema is
dbo
and `db_ddladmin`is a fix database role from mssql
d
Are you saying that Datahub captures role name instead of the schema name for containers? It is weird as we go through on the schemas what we get from sqlalchemy https://github.com/linkedin/datahub/blob/fab9c23aa5fc9d27f20cd400280507261225c49d/[…]tadata-ingestion/src/datahub/ingestion/source/sql/sql_common.py
👍 1
I need to check how the mssql sqlalchemy driver works or should work
Can you give some example how a table name looks like in your case? I can see in mssql there is way to have multipart schema name and maybe that is the one which causes this issue?? ->https://docs.sqlalchemy.org/en/14/dialects/mssql.html#multipart-schema-names
it seems like the current schema is the object name
d
Hello @dazzling-judge-80093, you want to know how a table name looks like in mssql or datahub?
d
In mssql based on what you wrote in datahub the schemaname is an objectid