Hello, when running the hive crawler is it normal ...
# ingestion
i
Hello, when running the hive crawler is it normal to have the following warnings: •
unable to map type DATE to metadata schema
unable to map type TIMESTAMP to metadata schema
unable to map type DECIMAL to metadata schema
Isn't the
platformSchema
(platform-specific schema) meant to be generated? Here is a sample of one of the crawled tables:
Copy code
"com.linkedin.pegasus2avro.schema.SchemaMetadata": {
                        "schemaName": "dev.bi.active_time_by_crowdmember_01_days",
                        "platform": "urn:li:dataPlatform:hive",
                        "version": 0,
                        "created": {
                            "time": 1614685268000,
                            "actor": "urn:li:corpuser:etl",
                            "impersonator": null
                        },
                        "lastModified": {
                            "time": 1614685268000,
                            "actor": "urn:li:corpuser:etl",
                            "impersonator": null
                        },
                        "deleted": null,
                        "dataset": null,
                        "cluster": null,
                        "hash": "",
                        "platformSchema": {
                            "com.linkedin.pegasus2avro.schema.MySqlDDL": {
                                "tableSchema": ""
                            }
                        },
...}
Nothing appears in the field
m
@incalculable-ocean-74010 when we introduced the new ingestion scripts, we tried to be “bug compatible” with the previous scripts that existed. That’s probably why we didn’t fill out the platformSchema field.
i
So it is a bug that this field is not filled?
m
In my opinion, yes. The ui doesn’t break if you don’t fill it out though.
You just see an empty json schema / raw schema tab.
Could you file an issue and we can take a look at fixing this.
With regard to your original question about type mapping, the script tries to canonicalize the platform schema to a common schema model. The warning tells us that it was unable to create an equivalent field (with the right type) in the canonical schema. This is the schema that you see when you view the dataset in the Schema tab in tabular format.
i
If it can not map to the right type it will be String?
I will file the issue related to native schemas in the meantime
m
i