Did some rules got stricter yesterday? We have a b...
# ingestion
c
Did some rules got stricter yesterday? We have a bunch of our ingestion jobs failing now because of missing properties.
To explain a bit: I think a lot more empty methods are required now instead of None’s. (Like providing an explicit empty list [] or a “” instead of a None) This seems like an anti pattern to me because now I have to fake my optionals and am forced to provide everything even if I don’t need it? Additionally some constructors are now weird and don’t seem to have their hash method working correctly. So testing becomes more difficult.
This is mainly wrt. metadata-ingestion (and potentially the codegen part in it.)
m
Thanks for reporting @calm-sunset-28996, I think this came in with a recent PR from @gray-shoe-75895. We’ll take a look.
c
I fixed all our issues in the meantime, so it’s not urgent. But was just wondering if this is expected behaviour.
m
Yes in hindsight at least the constructor change was expected. We’ll take a look at all the things you reported though.
g
Yes @calm-sunset-28996, I made the codegen class constructors actually require the required fields. The optional fields should still be optional - can you give me an example of where you saw that issue?
I'll fix the hash issue, but also the
==
operator should now work between the generated classes
c
An example is
SchemaMetadata
There I needed to add the following.
Copy code
hash="",
        platformSchema=MySqlDDL(tableSchema=""),
    )
But I don’t get what’s the advantage of doing this, over just allowing these to be null. Because now. I need to provide ‘fake’ values for everything. Another example is:
Copy code
dataset_property_aspect = DatasetProperties(
            description=redshift_table.table_description,
            customProperties={},
            uri="",
            tags=[],
        )
This just feels a bit off to me. Because normally I try to use Optionals for everything, but now I need to add default values for everything (Like an empty list etc.).
This last Redshift example might not really make sense without the customProperties, but it’s more about the uri and tags. Or is this something which shouldn’t be supported anymore?
g
So it's partially an issue with our metadata model itself, and partially an issue with the codegen. For example, the hash and platformSchema probably shouldn't be required in our metadata model, but they are - so that needs to change
The tags piece is a fun edge case with python - this explains it best: https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments. I'm working on a way to preserve the "expected" behavior for list arguments in the codegen, but for now you'll have to specify it manually
btw -
uri
should already be optional
c
I played around a bit today with the platformSchema to get it to work, but haven’t done so sucessfully. I think it is the reason for the following bug. but I’m not 100% sure:
Copy code
java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to execute search: entity type DATASET, query *, filters: [], start: 0, count: 20
Copy code
Caused by: com.linkedin.restli.client.RestLiResponseException: com.linkedin.restli.client.RestLiResponseException: Response status 500, serviceErrorMessage: com.linkedin.data.template.TemplateOutputCastException: Output test has type java.lang.String, but does not have a registered coercer and cannot be coerced to type java.net.URI
I have a feeling it is because it can’t map my types correctly anymore? I’m looking into it a bit further.
g
I haven't seen that one before. Can you provide the steps to reproduce it?