Hello, I was trying to extend the metadata model: ...
# advice-metadata-modeling
a
Hello, I was trying to extend the metadata model: trying to add a new entity. This is where I am getting stuck-
Copy code
FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':metadata-models:generateAvroSchema'.
> Process 'command '/usr/lib/jvm/java-8-openjdk-amd64/bin/java'' finished with non-zero exit value 1

* Try:
Run with --info or --debug option to get more log output. Run with --scan to get full insights.
Can anyone kindly provide me any suggestions why this is happening and how I can resolve this? Thanks.
n
this usually means that there is an error in your .pdl files I believe. Try and run
./gradlew :metadata-models:build --debug
it might give you more informations
a
Thanks for the suggestion. I see that I have made an error:
Copy code
15:08:43.856 [ERROR] [system.err] Exception in thread "main" java.lang.IllegalArgumentException: com.linkedin.testType.testTypeInfo has namespace that does not match file path '/home/ishtiaquemis1994/Desktop/datahub/metadata-models/src/main/pegasus/com/linkedin/rule/testTypeInfo.pdl'
I feel that I put the files in the right places. I am not exactly sure here what I am doing wrong. I took the DashboardInfo.pdl as my template.
I think I have solved the previous problem. There was a naming issue. But, now I am having a problem with this:
Copy code
16:04:55.817 [ERROR] [system.err] "ruleInfo" or "com.linkedin.metadata.aspect.testTypeInfo" cannot be resolved.
16:04:55.817 [ERROR] [system.err] 18,2: Type not found: testTypeInfo
I think I have made a mistake with the aspect file. I named it testType.pdl? Is this a naming issue? Or, there is any other mistake that I have made?
o
Are you able to share the PDL file and the location you have put it? You need to match the namespace at the top of the PDL file to the file path you have put it on. You also need to include imports for any custom types you include in the file. ex:
com/linkedin/dashboard/DashboardInfo.pdl
Copy code
namespace com.linkedin.dashboard

import com.linkedin.common.Time
...
The error you have looks like you have made an error with the field name "ruleInfo" in your custom type
a
Thanks. The build went through! But, every time I am trying the build, it is doing a whole lot other things too. Is there any way that I can only build the metadata type I defined, and not the other items?
o
As in you don't want to generate any of the base types, just your custom type? Or you just don't want to build other modules? You can run the specific gradle task you want to execute if you're okay with generating the base types, just want a faster build time. If you don't want the base types to be processed either there's not a way to do that without removing them or customizing the Pegasus gradle plugin, neither of which I would recommend. The gradle tasks you can try to get a bit more specific:
./gradlew :metadata-models:build
./gradlew :metadata-models:generateDataTemplate
./gradlew :metadata-models:mainTranslateSchemas
or other tasks specific to the metadata-models module, if you open up the project in IntelliJ you can see a full list in the gradle sidepanel It just depends on what you want to actually execute
a
Thank you so much. Unfortunately, I have another problem. While creating the snapshot pdl file, I see this statement in chartSnapshot.pdl:
Copy code
import com.linkedin.common.ChartUrn
But when I go into the common directory, I don't see any file with the name chartUrn. The same with dashboard too. Why is that? Where should I, lets say, myMetadataUrn.pdl create? And, I don't have a template that I can follow.
To clarify, where the urn is defined, i.e., chartUrn, dashboardUrn etc.?
o
Those files get set up in the li-utils module, since that's a dependency of the metadata-models module they get pulled in.
As a sidenote, we've made some updates from how entities get set up. We are moving away from the Snapshot models that get used in the legacy MetadataChangeEvent & MetadataAuditEvent classes in favor of using the less static MetadataChangeProposal & MetadataChangeLog classes and entities should use the generic base Urn rather than an entity specific one. You can also set up an entity without using code using the entity-registry.yml: https://datahubproject.io/docs/metadata-models-custom/ New aspects still have to be defined in PDL however
a
The link says "Currently, this project only supports adding new aspects to existing entities. You cannot add new entities to the metadata model yet." But I need a new entity. So, I think I need to stick with the Legacy approach, right? Will the legacy approach be discontinued?
And, I cannot add a file in the li-utils directory. But, ingestion using API requires defining the snapshot, like
Copy code
curl '<http://localhost:8080/entities?action=ingest>' -X POST --data '{
   "entity":{
      "value":{
         "com.linkedin.metadata.snapshot.ChartSnapshot":{
And, to create the snapshot, if I understood correctly, I will need to define my custom type URN. What can be a solution to this?
o
We need to clean up some on these docs unfortunately 😞 . You can add a new entity through the entity-registry.yml without adding a snapshot, we have a few examples of this in the main entity-registry.yml file: https://github.com/linkedin/datahub/blob/master/metadata-models/src/main/resources/entity-registry.yml#L132 Legacy approach will be around at least for the immediate future, but in general we're not recommending people take that approach for extending the model going forward. We have a second set of endpoints that are what we use internally and recommend not using the snapshot based endpoints. I am actively working on getting an OpenAPI based documentation for the newer endpoints, right now they are still using Rest.li which makes traditional endpoint documentation difficult. See: https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/cli/cli_utils.py#L406 which is used by the
datahub put
CLI
a
Thanks for the suggestions. These have been pretty helpful. I have another request. Can you kindly provide an example of as Assertion ingestion using API? May be I will be able to re-purpose Assertion and fit it to my need.