• g

    gray-nest-42961

    3 weeks ago
    heyy team, (sorry if this is not the right channel to ask😃) One of the user stories in our company wants to include a filter by ownership in Lineage, to improve the visibility of downstream customers. I am wondering if that's something possible to add for datahub.👀 cc @bitter-lizard-32293 @numerous-byte-87938 @brainy-megabyte-90473
    g
    l
    2 replies
    Copy to Clipboard
  • p

    polite-actor-701

    2 weeks ago
    Hello. I have a question adding an entity to metadata-model. I’m using datahub v0.8.32 version. I added a new entity(DatabaseQuery) to the datahub. DatabaseQuery entity have ‘DatabaseQueryKey’ aspect and ‘DatabaseQueryProperties’ aspect. And DatabaseQueryKey composed of ‘databaseQueryId’. I edited and added the following files. • metadata-models/src/main/pegasus/com/linkedin/metadata/key/DatabaseQueryKey.pdl • metadata-models/src/main/pegasus/com/linkedin/databaseQuery/DatabaseQueryProperties.pdl • metadata-models/src/main/resources/entity-registry.yml • datahub-graphql-core/src/main/resources/entity.graphql • li-utils/src/main/javaPegasus/com/linkedin/common/urn/DatabaseQueryUrn.java • metadata-utils/src/main/java/com/linkedin/metadata/Constants.java • datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/types/databasequery/mappers/DatabaseQueryPropertiesMapper.java • datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/types/databasequery/mappers/DatabaseQueryMapper.java • datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/types/databasequery/DatabaseQueryType.java • datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/GmsGraphQLEngine.java I added the metadata related to the DatabaseQuery to datahub. And I ran it, and there were no errors. However, when I query GraphQL on page(localhost:9002/api/graphiql), an error occurs.
    ------query 1
    {
      databaseQuery(urn:"urn:li:databaseQuery:test1") {
        databaseQueryId
      }
    }
    
    ------result 1
    {
      "errors": [
        {
          "message": "The field at path '/databaseQuery/databaseQueryId' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'String' within parent type 'DatabaseQuery'",
          "path": [
            "databaseQuery",
            "databaseQueryId"
          ],
          "extensions": {
            "classification": "NullValueInNonNullableField"
          }
        }
      ],
      "data": {
        "databaseQuery": null
      }
    }
    
    ------query 2
    query {
      search(input: {type: DATABASE_QUERY, query: "test1"}) {
        total
        searchResults {
          entity {
            urn
          }
        }
      }
    }
    
    ------result 2
    {
      "errors": [
        {
          "message": "The field at path '/search/searchResults[0]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value.  The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'",
          "path": [
            "search",
            "searchResults",
            0,
            "entity"
          ],
          "extensions": {
            "classification": "NullValueInNonNullableField"
          }
        }
      ],
      "data": {
        "search": null
      }
    }
    And when I query like this, it came out well.
    ------query 3
    {
      databaseQuery(urn:"urn:li:databaseQuery:test1") {
        databaseQueryProperties {
          name
        }
      }
    }
    
    ------result 3
    {
      "data": {
        "databaseQuery": {
          "databaseQueryProperties": {
            "name": "TEST1"
          }
        }
      }
    }
    
    ------query 4
    query {
      search(input: {type: DATABASE_QUERY, query: "test1"}) {
        total
        searchResults {
          matchedFields {
            name
            value
          }
        }
      }
    }
    
    ------result 3
    {
      "data": {
        "search": {
          "total": 1,
          "searchResults": [
            {
              "matchedFields": [
                {
                  "name": "databaseQueryId",
                  "value": "test1"
                }
              ]
            }
          ]
        }
      }
    }
    The code below is part of the code I modified. DatabaseQueryKey.pdl
    namespace com.linkedin.metadata.key
    
    /**
     * Key for Database Query
     */
    @Aspect = {
      "name": "databaseQueryKey"
    }
    record DatabaseQueryKey {
      /**
       * Database Query ID
       */
      @Searchable = {
        "boostScore": 10.0,
        "enableAutocomplete": true,
        "fieldName": "id",
        "fieldType": "TEXT_PARTIAL"
      }
      databaseQueryId: string
    }
    entity-registry.yml
    - name: databaseQuery
      doc: Database Query represents a query and related information for requesting data from a data source. ex) Sql, Sparql, Graphql etc.
      keyAspect: databaseQueryKey
      aspects:
        - databaseQueryProperties
    entity.graphql
    type DatabaseQuery implements EntityWithRelationships & Entity {
        """
        A primary key of the Database Query
        """
        urn: String!
    
        """
        A standard Entity Type
        """
        type: EntityType!
    
        """
        Unique guid for Database Query
        """
        databaseQueryId: String!
    
        """
        An additional set of read only properties
        """
        databaseQueryProperties: DatabaseQueryProperties
    }
    
    enum EntityType {
    	"""
    	The Database Query Entity
    	"""
    	DATABASE_QUERY
    	
    	etc…
    }
    DatabaseQueryUrn.java
    package com.linkedin.common.urn;
    
    import com.linkedin.data.template.Custom;
    import com.linkedin.data.template.DirectCoercer;
    import com.linkedin.data.template.TemplateOutputCastException;
    import java.net.URISyntaxException;
    
    public final class DatabaseQueryUrn extends Urn {
        public static final String ENTITY_TYPE = "databaseQuery";
    
        private final String _databaseQueryId;
    
        public DatabaseQueryUrn(String databaseQueryId) {
            super(ENTITY_TYPE, TupleKey.createWithOneKeyPart(databaseQueryId));
            this._databaseQueryId = databaseQueryId;
        }
    
        private DatabaseQueryUrn(TupleKey entityKey, String databaseQueryId) {
            super("li", "databaseQuery", entityKey);
            this._databaseQueryId = databaseQueryId;
        }
    
        public String getDatabaseQueryIdEntity() {
            return _databaseQueryId;
        }
    
        public static DatabaseQueryUrn createFromString(String rawUrn) throws URISyntaxException {
            return createFromUrn(Urn.createFromString(rawUrn));
        }
    
        private static DatabaseQueryUrn deodeUrn(String databaseQueryId) throws Exception {
            return new DatabaseQueryUrn(TupleKey.create(new Object[]{databaseQueryId}), databaseQueryId);
        }
    
        public static DatabaseQueryUrn createFromUrn(Urn urn) throws URISyntaxException {
            if (!"li".equals(urn.getNamespace())) {
                throw new URISyntaxException(urn.toString(), "Urn namespace type should be 'li'.");
            } else if (!ENTITY_TYPE.equals(urn.getEntityType())) {
                throw new URISyntaxException(urn.toString(), "Urn entity type should be 'test.");
            } else {
                TupleKey key = urn.getEntityKey();
                if (key.size() != 1) {
                    throw new URISyntaxException(urn.toString(), "Invalid number of Keys.");
                } else {
                    try {
                        return deodeUrn((String)key.getAs(0, String.class));
                        // return new DatabaseQueryUrn((String)key.getAs(0, String.class));
                    } catch (Exception e) {
                        throw new URISyntaxException(urn.toString(), "Invalid URN Parameter: '"+e.getMessage());
                    }
                }
            }
        }
    
        public static DatabaseQueryUrn deserialize(String rawUrn) throws URISyntaxException {
            return createFromString(rawUrn);
        }
    
        static {
            Custom.registerCoercer(new DirectCoercer<DatabaseQueryUrn>() {
                public Object coerceInput(DatabaseQueryUrn object) throws ClassCastException {
                    return object.toString();
                }
    
                public DatabaseQueryUrn coerceOutput(Object object) throws TemplateOutputCastException {
                    try {
                        return DatabaseQueryUrn.createFromString((String)object);
                    } catch (URISyntaxException e) {
                        throw new TemplateOutputCastException("Invalid URN syntax: " + e.getMessage(), e);
                    }
                }
            }, DatabaseQueryUrn.class);
        }
    }
    DatabaseQueryMapper.java
    public class DatabaseQueryMapper implements ModelMapper<EntityResponse, DatabaseQuery> {
        public static final DatabaseQueryMapper INSTANCE = new DatabaseQueryMapper();
    
        public static DatabaseQuery map(@Nonnull final EntityResponse entityResponse) {
            return INSTANCE.apply(entityResponse);
        }
    
        @Override
        public DatabaseQuery apply(EntityResponse entityResponse) {
            final DatabaseQuery result = new DatabaseQuery();
            result.setUrn(entityResponse.getUrn().toString());
            result.setType(EntityType.DATABASE_QUERY);
    
            EnvelopedAspectMap aspectMap = entityResponse.getAspects();
            MappingHelper<DatabaseQuery> mappingHelper = new MappingHelper<>(aspectMap, result);
            mappingHelper.mapToResult(DATABASE_QUERY_KEY_ASPECT_NAME, this::mapDatabaseQueryKey);
            mappingHelper.mapToResult(DATABASE_QUERY_PROPERTIES_ASPECT_NAME, (databaseQuery, dataMap) ->
                    databaseQuery.setDatabaseQueryProperties(DatabaseQueryPropertiesMapper.map(new DatabaseQueryProperties(dataMap))));
    
            return mappingHelper.getResult();
        }
    
        private void mapDatabaseQueryKey(@Nonnull DatabaseQuery databaseQuery, @Nonnull DataMap dataMap) {
            final DatabaseQueryKey databaseQueryKey = new DatabaseQueryKey(dataMap);
            databaseQueryKey.setDatabaseQueryId(databaseQueryKey.getDatabaseQueryId());
        }
    }
    DatabaseQueryType.java
    public class DatabaseQueryType implements SearchableEntityType<DatabaseQuery>,
                                              BrowsableEntityType<DatabaseQuery> {
        static final Set<String> ASPECTS_TO_RESOLVE = ImmutableSet.of(
                DATABASE_QUERY_KEY_ASPECT_NAME,
                DATABASE_QUERY_PROPERTIES_ASPECT_NAME
        );
        private static final Set<String> FACET_FIELDS = ImmutableSet.of("access");
        private final EntityClient _entityClient;
        public DatabaseQueryType(final EntityClient entityClient) {
            _entityClient = entityClient;
        }
    
        @Override
        public EntityType type() {
            return EntityType.DATABASE_QUERY;
        }
    
        @Override
        public Class<DatabaseQuery> objectClass() {
            return DatabaseQuery.class;
        }
    
        @Override
        public List<DataFetcherResult<DatabaseQuery>> batchLoad(@Nonnull List<String> urnStrs, @Nonnull QueryContext context) throws Exception {
            final List<Urn> urns = urnStrs.stream().map(UrnUtils::getUrn).collect(Collectors.toList());
            try {
                final Map<Urn, EntityResponse> databaseQueryMap = _entityClient.batchGetV2(
                        Constants.DATABASE_QUERY_ENTITY_NAME,
                        new HashSet<>(urns),
                        ASPECTS_TO_RESOLVE,
                        context.getAuthentication()
                );
                final List<EntityResponse> results = new ArrayList<>();
    
                for (Urn urn : urns) {
                    results.add(databaseQueryMap.getOrDefault(urn, null));
                }
    
                return results.stream()
                        .map(dbQuery -> dbQuery == null ? null : DataFetcherResult.<DatabaseQuery>newResult()
                        .data(DatabaseQueryMapper.map(dbQuery))
                        .build())
                        .collect(Collectors.toList());
            } catch (Exception e) {
                throw new RuntimeException("Failed to batch load DatabaseQuery", e);
            }
        }
    
        ...skip code
    
        @Override
        public List<BrowsePath> browsePaths(@Nonnull String urn, @Nonnull QueryContext context) throws Exception {
            final StringArray result = _entityClient.getBrowsePaths(getDatabaseQueryUrn(urn), context.getAuthentication());
            return BrowsePathsMapper.map(result);
        }
    
        private com.linkedin.common.urn.DatabaseQueryUrn getDatabaseQueryUrn(String urnStr) {
            try {
                return DatabaseQueryUrn.createFromString(urnStr);
            } catch (URISyntaxException e) {
                throw new RuntimeException(String.format("Failed to retrieve databaseQuery %s, invalid urn", urnStr));
            }
        }
    }
    What am I missing?? Please help me..
  • k

    kind-dawn-17532

    2 weeks ago
    Hi All! I am working on a custom model in Datahub. While defining the fields in the record of a custom aspect, currently the fields appear in random order in the datahub UI. Is there is way to force a specific order? Also, is there a way to only display only a subset of fields from the aspect in the UI?
    k
    m
    2 replies
    Copy to Clipboard
  • d

    dry-zoo-35797

    3 months ago
    Hello, I would like to associate a domain with a dataset using python SDK. Domain is an Aspect to dataset, but I don’t see any domain related Aspect Classes below: https://github.com/datahub-project/datahub/blob/master/metadata-models/src/main/pegasus/com/linkedin/metadata/aspect/DatasetAspect.pdl I don’t want to use GraphQL endpoint to do so. Any suggestion would be appreciated. Thanks, Mahbub
    d
    b
    3 replies
    Copy to Clipboard
  • g

    great-toddler-2251

    1 week ago
    Currently a schema is just an aspect of a dataset. We’d like to have a schema be a top level standalone entity, independent of datasets, effectively having DataHub as a schema registry along with all the other great capabilities. And if a dataset did get created based on the same schema, we’d want to reference it. Is there a way to achieve this today? Is this a feature request I should submit? As a short term workaround we could just expose Confluent Schema Registry to those users who need the standalone schema support, but really we’d like to see this integrated into DataHub so we can get all the other nice stuff like lineage, glossary, tags, owners, etc.
    g
    m
    4 replies
    Copy to Clipboard
  • f

    future-smartphone-53257

    3 days ago
    Is the DataHub's metadata model (https://datahubproject.io/docs/metadata-modeling/metadata-model/#exploring-datahubs-metadata-model) available as "code" somewhere? An example of what would be great if it was available as RDFS or OWL (like FIBO is for example), as then I could query it, but really even just the graphviz dot input for the diagram would be awesome.
    f
    4 replies
    Copy to Clipboard
  • f

    future-smartphone-53257

    3 days ago
    I'm trying to mock up a data product within DataHub and I would like some suggestions or feedback for various options. Some thoughts about how to do this: • Modify the Data Model: Not an option for managed DataHub so not going to do this. • Glossary Entry as Data Product Type + DataProcess as Data Product Instance: Create a Business Glossary Entry for the Data Product type, create a DataProcess and somehow indicate that it an instance of the Type denoted by Glossary Entry. A possible variation of this could be to use DataPipeline or even DataSet as DataProcess. • Glossary Entry as Data Product Type + Glossary Entry as Data Product Instance: Create a Business Glossary Entry for the Data Product type, create a Business Glossary Entry for the Data Product instance and somehow indicate that it an instance of the Type denoted by Glossary Entry. • ... other options ?
    f
    1 replies
    Copy to Clipboard
  • s

    swift-nail-32514

    1 week ago
    Has anyone developed a way to catalog
    joins
    in DataHub? In our enterprise environment, we have specific joins that have been created by teams and they want to catalog these “approved” join queries in the catalog solution. These are not DBMS technologies that can use any kind of query analyzer technology, and although not an analyst myself, my understanding is that for certain database technologies, such as Teradata, having a catalog of optimized joins is important to ensure partitions are being utilized correctly and resources are being used efficiently. Has anyone modeled something like this out using either custom or OOTB features in DataHub?
    s
    p
    +2
    11 replies
    Copy to Clipboard