https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • c

    colossal-furniture-76714

    05/25/2021, 10:41 AM
    Hello, I want to ingest data from hive to datahub, but I permanently get the error code:
    http.client.BadStatusLine: Invalid status 80
    I'm not sure where to look for the error as I do not know what causes the error. Do you have any hints? I've configured a yml for acryl-datahub...
    g
    • 2
    • 7
  • b

    brief-toothbrush-55766

    05/25/2021, 11:14 AM
    Hi all, If I ingest a Postgis(postgres) source, is it currently possible to also extract spatial metadata e.g CRS or even the BBOX? if not and i want to implement this via the transformer, whats the best way to approach that?
    g
    • 2
    • 3
  • m

    microscopic-book-98466

    05/25/2021, 5:18 PM
    Hi Team, The build is failing for the PR https://github.com/linkedin/datahub/pull/2599/checks?check_run_id=2667447042 I checked the logs and it is failing for looker python tests. @gray-shoe-75895 can you help here?
    h
    m
    g
    • 4
    • 3
  • g

    glamorous-kite-95510

    05/26/2021, 6:58 AM
    Hi, I have some questions : 1. I wonder whether he user have to manually specify upstream and downstream and then ingest into DataHub ? Can DataHub have special mechanism to identify which is the upstream and downstream ? If I select all records form table A in order to insert table B, did DataHub can realize table A is the upstream of table B? May i have to make something like scripts to parse the query to identify all the upstream, downstream things and then ingest into DataHub 2. How can I use Spark or Airflow to push metadata into DataHub? Can you give me an example, please ? 3. What is the mechanism operation of Metadata Audit Event (MAE) ? 4. How can i change the Status of entity into true in order to remove the undesirable entity ?
    g
    • 2
    • 1
  • b

    broad-flag-97458

    05/26/2021, 12:44 PM
    Hi all, I have some questions about ldap ingestion. a. I’ve got a service account that has the
    sn
    set but not a
    given_name
    . I saw the optional parameter
    drop_missing_first_last_name
    but I don’t think it’s applicable because this account does have a last_name. My question is should I just ensure that each user in a group has a proper givenName and sn? Or would there be a use case in general to be able to exclude users if they don’t have a givenName? I’ve got to capture it like (diff)
    Copy code
    -        first_name = attrs["givenName"][0].decode()
    +        first_name = (attrs["givenName"][0]).decode() if "givenName" in attrs else None
    a. I guess our LDAP server (active directory) is structured a little different than the assumptions. I had to add this block to make ldap ingestion work (otherwise I’d get index errors, etc.). My question is, is this a typical scenario that should be captured more broadly or is it just my screwy ldap setup? Diff:
    Copy code
    +                if "objectClass" in attrs:
                     if (
    -                    b"inetOrgPerson" in attrs["objectClass"]
    +                        b"organizationalPerson" in attrs["objectClass"]
    +                        or b"inetOrgPerson" in attrs["objectClass"]
                         or b"posixAccount" in attrs["objectClass"]
    l
    g
    • 3
    • 6
  • c

    colossal-furniture-76714

    05/26/2021, 3:42 PM
    Hi everyone, has someone successfully dealt with
    struct_type
    columns? The method here https://datahubproject.io/docs/metadata-ingestion#hive-hive returns only a string column for the highest level, but not the columns below. To illustrate further: I get columnA, but not columnA.subColumnB or columnA.subColumnC.subsubColumnD. I think
    sqlalchemy
    does not read the column as structured, but pyhive does. Does anyone had a similar use case in the past and can point me to the right direction? Thanks a lo
    g
    d
    +2
    • 5
    • 11
  • c

    cuddly-spoon-5445

    05/31/2021, 10:10 AM
    Hi everyone, I encounter a bug (or expected behavior?) when using airflow lineage integrate with datahub. I have one DAG that’s executing an ETL that inlets and outlets are same dataset (i.e. backfill some value into same table) so I configure both inlets and outlets to same dataset. Once I configured and run it, I found that my dataset are not able to be found on datahub UI, it keeps showing URN not found error 😮 Then I change my DAG’s inlet and outlet to some temp/unrelated Datasets, my origin Dataset become visible again (and all schema are keeping as origin) Is this behavior expected? one DAG (or task) cant have same Dataset in both inlet and outlets? *I’m not a native english speaker, sorry if my presentation is unclear/confusing.
    h
    g
    • 3
    • 21
  • p

    powerful-telephone-71997

    06/01/2021, 6:18 AM
    I got a response that this might not be solved with Kafka as well. Yet to look at a fix…thought if someone who has seen this issue can help…
    w
    l
    g
    • 4
    • 36
  • a

    acceptable-architect-70237

    06/01/2021, 7:08 PM
    Hi team, for the
    metadata-ingestion
    script, how could we customize the aspects of MCE? For example, when I pull PostgreSQL, I can only see the
    schemaMetadat
    aspect. but I also need the
    ownership
    and
    institutionalMemory
    . I have been reading through the codes. If you can point the right direction, it would save me some time. Thanks
    m
    g
    • 3
    • 3
  • h

    handsome-airplane-62628

    06/02/2021, 5:57 PM
    Is there a way to edit what was previously ingested? IE if we ran ingestion and then at a later point a table was deleted/deprecated and no longer exists on our data warehouse....what is the appropriate was to remove this entity?
    g
    g
    • 3
    • 11
  • h

    handsome-wolf-39306

    06/02/2021, 8:07 PM
    Do you know why kafkacat -L -b localhost:9092 returns topics with no leader (leader 0)?
    m
    e
    • 3
    • 7
  • w

    white-beach-27328

    06/02/2021, 9:55 PM
    I’m running into a problem where using the ingestion recipe framework in which the schema registry in our kafka cluster is rejecting all the messages trying to be created on our MetadataChangeEvent_v4 with the following message:
    Copy code
    {'error': KafkaError{code=INVALID_RECORD,val=87,str="Broker: Broker failed to validate record"}, 'msg': <cimpl.Message object at 0x7f185b5d6a70>}
    This is using the
    acryl-datahub==0.3.4
    package and the redshift source. I turned on debug but it doesn’t seem to be giving much more information. Did some change in how these messages are produced? Looking at the older ingestion framework (0.6.1) compared to the new one, I think keys were basically dropped from the produced records. They used to include the urn as the key: https://github.com/linkedin/datahub/blob/v0.6.0/metadata-ingestion/sql-etl/common.py#L100. Now it’s not included: https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/emitter/kafka_emitter.py#L62-L66. This causes our topics to fail because topic compaction no longer works. Is compaction something we shouldn’t be doing on these topics?
    e
    g
    m
    • 4
    • 19
  • a

    acoustic-midnight-64606

    06/03/2021, 9:23 AM
    Hi everyone! We just started looking into onboarding datahub and had a quick question. What is the best way to change the database name for ingestion from various data sources (glue, redshift, any Relational database with sqlalchemy, ...) would it be a custom transformer that replaces the database name, which in some cases it does not match to the intended name?
    g
    w
    • 3
    • 6
  • p

    powerful-telephone-71997

    06/03/2021, 1:51 PM
    Copy code
    Sink report:
    {'failures': [{'error': 'Unable to emit metadata to DataHub GMS',
                   'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
                            'message': 'java.lang.RuntimeException: java.lang.reflect.InvocationTargetException',
                            'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: '
                                          'java.lang.reflect.InvocationTargetException\n'
                                          '\tat com.linkedin.metadata.restli.RestliUtils.toTask(RestliUtils.java:39)\n'
                                          '\tat com.linkedin.metadata.restli.BaseEntityResource.ingestInternal(BaseEntityResource.java:182)\n'
                                          '\tat com.linkedin.metadata.restli.BaseEntityResource.ingest(BaseEntityResource.java:176)\n'
                                          '\tat com.linkedin.metadata.resources.dataset.Datasets.ingest(Datasets.java:310)\n'
                                          '\tat sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)\n'
                                          '\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n'
                                                       '\t... 88 more\n',
                            'status': 500}}],
     'records_written': 9555,
     'warnings': []}
    
    Pipeline finished with failures
    a
    g
    g
    • 4
    • 3
  • a

    astonishing-mechanic-42915

    06/03/2021, 4:40 PM
    I'm ingesting data from dbt catalog from datahub but it is failing with "ERROR {datahub.ingestion.run.pipeline:52} - failed to write record with workunit urnlidataset". I dont understand the issue here. Appreciate any help!
    g
    • 2
    • 62
  • a

    average-autumn-35845

    06/03/2021, 6:33 PM
    I got a problem of datahub-gms failling with a log:
    Copy code
    org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'dataProcessDAO' defined in com.linkedin.gms.factory.dataprocess.DataProcessDAOFactory: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.linkedin.metadata.dao.BaseLocalDAO]: Factory method 'createInstance' threw exception; nested exception is java.lang.NullPointerException
    Even I have tried remove all images and volumes, I couldn't figure why it doesn't start (it was normal before). Please help me!
    e
    b
    • 3
    • 4
  • a

    astonishing-mechanic-42915

    06/04/2021, 10:44 AM
    {'error': 'Unable to emit metadata to DataHub GMS',                'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',                         'message': "No root resource defined for path '/entities'",                         'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status404] No root resource defined for path '                                       "'/entities'\n"
    👀 1
    s
    d
    g
    • 4
    • 3
  • e

    early-hydrogen-59749

    06/04/2021, 12:42 PM
    Got error while ingesting data in v0.8.0 Steps : 1. checkout v0.8.0 2. run nuke.sh 3. executed quickstart.sh (validated if all containers are up) 4. validated table datahub.metadata_aspect_v2 5. Executed ingestion.sh While ingesting got Error :
    Copy code
    ERROR    {datahub.ingestion.run.pipeline:52} - failed to write record with workunit file:///bootstrap_mce.json:2 with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: javax.persistence.PersistenceException: Query threw SQLException:Table 'datahub.metadata_aspect' doesn't exist Bind values:[com.linkedin.metadata.dao.EbeanMetadataAspect$PrimaryKey@64865b96, ]
    👀 1
    s
    m
    +4
    • 7
    • 123
  • s

    sticky-television-18623

    06/04/2021, 5:58 PM
    Using 
    acryl-datahub[mssql]
     to ingest metadata from a mssql database is it possible to enable encryption using the config attributes?
    m
    g
    • 3
    • 6
  • h

    handsome-airplane-62628

    06/04/2021, 8:25 PM
    For snowflake specifically (but other DB relying on sql-alchemy are likely the same) is it possible to get schema data for views?
    g
    • 2
    • 1
  • s

    stale-jewelry-2440

    06/07/2021, 2:42 PM
    Hi! Can you point me on which file/s I need to modify to have ingested also the views, and not only the tables, in MSSQL?
    g
    • 2
    • 4
  • w

    white-beach-27328

    06/07/2021, 8:07 PM
    I’m noticing that the upgrade job to migrate data for the 0.7.1 to 0.8.0 changes is going realllly slowly. Is that expected or should I be increasing the pod’s resources?
    l
    b
    • 3
    • 25
  • g

    glamorous-kite-95510

    06/08/2021, 3:06 AM
    Hi, I have a question: Can I emit lineage to DataHub with an Airflow of version 1.10.3 ? I emit it using the DatahubEmitterOperator just like mentioned in yours docs. If I can’t , can you give me some suggestion because updating version of Airflow in our system is extremely tough and would be the last resort.
    b
    g
    • 3
    • 2
  • b

    better-orange-49102

    06/08/2021, 8:15 AM
    Copy code
    {'failures': [{'error': 'Unable to emit metadata to DataHub GMS',
                   'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
                            'message': "No root resource defined for path '/entities'",
                            'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]: No root resource defined for path '
                                          "'/entities'\n"
                                          '\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n'
                                          '\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:158)\n'
                                          '\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:203)\n'
                                          '\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:177)\n'
                                          '\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:164)\n'
                                          '\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:120)\n'
                                          '\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:132)\n'
                                          '\tat '
                                          'com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n'
                                          '\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n'
                                          '\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n'
                                          '\tat '
                                          'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n'
                                          '\tat '
                                          'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n'
                                          '\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n'
                                          '\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n'
                                          '\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n'
                                          '\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n'
                                          '\tat '
                                          'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n'
                                          '\tat '
                                          'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n'
                                          '\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n'
                                          '\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n'
                                          '\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n'
                                          '\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n'
                                          '\tat '
                                          'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n'
                                          '\tat '
                                          'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n'
                                          '\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n'
                                          '\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:96)\n'
                                          '\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n'
                                          '\tat '
                                          'com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n'
                                          '\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n'
                                          '\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n'
                                          '\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n'
                                          '\tat '
                                          'com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler.handleRequest(ParallelRestliHttpRequestHandler.java:61)\n'
                                          '\tat '
                                          'org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n'
                                          '\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n'
                                          '\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)\n'
                                          '\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:544)\n'
                                          '\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n'
                                          '\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:536)\n'
                                          '\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n'
                                          '\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n'
                                          '\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)\n'
                                          '\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n'
                                          '\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307)\n'
                                          '\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n'
                                          '\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)\n'
                                          '\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)\n'
                                          '\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n'
                                          '\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1204)\n'
                                          '\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n'
                                          '\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n'
                                          '\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n'
                                          '\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n'
                                          '\tat org.eclipse.jetty.server.Server.handle(Server.java:494)\n'
                                          '\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)\n'
                                          '\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)\n'
                                          '\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n'
                                          '\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n'
                                          '\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n'
                                          '\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n'
                                          '\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n'
                                          '\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n'
                                          '\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n'
                                          '\tat '
                                          'org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367)\n'
                                          '\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)\n'
                                          '\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)\n'
                                          '\tat java.lang.Thread.run(Thread.java:748)\n'
                                          "Caused by: com.linkedin.restli.server.RoutingException: No root resource defined for path '/entities'\n"
                                          '\tat com.linkedin.restli.internal.server.RestLiRouter.process(RestLiRouter.java:139)\n'
                                          '\tat com.linkedin.restli.server.BaseRestLiServer.getRoutingResult(BaseRestLiServer.java:139)\n'
                                          '\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:173)\n'
                                          '\t... 62 more\n',
                            'status': 404}}],
     'records_written': 0,
     'warnings': []}
    g
    b
    • 3
    • 3
  • c

    crooked-leather-44416

    06/08/2021, 2:45 PM
    I am trying to ingest AVRO files from S3. Is there documentation or examples I can use as a starting point?
    c
    • 2
    • 2
  • w

    white-beach-27328

    06/08/2021, 6:29 PM
    I’m noticing an odd inconsistency between the redshift metadata ingestion code and the hive metadata ingestion code. For some reason the hive metadata ingestion doesn’t include the catalog name (i.e. database) in the dataset name but in redshift it includes the database name. Any ideas as to why or how to fix that? If the hive ingestion doesn’t include that it could lead to collisions between catalogs that have the same schema and table names.
    g
    • 2
    • 17
  • d

    dazzling-book-76108

    06/08/2021, 6:54 PM
    Hey guys! I'am trying to reproduce "No Code Modeling" at docs. I run
    ./gradlew clean build
    then 
    ./docker/dev.sh
     and everything sounds good. But when I try to create a new "Service entity" I got this error:
    Copy code
    {"exceptionClass":"com.linkedin.restli.server.RestLiServiceException","stackTrace":"com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Parameters of method 'ingest' failed validation with error 'ERROR :: /entity/value :: \"com.linkedin.metadata.snapshot.ServiceSnapshot\" is not a member type of union [ { \"type\" : \"record\", \"name\" : \"ChartSnapshot\", \"namespace\" : \"com.linkedin.metadata.snapshot\", \"doc\" : \"A metadata snapshot for a specific Chart entity.\", \"fields\" : [ { \"name\" : \"urn\", \"type\" : { \"type\" : \"typeref\", \"name\" : \"ChartUrn\", \"namespace\" : \"com.linkedin.common\", \"doc\" : \"Standardized chart identifier\", \"ref\" : \"string\", \"java\" : { \"class\" : \"com.linkedin.common.urn.ChartUrn\" }, \"validate\" : { \"com.linkedin.common.validator.TypedUrnValidator\" : { \"accessible\" : true, \"constructable\" : true, \"doc\" : \"Standardized chart identifier\", \"entityType\" : \"chart\", \"fields\" : [ { \"doc\" : \"The name of the dashboard tool such as looker, redash etc.\", \"maxLength\" : 20, \"name\" : \"dashboardTool\", \"type\" : \"string\" }, { \"doc\" : \"Unique id for the chart. This id should be globally unique for a dashboarding tool even when there are multiple deployments of it. As an example, chart URL could be used here for Looker such as '<http://looker.linkedin.com/looks/1234|looker.linkedin.com/looks/1234>'\", \"maxLength\" : 200, \"name\" : \"chartId\", \"type\" : \"string\" } ], \"maxLength\" : 236, \"name\" : \"Chart\", \"namespace\" : \"li\", \"owners\" : [ \"urn:li:corpuser:fbar\", \"urn:li:corpuser:bfoo\" ], \"owningTeam\" : \"urn:li:internalTeam:datahub\" } } }, \"doc\" : \"URN for the entity the metadata snapshot is associated with.\" }, { \"name\" : \"aspects\", \"type\" : { \"type\" : \"array\", \"items\" : { \"type\" : \"typeref\", \"name\" : \"ChartAspect\", \"namespace\" : \"com.linkedin.metadata.aspect\", \"doc\" : \"A union of all supported metadata aspects for a Chart\", \"ref\" : [ { \"type\" : \"record\", \"name\" : \"ChartKey\", \"namespace\" : \"com.linkedin.metadata.key\", \"doc\" : \"Key for a Chart\", \"fields\" : [ { \"name\" : \"dashboardTool\", \"type\" : \"string\", \"doc\" : \"The name of the dashboard tool such as looker, redash etc.\", \"Searchable\" : { \"addToFilters\" : true, \"boostScore\" : 4.0, \"fieldName\" : \"tool\", \"fieldType\" : \"TEXT_PARTIAL\" } }, { \"name\" : \"chartId\", \"type\" : \"string\", \"doc\" : \"Unique id for the chart. This id should be globally unique for a dashboarding tool even when there are multiple deployments of it. As an example, chart URL could be used here for Looker such as '<http://looker.linkedin.com/looks/1234|looker.linkedin.com/looks/1234>'\" } ], \"Aspect\" : { \"name\" : \"chartKey\" } }, { \"type\" : \"record\", \"name\" : \"ChartInfo\", \"namespace\" : \"com.linkedin.chart\", \"doc\" : \"Information about a chart\", \"include\" : [ { \"type\" : \"record\", \"name\" : \"CustomProperties\", \"namespace\" : \"com.linkedin.common\", \"doc\" : \"Misc. properties about an entity.\",
    [...]
    I noticed that
    ServiceSnapshot.pdl
    (described in the docs) is not in the master at
    /metadata/snapshot/
    (link). Also
    Snapshot.pdl
    is not up-to-date containing
    ServiceSnapshot.pdl
    inside
    union
    (link). Any ideas? Am I forgetting something?
    e
    b
    +3
    • 6
    • 48
  • w

    white-beach-27328

    06/09/2021, 5:07 PM
    @early-lamp-41924/@big-carpet-38439, could I get a review on this PR: https://github.com/linkedin/datahub/pull/2667 ? @gray-shoe-75895 already took a look but it seems I might need another one. Also might be helpful to have a slack group/channel to request these kinds of reviews. I don’t want to
    @here
    700+ people
    b
    g
    • 3
    • 8
  • g

    gorgeous-glass-57878

    06/09/2021, 6:05 PM
    Hi. Is there any way to remove the already ingested datasets in datahub? Or is it an append-only ingestion pipeline? Is there a way to mark 'Deleted' on a particular dataset?
    g
    f
    b
    • 4
    • 8
  • c

    clean-art-94242

    06/10/2021, 12:04 PM
    Hi, I have setup datahub on our eks aws cluster and exposed through ingress datahub-frontend en datahub-gms. When I try to run an ingest of a sample mce:
    Copy code
    datahub ingest -c ./examples/recipes/file_to_datahub_rest.yml
    I receive a 500 error: Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table ‘datahub.metadata_aspect_v2’ doesn’t exist. If I create the table manually and run the ingest again it does seem to work:
    Copy code
    datahub ingest -c ./examples/recipes/file_to_datahub_rest.yml
    [2021-06-10 14:02:29,579] INFO     {datahub.entrypoints:68} - Using config: {'source': {'type': 'file', 'config': {'filename': './examples/mce_files/single_mce.json'}}, 'sink': {'type': 'datahub-rest', 'config': {'server': '<https://datahub-gms-test.data.dpgmedia.cloud>'}}}
    [2021-06-10 14:02:29,876] INFO     {datahub.ingestion.run.pipeline:44} - sink wrote workunit file://./examples/mce_files/single_mce.json:0
    
    Source report:
    {'failures': {}, 'warnings': {}, 'workunit_ids': ['file://./examples/mce_files/single_mce.json:0'], 'workunits_produced': 1}
    Sink report:
    {'failures': [], 'records_written': 1, 'warnings': []}
    
    Pipeline finished successfully
    But I don’t see any rows in the
    metadata_aspect_v2
    table? Ingesting the same to datahub hosted in a local docker environment does work: entry is added to that table. Any ideas?
    b
    • 2
    • 2
1...456...144Latest