colossal-furniture-76714
05/25/2021, 10:41 AMhttp.client.BadStatusLine: Invalid status 80
I'm not sure where to look for the error as I do not know what causes the error. Do you have any hints? I've configured a yml for acryl-datahub...brief-toothbrush-55766
05/25/2021, 11:14 AMmicroscopic-book-98466
05/25/2021, 5:18 PMglamorous-kite-95510
05/26/2021, 6:58 AMbroad-flag-97458
05/26/2021, 12:44 PMsn
set but not a given_name
. I saw the optional parameter drop_missing_first_last_name
but I don’t think it’s applicable because this account does have a last_name. My question is should I just ensure that each user in a group has a proper givenName and sn? Or would there be a use case in general to be able to exclude users if they don’t have a givenName? I’ve got to capture it like (diff)
- first_name = attrs["givenName"][0].decode()
+ first_name = (attrs["givenName"][0]).decode() if "givenName" in attrs else None
a. I guess our LDAP server (active directory) is structured a little different than the assumptions. I had to add this block to make ldap ingestion work (otherwise I’d get index errors, etc.). My question is, is this a typical scenario that should be captured more broadly or is it just my screwy ldap setup? Diff:
+ if "objectClass" in attrs:
if (
- b"inetOrgPerson" in attrs["objectClass"]
+ b"organizationalPerson" in attrs["objectClass"]
+ or b"inetOrgPerson" in attrs["objectClass"]
or b"posixAccount" in attrs["objectClass"]
colossal-furniture-76714
05/26/2021, 3:42 PMstruct_type
columns? The method here https://datahubproject.io/docs/metadata-ingestion#hive-hive returns only a string column for the highest level, but not the columns below. To illustrate further: I get columnA, but not columnA.subColumnB or columnA.subColumnC.subsubColumnD. I think sqlalchemy
does not read the column as structured, but pyhive does. Does anyone had a similar use case in the past and can point me to the right direction? Thanks a locuddly-spoon-5445
05/31/2021, 10:10 AMpowerful-telephone-71997
06/01/2021, 6:18 AMacceptable-architect-70237
06/01/2021, 7:08 PMmetadata-ingestion
script, how could we customize the aspects of MCE? For example, when I pull PostgreSQL, I can only see the schemaMetadat
aspect. but I also need the ownership
and institutionalMemory
. I have been reading through the codes. If you can point the right direction, it would save me some time. Thankshandsome-airplane-62628
06/02/2021, 5:57 PMhandsome-wolf-39306
06/02/2021, 8:07 PMwhite-beach-27328
06/02/2021, 9:55 PM{'error': KafkaError{code=INVALID_RECORD,val=87,str="Broker: Broker failed to validate record"}, 'msg': <cimpl.Message object at 0x7f185b5d6a70>}
This is using the acryl-datahub==0.3.4
package and the redshift source. I turned on debug but it doesn’t seem to be giving much more information. Did some change in how these messages are produced? Looking at the older ingestion framework (0.6.1) compared to the new one, I think keys were basically dropped from the produced records. They used to include the urn as the key: https://github.com/linkedin/datahub/blob/v0.6.0/metadata-ingestion/sql-etl/common.py#L100. Now it’s not included: https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub/emitter/kafka_emitter.py#L62-L66. This causes our topics to fail because topic compaction no longer works. Is compaction something we shouldn’t be doing on these topics?acoustic-midnight-64606
06/03/2021, 9:23 AMpowerful-telephone-71997
06/03/2021, 1:51 PMSink report:
{'failures': [{'error': 'Unable to emit metadata to DataHub GMS',
'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
'message': 'java.lang.RuntimeException: java.lang.reflect.InvocationTargetException',
'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: '
'java.lang.reflect.InvocationTargetException\n'
'\tat com.linkedin.metadata.restli.RestliUtils.toTask(RestliUtils.java:39)\n'
'\tat com.linkedin.metadata.restli.BaseEntityResource.ingestInternal(BaseEntityResource.java:182)\n'
'\tat com.linkedin.metadata.restli.BaseEntityResource.ingest(BaseEntityResource.java:176)\n'
'\tat com.linkedin.metadata.resources.dataset.Datasets.ingest(Datasets.java:310)\n'
'\tat sun.reflect.GeneratedMethodAccessor21.invoke(Unknown Source)\n'
'\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n'
'\t... 88 more\n',
'status': 500}}],
'records_written': 9555,
'warnings': []}
Pipeline finished with failures
astonishing-mechanic-42915
06/03/2021, 4:40 PMaverage-autumn-35845
06/03/2021, 6:33 PMorg.springframework.beans.factory.BeanCreationException: Error creating bean with name 'dataProcessDAO' defined in com.linkedin.gms.factory.dataprocess.DataProcessDAOFactory: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.linkedin.metadata.dao.BaseLocalDAO]: Factory method 'createInstance' threw exception; nested exception is java.lang.NullPointerException
Even I have tried remove all images and volumes, I couldn't figure why it doesn't start (it was normal before). Please help me!astonishing-mechanic-42915
06/04/2021, 10:44 AMearly-hydrogen-59749
06/04/2021, 12:42 PMERROR {datahub.ingestion.run.pipeline:52} - failed to write record with workunit file:///bootstrap_mce.json:2 with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: javax.persistence.PersistenceException: Query threw SQLException:Table 'datahub.metadata_aspect' doesn't exist Bind values:[com.linkedin.metadata.dao.EbeanMetadataAspect$PrimaryKey@64865b96, ]
sticky-television-18623
06/04/2021, 5:58 PMacryl-datahub[mssql]
to ingest metadata from a mssql database is it possible to enable encryption using the config attributes?handsome-airplane-62628
06/04/2021, 8:25 PMstale-jewelry-2440
06/07/2021, 2:42 PMwhite-beach-27328
06/07/2021, 8:07 PMglamorous-kite-95510
06/08/2021, 3:06 AMbetter-orange-49102
06/08/2021, 8:15 AM{'failures': [{'error': 'Unable to emit metadata to DataHub GMS',
'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
'message': "No root resource defined for path '/entities'",
'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]: No root resource defined for path '
"'/entities'\n"
'\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n'
'\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:158)\n'
'\tat com.linkedin.restli.server.RestRestLiServer.buildPreRoutingRestException(RestRestLiServer.java:203)\n'
'\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:177)\n'
'\tat com.linkedin.restli.server.RestRestLiServer.doHandleRequest(RestRestLiServer.java:164)\n'
'\tat com.linkedin.restli.server.RestRestLiServer.handleRequest(RestRestLiServer.java:120)\n'
'\tat com.linkedin.restli.server.RestLiServer.handleRequest(RestLiServer.java:132)\n'
'\tat '
'com.linkedin.restli.server.DelegatingTransportDispatcher.handleRestRequest(DelegatingTransportDispatcher.java:70)\n'
'\tat com.linkedin.r2.filter.transport.DispatcherRequestFilter.onRestRequest(DispatcherRequestFilter.java:70)\n'
'\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n'
'\tat '
'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n'
'\tat '
'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n'
'\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n'
'\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n'
'\tat com.linkedin.r2.filter.transport.ServerQueryTunnelFilter.onRestRequest(ServerQueryTunnelFilter.java:58)\n'
'\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n'
'\tat '
'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n'
'\tat '
'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n'
'\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n'
'\tat com.linkedin.r2.filter.TimedNextFilter.onRequest(TimedNextFilter.java:55)\n'
'\tat com.linkedin.r2.filter.message.rest.RestFilter.onRestRequest(RestFilter.java:50)\n'
'\tat com.linkedin.r2.filter.TimedRestFilter.onRestRequest(TimedRestFilter.java:72)\n'
'\tat '
'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:146)\n'
'\tat '
'com.linkedin.r2.filter.FilterChainIterator$FilterChainRestIterator.doOnRequest(FilterChainIterator.java:132)\n'
'\tat com.linkedin.r2.filter.FilterChainIterator.onRequest(FilterChainIterator.java:62)\n'
'\tat com.linkedin.r2.filter.FilterChainImpl.onRestRequest(FilterChainImpl.java:96)\n'
'\tat com.linkedin.r2.filter.transport.FilterChainDispatcher.handleRestRequest(FilterChainDispatcher.java:75)\n'
'\tat '
'com.linkedin.r2.util.finalizer.RequestFinalizerDispatcher.handleRestRequest(RequestFinalizerDispatcher.java:61)\n'
'\tat com.linkedin.r2.transport.http.server.HttpDispatcher.handleRequest(HttpDispatcher.java:101)\n'
'\tat com.linkedin.r2.transport.http.server.AbstractR2Servlet.service(AbstractR2Servlet.java:105)\n'
'\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n'
'\tat '
'com.linkedin.restli.server.spring.ParallelRestliHttpRequestHandler.handleRequest(ParallelRestliHttpRequestHandler.java:61)\n'
'\tat '
'org.springframework.web.context.support.HttpRequestHandlerServlet.service(HttpRequestHandlerServlet.java:73)\n'
'\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:790)\n'
'\tat org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:852)\n'
'\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:544)\n'
'\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n'
'\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:536)\n'
'\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n'
'\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n'
'\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1581)\n'
'\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)\n'
'\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1307)\n'
'\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)\n'
'\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:482)\n'
'\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1549)\n'
'\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)\n'
'\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1204)\n'
'\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n'
'\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)\n'
'\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)\n'
'\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n'
'\tat org.eclipse.jetty.server.Server.handle(Server.java:494)\n'
'\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:374)\n'
'\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:268)\n'
'\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n'
'\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n'
'\tat org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)\n'
'\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)\n'
'\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)\n'
'\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)\n'
'\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)\n'
'\tat '
'org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:367)\n'
'\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:782)\n'
'\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:918)\n'
'\tat java.lang.Thread.run(Thread.java:748)\n'
"Caused by: com.linkedin.restli.server.RoutingException: No root resource defined for path '/entities'\n"
'\tat com.linkedin.restli.internal.server.RestLiRouter.process(RestLiRouter.java:139)\n'
'\tat com.linkedin.restli.server.BaseRestLiServer.getRoutingResult(BaseRestLiServer.java:139)\n'
'\tat com.linkedin.restli.server.RestRestLiServer.handleResourceRequest(RestRestLiServer.java:173)\n'
'\t... 62 more\n',
'status': 404}}],
'records_written': 0,
'warnings': []}
crooked-leather-44416
06/08/2021, 2:45 PMwhite-beach-27328
06/08/2021, 6:29 PMdazzling-book-76108
06/08/2021, 6:54 PM./gradlew clean build
then ./docker/dev.sh
and everything sounds good.
But when I try to create a new "Service entity" I got this error:
{"exceptionClass":"com.linkedin.restli.server.RestLiServiceException","stackTrace":"com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Parameters of method 'ingest' failed validation with error 'ERROR :: /entity/value :: \"com.linkedin.metadata.snapshot.ServiceSnapshot\" is not a member type of union [ { \"type\" : \"record\", \"name\" : \"ChartSnapshot\", \"namespace\" : \"com.linkedin.metadata.snapshot\", \"doc\" : \"A metadata snapshot for a specific Chart entity.\", \"fields\" : [ { \"name\" : \"urn\", \"type\" : { \"type\" : \"typeref\", \"name\" : \"ChartUrn\", \"namespace\" : \"com.linkedin.common\", \"doc\" : \"Standardized chart identifier\", \"ref\" : \"string\", \"java\" : { \"class\" : \"com.linkedin.common.urn.ChartUrn\" }, \"validate\" : { \"com.linkedin.common.validator.TypedUrnValidator\" : { \"accessible\" : true, \"constructable\" : true, \"doc\" : \"Standardized chart identifier\", \"entityType\" : \"chart\", \"fields\" : [ { \"doc\" : \"The name of the dashboard tool such as looker, redash etc.\", \"maxLength\" : 20, \"name\" : \"dashboardTool\", \"type\" : \"string\" }, { \"doc\" : \"Unique id for the chart. This id should be globally unique for a dashboarding tool even when there are multiple deployments of it. As an example, chart URL could be used here for Looker such as '<http://looker.linkedin.com/looks/1234|looker.linkedin.com/looks/1234>'\", \"maxLength\" : 200, \"name\" : \"chartId\", \"type\" : \"string\" } ], \"maxLength\" : 236, \"name\" : \"Chart\", \"namespace\" : \"li\", \"owners\" : [ \"urn:li:corpuser:fbar\", \"urn:li:corpuser:bfoo\" ], \"owningTeam\" : \"urn:li:internalTeam:datahub\" } } }, \"doc\" : \"URN for the entity the metadata snapshot is associated with.\" }, { \"name\" : \"aspects\", \"type\" : { \"type\" : \"array\", \"items\" : { \"type\" : \"typeref\", \"name\" : \"ChartAspect\", \"namespace\" : \"com.linkedin.metadata.aspect\", \"doc\" : \"A union of all supported metadata aspects for a Chart\", \"ref\" : [ { \"type\" : \"record\", \"name\" : \"ChartKey\", \"namespace\" : \"com.linkedin.metadata.key\", \"doc\" : \"Key for a Chart\", \"fields\" : [ { \"name\" : \"dashboardTool\", \"type\" : \"string\", \"doc\" : \"The name of the dashboard tool such as looker, redash etc.\", \"Searchable\" : { \"addToFilters\" : true, \"boostScore\" : 4.0, \"fieldName\" : \"tool\", \"fieldType\" : \"TEXT_PARTIAL\" } }, { \"name\" : \"chartId\", \"type\" : \"string\", \"doc\" : \"Unique id for the chart. This id should be globally unique for a dashboarding tool even when there are multiple deployments of it. As an example, chart URL could be used here for Looker such as '<http://looker.linkedin.com/looks/1234|looker.linkedin.com/looks/1234>'\" } ], \"Aspect\" : { \"name\" : \"chartKey\" } }, { \"type\" : \"record\", \"name\" : \"ChartInfo\", \"namespace\" : \"com.linkedin.chart\", \"doc\" : \"Information about a chart\", \"include\" : [ { \"type\" : \"record\", \"name\" : \"CustomProperties\", \"namespace\" : \"com.linkedin.common\", \"doc\" : \"Misc. properties about an entity.\",
[...]
I noticed that ServiceSnapshot.pdl
(described in the docs) is not in the master at /metadata/snapshot/
(link).
Also Snapshot.pdl
is not up-to-date containing ServiceSnapshot.pdl
inside union
(link).
Any ideas? Am I forgetting something?white-beach-27328
06/09/2021, 5:07 PM@here
700+ peoplegorgeous-glass-57878
06/09/2021, 6:05 PMclean-art-94242
06/10/2021, 12:04 PMdatahub ingest -c ./examples/recipes/file_to_datahub_rest.yml
I receive a 500 error: Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table ‘datahub.metadata_aspect_v2’ doesn’t exist.
If I create the table manually and run the ingest again it does seem to work:
datahub ingest -c ./examples/recipes/file_to_datahub_rest.yml
[2021-06-10 14:02:29,579] INFO {datahub.entrypoints:68} - Using config: {'source': {'type': 'file', 'config': {'filename': './examples/mce_files/single_mce.json'}}, 'sink': {'type': 'datahub-rest', 'config': {'server': '<https://datahub-gms-test.data.dpgmedia.cloud>'}}}
[2021-06-10 14:02:29,876] INFO {datahub.ingestion.run.pipeline:44} - sink wrote workunit file://./examples/mce_files/single_mce.json:0
Source report:
{'failures': {}, 'warnings': {}, 'workunit_ids': ['file://./examples/mce_files/single_mce.json:0'], 'workunits_produced': 1}
Sink report:
{'failures': [], 'records_written': 1, 'warnings': []}
Pipeline finished successfully
But I don’t see any rows in the metadata_aspect_v2
table?
Ingesting the same to datahub hosted in a local docker environment does work: entry is added to that table.
Any ideas?