Hi team! I got these WARN logs in datahub-gms. It ...
# troubleshoot
f
Hi team! I got these WARN logs in datahub-gms. It seems data could not be ingested? How can I fix it? (I deploy datahub using docker compose). Thanks in advance!
Copy code
08:01:55.065 [qtp522764626-446] INFO  c.l.m.r.entity.EntityResource:157 - GET urn:li:corpuser:tri.tran5
08:01:55.069 [pool-11-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entities/urn%3Ali%3Acorpuser%3Atri.tran5 - get - 200 - 4ms
08:01:55.074 [qtp522764626-391] INFO  c.l.m.r.entity.AspectResource:143 - INGEST PROPOSAL proposal: {aspectName=corpUserStatus, entityUrn=urn:li:corpuser:tri.tran5, entityType=corpuser, changeType=UPSERT, aspect={contentType=application/json, value=ByteString(length=100,bytes=7b227374...37327d7d)}}
08:01:55.091 [pool-11-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - POST /aspects?action=ingestProposal - ingestProposal - 200 - 17ms
08:01:55.752 [ThreadPoolTaskExecutor-1] INFO  c.l.m.k.t.DataHubUsageEventTransformer:74 - Invalid event type: HomePageViewEvent
08:01:55.752 [ThreadPoolTaskExecutor-1] WARN  c.l.m.k.DataHubUsageEventsProcessor:56 - Failed to apply usage events transform to record: {"type":"HomePageViewEvent","actorUrn":"urn:li:corpuser:tri.tran5","timestamp":1669622515141,"date":"Mon Nov 28 2022 15:01:55 GMT+0700 (Indochina Time)","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36","browserId":"1b20cc1b-5afe-4f60-b6f4-2876120b0463"}
08:01:55.776 [I/O dispatcher 1] INFO  c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 3 Took time ms: -1
08:01:55.781 [pool-11-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Atri.tran5) - batchGet - 200 - 3ms
08:01:55.783 [ThreadPoolTaskExecutor-1] INFO  c.l.m.k.t.DataHubUsageEventTransformer:74 - Invalid event type: HomePageViewEvent
08:01:55.783 [ThreadPoolTaskExecutor-1] WARN  c.l.m.k.DataHubUsageEventsProcessor:56 - Failed to apply usage events transform to record: {"type":"HomePageViewEvent","actorUrn":"urn:li:corpuser:tri.tran5","timestamp":1669622515213,"date":"Mon Nov 28 2022 15:01:55 GMT+0700 (Indochina Time)","userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36","browserId":"1b20cc1b-5afe-4f60-b6f4-2876120b0463"}
08:01:56.453 [pool-11-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Atri.tran5) - batchGet - 200 - 3ms
08:01:56.458 [pool-11-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Atri.tran5) - batchGet - 200 - 2ms
08:01:56.464 [pool-11-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Atri.tran5) - batchGet - 200 - 2ms
08:01:56.497 [I/O dispatcher 1] INFO  c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 1 Took time ms: -1
08:01:56.516 [pool-11-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Atri.tran5) - batchGet - 200 - 3ms
08:01:56.534 [pool-11-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Atri.tran5) - batchGet - 200 - 2ms
08:01:56.841 [pool-11-thread-1] INFO  c.l.m.filter.RestliLoggingFilter:55 - GET /entitiesV2?ids=List(urn%3Ali%3Acorpuser%3Atri.tran5) - batchGet - 200 - 2ms
b
Hey Hue, The
DataHubUsageEvent
is an internal event to track the usage of the product. It's not related to ingestion events. It's not great that GMS couldn't process the usage events, but it shouldn't effect your ingestion.
f
Thanks Peter! what's the impact if GMS couldn't process it? How could I fix it?
f
@few-sunset-43876 what kind of source did you ingest from? Just find this message from your log. Does it have any meaning?
Copy code
08:01:55.776 [I/O dispatcher 1] INFO  c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 3 Took time ms: -1
f
Hi @famous-florist-7218, we ingest the metadata from many sources airflow, oracle, bigquery ... No idea for
08:01:55.776 [I/O dispatcher 1] INFO  c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 3 Took time ms: -1
Thanks for your support!
b
That logline means you ingested 3 events.
Can you see the ingested entities when you query them on the frontend?
f
Many entities are ingested successfully (not sure 100% of them are ok), but I see there are changes on the datahub frontend. I just don't understand those warning logs and how they impacted?
f
These logs came from
DataHubUsageEvent
that used to track user behavior and usage of product. If you enable
DATAHUB_ANALYTICS_ENABLED
in docker env, you should be able to see the Analytics page as the demo below. And these usage logs were separated from your ingestion jobs. https://demo.datahubproject.io/analytics
f
Thanks Hieu for that information!! 👍