Hi, so I was creating a new source using the web (...
# ui
m
Hi, so I was creating a new source using the web (Hive to be more specific). Once I finished, I got the message: Successfully created ingestion source! My problem is that although I get that message, the source isn't shown. I have checked the logs of the 3 services but in none of them I see errors. It is either a problem of the UI or the configuration (this is quite unlikely to be the source of the problem as I have already used this configuration and worked perfectly and is quite similar to the one used in the quickstart). But it not happens just with the ingestion sources, it also happens with the domains, the users, .... I have been trying different versions (v0.8.40, v0.8.36 and v0.8.34) but in all of them I had this same problem.
s
Did you execute the recipe after creating the source? It needs to be run for the ingestion to happen
👍 1
m
That is the problem, I can't execute the recipe because it is not shown in the UI.
s
Can you please post a screenshot or a video recording of the same? Can you check logs of gms?
m
The logs of gms after initialization:
The log of actions:
Copy code
2022/07/04 07:25:33 Waiting for: <http://datahub-gms:8080/health>
| 2022/07/04 07:25:33 Received 200 from <http://datahub-gms:8080/health>
| [2022-07-04 07:25:34,778] DEBUG    {datahub.telemetry.telemetry:187} - Sending init Telemetry
| [2022-07-04 07:25:35,005] DEBUG    {datahub.telemetry.telemetry:219} - Sending Telemetry
| [2022-07-04 07:25:35,151] INFO     {datahub.cli.ingest_cli:88} - DataHub CLI version: 0.8.32.1
| [2022-07-04 07:25:35,162] DEBUG    {datahub.cli.ingest_cli:94} - Using config: {'source': {'type': 'datahub-stream', 'config': {'auto_offset_reset': 'latest', 'connection': {'bootstrap': 'kafka-broker1:9092', 'schema_registry_url': '<http://schema-registry:8081>', 'consumer_config': {'security.protocol': 'PLAINTEXT'}}, 'actions': [{'type': 'executor', 'config': {'local_executor_enabled': True, 'remote_executor_enabled': 'False', 'remote_executor_type': 'acryl.executor.sqs.producer.sqs_producer.SqsRemoteExecutor', 'remote_executor_config': {'id': 'remote', 'aws_access_key_id': '""', 'aws_secret_access_key': '""', 'aws_session_token': '""', 'aws_command_queue_url': '""', 'aws_region': '""'}}}], 'topic_routes': {'mae': 'MetadataAuditEvent_v4', 'mcl': 'MetadataChangeLog_Versioned_v1'}}}, 'sink': {'type': 'console'}, 'datahub_api': {'server': '<http://datahub-gms:8080>', 'extra_headers': {'Authorization': 'Basic __datahub_system:JohnSnowKnowsNothing'}}}
| [2022-07-04 07:25:35,194] DEBUG    {datahub.ingestion.run.pipeline:128} - Sink type:console,<class 'datahub.ingestion.sink.console.ConsoleSink'> configured
| [2022-07-04 07:25:35,329] INFO     {acryl_action_fwk.source.datahub_streaming:168} - Action executor:ExecutionRequestAction: configured
| [2022-07-04 07:25:35,329] DEBUG    {datahub.ingestion.run.pipeline:135} - Source type:datahub-stream,<class 'acryl_action_fwk.source.datahub_streaming.DataHubStreamSource'> configured
| [2022-07-04 07:25:35,330] INFO     {datahub.cli.ingest_cli:104} - Starting metadata ingestion
| [2022-07-04 07:25:35,330] INFO     {acryl_action_fwk.source.datahub_streaming:188} - Will subscribe to MetadataAuditEvent_v4, MetadataChangeLog_Versioned_v1
| [2022-07-04 07:25:35,330] INFO     {acryl_action_fwk.source.datahub_streaming:191} - Action framework started
| [2022-07-04 07:26:18,826] INFO     {acryl_action_fwk.source.datahub_streaming:198} - Msg received: MetadataChangeLog_Versioned_v1, 0, 577
| [2022-07-04 07:26:18,827] INFO     {acryl_action_fwk.source.datahub_streaming:85} - Calling act of ExecutionRequestAction
| [2022-07-04 07:26:18,919] INFO     {acryl_action_fwk.source.datahub_streaming:198} - Msg received: MetadataChangeLog_Versioned_v1, 0, 578
| [2022-07-04 07:26:18,920] INFO     {acryl_action_fwk.source.datahub_streaming:85} - Calling act of ExecutionRequestAction
| [2022-07-04 07:30:52,730] INFO     {acryl_action_fwk.source.datahub_streaming:198} - Msg received: MetadataChangeLog_Versioned_v1, 0, 579
| [2022-07-04 07:30:52,731] INFO     {acryl_action_fwk.source.datahub_streaming:85} - Calling act of ExecutionRequestAction
| [2022-07-04 07:30:52,741] INFO     {acryl_action_fwk.source.datahub_streaming:198} - Msg received: MetadataChangeLog_Versioned_v1, 0, 580
| [2022-07-04 07:30:52,741] INFO     {acryl_action_fwk.source.datahub_streaming:85} - Calling act of ExecutionRequestAction
@square-activity-64562 I think it might be a problem with the communication between the GMS and the storage, as when Datahub is started, no source of ingestion is detected. This problem might also affect the front as, if I am not mistaken, the front uses the GMS to get the data to be displayed
@square-activity-64562 could the source of the problem be that the Ingestion Sources are saved to be created by urnlicorpuser:UNKNOWN instead of urnlicorpuser:__datahub_system???
createdby      | urn:li:corpuser:UNKNOWN
s
The UI currently saves it as UNKNOWN actor from many places in the UI. This isn't something we have heard from anyone else. So this is something wrong going on with your particular setup.
m
So, who is supposed to be the actor of the creation??
___datahub_system_
?
s
Ideally in my opinion it should be the person who is logged into the UI and does the creation. But that does not explain the problem here
m
And do you know by under what circumstances does this happen?? Or it is the first time it has happened? If it is of any help, I upload the docker-compose I am currently using
s
This is the first time I have heard this happening. Are you comfortable using the API to find out whether these sources are even being created in the backend or not? https://datahubproject.io/docs/graphql/queries
m
In the backend is created indeed. I am currently using PostgreSQL, so I have done a query in the table
metada_aspect_v2
and the result I got is the following:
s
Can you check the developer tools whether there is any problem in API calls? Maybe make the API call initially?
Have you forked datahub by any chance and made changes?
m
No, I use the oficial images that are published in dockerhub. One thing I have noticed is that there are 2 repositories for the actions service. Are they both the same??
s
Please use the first one. But that should not affect these showing up on the UI
Are you able to make graphql API calls and see if that returns anything?
you can use the interface at
/graphiql
e.g. https://demo.datahubproject.io/graphiql
m
I have executed a simple query
Copy code
{
appConfig
{appStatus}
}
And got a successful response. I have also tried executing:
Copy code
{
	corpUser(urn: "urn:li:corpuser:datahub") {
	  status
	}
}
And the response is:
Copy code
{
  "data": {
    "corpUser": {
      "status": null
    }
  }
}
s
Try
Copy code
query listIngestionSources($input: ListIngestionSourcesInput!) {
  listIngestionSources(input: $input) {
    start
    count
    total
    ingestionSources {
      urn
      type
      name
    }
  }
}
with variables
Copy code
{
  "input": {
    "start": 0,
    "count": 10,
    "query": "*"
  }
}
m
I had to erase the variable "query" cause it gave an error with it. The result is:
Copy code
{
  "data": {
    "listIngestionSources": {
      "start": 0,
      "count": 10,
      "total": 0,
      "ingestionSources": []
    }
  }
}
s
So it is present in mysql but not being returned by graphql?
let me ask someone for help
m
That is correct, it can be seen with the details written in the recipe in PostgreSQL but not GraphQL
b
Hey Pablo! I'm looking into this as well to see if I can help. Have you had any other issues with creating things via the UI and not seeing them? it's very weird you're seeing the item in postgres but not able to get it through graphql. so you don't use mysql at all, right? just postgres?
also when you're on the ingestion page in the UI, can you open up your developer tools so we can see the console and the network tab? let's check to see if there are any errors in the console relevant to this. and I'd love to see the graphql query specifically in your network tab trying to list these ingestion sources
i tried replicating this locally with your recipe and everything the same, but can't do it myself
m
Hi Chris, yeah I only use Postgres as DB. The problem is not just for sources, that once I create them I am not able to see them through the UI, it also happens with the users (I can't see the users after having added more information than the name), the glossary terms,... In other words, all the modifications made by users can't be seen in the UI due to the error pointed out previously: all these modifications are saved to be created by
urn:li:corpuser:UNKNOWN
So you have used the same compose and it works fine in your local deployment???
The composes I am using to deploy both Datahub and its dependencies have been used to deploy Datahub a few months later and it worked just fine. So yeah, it is quite strange that now does this thing, and doesn't even print errors.
Here is the execution of the GraphQL query with the network tab on the side
b
okay interesting, so anything you create through the UI is not showing up, not just ingestion sources?
there's some issue with connecting to your postgres instance in this situation..
b
Yeah so this appears to be elastic related. Hence the data is in the document store (Postgres) but not being returned properly in search queries (which get their candidates using elastic)
Can you confirm that the elasticsearch pod is up and running - and no error logs coming from it?
We've seen cases where if elastic isn't up (or something wrong with it), we will have this problem
m
So elastic is up and the status of it is green , but as I am using the version 7.16.3 I am getting a lot of warning messages due to the use of frozen indices. I sometimes get errors of something not being able to obtain as it doesnt exist in ES, but it creates it after. I will try to reproduce the errors to send them to you.
After showing the error in the log, as it can be seen in the end of the file, the index missing is created, so that could not be a problem at all. In this log you can also see the warnings I mentioned previously
In fact, I have started over from 0 (deploying all the dependencies from scratch) and now Datahub works perfectly. I still get the warnings but I think it is normal. As you pointed out, I think it might have been some problem with ES as, after having modified the column createdby where the UNKNOWN appear in PostgreSQL, the ingestion sources weren't shown either.