https://datahubproject.io logo
Join SlackCommunities
Powered by
# all-things-deployment
  • g

    great-branch-515

    09/12/2022, 11:50 AM
    @here I am trying build this grafana dashboard https://github.com/datahub-project/datahub/tree/master/docker/monitoring/grafana/dashboards I am not finding some of the metrics. (for example metrics_com_linkedin_metadata_resources_entity_EntityResource_search_Mean) Where I can find list of datahub metrics emitted by each service?
    l
    b
    +2
    • 5
    • 13
  • f

    full-chef-85630

    09/12/2022, 1:39 PM
    Hi all,I want to package pypi myself,I found that the metadata module is missing after packing。This module should be generated automatically. I don't know how to generate it @dazzling-judge-80093
    Copy code
    ../gradlew :metadata-ingestion:installDev
    Copy code
    Failed to build python-ldap orderedset sasl3
    ERROR: Could not build wheels for python-ldap, which is required to install pyproject.toml-based projects
    
    > Task :metadata-ingestion:installDev FAILED
    
    FAILURE: Build failed with an exception.
    
    * What went wrong:
    Execution failed for task ':metadata-ingestion:installDev'.
    > Process 'command 'bash'' finished with non-zero exit value 1
    
    * Try:
    Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
    
    * Get more help at <https://help.gradle.org>
    
    Deprecated Gradle features were used in this build, making it incompatible with Gradle 7.0.
    Use '--warning-mode all' to show the individual deprecation warnings.
    See <https://docs.gradle.org/6.9.2/userguide/command_line_interface.html#sec:command_line_warnings>
    
    BUILD FAILED in 4m 19s
    26 actionable tasks: 9 executed, 17 up-to-date
    b
    g
    • 3
    • 11
  • c

    cuddly-arm-8412

    09/13/2022, 11:48 AM
    hi,team , i add a ontest in FabricType and then it prompts error~ [checker] [RS-I]/collection/actions/batchIngest/parameters/entities/type new enum added symbols ONTEST [checker] [MD-I]com.linkedin.entity.Entity/value/com.linkedin.metadata.snapshot.Snapshot/ref/union/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/array/items/com.linkedin.metadata.aspect.DatasetAspect/ref/union/com.linkedin.metadata.key.DatasetKey/origin/com.linkedin.common.FabricType/symbols new enum added symbols ONTEST, breaks old readers [checker] [MD-I]com.linkedin.entity.Entity/value/com.linkedin.metadata.snapshot.Snapshot/ref/union/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/array/items/com.linkedin.metadata.aspect.DatasetAspect/ref/union/com.linkedin.metadata.key.DatasetKey/origin/com.linkedin.common.FabricType/symbols new enum added symbols ONTEST, breaks old readers [checker] [MD-I]com.linkedin.metadata.aspect.VersionedAspect/aspect/com.linkedin.metadata.aspect.Aspect/ref/union/com.linkedin.metadata.key.MLModelKey/origin/com.linkedin.common.FabricType/symbols new enum added symbols ONTEST, breaks old readers [checker] [MD-I]com.linkedin.metadata.aspect.VersionedAspect/aspect/com.linkedin.metadata.aspect.Aspect/ref/union/com.linkedin.metadata.key.MLModelKey/origin/com.linkedin.common.FabricType/symbols new enum added symbols ONTEST, breaks old readers
    o
    • 2
    • 1
  • b

    bland-orange-13353

    09/14/2022, 8:59 AM
    This message was deleted.
    p
    m
    • 3
    • 2
  • s

    shy-dog-84302

    09/14/2022, 7:04 PM
    Hi! I am experiencing issues in deploying Elasticsearch cluster from datahub-helm prerequisites as pod security constraints complain about root access for
    configure-sysctl
    initcontainer pod. I see running sysctl command requires root access but the container constraints does not allow that. Attached is my StatefulSet that results in this error. How can I fix this?
    Copy code
    Event  : elasticsearch-master   StatefulSet   elasticsearch-master   datahub     create Pod elasticsearch-master-0 in StatefulSet elasticsearch-master failed error: admission webhook "<http://pod-security-webhook.kubernetes.io|pod-security-webhook.kubernetes.io>" denied the request: pods "elasticsearch-master-0" is forbidden: violates PodSecurity "restricted:latest": privileged (container "configure-sysctl" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "configure-sysctl" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "configure-sysctl" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "configure-sysctl" must set securityContext.runAsNonRoot=true), runAsUser=0 (container "configure-sysctl" must not set runAsUser=0), seccompProfile (pod or container "configure-sysctl" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")   FailedCreate
    statefulset.yaml
    plus1 1
    o
    b
    w
    • 4
    • 16
  • b

    bright-egg-51769

    09/14/2022, 10:48 PM
    Hi Team, facing an issue; I am trying to setup datahub open source actually and I am stuck on getting the front end deployment to work actually I used this; https://datahubproject.io/docs/deploy/aws Additionally, here are the ingress rules for the datahub front end deployment that would allow connectivity internally. datahub-frontend: enabled: true image: repository: linkedin/datahub-frontend-react tag: "v0.8.44" # Set up ingress to expose react front-end ingress: enabled: true annotations: kubernetes.io/ingress.class : alb alb.ingress.kubernetes.io/scheme : internet-facing alb.ingress.kubernetes.io/target-type : instance alb.ingress.kubernetes.io/certificate-arn : arnawsacmus east 1:certificate/5764945f-601d alb.ingress.kubernetes.io/inbound-cidrs : 0.0.0.0/0 alb.ingress.kubernetes.io/listen-ports : '[{"HTTP": 80}, {"HTTPS":443}]' alb.ingress.kubernetes.io/actions.ssl-redirect : '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}' hosts: - host: datahub.Company.biz redirectPaths: - path: /* name: ssl-redirect port: use-annotation paths: - /* we cannot connect to this elb k8s-default-datahubd-19d3dda69b. A record named datahub.company.biz with a value of k8-ALB/ entry has be entered into Route 53 and port 443 has been opened for 0.0.0.0/0 but we still cannot reach the application.
    b
    • 2
    • 3
  • t

    thousands-solstice-2498

    09/15/2022, 3:11 AM
    Hi Team, could you please confirm schema registry use case of datahub custom helm charts?
    b
    • 2
    • 1
  • t

    tall-butcher-30509

    09/15/2022, 6:03 AM
    We want to create/alter the db versioning policy for our deployment on Kubernetes (GKE). However, according to the docs
    We will support a standardized way to do this in Kubernetes setup in the near future.
    https://datahubproject.io/docs/advanced/db-retention/ Is there any update on this?
  • c

    cuddly-arm-8412

    09/15/2022, 9:12 AM
    hi,team.In order to improve the search experience, we want to use our own elasticsearch-ik word breaker. How to configure this。Do I need to specify a word breaker when creating an index。
    o
    • 2
    • 4
  • b

    bright-egg-51769

    09/15/2022, 3:41 PM
    Hi Team, facing an issue; I am trying to setup datahub open source actually and I am stuck on getting the front end deployment to work actually I used this; https://datahubproject.io/docs/deploy/aws Additionally, here are the ingress rules for the datahub front end deployment that would allow connectivity internally. datahub-frontend: enabled: true image: repository: linkedin/datahub-frontend-react tag: "v0.8.44" # Set up ingress to expose react front-end ingress: enabled: true annotations: kubernetes.io/ingress.class : alb alb.ingress.kubernetes.io/scheme : internet-facing alb.ingress.kubernetes.io/target-type : instance alb.ingress.kubernetes.io/certificate-arn : arnawsacmus east 1:certificate/5764945f-601d alb.ingress.kubernetes.io/inbound-cidrs : 0.0.0.0/0 alb.ingress.kubernetes.io/listen-ports : '[{"HTTP": 80}, {"HTTPS":443}]' alb.ingress.kubernetes.io/actions.ssl-redirect : '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}' hosts: - host: datahub.Company.biz redirectPaths: - path: /* name: ssl-redirect port: use-annotation paths: - /* we cannot connect to this elb k8s-default-datahubd-19d3dda69b. A record named datahub.company.biz with a value of k8-ALB/ entry has be entered into Route 53 and port 443 has been opened for 0.0.0.0/0 but we still cannot reach the application.
    p
    • 2
    • 1
  • p

    proud-table-38689

    09/15/2022, 9:02 PM
    another question about Airflow - how does authentication work? If I follow https://datahubproject.io/docs/lineage/airflow/#using-datahubs-airflow-lineage-plugin-new do I need some sort of DataHub authentication token stored in Airflow?
    b
    o
    • 3
    • 3
  • b

    bumpy-journalist-41369

    09/16/2022, 7:39 AM
    How to change the credentials for the default datahub, which I log into datahub UI with? I have deployed Datahub in Kubernetes using the provided helm charts -https://github.com/acryldata/datahub-helm
    b
    c
    • 3
    • 6
  • c

    cuddly-arm-8412

    09/16/2022, 11:08 AM
    I want to search through the dataset name. Can I achieve the priority matching
    o
    b
    • 3
    • 4
  • c

    colossal-needle-73093

    09/17/2022, 4:20 AM
    $ kubectl logs --tail 300 datahub-kafka-setup-job-97skn ./kafka-setup.sh: line 11: /tmp/connection.properties: Permission denied ./kafka-setup.sh: line 12: /tmp/connection.properties: Permission denied [main] ERROR io.confluent.admin.utils.cli.KafkaReadyCommand - Error while running kafka-ready. java.nio.file.NoSuchFileException: /tmp/connection.properties at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) at java.nio.file.Files.newByteChannel(Files.java:361) at java.nio.file.Files.newByteChannel(Files.java:407) at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
    i
    • 2
    • 1
  • c

    cuddly-arm-8412

    09/19/2022, 11:44 AM
    hi,team.how to uniformly set log printing path, name, format, etc?
    i
    • 2
    • 3
  • m

    microscopic-mechanic-13766

    09/19/2022, 11:45 AM
    Hi, I have a question about the roles of users that I don't think have much sense. So in my deployment I have 3 users: Datahub (which is the default root user of the app), Ut2 and Ut3. With the user Datahub I have configured the users as following: Datahub as the Admin role, Ut2 has the Reader role and Ut3 has the Editor role. In order to see how the different roles influenced in the possible actions a user can do, I logged in with user Ut2. Once logged in, I headed to the "Manage Permissions" tab to check if I would be able (as a reader) to modify either the roles or the policies. To my surprise, I was able to modify both the roles and policies. Isn't that considered as "administrative actions" and shouldn't it be impossible for all users but the admins to manage them?? Or am I missing some keys concepts? Other things that I am able to do with the "Reader" role is to create glossary terms and domains, but I am not able to add them to the datasets. Thanks in advance! 🙂
  • c

    cuddly-arm-8412

    09/19/2022, 11:58 AM
    hi,team。The es version we used is 7.4. x, and we found that the datahub_usage_event index cannot be created successfully. Then I created the index manually. But when I browse a specific dataset resource, the track request succeeds, but no data is stored in the es database. Do I have any good suggestions for operating in the lower version of es? Can I use this function in the lower version of es?
    i
    • 2
    • 2
  • t

    thousands-solstice-2498

    09/19/2022, 1:10 PM
    Team, please look into it. getting error during kafka setup job.
    Copy code
    org.apache.kafka.common.errors.UnknownTopicOrPartitionException:
    	at org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)
    	at org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)
    	at org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:104)
    	at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:272)
    	at kafka.admin.ConfigCommand$.getResourceConfig(ConfigCommand.scala:552)
    	at kafka.admin.ConfigCommand$.alterConfig(ConfigCommand.scala:322)
    	at kafka.admin.ConfigCommand$.processCommand(ConfigCommand.scala:302)
    	at kafka.admin.ConfigCommand$.main(ConfigCommand.scala:97)
    	at kafka.admin.ConfigCommand.main(ConfigCommand.scala)
    Caused by: org.apache.kafka.common.errors.UnknownTopicOrPartitionException:
    i
    • 2
    • 1
  • m

    microscopic-mechanic-13766

    09/19/2022, 1:50 PM
    Good afternoon, so I am trying to configure Apache Ranger to use it as policy provider for Datahub. I am following the guide that appears in the documentation. My problem is that I have looking in Maven for that plugin but it doesn't seem to appear (url in such guide doesn't work either). Does anyone know anything about it?
    i
    m
    • 3
    • 4
  • v

    victorious-xylophone-76105

    09/19/2022, 4:44 PM
    Hello, The
    quickstart
    deployment started giving me issues: recently while trying to invoke
    datahub docker quickstart
    , I am getting:
    Copy code
    Fetching docker-compose file <https://raw.githubusercontent.com/datahub-project/datahub/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml> from GitHub
    Pulling docker images...
    unknown shorthand flag: 'f' in -f
    ...
    I suspect that this line of code has to do something with it, because
    docker compose
    is called instead if `docker-compose`: https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/cli/docker_cli.py#L636-L637 Please, help how to get around it? Thanks!
    i
    • 2
    • 6
  • c

    cuddly-arm-8412

    09/20/2022, 1:56 AM
    hi,team。For consanguinity, we currently use a single table as the entry to query whether we plan to have a function similar to a panorama, with the panorama of consanguinity as the home page, and then expand it again
    i
    • 2
    • 18
  • r

    rich-policeman-92383

    09/20/2022, 6:45 AM
    Hello How can we disable HTTP trace/track HTTP methods for datahub mae and mce. This is reported by our infosec team as one of the vulnerabilities. datahub version : v0.8.41
    i
    • 2
    • 5
  • f

    famous-florist-7218

    09/20/2022, 7:53 AM
    Hi guys, I got
    oeja.AnnotationParser
    spamming with message
    scanned from multiple locations
    after upgrading and re-deploying helm chart. This thing happens on datahub-gms. Any thoughts? ============= Chart version: 0.2.100. DataHub version: 0.8.44
    i
    • 2
    • 2
  • c

    colossal-fish-54995

    09/20/2022, 8:32 AM
    Hi,team.I had this problem just now when I used the
    datahub docker quickstart
    command. It looks like someone mentioned this bug earlier. But it still works fine when I ran it yesterday.
    i
    • 2
    • 6
  • m

    microscopic-mechanic-13766

    09/20/2022, 10:36 AM
    Good morning, so I have been looking through the new RBAC features and noticed that, although you can asign users different roles, they can all see all the datasets ingested into Datahub. Is there a possible way to determine who can be able to see certain datasets?? I am asking this because I think it is a key feature and don't know if it is implemented or not. Thanks in advance!!
    e
    • 2
    • 8
  • m

    microscopic-mechanic-13766

    09/20/2022, 11:37 AM
    Hello again, so I having a bit of trouble with setting a correct role and define the correct policies for some users. In my case, I have a user, let's call him user A. This user A initially has the reader role (which only lets him see the datasets and the glossary; he isn't allowed to see the ingestion, users&groups and permissions tabs). Later on I created some policies (with the root user as with the user A I didn't have enough privileges) over a domain to check if I was able to restrict the people that would be able to see it. After creating the policies, I logged in again with the user A. To my surprise, I was able to see the ingestion, users&groups and permissions tabs which I wasn't able to see when the role changes took effect. Can anyone explain why this strange behaviour might be happening??
    i
    • 2
    • 5
  • b

    bumpy-journalist-41369

    09/20/2022, 2:46 PM
    Hello. I have datahub deployed in Kubernetes cluster. After changing the password for the datahub default user I am no longer able to execute recipes for ingestion from the UI no matter the source. I get the following message:
    Copy code
    ~~~~ Execution Summary ~~~~
    
    RUN_INGEST - {'errors': [],
     'exec_id': '318cbd70-0a68-426a-aec1-c9c50c344b85',
     'infos': ['2022-09-20 13:17:11.219637 [exec_id=318cbd70-0a68-426a-aec1-c9c50c344b85] INFO: Starting execution for task with name=RUN_INGEST',
               '2022-09-20 13:17:11.220184 [exec_id=318cbd70-0a68-426a-aec1-c9c50c344b85] INFO: Caught exception EXECUTING '
               'task_id=318cbd70-0a68-426a-aec1-c9c50c344b85, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 121, in execute_task\n'
               '    self.event_loop.run_until_complete(task_future)\n'
               '  File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
               '    return f.result()\n'
               '  File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
               '    raise self._exception\n'
               '  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
               '    result = coro.send(None)\n'
               '  File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 71, in execute\n'
               '    validated_args = SubProcessIngestionTaskArgs.parse_obj(args)\n'
               '  File "pydantic/main.py", line 521, in pydantic.main.BaseModel.parse_obj\n'
               '  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__\n'
               'pydantic.error_wrappers.ValidationError: 1 validation error for SubProcessIngestionTaskArgs\n'
               'debug_mode\n'
               '  extra fields not permitted (type=value_error.extra)\n']}
    Execution finished with errors.
    If however I log into the kubernetes pod - datahub-acryl-datahub-action, and execute the same recipe.yaml it executes sucessfully. Can someone help with my problem?
    m
    b
    • 3
    • 10
  • t

    thousands-solstice-2498

    09/20/2022, 4:17 PM
    Hi Team, please advise this. s0g09ba@m-c02f5axamd6n sg-rcube-datahub % kubectl logs datahub3-postgresql-setup-job-rxk74 -n p1978837828 2022/09/20 153720 Waiting for: tcp://10.240.154.202:5432 2022/09/20 153720 Connected to tcp://10.240.154.202:5432 psql: error: connection to server at "10.240.154.202", port 5432 failed: FATAL: database "dcflow_rw" does not exist psql: error: connection to server at "10.240.154.202", port 5432 failed: FATAL: database "dcflow_rw" does not exist -- create metadata aspect table CREATE TABLE IF NOT EXISTS metadata_aspect_v2 ( urn varchar(500) not null, aspect varchar(200) not null, version bigint not null, metadata text not null, systemmetadata text, createdon timestamp not null, createdby varchar(255) not null, createdfor varchar(255), CONSTRAINT pk_metadata_aspect_v2 PRIMARY KEY (urn, aspect, version) ); -- create default records for datahub user if not exists CREATE TEMP TABLE temp_metadata_aspect_v2 AS TABLE metadata_aspect_v2; INSERT INTO temp_metadata_aspect_v2 (urn, aspect, version, metadata, createdon, createdby) VALUES( 'urnlicorpuser:datahub', 'corpUserInfo', 0, '{"displayName":"Data Hub","active":true,"fullName":"Data Hub","email":"datahub@linkedin.com"}', now(), 'urnlicorpuser:__datahub_system' ), ( 'urnlicorpuser:datahub', 'corpUserEditableInfo', 0, '{"skills":[],"teams":[],"pictureLink":"

    https://raw.githubusercontent.com/datahub-project/datahub/master/datahub-web-react/src/images/default_avatar.png▾

    "}', now(), 'urnlicorpuser:__datahub_system' ); -- only add default records if metadata_aspect is empty INSERT INTO metadata_aspect_v2 SELECT * FROM temp_metadata_aspect_v2 WHERE NOT EXISTS (SELECT * from metadata_aspect_v2); DROP TABLE temp_metadata_aspect_v2; psql: error: connection to server at "10.240.154.202", port 5432 failed: FATAL: database "datahub" does not exist 2022/09/20 153720 Command exited with error: exit status 2 s0g09ba@m-c02f5axamd6n sg-rcube-datahub %
  • c

    careful-engine-38533

    09/21/2022, 10:07 AM
    Hi, I use helm to deploy DataHub, so far I used the internal mysql which is installed by helm prerequisite, now I want to use the external mysql - how to do this without data loss? any help?
    i
    • 2
    • 5
  • s

    shy-lion-56425

    09/21/2022, 4:52 PM
    When deploying datahub via Helm. Is there an easy way to enable additional sources for the datahub-actions? Like say S3 data lake?
    i
    • 2
    • 3
1...232425...53Latest