https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • s

    silly-finland-62382

    08/24/2022, 12:33 PM
    getting error
  • s

    silly-finland-62382

    08/24/2022, 12:33 PM
    Copy code
    from pyspark.sql import  SparkSession
    
    from pyspark import  SparkConf
    
    # spark = SparkSession.builder \
    #     .master("local") \
    #     .appName("spark-lineage") \
    #     .config("spark.jars.packages","io.acryl:datahub-spark-lineage:0.8.43") \
    #     .config("spark.extraListeners","datahub.spark.DatahubSparkListener") \
    #     .config("spark.datahub.rest.server", "<https://datahub-frontend-sso.meeshotest.in:9002>") \
    #     .enableHiveSupport() \
    #     .getOrCreate()
    #
    # # df= spark.createDataFrame([(1, "value1"), (2, "value2")], ["id", "value"])
    # # df.write.mode("overwrite").saveAsTable("spark_model")
    # # spark.stop()
    # # #
    #conf = SparkConf().set("spark.jars", "/Users/nishchayagarwal/documents/datahub-spark-lineage-0.8.43.jar")
    #
    spark=SparkSession.builder \
    .master("local[1]") \
        .appName("Main") \
        .config("spark.sql.warehouse.dir", "/tmp/data") \
        .config("spark.jars.packages", "io.acryl:datahub-spark-lineage:0.8.43") \
        .config("spark.extraListeners", "datahub.spark.DatahubSparkListener") \
        .config("spark.datahub.rest.server", "<http://172.31.18.133:8080>") \
        .config("spark.datahub.metadata.dataset.platformInstance", "dataset") \
        .config("spark.datahub.rest.token", "eyJhbGciOiJIUzI1NiJ9.eyJhY3RvclR5cGUiOiJVU0VSIiwiYWN0b3JJZCI6Im1vaGl0LmdhcmciLCJ0eXBlIjoiUEVSU09OQUwiLCJ2ZXJzaW9uIjoiMiIsImV4cCI6MTY2MzkxOTkzOSwianRpIjoiMjk2Y2E3MGUtMjA2My00ODM0LTkwNmYtMGIzZjRjMTVlY2RhIiwic3ViIjoibW9oaXQuZ2FyZyIsImlzcyI6ImRhdGFodWItbWV0YWRhdGEtc2VydmljZSJ9.tr2mu_FueVfHKz9Ze2BWmN4dqhOrTwR1t_WrfxspOmY") \
        .enableHiveSupport() \
        .getOrCreate();
    
    spark.sparkContext.addPyFile("/Users/nishchayagarwal/documents/datahub-spark-lineage-0.8.43.jar")
  • s

    silly-finland-62382

    08/24/2022, 12:33 PM
    wheile runnong spark datahub listener using oython code
  • s

    silly-finland-62382

    08/24/2022, 12:33 PM
    Error:
  • s

    silly-finland-62382

    08/24/2022, 12:33 PM
    Copy code
    /Users/nishchayagarwal/venv-test/bin/Python /Users/nishchayagarwal/IdeaProjects/prism-catalog/lineage/staging/datahub_spark_lineage.py
    Ivy Default Cache set to: /Users/nishchayagarwal/.ivy2/cache The jars for the packages stored in: /Users/nishchayagarwal/.ivy2/jars :: loading settings :: url = jarfile/Users/nishchayagarwal/venv-test/lib/python3.8/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml io.acryl#datahub-spark-lineage added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent-b55a9793-fa10-4aae-a4dc-9d489478e53f;1.0 confs: [default] found io.acryl#datahub-spark-lineage;0.8.43 in central :: resolution report :: resolve 148ms :: artifacts dl 2ms :: modules in use: io.acryl#datahub-spark-lineage;0.8.43 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 1 | 0 | 0 | 0 || 1 | 0 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent-b55a9793-fa10-4aae-a4dc-9d489478e53f confs: [default] 0 artifacts copied, 1 already retrieved (0kB/4ms) 22/08/24 180227 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 22/08/24 180229 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041. SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Process finished with exit code 0
    b
    • 2
    • 3
  • s

    silly-finland-62382

    08/24/2022, 12:33 PM
    @big-carpet-38439
  • s

    silly-finland-62382

    08/24/2022, 12:33 PM
    @bulky-soccer-26729
  • s

    silly-finland-62382

    08/24/2022, 12:33 PM
    Need your urgent help
  • b

    bright-motherboard-35257

    08/24/2022, 1:00 PM
    When #getting-started with the docker container, how do you add additional #ingestion plugins that are not showing on the add soure screen (e.g. Salesforce, Elasticsearch). There are 3 containers running: • datahub-frontend-react • datahub-gms • datahub_datahub-actions_1 Do I go into one of these and install the plugins, then restart the container?
    b
    • 2
    • 1
  • b

    busy-analyst-8258

    08/24/2022, 2:57 PM
    Hello Everyone, we are using the Stats option to populate the data profiling. we are not doing data profiling directly in Datahub, but we have the profiling results in the greenplum database. I am able to ingest the values for the stats. question is can we create more columns in the stats section. in Data hub there are already 9 columns, can we customize those columns by adding few more?
    b
    • 2
    • 2
  • b

    best-fireman-42901

    08/24/2022, 4:41 PM
    Hi All. We're deploying onto AWS using Terraform/kubectl & helm. I'm getting an error with the 'datahub-datahub-upgrade-job' when trying to install datahub. We are providing our values files when running the install command. The error is below: client.go607 [debug] datahub-datahub-upgrade-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0 Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition helm.go84 [debug] failed post-install: timed out waiting for the condition INSTALLATION FAILED main.newInstallCmd.func2 helm.sh/helm/v3/cmd/helm/install.go:127 github.com/spf13/cobra.(*Command).execute github.com/spf13/cobra@v1.4.0/command.go:856 github.com/spf13/cobra.(*Command).ExecuteC github.com/spf13/cobra@v1.4.0/command.go:974 github.com/spf13/cobra.(*Command).Execute github.com/spf13/cobra@v1.4.0/command.go:902 main.main helm.sh/helm/v3/cmd/helm/helm.go:83 runtime.main runtime/proc.go:255 runtime.goexit runtime/asm_amd64.s:1581 How do i find out what the issue is? We also want to use AWS services rather than stand up the dependencies on within the K8s stack. We have a values file which we reference when installing pre-requisites - i assume we set the below as false?: elasticsearch: enabled: false # set this to false, if you want to provide your own ES instance. replicas: 3 Do we need to set Kafka and SQL to false also if we want to provide our own instances for those services? When running the datahub install command, we have another values file which we reference, these have the below entries: elasticsearchSetupJob: enabled: false kafkaSetupJob: enabled: false mysqlSetupJob: enabled: false And then under 'global' we have the details for each service to use? I cant work out if this is correct from the documentation site?
    b
    o
    • 3
    • 3
  • s

    silly-finland-62382

    08/24/2022, 6:37 PM
    Hye
  • s

    silly-finland-62382

    08/24/2022, 6:38 PM
    someone can tell me how to see spark ingested data using spark lineage , I am not able to see data actually on datahub , is there any workaround @dazzling-judge-80093 @big-carpet-38439 @bulky-soccer-26729 @little-megabyte-1074
    b
    • 2
    • 1
  • p

    purple-monitor-41675

    08/24/2022, 11:45 PM
    Hi, anyone getting this error installing
    accumulation-tree
    package when trying to install
    datahub
    ? ENV:
    Copy code
    OS: macos
    arch: x86_64
    pip v22.2.2
    python version: Python 3.8.9
    running
    python -m pip install --upgrade acryl-datahub
    Copy code
    note: This error originates from a subprocess, and is likely not a problem with pip.
      ERROR: Failed building wheel for accumulation-tree
      Running setup.py clean for accumulation-tree
    Failed to build accumulation-tree
    Installing collected packages: accumulation-tree, tdigest, acryl-datahub
      Running setup.py install for accumulation-tree ... error
      error: subprocess-exited-with-error
    
      × Running setup.py install for accumulation-tree did not run successfully.
      │ exit code: 1
      ╰─> [24 lines of output]
          /Users/home/.virtualenvs/datahub/lib/python3.8/site-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
            warnings.warn(
          running install
          /Users/home/.virtualenvs/datahub/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
            warnings.warn(
          running build
          running build_py
          creating build
          creating build/lib.macosx-10.14-x86_64-cpython-38
          creating build/lib.macosx-10.14-x86_64-cpython-38/accumulation_tree
          copying accumulation_tree/treeslice.py -> build/lib.macosx-10.14-x86_64-cpython-38/accumulation_tree
          copying accumulation_tree/__init__.py -> build/lib.macosx-10.14-x86_64-cpython-38/accumulation_tree
          copying accumulation_tree/abctree.py -> build/lib.macosx-10.14-x86_64-cpython-38/accumulation_tree
          running build_ext
          skipping 'accumulation_tree/accumulation_tree.c' Cython extension (up-to-date)
          building 'accumulation_tree.accumulation_tree' extension
          creating build/temp.macosx-10.14-x86_64-cpython-38
          creating build/temp.macosx-10.14-x86_64-cpython-38/accumulation_tree
          clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -iwithsysroot/System/Library/Frameworks/System.framework/PrivateHeaders -iwithsysroot/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers -Werror=implicit-function-declaration -arch x86_64 -I/Users/home/.virtualenvs/datahub/include -I/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/Headers -c accumulation_tree/accumulation_tree.c -o build/temp.macosx-10.14-x86_64-cpython-38/accumulation_tree/accumulation_tree.o
          accumulation_tree/accumulation_tree.c:6:10: fatal error: 'Python.h' file not found
          #include "Python.h"
                   ^~~~~~~~~~
          1 error generated.
          error: command '/usr/bin/clang' failed with exit code 1
          [end of output]
    
      note: This error originates from a subprocess, and is likely not a problem with pip.
    error: legacy-install-failure
    
    × Encountered error while trying to install package.
    ╰─> accumulation-tree
    
    note: This is an issue with the package mentioned above, not pip.
    hint: See above for output from the failure.
    plus1 1
    g
    m
    +3
    • 6
    • 20
  • l

    lemon-cat-72045

    08/25/2022, 7:59 AM
    Hi, I'm following the instructions in the DataHub Developer's Guide. But when running
    .gradlew build
    command, it fails on running
    metadata-io:test
    with
    Copy code
    java.lang.IllegalStateException: Could not find a valid Docker environment. Please see logs and check configuration
    ENV:
    Copy code
    OS: macos
    arch: aarch64(M1)
    java: 
    openjdk version "1.8.0_345"
    OpenJDK Runtime Environment (Zulu 8.64.0.19-CA-macos-aarch64) (build 1.8.0_345-b01)
    OpenJDK 64-Bit Server VM (Zulu 8.64.0.19-CA-macos-aarch64) (build 25.345-b01, mixed mode)
    docker:
    Docker version 20.10.17, build 100c701
    The full log looks like:
    Copy code
    > Task :metadata-io:test
    
    Gradle Test Executor 35 STANDARD_ERROR
        SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
        SLF4J: Defaulting to no-operation (NOP) logger implementation
        SLF4J: See <http://www.slf4j.org/codes.html#StaticLoggerBinder> for further details.
    
    Gradle suite > Gradle test > com.linkedin.metadata.graph.dgraph.DgraphGraphServiceTest > setup FAILED
        java.lang.IllegalStateException: Could not find a valid Docker environment. Please see logs and check configuration
            at org.testcontainers.dockerclient.DockerClientProviderStrategy.lambda$getFirstValidStrategy$7(DockerClientProviderStrategy.java:215)
            at java.util.Optional.orElseThrow(Optional.java:290)
            at org.testcontainers.dockerclient.DockerClientProviderStrategy.getFirstValidStrategy(DockerClientProviderStrategy.java:207)
            at org.testcontainers.DockerClientFactory.getOrInitializeStrategy(DockerClientFactory.java:136)
            at org.testcontainers.DockerClientFactory.client(DockerClientFactory.java:178)
            at org.testcontainers.LazyDockerClient.getDockerClient(LazyDockerClient.java:14)
            at org.testcontainers.LazyDockerClient.infoCmd(LazyDockerClient.java:12)
            at com.linkedin.metadata.DockerTestUtils.checkContainerEngine(DockerTestUtils.java:10)
            at com.linkedin.metadata.graph.dgraph.DgraphGraphServiceTest.setup(DgraphGraphServiceTest.java:62)
    
    Gradle suite > Gradle test > com.linkedin.metadata.systemmetadata.ElasticSearchSystemMetadataServiceTest > setup FAILED
        org.testcontainers.containers.ContainerFetchException: Can't get Docker image: RemoteDockerImage(imageName=<http://docker.elastic.co/elasticsearch/elasticsearch:7.9.3|docker.elastic.co/elasticsearch/elasticsearch:7.9.3>, imagePullPolicy=DefaultPullPolicy())
            at org.testcontainers.containers.GenericContainer.getDockerImageName(GenericContainer.java:1286)
            at org.testcontainers.containers.GenericContainer.logger(GenericContainer.java:615)
            at org.testcontainers.elasticsearch.ElasticsearchContainer.<init>(ElasticsearchContainer.java:73)
            at com.linkedin.metadata.ElasticTestUtils.getNewElasticsearchContainer(ElasticTestUtils.java:31)
            at com.linkedin.metadata.systemmetadata.ElasticSearchSystemMetadataServiceTest.setup(ElasticSearchSystemMetadataServiceTest.java:37)
    
            Caused by:
            java.lang.IllegalStateException: Previous attempts to find a Docker environment failed. Will not retry. Please see logs and check configuration
                at org.testcontainers.dockerclient.DockerClientProviderStrategy.getFirstValidStrategy(DockerClientProviderStrategy.java:109)
                at org.testcontainers.DockerClientFactory.getOrInitializeStrategy(DockerClientFactory.java:136)
                at org.testcontainers.DockerClientFactory.client(DockerClientFactory.java:178)
                at org.testcontainers.LazyDockerClient.getDockerClient(LazyDockerClient.java:14)
            Caused by:
    
                at org.testcontainers.LazyDockerClient.listImagesCmd(LazyDockerClient.java:12)
                at org.testcontainers.images.LocalImagesCache.maybeInitCache(LocalImagesCache.java:68)
                at org.testcontainers.images.LocalImagesCache.get(LocalImagesCache.java:32)
                at org.testcontainers.images.AbstractImagePullPolicy.shouldPull(AbstractImagePullPolicy.java:18)
                at org.testcontainers.images.RemoteDockerImage.resolve(RemoteDockerImage.java:66)
                at org.testcontainers.images.RemoteDockerImage.resolve(RemoteDockerImage.java:27)
                at org.testcontainers.utility.LazyFuture.getResolvedValue(LazyFuture.java:17)
                at org.testcontainers.utility.LazyFuture.get(LazyFuture.java:39)
                at org.testcontainers.containers.GenericContainer.getDockerImageName(GenericContainer.java:1284)
                ... 4 more
    I think this is related to the test container is not compatible with M1 chip after some googling, but I could not find a solution. Could someone help me with this issue? Much appreciated!
    d
    e
    l
    • 4
    • 4
  • h

    happy-twilight-44865

    08/25/2022, 12:58 PM
    Can I use sqlalchemy source type to connect from dremio ?? I tried to connect it but getting error can't load plugin sqlalchemy.dialects:pyodbc
    g
    • 2
    • 3
  • s

    silly-finland-62382

    08/25/2022, 5:45 PM
    Hey, Can we run spark lineage datahub integration on databricks?
    d
    • 2
    • 23
  • t

    thousands-solstice-2498

    08/26/2022, 10:38 AM
    Please advise. *kube:*client.go299 [debug] Starting delete for "sg-rcube-datahub-elasticsearch-setup-job" Job *kube:*client.go128 [debug] creating 1 resource(s) *kube:*client.go528 [debug] Watching for changes to Job sg-rcube-datahub-elasticsearch-setup-job with timeout of 1h23m20s *kube:*client.go556 [debug] Add/Modify event for sg-rcube-datahub-elasticsearch-setup-job: ADDED *kube:*client.go595 [debug] sg-rcube-datahub-elasticsearch-setup-job: Jobs active: 0, jobs failed: 0, jobs succeeded: 0 *kube:*client.go556 [debug] Add/Modify event for sg-rcube-datahub-elasticsearch-setup-job: MODIFIED *kube:*client.go595 [debug] sg-rcube-datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 0, jobs succeeded: 0 *kube:*client.go556 [debug] Add/Modify event for sg-rcube-datahub-elasticsearch-setup-job: MODIFIED *kube:*client.go595 [debug] sg-rcube-datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 1, jobs succeeded: 0 *kube:*client.go556 [debug] Add/Modify event for sg-rcube-datahub-elasticsearch-setup-job: MODIFIED *kube:*client.go595 [debug] sg-rcube-datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 2, jobs succeeded: 0 *kube:*client.go556 [debug] Add/Modify event for sg-rcube-datahub-elasticsearch-setup-job: MODIFIED *kube:*client.go595 [debug] sg-rcube-datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 3, jobs succeeded: 0 *kube:*client.go556 [debug] Add/Modify event for sg-rcube-datahub-elasticsearch-setup-job: MODIFIED *kube:*client.go595 [debug] sg-rcube-datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 4, jobs succeeded: 0 *kube:*client.go556 [debug] Add/Modify event for sg-rcube-datahub-elasticsearch-setup-job: MODIFIED *kube:*client.go595 [debug] sg-rcube-datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 5, jobs succeeded: 0 *kube:*client.go556 [debug] Add/Modify event for sg-rcube-datahub-elasticsearch-setup-job: MODIFIED *kube:*client.go595 [debug] sg-rcube-datahub-elasticsearch-setup-job: Jobs active: 1, jobs failed: 6, jobs succeeded: 0 *kube:*client.go556 [debug] Add/Modify event for sg-rcube-datahub-elasticsearch-setup-job: MODIFIED *kube:*Error: failed pre-install: job failed: BackoffLimitExceeded *kube:*helm.go88 [debug] failed pre-install: job failed: BackoffLimitExceeded
    l
    • 2
    • 2
  • s

    steep-sandwich-72508

    08/26/2022, 12:05 PM
    Hi there, I'm consuming the graphql API and facing a refresh problem. I updated the "About" section of a dashboard using the datahub interface, but when I call the graphql API I'm still getting the old value of it. Is the any cache configuration or something I should do to refresh this data? I'm using this query to get the data
    Copy code
    query GetDashboards($textoBusca: String!) {
            search(input: { count: 1000,type: DASHBOARD, query: $textoBusca}) {
              searchResults {
                entity {
                  urn
                  type
                  ... on Dashboard {
                    tool
                    info {
                      name
                      description
                      externalUrl
                    }
                    status {
                      removed
                    }
                    platform {
                      name
                    }
                    domain {
                      properties{
                        name
                      }
                    }
                  }
                }
              }
            }
          }
    and the info -> description is the field holding the "about" info.
    g
    • 2
    • 9
  • b

    best-market-29539

    08/26/2022, 12:14 PM
    Hey, does datahub support argo workflows integration? Or are there any plans to support that?
    g
    w
    • 3
    • 8
  • l

    lemon-engine-23512

    08/27/2022, 5:45 PM
    Hello team, How do i install the datahub sdk in my project for creating custom source or emitter?
    m
    • 2
    • 2
  • b

    bulky-jordan-44775

    08/28/2022, 7:41 AM
    Hi all, I'm looking for a sample dataset I can load into my datahub installation to showcase it in my company. Can someone help me with that? Thanks!
    g
    • 2
    • 7
  • b

    breezy-shoe-41523

    08/28/2022, 2:36 PM
    Hello Team, I have question for cp-schema-registry i’m deploying datahub on our own cluster and we don’t use cp-schema-registry so my question is is cp-schema-registry required feature? what does cp-schema-registry do in datahub?
    b
    • 2
    • 3
  • b

    bumpy-activity-74405

    08/29/2022, 6:42 AM
    Hey, I want to get all datasets that have a certain tag via api. Have never worked with graphql previously, can someone please set me on the right track? Did not find anything of the sort while reading through documentation
    b
    • 2
    • 13
  • b

    breezy-shoe-41523

    08/29/2022, 6:46 AM
    Hello Team, i have another question i need to open acl between my other mysql and datahub so my question is which component of datahub makes queries to mysql??? i am deploying datahub with k8s. i think it must be gms but accurate answer is required
    g
    • 2
    • 2
  • s

    silly-finland-62382

    08/29/2022, 12:48 PM
    https://datahubspace.slack.com/archives/CV2KB471C/p1661449550636869 Regarding this, can you please help, how to handle databricks spark lineage
  • b

    busy-computer-98970

    08/29/2022, 5:24 PM
    Hello Everyone! Some tips for Datahub deploy with Azure DevOps pipeline on AWS?
  • l

    late-rocket-94535

    08/30/2022, 5:29 AM
    Hi Team! How can I filter data in GraphQL by null? I tried construction like
    {field: "typeNames", value: {ne: null} }}
    but it's not working. And other operators like
    regex
    or
    in
    I can't understand how to use. Give an example, please.
    g
    • 2
    • 1
  • e

    enough-monitor-24292

    08/30/2022, 9:44 AM
    Hi, Can we upgrade datahub from 0.8.38 to 0.8.43 without any data loss or failure? Thanks
    b
    g
    b
    • 4
    • 6
  • c

    cool-kitchen-48091

    08/30/2022, 2:58 PM
    Does DataHub support LDAP bind? I’ve been browsing through the frontend code but couldn’t find it🤔
    b
    • 2
    • 2
1...404142...80Latest