https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • f

    fresh-napkin-5247

    04/21/2022, 1:07 PM
    Hello all 🙂. We are currently evaluating Datahub at my company, however I am having an error that I am not quite sure how to solve. The error happens when using the glue connector to write to a file sink:
    Copy code
    UnboundLocalError: local variable 'node_urn' referenced before assignment
    I am running the command
    datahub ingest -c glue.yml
    , and it runs and writes a lot of datasets to the sink file, but then this error appears and the process exits. Anyone had a similar issue? The recipe file is just a regular recipe file like in the demo on the website. I also had another error, where an exception would occur because Datahub was trying to read the 'StorageDescriptor' from a dictionary without this key (I assume this is from the boto3 API). I solved this error by ignoring some tables, however it's weird to me that datahub does not handle this exception and just stops altogether. Thank you!
    s
    h
    b
    • 4
    • 7
  • f

    full-dentist-68591

    04/21/2022, 2:45 PM
    Hi all, is there a way to set a domain for a dataset via Python MCPW? I couldn't find anything in the examples :)
    s
    b
    i
    • 4
    • 10
  • q

    quick-student-61408

    04/21/2022, 3:01 PM
    Hi everyone, today i tried to connect my openldap server it's a success. But i've a problem every LDAP user it's automatically dropped :
    Copy code
    'dropped_dns': ['cn=charlie,ou=datahubaccounts,dc=datahub,dc=com',\n"
               "                 'cn=anne,ou=datahubaccounts,dc=datahub,dc=com',\n"
               "                 'cn=antoine,ou=datahubaccounts,dc=datahub,dc=com',\n"
               "                 'cn=charlieC,ou=datahubaccounts,dc=datahub,dc=com',\n"
               "                 'cn=charlie C,ou=datahubaccounts,dc=datahub,dc=com',\n"
               "                 'cn=charlie charlie,ou=datahubaccounts,dc=datahub,dc=com',\n"
               "                 'cn=anneD,ou=datahubaccounts,dc=datahub,dc=com']}\n"
    When i turned to false the
    drop_missing_first_last_name
    option, i've an error : find attached Can you help me ? Thank you
    output ldap datahub.txt
    s
    b
    +2
    • 5
    • 6
  • a

    acoustic-quill-54426

    04/22/2022, 8:55 AM
    Ingesting from
    bigquery
    and
    bigquery-usage
    is failing since yesterday for us due to 500 errors at
    <http://logging.googleapis.com/v2/entries:list|logging.googleapis.com/v2/entries:list>
    . Although google claims the incident to be resolved I can reproduce the error from google cloud console 😅
    d
    • 2
    • 1
  • s

    square-solstice-69079

    04/22/2022, 9:53 AM
    Is it possible to delete a domain? Cant find anything in the UI or with datahub delete --urn is not working.
    n
    b
    • 3
    • 2
  • b

    better-spoon-77762

    04/22/2022, 5:42 PM
    Hello I am trying to use AWS MKS (kafka) as a replacement to kafka for my datahub deployment, in this case do I need to run separate schema-registry OR aws glue schema-registry is supported ?
    b
    • 2
    • 1
  • s

    square-solstice-69079

    04/23/2022, 7:25 AM
    Is it possible to change the database name show in datahub, maybe with a custom transform? (For Redshift) The problem we have is that we just used the default database name "dev", but now that we want to expose the data in datahub, this can be confusing for end users. The dev database name is also protected from just being renamed unfortunately. For Oracle there is no database concept, it just shows the schemas, and that would be nice in our case for Redshift.
    d
    m
    • 3
    • 6
  • m

    many-pillow-9544

    04/24/2022, 7:52 AM
    Hi!I've been able to successfully deployed datahub in my local network. However, as it could be seen in the photo, when I am trying to ingest data from UI Ingestion tab, I am facing some problems. Here is one of them : As I choose a source (an Oracle DB for instance) , at the "Configure Oracle Recipe" section , the below box would stick "Loading" and I can not progress.Any idea how can I fix it? Where should I begin the troubleshooting ?
    b
    • 2
    • 1
  • m

    modern-zoo-97059

    04/25/2022, 2:39 AM
    Copy code
    play.api.UnexpectedException: Unexpected exception[CompletionException: java.net.ConnectException: Connection refused: datahub-gms/172.18.0.5:8080]
            at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:247)
            at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:176)
            at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:363)
            at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:361)
            at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:346)
            at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:345)
            at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
            at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
            at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:92)
            at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:92)
            at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:92)
            at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
            at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)
            at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
            at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:49)
            at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
            at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
            at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
            at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
    Caused by: java.util.concurrent.CompletionException: java.net.ConnectException: Connection refused: datahub-gms/172.18.0.5:8080
            at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
            at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
            at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
            at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
            at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
            at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
            at scala.concurrent.java8.FuturesConvertersImpl$CF.apply(FutureConvertersImpl.scala:21)
            at scala.concurrent.java8.FuturesConvertersImpl$CF.apply(FutureConvertersImpl.scala:18)
            at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
            at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
            at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
            at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
            at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
            at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
            at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
            at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
            at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
            at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
            at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
            at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
            at scala.concurrent.Promise$class.complete(Promise.scala:55)
            at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:157)
            at scala.concurrent.Promise$class.failure(Promise.scala:104)
            at scala.concurrent.impl.Promise$DefaultPromise.failure(Promise.scala:157)
            at play.libs.ws.ahc.StandaloneAhcWSClient$ResponseAsyncCompletionHandler.onThrowable(StandaloneAhcWSClient.java:227)
            at play.shaded.ahc.org.asynchttpclient.netty.NettyResponseFuture.abort(NettyResponseFuture.java:278)
            at play.shaded.ahc.org.asynchttpclient.netty.channel.NettyConnectListener.onFailure(NettyConnectListener.java:181)
            at play.shaded.ahc.org.asynchttpclient.netty.channel.NettyChannelConnector$1.onFailure(NettyChannelConnector.java:108)
            at play.shaded.ahc.org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:28)
            at play.shaded.ahc.org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:20)
            at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
            at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
            at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
            at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
            at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
            at play.shaded.ahc.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327)
            at play.shaded.ahc.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343)
            at play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:632)
            at play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
            at play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
            at play.shaded.ahc.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
            at play.shaded.ahc.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
            at play.shaded.ahc.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
            at java.lang.Thread.run(Thread.java:748)
    Caused by: java.net.ConnectException: Connection refused: datahub-gms/172.18.0.5:8080
            at play.shaded.ahc.org.asynchttpclient.netty.channel.NettyConnectListener.onFailure(NettyConnectListener.java:179)
            ... 17 common frames omitted
    Caused by: play.shaded.ahc.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: datahub-gms/172.18.0.5:8080
            at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
            at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
            at play.shaded.ahc.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
            at play.shaded.ahc.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
            ... 7 common frames omitted
    Caused by: java.net.ConnectException: Connection refused
            ... 11 common frames omitted
    Hi! I user ingest UI. and it's failed and throws 500 exception. then I refresh page and I'm facing this problem.
    b
    p
    • 3
    • 19
  • f

    full-dentist-68591

    04/25/2022, 7:14 AM
    Hi all, I am looking for a way to find domain urns by their name. Using
    DataHubGraph
    doesn't seem suitable because it requires
    entity_urn
    . Any recommendations here?
    b
    g
    • 3
    • 2
  • i

    important-wire-73

    04/25/2022, 7:22 AM
    Hi , I did some bigquery ingestion in bulk and then ingested looker data. jobs completed. gms logs keep showing bulk requests (ES) processing. I checked elasticsearch there’s no data for dashboards, but when i put url of looker dataset in datahub UI it shows all the data.
    g
    o
    l
    • 4
    • 31
  • k

    kind-psychiatrist-76973

    04/25/2022, 9:04 AM
    Is there a way to see if datahub GMS connection pool limits? I see GMS can use only 50 connections, I would like to make sure there is not a limit
    s
    • 2
    • 5
  • s

    stale-jewelry-2440

    04/25/2022, 1:55 PM
    Hi folks, I am running validations on several csv files via GreatExpectations operator. I set up the DataHubValidationAction, and all seems to work fine. But I don’t see the results in the datasets, in Datahhub. For completeness, I set up the lineage in the tasks as
    outlets={"datasets": [Dataset("file", "AppleSchoolManager.courses_csv")]},
    g
    h
    • 3
    • 3
  • c

    clever-air-4600

    04/25/2022, 6:12 PM
    Hi guys im trying send a request to graphql to search for all the datasets with a specific tag: { search( input: {start: 0, count: 10, query: "*", type: DATASET, filters: {field: "tags", value: "facundo_prueba"} } ) { searchResults { entity { urn type } matchedFields { name value } } } } even though i have two datasets with that tag, the response is: {'data': {'search': {'searchResults': []}}} i tried to request directly to the backend with: { "input": "tags:facundo_prueba", "entity": "dataset", "start": 0, "count": 10 } and it works is the graphql query not correct? im trying to use graphql instead of directly querying the backend.
    g
    b
    • 3
    • 3
  • m

    microscopic-mechanic-13766

    04/26/2022, 8:17 AM
    Good morning, I am using v0.8.33 and ES 7.16.1. As I have ingested some datasets from different sources, I wanted to see what the "Analytics" tab would show. The problem is that I keep getting the following error:
    • 1
    • 2
  • j

    jolly-traffic-67085

    04/26/2022, 9:40 AM
    Hi all, I want to split datahub-frontend 1 more is issued and must not be related to the database of the old datahub-frontend, is that possible? I use kubernetes
    e
    • 2
    • 3
  • a

    ambitious-cartoon-15344

    04/27/2022, 7:08 AM
    hi ,I enable Metadata Service Authentication:https://datahubproject.io/docs/introducing-metadata-service-authentication/#if-i-enable-metadata-service-authentication-will-ingestion-stop-working There is a question, I setting up airflow to use datahub as lineage backend, whether it is necessary to set token. But I didn't see DatahubLineageBackend in the DatahubLineageBackend code.
    e
    • 2
    • 2
  • k

    kind-psychiatrist-76973

    04/27/2022, 10:29 AM
    I can see, from the logs, many errors like this one:
    Copy code
    10:28:53.812 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter - POST /usageStats?action=queryRange - queryRange - 200 - 356ms
    10:29:17.224 [qtp544724190-11718] INFO  c.l.m.r.entity.EntityResource - LIST URNS for dataHubPolicy with start 0 and count 30
    10:29:27.224 [pool-17-thread-1] ERROR c.d.m.a.AuthorizationManager - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
    com.linkedin.r2.RemoteInvocationException: com.linkedin.r2.RemoteInvocationException: Failed to get response from server for URI <http://localhost:8080/entities>
    	at com.linkedin.restli.internal.client.ExceptionUtil.wrapThrowable(ExceptionUtil.java:135)
    	at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponseImpl(ResponseFutureImpl.java:130)
    	at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponse(ResponseFutureImpl.java:94)
    	at com.linkedin.common.client.BaseClient.sendClientRequest(BaseClient.java:28)
    	at com.linkedin.entity.client.RestliEntityClient.listUrns(RestliEntityClient.java:390)
    	at com.datahub.metadata.authorization.AuthorizationManager$PolicyRefreshRunnable.run(AuthorizationManager.java:186)
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    	at java.lang.Thread.run(Thread.java:748)
    Caused by: com.linkedin.r2.RemoteInvocationException: Failed to get response from server for URI <http://localhost:8080/entities>
    	at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:67)
    	at com.linkedin.r2.transport.http.client.rest.ExecutionCallback.lambda$onResponse$0(ExecutionCallback.java:64)
    	... 3 common frames omitted
    Caused by: java.util.concurrent.TimeoutException: Exceeded request timeout of 10000ms
    	at com.linkedin.r2.transport.http.client.TimeoutTransportCallback$1.run(TimeoutTransportCallback.java:69)
    	at com.linkedin.r2.util.Timeout.lambda$new$0(Timeout.java:77)
    	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    	... 3 common frames omitted
    10:31:17.225 [qtp544724190-7234] INFO  c.l.m.r.enti
    Do this affected the UI or any other code functionality of Datahub?
    e
    e
    • 3
    • 6
  • b

    brainy-vegetable-68946

    04/27/2022, 5:01 PM
    Hi guys, Can Anyone please help me with this error. creds: username: datahub pass: datahub
    e
    b
    • 3
    • 5
  • b

    better-orange-49102

    04/28/2022, 10:30 AM
    attempting to gradle build in a no-internet environment, i saw that metadata-integration☕datahub-protobuf is using JDK11 to build via Gradle Toolchains. Since i have no internet, i tried to install both JDK8 and 11 onto the machine and now im getting error msg:
    Copy code
    Task :datahub-graphql-core:compileJava 
    /datahub/datahub-graphql-core/src/mainGeneratedGraphQL/java/com/linkedin/datahub/graphql/generated/VisualConfiguration.java:7: error: cannot find symbol @javax.annotation.processing.Generated(
    symbol: class Generated
    location: package javax.annotation.processing
    
    <followed by all the other files in the same folder giving the same annotation error msg>
    which i think is due to the presence of the JDK11. Any suggestions on overcoming this? the command i used to build was
    Copy code
    ./gradlew build -x :metadata-ingestion:build -x :metadata-ingestion:check -x docs-website:build -x datahub-web-react:yarnBuild -x datahub-frontend:unzipAssets
    ./gradlew build -x :metadata-ingestion:build -x :metadata-ingestion:check -x docs-website:build -x :metadata-integration:java:spark-lineage:test
    h
    • 2
    • 3
  • b

    breezy-portugal-43538

    04/28/2022, 1:30 PM
    Hello, I have a question, regarding properties update. Lately I had ingested the datasets to datahub using S3 as an origin. I can see that my datasets were uploaded to datahub correctly. Right now I would like to update an urn by adding to it some custom properties. Unfortunately performing curl command gives me an error. I think I had done everything correctly yet the error with
    message:"No root resource defined for path '/datasets'","status":404}
    appears. Is it possible to update properties to datasets ingested from S3, if yes then how? my curl command:
    curl --location --request POST '<http://localhost:8080/datasets?action=ingest>' \
    --header 'X-RestLi-Protocol-Version: 2.0.0' \
    --header 'Content-Type: application/json' \
    --data-raw '{
    "snapshot": {
    "aspects": [
    {
    "com.linkedin.dataset.DatasetProperties":  {
    "customProperties": {
    "SuperProperty": "over 9000"
    }
    }
    }
    ],
    "urn": "urn:li:dataset:(urn:li:dataset:(urn:li:dataPlatform:s3,origin_file_src%2Fdata%2Ftest%2Fother_timeZ%2Ftime%2other_folder%2Fsome_folder%2Fexample.csv,DEV)
    }
    }'
    Issue might be because my urn is incorrect - I had copied it from the webpage url. I tried to find the correct url at http://localhost:9200/datasetindex_v2/_search?=pretty but for some reason dataplatform:s3 is not visible there, do you know how can I get my s3 urn name to be sure that I had it setup correctly? Thanks in advance for the help! *EDIT: changing in the urn name to use . instead of %2F did not help
    d
    h
    • 3
    • 18
  • l

    limited-agent-54038

    04/29/2022, 3:10 AM
    Trying to test out a S3 Data Lake with a local docker deployment and am getting the error:
    '[2022-04-29 02:44:40,288] ERROR    {logger:26} - Please set env variable SPARK_VERSION\n'
    I am just having trouble figuring out where this env variable is or how to change it. Thanks
    e
    g
    • 3
    • 60
  • s

    square-solstice-69079

    04/29/2022, 11:16 AM
    I guess the bulk metadata editor function is something that is coming to the UI at some point, until that happens, what is the best way to add owner, tag and domains to datasets? Taking a export from a search and then add the metadata to the .csv is something that would work well for us. Is there maybe someone who have already done that and got some script to "ingest" this metadata based on the format of the default .csv using curl or GraphQL?
    e
    m
    • 3
    • 6
  • k

    kind-psychiatrist-76973

    04/29/2022, 12:49 PM
    I have all containers to v0.8.33 tag but “linkedin/datahub-frontend-react” was
    v0.8.17
    and I updated it to
    v0.8.33
    . After the deployment the UI crashed and this is the error I have from the logs:
    Copy code
    ! @7nf015ap6 - Internal server error, for (GET) [/callback/oidc?state=LqmnUiAvYgUGt98yM69UMRPG24DNJMAazoGGCH66Fkw&code=4/0AX4XfWg4uU9YpUKuVYjja_NgSZ0r7n4HTGM_Gpg87fxx4ODyQDVde1tIC0jPB7nEzaVjSw&scope=email%20profile%20<https://www.googleapis.com/auth/userinfo.profile%20openid%20https://www.googleapis.com/auth/userinfo.email&authuser=1&hd=sennder.com&prompt=none>] ->
     
    play.api.UnexpectedException: Unexpected exception[CompletionException: org.pac4j.core.exception.TechnicalException: Bad token response, error=invalid_grant]
    e
    • 2
    • 13
  • m

    mammoth-fall-12031

    05/02/2022, 8:10 AM
    I have been trying to setup the dev environment for datahub in local and getting stuck at this particular error below when running
    ./gradlew build
    Copy code
    * What went wrong:
    Execution failed for task ':metadata-service:restli-servlet-impl:generateRestModel'.
    > Process 'command '/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java'' finished with non-zero exit value 1
    Have tried doing
    ./gradlew clean
    and ran
    Copy code
    ./gradlew :metadata-service:restli-servlet-impl:build -Prest.model.compatibility=ignore
    but still getting the same error. System config: MacOS Monterey 12.1 Java version:
    Copy code
    java version "1.8.0_331"
    Java(TM) SE Runtime Environment (build 1.8.0_331-b09)
    Java HotSpot(TM) 64-Bit Server VM (build 25.331-b09, mixed mode)
    Any ways to resolve this?
    h
    o
    • 3
    • 7
  • f

    fresh-napkin-5247

    05/02/2022, 8:57 AM
    Hello all. I am trying to connect datahub to redshift using IAM Auth. So basically what this means is that I am not going to supply a password to the user, rather set up an endpoing using aws-vault. However, I so far I am not being successfull. Does anyone have a similar setup that could help me?
    d
    • 2
    • 2
  • k

    kind-psychiatrist-76973

    05/03/2022, 3:57 PM
    I have this job definition
    Copy code
    # Snowflake to Datahub recipe configuration
    # To run an ingestion run: datahub ingest -c ./metadata-ingestion/recipes/snowflake_to_datahub_rest.yml
    # pipeline_name: "my_snowflake_pipeline_1"
    source:
      type: snowflake
      config:
        # Coordinates
        host_port: ${SNOWFLAKE_ACCOUNT}
        warehouse: 'AGGREGATION_COMPUTE'
    
        # Credentials
        username: ${SNOWFLAKE_USERNAME}
        password: ${SNOWFLAKE_PASSWORD}
        role: 'XADMIN'
    
        env: "PROD"
    
        profiling:
          enabled: False
    
        database_pattern:
          allow:
            - "DWXX"
            - "VISIBILITY"
            - "STRATEGY_AND_PLANNING"
            - "ABC_SHIPPER_STRATEGY_AND_PLANNING"
            - "XYZ"
            - "MARKETING"
            - "GLOBAL_OPERATIONS"
            - "CENTRAL_STRATEGY_AND_PLANNING"
            - "FINANCE"
          deny:
            - "DEV"
            - "ANALYST_DEV"
    
        table_pattern:
          ignoreCase: False
    
        include_tables: True
        include_views: True
        include_table_lineage: False
    
    
        stateful_ingestion:
          enabled: True
          remove_stale_metadata: True
    
    
    
    sink:
      type: "datahub-rest"
      config:
        server: ${DATAHUB_GMS_HOST}:8080
    I get this validation error:
    Copy code
    1 validation error for SnowflakeConfig                                                                                                              │
    │ stateful_ingestion                                                                                                                                  │
    │   extra fields not permitted (type=value_error.extra)
    which is really vague, I have not any idea of what I am doing wrong
    h
    m
    m
    • 4
    • 13
  • c

    clever-air-4600

    05/03/2022, 7:02 PM
    Hi guys, is there a way to fetch dataset from graphql with a specific COLUMN tag?:
    Copy code
    search(
                input: {start: 0, count: 10, query: "*", type: DATASET, filters: {field: "tags", value: "urn:li:tag:Phone"} }
            ) {
                searchResults {
                    entity {
                        urn
                        type
                    }
                    matchedFields {
                        name
                        value
                    }
                }
            }
        }
    im trying something like this, works with table tags but not with the column ones
    h
    g
    a
    • 4
    • 17
  • l

    limited-agent-54038

    05/04/2022, 5:16 AM
    Hi All - I have not been able to get any integrations to work, so I am not sure what I am doing wrong. I have the following integration yaml:
    Copy code
    source:
      type: data-lake
      config:
        env: "PROD"
        platform: "local-data-lake"
        base_path: "~/.datahub/data_test2.json"
        profiling:
          enabled: true
    
    sink:
      type: console
    and am getting the following error:
    Copy code
    ---- (full traceback above) ----
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 82, in run
        pipeline = Pipeline.create(pipeline_config, dry_run, preview)
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 175, in create
        return cls(config, dry_run=dry_run, preview_mode=preview_mode)
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 127, in __init__
        self.source: Source = source_class.create(
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/source/data_lake/__init__.py", line 248, in create
        return cls(config, ctx)
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/source/data_lake/__init__.py", line 176, in __init__
        self.init_spark()
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/source/data_lake/__init__.py", line 242, in init_spark
        self.spark = SparkSession.builder.config(conf=conf).getOrCreate()
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/sql/session.py", line 186, in getOrCreate
        sc = SparkContext.getOrCreate(sparkConf)
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/context.py", line 378, in getOrCreate
        SparkContext(conf=conf or SparkConf())
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/context.py", line 133, in __init__
        SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/context.py", line 327, in _ensure_initialized
        SparkContext._gateway = gateway or launch_gateway(conf)
    File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/java_gateway.py", line 105, in launch_gateway
        raise Exception("Java gateway process exited before sending its port number")
    
    Exception: Java gateway process exited before sending its port number
    [2022-05-03 22:15:55,416] INFO     {datahub.entrypoints:161} - DataHub CLI version: 0.8.30.0 at /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/__init__.py
    [2022-05-03 22:15:55,416] INFO     {datahub.entrypoints:164} - Python version: 3.10.0 (v3.10.0:b494f5935c, Oct  4 2021, 14:59:20) [Clang 12.0.5 (clang-1205.0.22.11)] at /Library/Frameworks/Python.framework/Versions/3.10/bin/python3 on macOS-11.6.5-x86_64-i386-64bit
    [2022-05-03 22:15:55,416] INFO     {datahub.entrypoints:167} - GMS config {}
    h
    • 2
    • 3
  • a

    astonishing-guitar-79208

    05/04/2022, 9:10 AM
    Hi All. I've been trying to setup
    datahub-frontend
    JaaS authentication with Kerberos. I'm providing a custom
    jaas.conf
    file via k8s configmap, volume mounted in the container at the path specified here - https://datahubproject.io/docs/how/auth/jaas#custom-jaas-configuration. But no matter what
    jaas.conf
    file I provide (even the default one with PropertyFileLoginModule) the app fails to boot up with an error that doesn't help much debug the issue. Full error in the thread.
    h
    • 2
    • 4
1...272829...119Latest