DataHub #troubleshoot

fresh-napkin-5247

04/21/2022, 1:07 PM

Hello all 🙂. We are currently evaluating Datahub at my company, however I am having an error that I am not quite sure how to solve. The error happens when using the glue connector to write to a file sink:

Copy code

UnboundLocalError: local variable 'node_urn' referenced before assignment

I am running the command

datahub ingest -c glue.yml

, and it runs and writes a lot of datasets to the sink file, but then this error appears and the process exits. Anyone had a similar issue? The recipe file is just a regular recipe file like in the demo on the website. I also had another error, where an exception would occur because Datahub was trying to read the 'StorageDescriptor' from a dictionary without this key (I assume this is from the boto3 API). I solved this error by ignoring some tables, however it's weird to me that datahub does not handle this exception and just stops altogether. Thank you!

full-dentist-68591

04/21/2022, 2:45 PM

Hi all, is there a way to set a domain for a dataset via Python MCPW? I couldn't find anything in the examples :)

quick-student-61408

04/21/2022, 3:01 PM

Hi everyone, today i tried to connect my openldap server it's a success. But i've a problem every LDAP user it's automatically dropped :

Copy code

'dropped_dns': ['cn=charlie,ou=datahubaccounts,dc=datahub,dc=com',\n"
           "                 'cn=anne,ou=datahubaccounts,dc=datahub,dc=com',\n"
           "                 'cn=antoine,ou=datahubaccounts,dc=datahub,dc=com',\n"
           "                 'cn=charlieC,ou=datahubaccounts,dc=datahub,dc=com',\n"
           "                 'cn=charlie C,ou=datahubaccounts,dc=datahub,dc=com',\n"
           "                 'cn=charlie charlie,ou=datahubaccounts,dc=datahub,dc=com',\n"
           "                 'cn=anneD,ou=datahubaccounts,dc=datahub,dc=com']}\n"

When i turned to false the

drop_missing_first_last_name

option, i've an error : find attached Can you help me ? Thank you

output ldap datahub.txt

acoustic-quill-54426

04/22/2022, 8:55 AM

Ingesting from

bigquery

and

bigquery-usage

is failing since yesterday for us due to 500 errors at

<http://logging.googleapis.com/v2/entries:list|logging.googleapis.com/v2/entries:list>

. Although google claims the incident to be resolved I can reproduce the error from google cloud console 😅

square-solstice-69079

04/22/2022, 9:53 AM

Is it possible to delete a domain? Cant find anything in the UI or with datahub delete --urn is not working.

better-spoon-77762

04/22/2022, 5:42 PM

Hello I am trying to use AWS MKS (kafka) as a replacement to kafka for my datahub deployment, in this case do I need to run separate schema-registry OR aws glue schema-registry is supported ?

square-solstice-69079

04/23/2022, 7:25 AM

Is it possible to change the database name show in datahub, maybe with a custom transform? (For Redshift) The problem we have is that we just used the default database name "dev", but now that we want to expose the data in datahub, this can be confusing for end users. The dev database name is also protected from just being renamed unfortunately. For Oracle there is no database concept, it just shows the schemas, and that would be nice in our case for Redshift.

many-pillow-9544

04/24/2022, 7:52 AM

Hi!I've been able to successfully deployed datahub in my local network. However, as it could be seen in the photo, when I am trying to ingest data from UI Ingestion tab, I am facing some problems. Here is one of them : As I choose a source (an Oracle DB for instance) , at the "Configure Oracle Recipe" section , the below box would stick "Loading" and I can not progress.Any idea how can I fix it? Where should I begin the troubleshooting ?

modern-zoo-97059

04/25/2022, 2:39 AM

Copy code

play.api.UnexpectedException: Unexpected exception[CompletionException: java.net.ConnectException: Connection refused: datahub-gms/172.18.0.5:8080]
        at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:247)
        at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:176)
        at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:363)
        at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:361)
        at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:346)
        at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:345)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
        at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:92)
        at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:92)
        at akka.dispatch.BatchingExecutor$BlockableBatch$$anonfun$run$1.apply(BatchingExecutor.scala:92)
        at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
        at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:91)
        at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
        at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:49)
        at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
        at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
        at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
        at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.util.concurrent.CompletionException: java.net.ConnectException: Connection refused: datahub-gms/172.18.0.5:8080
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
        at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
        at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
        at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
        at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
        at scala.concurrent.java8.FuturesConvertersImpl$CF.apply(FutureConvertersImpl.scala:21)
        at scala.concurrent.java8.FuturesConvertersImpl$CF.apply(FutureConvertersImpl.scala:18)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
        at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
        at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
        at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
        at scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
        at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
        at scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
        at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
        at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
        at scala.concurrent.Promise$class.complete(Promise.scala:55)
        at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:157)
        at scala.concurrent.Promise$class.failure(Promise.scala:104)
        at scala.concurrent.impl.Promise$DefaultPromise.failure(Promise.scala:157)
        at play.libs.ws.ahc.StandaloneAhcWSClient$ResponseAsyncCompletionHandler.onThrowable(StandaloneAhcWSClient.java:227)
        at play.shaded.ahc.org.asynchttpclient.netty.NettyResponseFuture.abort(NettyResponseFuture.java:278)
        at play.shaded.ahc.org.asynchttpclient.netty.channel.NettyConnectListener.onFailure(NettyConnectListener.java:181)
        at play.shaded.ahc.org.asynchttpclient.netty.channel.NettyChannelConnector$1.onFailure(NettyChannelConnector.java:108)
        at play.shaded.ahc.org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:28)
        at play.shaded.ahc.org.asynchttpclient.netty.SimpleChannelFutureListener.operationComplete(SimpleChannelFutureListener.java:20)
        at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:511)
        at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:504)
        at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:483)
        at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:424)
        at play.shaded.ahc.io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:121)
        at play.shaded.ahc.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:327)
        at play.shaded.ahc.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:343)
        at play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:632)
        at play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:579)
        at play.shaded.ahc.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:496)
        at play.shaded.ahc.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:458)
        at play.shaded.ahc.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:897)
        at play.shaded.ahc.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused: datahub-gms/172.18.0.5:8080
        at play.shaded.ahc.org.asynchttpclient.netty.channel.NettyConnectListener.onFailure(NettyConnectListener.java:179)
        ... 17 common frames omitted
Caused by: play.shaded.ahc.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: datahub-gms/172.18.0.5:8080
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
        at play.shaded.ahc.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:327)
        at play.shaded.ahc.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:340)
        ... 7 common frames omitted
Caused by: java.net.ConnectException: Connection refused
        ... 11 common frames omitted

Hi! I user ingest UI. and it's failed and throws 500 exception. then I refresh page and I'm facing this problem.

full-dentist-68591

04/25/2022, 7:14 AM

Hi all, I am looking for a way to find domain urns by their name. Using

DataHubGraph

doesn't seem suitable because it requires

entity_urn

. Any recommendations here?

important-wire-73

04/25/2022, 7:22 AM

Hi , I did some bigquery ingestion in bulk and then ingested looker data. jobs completed. gms logs keep showing bulk requests (ES) processing. I checked elasticsearch there’s no data for dashboards, but when i put url of looker dataset in datahub UI it shows all the data.

kind-psychiatrist-76973

04/25/2022, 9:04 AM

Is there a way to see if datahub GMS connection pool limits? I see GMS can use only 50 connections, I would like to make sure there is not a limit

stale-jewelry-2440

04/25/2022, 1:55 PM

Hi folks, I am running validations on several csv files via GreatExpectations operator. I set up the DataHubValidationAction, and all seems to work fine. But I don’t see the results in the datasets, in Datahhub. For completeness, I set up the lineage in the tasks as

outlets={"datasets": [Dataset("file", "AppleSchoolManager.courses_csv")]},

clever-air-4600

04/25/2022, 6:12 PM

Hi guys im trying send a request to graphql to search for all the datasets with a specific tag: { search( input: {start: 0, count: 10, query: "*", type: DATASET, filters: {field: "tags", value: "facundo_prueba"} } ) { searchResults { entity { urn type } matchedFields { name value } } } } even though i have two datasets with that tag, the response is: {'data': {'search': {'searchResults': []}}} i tried to request directly to the backend with: { "input": "tags:facundo_prueba", "entity": "dataset", "start": 0, "count": 10 } and it works is the graphql query not correct? im trying to use graphql instead of directly querying the backend.

microscopic-mechanic-13766

04/26/2022, 8:17 AM

Good morning, I am using v0.8.33 and ES 7.16.1. As I have ingested some datasets from different sources, I wanted to see what the "Analytics" tab would show. The problem is that I keep getting the following error:

jolly-traffic-67085

04/26/2022, 9:40 AM

Hi all, I want to split datahub-frontend 1 more is issued and must not be related to the database of the old datahub-frontend, is that possible? I use kubernetes

ambitious-cartoon-15344

04/27/2022, 7:08 AM

hi ,I enable Metadata Service Authentication:https://datahubproject.io/docs/introducing-metadata-service-authentication/#if-i-enable-metadata-service-authentication-will-ingestion-stop-working There is a question, I setting up airflow to use datahub as lineage backend, whether it is necessary to set token. But I didn't see DatahubLineageBackend in the DatahubLineageBackend code.

kind-psychiatrist-76973

04/27/2022, 10:29 AM

I can see, from the logs, many errors like this one:

Copy code

10:28:53.812 [pool-9-thread-1] INFO  c.l.m.filter.RestliLoggingFilter - POST /usageStats?action=queryRange - queryRange - 200 - 356ms
10:29:17.224 [qtp544724190-11718] INFO  c.l.m.r.entity.EntityResource - LIST URNS for dataHubPolicy with start 0 and count 30
10:29:27.224 [pool-17-thread-1] ERROR c.d.m.a.AuthorizationManager - Failed to retrieve policy urns! Skipping updating policy cache until next refresh. start: 0, count: 30
com.linkedin.r2.RemoteInvocationException: com.linkedin.r2.RemoteInvocationException: Failed to get response from server for URI <http://localhost:8080/entities>
	at com.linkedin.restli.internal.client.ExceptionUtil.wrapThrowable(ExceptionUtil.java:135)
	at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponseImpl(ResponseFutureImpl.java:130)
	at com.linkedin.restli.internal.client.ResponseFutureImpl.getResponse(ResponseFutureImpl.java:94)
	at com.linkedin.common.client.BaseClient.sendClientRequest(BaseClient.java:28)
	at com.linkedin.entity.client.RestliEntityClient.listUrns(RestliEntityClient.java:390)
	at com.datahub.metadata.authorization.AuthorizationManager$PolicyRefreshRunnable.run(AuthorizationManager.java:186)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.linkedin.r2.RemoteInvocationException: Failed to get response from server for URI <http://localhost:8080/entities>
	at com.linkedin.r2.transport.http.common.HttpBridge$1.onResponse(HttpBridge.java:67)
	at com.linkedin.r2.transport.http.client.rest.ExecutionCallback.lambda$onResponse$0(ExecutionCallback.java:64)
	... 3 common frames omitted
Caused by: java.util.concurrent.TimeoutException: Exceeded request timeout of 10000ms
	at com.linkedin.r2.transport.http.client.TimeoutTransportCallback$1.run(TimeoutTransportCallback.java:69)
	at com.linkedin.r2.util.Timeout.lambda$new$0(Timeout.java:77)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	... 3 common frames omitted
10:31:17.225 [qtp544724190-7234] INFO  c.l.m.r.enti

Do this affected the UI or any other code functionality of Datahub?

brainy-vegetable-68946

04/27/2022, 5:01 PM

Hi guys, Can Anyone please help me with this error. creds: username: datahub pass: datahub

better-orange-49102

04/28/2022, 10:30 AM

attempting to gradle build in a no-internet environment, i saw that metadata-integration☕datahub-protobuf is using JDK11 to build via Gradle Toolchains. Since i have no internet, i tried to install both JDK8 and 11 onto the machine and now im getting error msg:

Copy code

Task :datahub-graphql-core:compileJava 
/datahub/datahub-graphql-core/src/mainGeneratedGraphQL/java/com/linkedin/datahub/graphql/generated/VisualConfiguration.java:7: error: cannot find symbol @javax.annotation.processing.Generated(
symbol: class Generated
location: package javax.annotation.processing

<followed by all the other files in the same folder giving the same annotation error msg>

which i think is due to the presence of the JDK11. Any suggestions on overcoming this? the command i used to build was

Copy code

./gradlew build -x :metadata-ingestion:build -x :metadata-ingestion:check -x docs-website:build -x datahub-web-react:yarnBuild -x datahub-frontend:unzipAssets
./gradlew build -x :metadata-ingestion:build -x :metadata-ingestion:check -x docs-website:build -x :metadata-integration:java:spark-lineage:test

breezy-portugal-43538

04/28/2022, 1:30 PM

Hello, I have a question, regarding properties update. Lately I had ingested the datasets to datahub using S3 as an origin. I can see that my datasets were uploaded to datahub correctly. Right now I would like to update an urn by adding to it some custom properties. Unfortunately performing curl command gives me an error. I think I had done everything correctly yet the error with

message:"No root resource defined for path '/datasets'","status":404}

appears. Is it possible to update properties to datasets ingested from S3, if yes then how? my curl command:

curl --location --request POST '<http://localhost:8080/datasets?action=ingest>' \

--header 'X-RestLi-Protocol-Version: 2.0.0' \

--header 'Content-Type: application/json' \

--data-raw '{

"snapshot": {

"aspects": [

"com.linkedin.dataset.DatasetProperties":  {

"customProperties": {

"SuperProperty": "over 9000"

],

"urn": "urn:li:dataset:(urn:li:dataset:(urn:li:dataPlatform:s3,origin_file_src%2Fdata%2Ftest%2Fother_timeZ%2Ftime%2other_folder%2Fsome_folder%2Fexample.csv,DEV)

}'

Issue might be because my urn is incorrect - I had copied it from the webpage url. I tried to find the correct url at http://localhost:9200/datasetindex_v2/_search?=pretty but for some reason dataplatform:s3 is not visible there, do you know how can I get my s3 urn name to be sure that I had it setup correctly? Thanks in advance for the help! *EDIT: changing in the urn name to use . instead of %2F did not help

limited-agent-54038

04/29/2022, 3:10 AM

Trying to test out a S3 Data Lake with a local docker deployment and am getting the error:

'[2022-04-29 02:44:40,288] ERROR    {logger:26} - Please set env variable SPARK_VERSION\n'

I am just having trouble figuring out where this env variable is or how to change it. Thanks

square-solstice-69079

04/29/2022, 11:16 AM

I guess the bulk metadata editor function is something that is coming to the UI at some point, until that happens, what is the best way to add owner, tag and domains to datasets? Taking a export from a search and then add the metadata to the .csv is something that would work well for us. Is there maybe someone who have already done that and got some script to "ingest" this metadata based on the format of the default .csv using curl or GraphQL?

kind-psychiatrist-76973

04/29/2022, 12:49 PM

I have all containers to v0.8.33 tag but “linkedin/datahub-frontend-react” was

v0.8.17

and I updated it to

v0.8.33

. After the deployment the UI crashed and this is the error I have from the logs:

Copy code

! @7nf015ap6 - Internal server error, for (GET) [/callback/oidc?state=LqmnUiAvYgUGt98yM69UMRPG24DNJMAazoGGCH66Fkw&code=4/0AX4XfWg4uU9YpUKuVYjja_NgSZ0r7n4HTGM_Gpg87fxx4ODyQDVde1tIC0jPB7nEzaVjSw&scope=email%20profile%20<https://www.googleapis.com/auth/userinfo.profile%20openid%20https://www.googleapis.com/auth/userinfo.email&authuser=1&hd=sennder.com&prompt=none>] ->
 
play.api.UnexpectedException: Unexpected exception[CompletionException: org.pac4j.core.exception.TechnicalException: Bad token response, error=invalid_grant]

mammoth-fall-12031

05/02/2022, 8:10 AM

I have been trying to setup the dev environment for datahub in local and getting stuck at this particular error below when running

./gradlew build

Copy code

* What went wrong:
Execution failed for task ':metadata-service:restli-servlet-impl:generateRestModel'.
> Process 'command '/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java'' finished with non-zero exit value 1

Have tried doing

./gradlew clean

and ran

Copy code

./gradlew :metadata-service:restli-servlet-impl:build -Prest.model.compatibility=ignore

but still getting the same error. System config: MacOS Monterey 12.1 Java version:

Copy code

java version "1.8.0_331"
Java(TM) SE Runtime Environment (build 1.8.0_331-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.331-b09, mixed mode)

Any ways to resolve this?

fresh-napkin-5247

05/02/2022, 8:57 AM

Hello all. I am trying to connect datahub to redshift using IAM Auth. So basically what this means is that I am not going to supply a password to the user, rather set up an endpoing using aws-vault. However, I so far I am not being successfull. Does anyone have a similar setup that could help me?

kind-psychiatrist-76973

05/03/2022, 3:57 PM

I have this job definition

Copy code

# Snowflake to Datahub recipe configuration
# To run an ingestion run: datahub ingest -c ./metadata-ingestion/recipes/snowflake_to_datahub_rest.yml
# pipeline_name: "my_snowflake_pipeline_1"
source:
  type: snowflake
  config:
    # Coordinates
    host_port: ${SNOWFLAKE_ACCOUNT}
    warehouse: 'AGGREGATION_COMPUTE'

    # Credentials
    username: ${SNOWFLAKE_USERNAME}
    password: ${SNOWFLAKE_PASSWORD}
    role: 'XADMIN'

    env: "PROD"

    profiling:
      enabled: False

    database_pattern:
      allow:
        - "DWXX"
        - "VISIBILITY"
        - "STRATEGY_AND_PLANNING"
        - "ABC_SHIPPER_STRATEGY_AND_PLANNING"
        - "XYZ"
        - "MARKETING"
        - "GLOBAL_OPERATIONS"
        - "CENTRAL_STRATEGY_AND_PLANNING"
        - "FINANCE"
      deny:
        - "DEV"
        - "ANALYST_DEV"

    table_pattern:
      ignoreCase: False

    include_tables: True
    include_views: True
    include_table_lineage: False


    stateful_ingestion:
      enabled: True
      remove_stale_metadata: True



sink:
  type: "datahub-rest"
  config:
    server: ${DATAHUB_GMS_HOST}:8080

I get this validation error:

Copy code

1 validation error for SnowflakeConfig                                                                                                              │
│ stateful_ingestion                                                                                                                                  │
│   extra fields not permitted (type=value_error.extra)

which is really vague, I have not any idea of what I am doing wrong

clever-air-4600

05/03/2022, 7:02 PM

Hi guys, is there a way to fetch dataset from graphql with a specific COLUMN tag?:

Copy code

search(
            input: {start: 0, count: 10, query: "*", type: DATASET, filters: {field: "tags", value: "urn:li:tag:Phone"} }
        ) {
            searchResults {
                entity {
                    urn
                    type
                }
                matchedFields {
                    name
                    value
                }
            }
        }
    }

im trying something like this, works with table tags but not with the column ones

limited-agent-54038

05/04/2022, 5:16 AM

Hi All - I have not been able to get any integrations to work, so I am not sure what I am doing wrong. I have the following integration yaml:

Copy code

source:
  type: data-lake
  config:
    env: "PROD"
    platform: "local-data-lake"
    base_path: "~/.datahub/data_test2.json"
    profiling:
      enabled: true

sink:
  type: console

and am getting the following error:

Copy code

---- (full traceback above) ----
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 82, in run
    pipeline = Pipeline.create(pipeline_config, dry_run, preview)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 175, in create
    return cls(config, dry_run=dry_run, preview_mode=preview_mode)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 127, in __init__
    self.source: Source = source_class.create(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/source/data_lake/__init__.py", line 248, in create
    return cls(config, ctx)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/source/data_lake/__init__.py", line 176, in __init__
    self.init_spark()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/ingestion/source/data_lake/__init__.py", line 242, in init_spark
    self.spark = SparkSession.builder.config(conf=conf).getOrCreate()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/sql/session.py", line 186, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/context.py", line 378, in getOrCreate
    SparkContext(conf=conf or SparkConf())
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/context.py", line 133, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/context.py", line 327, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pyspark/java_gateway.py", line 105, in launch_gateway
    raise Exception("Java gateway process exited before sending its port number")

Exception: Java gateway process exited before sending its port number
[2022-05-03 22:15:55,416] INFO     {datahub.entrypoints:161} - DataHub CLI version: 0.8.30.0 at /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/datahub/__init__.py
[2022-05-03 22:15:55,416] INFO     {datahub.entrypoints:164} - Python version: 3.10.0 (v3.10.0:b494f5935c, Oct  4 2021, 14:59:20) [Clang 12.0.5 (clang-1205.0.22.11)] at /Library/Frameworks/Python.framework/Versions/3.10/bin/python3 on macOS-11.6.5-x86_64-i386-64bit
[2022-05-03 22:15:55,416] INFO     {datahub.entrypoints:167} - GMS config {}

astonishing-guitar-79208

05/04/2022, 9:10 AM

Hi All. I've been trying to setup

datahub-frontend

JaaS authentication with Kerberos. I'm providing a custom

jaas.conf

file via k8s configmap, volume mounted in the container at the path specified here - https://datahubproject.io/docs/how/auth/jaas#custom-jaas-configuration. But no matter what

jaas.conf

file I provide (even the default one with PropertyFileLoginModule) the app fails to boot up with an error that doesn't help much debug the issue. Full error in the thread.