astonishing-dusk-99990
04/10/2023, 9:13 AM# Set up ingress to expose react front-end
ingress:
enabled: true
podAnnotations:
<http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: "gce-internal"
<http://kubernetes.io/ingress.regional-static-ip-name|kubernetes.io/ingress.regional-static-ip-name>: "your-domain-name-internal-address"
hosts:
- host: your-domain-name
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: datahub-frontend
port:
name: http
#path: /
#redirectPaths: []
service:
type: NodePort # ClusterIP or NodePort
port: 9002
targetPort: http
protocol: TCP
name: http
annotations:
<http://cloud.google.com/neg|cloud.google.com/neg>: '{"ingress": true}'
# annotations:
# <http://networking.gke.io/load-balancer-type|networking.gke.io/load-balancer-type>: Internal
Since in service section we canโt using arg loadBalancerIP
, is there anyway to make datahub front end from dynamic IP to static IP when we deploy using helm chart?
Also when I tried to do helm upgrade it always got an error look like this
Error: UPGRADE FAILED: error validating "": error validating data: ValidationError(Ingress.spec.rules[0].http): missing required field "paths" in io.k8s.api.networking.v1.HTTPIngressRuleValue
Anyone know the problems and how to fix it?
Notes:
โข Image datahub v0.10.0best-umbrella-88325
04/10/2023, 12:03 PMdocker build -f docker/datahub-actions/Dockerfile . --no-cache
as mentioned in the documentation.
Once I use this in my helm chart, I get the following error from the actions pod:
2023/04/10 11:59:06 Waiting for: <http://datahub-datahub-gms:8080/health>
2023/04/10 11:59:06 Received 200 from <http://datahub-datahub-gms:8080/health>
2023/04/10 11:59:06 Error starting command: `/start_datahub_actions.sh` - fork/exec /start_datahub_actions.sh: no such file or directory
Can someone help me with this? Thanks in advance..victorious-planet-2053
04/10/2023, 1:19 PMhandsome-football-66174
04/10/2023, 5:21 PMproud-printer-88070
04/11/2023, 3:15 AMpython3 -m datahub ingest -c source.yml
The log is attached as cli-error-log.txt
my .datahubenv looks something like this:
gms:
server: https://<<<gms-host>>>.<http://us-east-1.elb.amazonaws.com:8080|us-east-1.elb.amazonaws.com:8080>
token: <<<token>>>
And I can curl the following URL successfully:
curl http://<<<gms-host>>>.<http://us-east-1.elb.amazonaws.com:8080/config|us-east-1.elb.amazonaws.com:8080/config>
{
"models" : { },
"patchCapable" : true,
"versions" : {
"linkedin/datahub" : {
"version" : "v0.10.0",
"commit" : "cf1e627e55431fc69d72918b2bcc3c5f3a1d5002"
}
},
"managedIngestion" : {
"defaultCliVersion" : "0.10.0",
"enabled" : true
},
"statefulIngestionCapable" : true,
"supportsImpactAnalysis" : true,
"telemetry" : {
"enabledCli" : true,
"enabledIngestion" : false
},
"datasetUrnNameCasing" : false,
"retention" : "true",
"datahub" : {
"serverType" : "prod"
},
"noCode" : "true"
}
I looked at this post:
https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#your-proxy-appears-to-only-use-http-and-not-https
In my setup, there are no env vars setup for HTTP_PROXY or HTTPS_PROXY.
The error happens when trying to access the /config endpoint and says try changing your proxy URL to be HTTP
GMS is installed in a kubernetes pod, in a production environment and we are in a VPN while running the above commands.
Thanks !mysterious-scooter-52411
04/11/2023, 7:27 AMcolossal-waitress-83487
04/11/2023, 10:51 AMelegant-salesmen-99143
04/11/2023, 12:12 PMeager-animal-48107
04/11/2023, 4:27 PMeager-animal-48107
04/11/2023, 4:28 PMERROR: could not serialize access due to concurrent update Call getNextException to see other errors in the batch.
at org.postgresql.jdbc.BatchResultHandler.handleError(BatchResultHandler.java:165)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2366)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:559)
at org.postgresql.jdbc.PgStatement.internalExecuteBatch(PgStatement.java:887)
at org.postgresql.jdbc.PgStatement.executeBatch(PgStatement.java:910)
at org.postgresql.jdbc.PgPreparedStatement.executeBatch(PgPreparedStatement.java:1649)
at io.ebean.datasource.delegate.PreparedStatementDelegator.executeBatch(PreparedStatementDelegator.java:357)
at io.ebeaninternal.server.persist.BatchedPstmt.executeAndCheckRowCounts(BatchedPstmt.java:130)
at io.ebeaninternal.server.persist.BatchedPstmt.executeBatch(BatchedPstmt.java:97)
at io.ebeaninternal.server.persist.BatchedPstmtHolder.flush(BatchedPstmtHolder.java:124)
at io.ebeaninternal.server.persist.BatchControl.flushPstmtHolder(BatchControl.java:206)
at io.ebeaninternal.server.persist.BatchControl.executeNow(BatchControl.java:220)
at io.ebeaninternal.server.persist.BatchedBeanHolder.executeNow(BatchedBeanHolder.java:100)
at io.ebeaninternal.server.persist.BatchControl.flush(BatchControl.java:271)
at io.ebeaninternal.server.persist.BatchControl.flush(BatchControl.java:227)
at io.ebeaninternal.server.transaction.JdbcTransaction.batchFlush(JdbcTransaction.java:678)
... 101 common frames omitted
Caused by: org.postgresql.util.PSQLException: ERROR: could not serialize access due to concurrent update
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2675)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2365)
... 115 common frames omitted
flat-engineer-75197
04/11/2023, 5:26 PMcuddly-butcher-39945
04/11/2023, 7:10 PMbest-eve-12546
04/11/2023, 9:24 PMdatahub delete
to delete datasets with a specific schema. Looking at https://datahubproject.io/docs/how/delete-metadata/ it looks like it supports a query operator, but I couldnโt figure out exactly how to use it.
i.e. Iโm trying to do something like
datahub delete --enitity_type dataset --env PROD --query "thisschema"
To delete
urn:li:dataset:(urn:li:dataPlatform:platform,thisschema.table1,PROD)
urn:li:dataset:(urn:li:dataPlatform:platform,thisschema.table2,PROD)
but NOT
urn:li:dataset:(urn:li:dataPlatform:platform,wrong_schema.thisschema,PROD)
The query operator seems to match all 3 since the target string is in the table-name.
Is this possible?incalculable-zebra-69091
04/12/2023, 3:55 AMdatahub docker quickstart --version=v0.10.1
(datahub version 0.10.1). but when i sign in (GUI) have error /track and /login. i check log container datahub-frontend-react have error "[kafka-producer-network-thread | datahub-frontend] WARN o.apache.kafka.clients.NetworkClient - [Producer clientId=datahub-frontend] Connection to node -1 (broker/172.18.0.6:29092) could not be established. Broker may not be available", and datahub-gms have error. I need to be able to sign in ?able-city-76673
04/12/2023, 6:01 AMmicroscopic-room-90690
04/12/2023, 6:59 AM[2023-03-31 14:47:57,830] INFO {datahub.cli.ingest_cli:170} - DataHub CLI version: 0.8.43
[2023-04-04 10:50:44,875] INFO {datahub.cli.ingest_cli:137} - Finished metadata ingestion
Command exiting with ret '0'
few-carpenter-93837
04/12/2023, 9:54 AMsteep-fountain-54482
04/12/2023, 11:04 AMsteep-fountain-54482
04/12/2023, 11:04 AM23/04/12 10:29:22 ERROR SplineAgent: Unexpected error occurred during lineage processing for application: launcher #00f9a8uvf3tjqt09
java.lang.IllegalStateException: WithField.dataType should not be called.
bland-orange-13353
04/12/2023, 12:16 PMbland-orange-13353
04/12/2023, 12:23 PMwide-afternoon-79955
04/12/2023, 4:25 PMdatahub-gms:
extraEnvs:
- name: LOG_DIR
value: /tmp/datahub-gms/log/
but the logback file does not seems to pickup the env var LOG_DIRhallowed-lizard-92381
04/12/2023, 6:20 PMcuddly-butcher-39945
04/12/2023, 6:56 PMelegant-salesmen-99143
04/12/2023, 8:06 PMdescription
under name
in the query, but it returns null
, even though documentation is not empty for this container. Is it called something different? the property like 'documentation' is not found
{container(urn:"urn:li:container:XXX") {
properties
{name
description
}
entities{
total
start
}
}}
microscopic-room-90690
04/13/2023, 3:37 AMsource:
type: hive
config:
host_port: localhost:10000
database_alias: hive
schema_pattern:
allow: ["^web_hudi$"]
sink:
type: "datahub-rest"
config:
server: ${datahub_server}
token: ${token}
busy-analyst-35820
04/13/2023, 3:57 AMbetter-fireman-33387
04/13/2023, 8:41 AMdatahub_datahub_usage_event
(I set datahub prefix for all indices)
could anyone assist please?bland-orange-13353
04/13/2023, 10:28 AMfuture-holiday-32084
04/13/2023, 10:30 AMspark.sql("select * from <database>.<table_source>").write.mode("append").format("parquet").saveAsTable("<database>.<table_sink>")
The lineage, as shown in the image below, has been inferred perfectly for the sink table. However, the source table displays the location on my Hadoop Data Lake, even though I'm reading from a table, not a path.