brave-tomato-16287
08/10/2022, 8:04 AMadamant-van-21355
08/10/2022, 8:54 AMv0.8.42
, we are using helm charts for deployment but I don't see the latest version 0.2.88 on the datahub helm chart release yet here and the application version showing there is still 0.8.41
. Any updates on that?
Also fyi the new version does not reflect to the components used in the values.yaml on the datahub-helm repo (still showing 0.8.41 for all of them instead of 0.8.42).narrow-apple-60403
08/10/2022, 9:17 AMkubectl get deployment -n kube-system aws-load-balancer-controller
NAME READY UP-TO-DATE AVAILABLE AGE
aws-load-balancer-controller 2/2 2 2 26h
kubectl get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
datahub-datahub-frontend <none> <http://datahub-eks.example.com|datahub-eks.example.com> 80 39m
values.yaml
datahub-frontend:
enabled: true
image:
repository: linkedin/datahub-frontend-react
tag: "v0.8.41"
# Set up ingress to expose react front-end
ingress:
enabled: true
annotations:
<http://kubernetes.io/ingress.class|kubernetes.io/ingress.class>: alb
<http://alb.ingress.kubernetes.io/scheme|alb.ingress.kubernetes.io/scheme>: internet-facing
<http://alb.ingress.kubernetes.io/target-type|alb.ingress.kubernetes.io/target-type>: instance
<http://alb.ingress.kubernetes.io/certificate-arn|alb.ingress.kubernetes.io/certificate-arn>: <<certificate-arn>>
<http://alb.ingress.kubernetes.io/inbound-cidrs|alb.ingress.kubernetes.io/inbound-cidrs>: 0.0.0.0/0
<http://alb.ingress.kubernetes.io/listen-ports|alb.ingress.kubernetes.io/listen-ports>: '[{"HTTP": 80}, {"HTTPS":443}]'
<http://alb.ingress.kubernetes.io/actions.ssl-redirect|alb.ingress.kubernetes.io/actions.ssl-redirect>: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'
hosts:
- host: <http://datahub-eks.example.com|datahub-eks.example.com>
redirectPaths:
- path: /*
name: ssl-redirect
port: use-annotation
paths:
- /*
careful-france-4125
08/10/2022, 9:19 AMambitious-cartoon-15344
08/10/2022, 10:08 AMfamous-florist-7218
08/10/2022, 10:52 AM'File "/tmp/datahub/ingest/venv-d4f145d7-4f51-4053-990f-4c541a5eaa57/lib/python3.9/site-packages/bson/__init__.py", line 1164, in '
'_decode_all_selective\n'
' return decode_all(data, codec_options)\n'
'File "/tmp/datahub/ingest/venv-d4f145d7-4f51-4053-990f-4c541a5eaa57/lib/python3.9/site-packages/bson/__init__.py", line 1108, in '
'decode_all\n'
' return _decode_all(data, opts) # type: ignore[arg-type]\n'
'\n'
'InvalidBSON: year 50577 is out of range\n'
famous-florist-7218
08/10/2022, 11:28 AMbigquery.py
, I found that the base_query will be failed with Syntax error: Missing whitespace between literal and alias
whenever the profiling is enabled.
• It seems like {schema}.TABLES should be put into the angled apostraphes (``{schema}.__TABLES__``) because of BigQuery Syntax Rule.
• Should I raise this point as a bug?
@staticmethod
def get_all_schema_tables_query(schema: str) -> str:
base_query = (
f"SELECT "
f"table_id, "
f"size_bytes, "
f"last_modified_time, "
f"row_count, "
f"FROM {schema}.__TABLES__"
)
return base_query
P/s: Please take a look at the attached screenshot for more detail.brave-secretary-27487
08/10/2022, 11:48 AMpurple-analyst-83660
08/10/2022, 1:17 PMglobal:
datahub:
metadata_service_authentication:
enabled: true
Still I get "Token based authentication is currently disabled. Contact your DataHub administrator to enable this feature."
Datahub GMS version 0.8.41.
Can anybody suggest what I might be doing wrong?gentle-camera-33498
08/10/2022, 2:22 PMjolly-traffic-67085
08/11/2022, 3:38 AMlimited-forest-73733
08/11/2022, 11:05 AMfierce-garage-74290
08/11/2022, 2:11 PMQueries
and Stats
, which are quite hard to produce in appealing manner in R&D env. Is there any way I can ingest some dummy query and stats on service startup via some script with API calls? Thanks for the tips!handsome-football-66174
08/10/2022, 5:26 PMelse:
# ingest data profile without partition
table_stats = response["Table"]["Parameters"]
column_stats = response["Table"]["StorageDescriptor"]["Columns"]
return [self._create_profile_mcp(mce, table_stats, column_stats)]
little-breakfast-38102
08/12/2022, 5:52 AMbig-ocean-9800
08/12/2022, 5:53 AMhigh-gigabyte-86638
08/12/2022, 8:44 AMpurple-analyst-83660
08/12/2022, 11:49 AMfailed to write record with workunit container-urn:li:container:2380f5d80f130b57eeeee5016d8c18b7-to-urn:li:chart:(tableau,885b1266-5bd9-4338-745e-c480bc7f0168) with ('Unable to emit metadata to DataHub GMS', {'message': '401 Client Error: Unauthorized for url:
great-motherboard-71467
08/12/2022, 1:31 PMWHZ-Authentication {
com.sun.security.auth.module.LdapLoginModule sufficient
userProvider="<ldaps://ldaps.some.server.eu/dc=some,dc=domain,dc=com>"
authIdentity="{USERNAME}@some.domain.com"
userFilter="uid={USERNAME},cn=users,cn=accounts,dc=some,dc=domain,dc=com"
java.naming.security.authentication="simple"
debug="true"
useSSL="true";
};
Whatever i`m changing inside of this config for example port setting to :636
I`m ending with following error
datahub-frontend-react | [LdapLoginModule] authentication-first mode; SSL enabled
datahub-frontend-react | [LdapLoginModule] user provider: <ldaps://ldaps.some.server.eu/cn=users,cn=accounts,dc=some,dc=domain,dc=com>
datahub-frontend-react | 13:06:46 [application-akka.actor.default-dispatcher-2] ERROR application - The submitted callback is of type: class javax.security.auth.callback.NameCallback : javax.security.auth.callback.NameCallback@332d2227
datahub-frontend-react | 13:06:46 [application-akka.actor.default-dispatcher-2] ERROR application - The submitted callback is of type: class javax.security.auth.callback.PasswordCallback : javax.security.auth.callback.PasswordCallback@7dbea6b9
datahub-frontend-react | [LdapLoginModule] attempting to authenticate user: some_test_user
datahub-frontend-react | [LdapLoginModule] authentication failed
datahub-frontend-react | [LdapLoginModule] aborted authentication
No matter if i will change authIdentity to only {USERNAME} or if i will provide with domain name, or with dc standard.
It is not working.
Or when i`m trying to provide technical user which will connect by providing
java.naming.security.principal=
java.naming.security.credential=
Then the Dummy Module is authenticating everything in such case.
When i`m doing on my CLI following ldapsearch, i`m able to get info from LDAP about the specific user
ldapsearch -H <ldaps://ldaps.some.server.eu> -x -b dc=some,dc=domain,dc=com '(&(objectClass=person)(uid=some_test_user))'
Any hint what could be wrong ?busy-petabyte-37287
08/12/2022, 2:11 PMkubectl get deployment -n kube-system aws-load-balancer-controller
I get the following output:busy-petabyte-37287
08/12/2022, 2:13 PMhelm upgrade --install datahub datahub/datahub --values values.yaml
I get no address:busy-petabyte-37287
08/12/2022, 2:14 PMambitious-cartoon-15344
08/14/2022, 3:20 PMbitter-insurance-49151
08/15/2022, 9:54 AMechoing-alligator-70530
08/15/2022, 1:03 PMbrief-cat-57352
08/15/2022, 1:19 PMNone
or 1:00:00
. Other DAGs are working fine. Any idea on this? Thanks. Please see error stacktrace. cc: @hallowed-intern-9191
File "/usr/local/lib/python3.7/site-packages/***/serialization/serialized_objects.py", line 830, in serialize_dag
serialize_dag["tasks"] = [cls._serialize(task) for _, task in dag.task_dict.items()]
File "/usr/local/lib/python3.7/site-packages/***/serialization/serialized_objects.py", line 830, in <listcomp>
serialize_dag["tasks"] = [cls._serialize(task) for _, task in dag.task_dict.items()]
File "/usr/local/lib/python3.7/site-packages/***/serialization/serialized_objects.py", line 308, in _serialize
return SerializedBaseOperator.serialize_operator(var)
File "/usr/local/lib/python3.7/site-packages/***/serialization/serialized_objects.py", line 578, in serialize_operator
serialize_op['params'] = cls._serialize_params_dict(op.params)
File "/usr/local/lib/python3.7/site-packages/***/serialization/serialized_objects.py", line 451, in _serialize_params_dict
if f'{v.__module__}.{v.__class__.__name__}' == '***.models.param.Param':
AttributeError: 'int' object has no attribute '__module__'
thankful-morning-85093
08/15/2022, 4:00 PM22/08/13 20:54:48 ERROR DatahubSparkListener: java.lang.NullPointerException
Even though I get this error, the command completes execution without pushing data to Datahub. Using Spark 3.3.0.
Any help appreciated. Thanks.bitter-lizard-32293
08/16/2022, 3:27 AMCaused by: org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index [datahub_usage_event]]
Details in đź§µfew-carpenter-93837
08/16/2022, 8:41 AM'warnings': {'XSCHEMA.YTABLE': ['unable to get column information due to an error -> __init__() got an unexpected keyword ' "argument 'precision'"]}
This issue is raised from the following code in source\sql\vertica.py:
elif attype in ("timestamptz", "timetz"):
kwargs["timezone"] = True
if charlen:
kwargs["precision"] = int(charlen) # type: ignore
args = () # type: ignore
elif attype in ("timestamp", "time"):
kwargs["timezone"] = False
if charlen:
kwargs["precision"] = int(charlen) # type: ignore
Which leads to the point that TIMESTAMP imported from sqlalchemy.sql.sqltypes import TIME, TIMESTAMP, String
is missing the parameter precision: https://docs.sqlalchemy.org/en/14/core/type_basics.html#sqlalchemy.types.DateTime
One of the solutions would be to add
self.precision = precision
into the
class TIMESTAMP(DateTime):
in \Lib\site-packages\sqlalchemy\sql\sqltypes.py
Which results in the correct output
{
"fieldPath": "DATE_CREATED",
"jsonPath": null,
"nullable": true,
"description": null,
"created": null,
"lastModified": null,
"type": {
"type": {
"com.linkedin.pegasus2avro.schema.TimeType": {}
}
},
"nativeDataType": "TIMESTAMP(precision=0)",
"recursive": false,
"globalTags": null,
"glossaryTerms": null,
"isPartOfKey": false,
"isPartitioningKey": null,
"jsonProps": null
}
Since the fix is in sqlalchemy files, how to implement this to datahub code instead?numerous-account-62719
08/16/2022, 9:41 AM