agreeable-table-54007
04/05/2023, 9:06 AMwonderful-wall-76801
04/05/2023, 10:29 AMSuppressed: org.elasticsearch.client.ResponseException: method [POST], host [<http://123.123.123.123:9200>], URI [/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 400 Bad Request]
{"error":{"root_cause":[{"type":"parsing_exception","reason":"[term] query does not support [case_insensitive]","line":1,"col":942}],"type":"x_content_parse_exception","reason":"[1:942] [bool] failed to parse field [must]","caused_by":{"type":"x_content_parse_exception","reason":"[1:942] [bool] failed to parse field [should]","caused_by":{"type":"x_content_parse_exception","reason":"[1:942] [bool] failed to parse field [should]","caused_by":{"type":"parsing_exception","reason":"[term] query does not support [case_insensitive]","line":1,"col":942}}}},"status":400}
How we can fix that?
Thanks a lot!
PS ElasticSearch version: 7.9.3
HelmChart version: datahub-0.2.161bland-orange-13353
04/05/2023, 5:58 PMshy-alarm-27631
04/05/2023, 6:47 PM./gradlew quickstart
I get the following error:
ModuleNotFoundError: No module named 'confluent_kafka'
Iβm not sure why this is happening since I try importing it in a python shell and it works fine.
Can I get some help with this?cuddly-beach-83988
04/05/2023, 7:14 PMacryldata/datahub-actions:head
found the mssql-venv and phyiscally just install pyodbc, but i still require the driver file and unfortuantly the local datahub
user in the container does not have sudo permissions so I can't just install it via apt-get.
sigh can anyone please point me in the right direction?bumpy-musician-39948
04/06/2023, 3:06 AMbumpy-musician-39948
04/06/2023, 3:06 AMbland-orange-13353
04/06/2023, 6:28 AMbland-orange-13353
04/06/2023, 7:36 AMicy-dentist-82336
04/06/2023, 8:13 AMrich-dusk-60426
04/06/2023, 9:03 AMwide-afternoon-79955
04/06/2023, 9:47 AMfancy-crayon-39356
04/06/2023, 11:00 AM{
search(
input: {
type: DATASET,
query: "*",
start: 0,
count: 100,
orFilters: {
and: [
{
field: "urn",
values: ["urn:li:dataset:(...)", "urn:li:dataset:(...)"],
condition: EQUAL,
},
]
}
}
) {
start
count
total
searchResults {
entity {
... on Dataset {
urn
name
platform {
urn
}
dataPlatformInstance {
urn
}
domain {
domain {
urn
}
}
properties {
description
}
ownership {
owners {
associatedUrn
}
}
tags {
tags {
associatedUrn
}
}
glossaryTerms {
terms {
associatedUrn
}
}
schemaMetadata {
fields {
description
}
}
}
}
}
}
}
The problem is that it returns me empty results, always. The same does not happen when I try to filter for tags, platform, etc. It all works, except for filtering by URNs.
Am I doing something wrong here? My intention is to fetch information about ALL datasets that we have. However, this is not possible due to ES limitations when query is above 10k results (https://github.com/datahub-project/datahub/issues/4575), so my plan is to create batches of URNs and filter for it, one batch at a time.
Appreciate your help! πastonishing-animal-7168
04/06/2023, 12:49 PMbland-orange-13353
04/06/2023, 1:39 PMnumerous-eve-42142
04/06/2023, 6:32 PMFailed to execute job 3031 for task pipeline_ingest (No module named 'great_expectations.datasource.sqlalchemy_datasource'; 2234)
Even using
pip install 'acryl-datahub[great-expectations]'==0.9.3
(this is the version that i'm running datahub by now)
PS: i'm not using DataHubValidationAction, but without using
from datahub.integrations.great_expectations.action import DataHubValidationAction
It not works. I'm just profiling tables.
Just upgrading the platform and dependencies should work?cuddly-butcher-39945
04/06/2023, 7:57 PMastonishing-australia-72492
04/06/2023, 8:45 PMwhite-knife-12883
04/06/2023, 8:49 PMhelm --debug pull datahub --repo $'<https://helm.datahubproject.io>' --version 0.2.162 --destination __downloads --untar
Error: chart "datahub" version "0.2.162" not found in <https://helm.datahubproject.io> repository
But I see that version right over: https://github.com/acryldata/datahub-helm/releases/tag/datahub-0.2.162numerous-byte-87938
04/06/2023, 10:08 PMastonishing-knife-25309
04/07/2023, 3:23 AMpython3 -m datahub docker quickstart
the error that I am getting is this on a loop
C:\Windows\System32>python3 -m datahub docker quickstart
β[32mβ[2m[2023-04-06 22:19:52,724]β[0m INFO β[0m β[34mβ[2m{datahub.cli.quickstart_versioning:144}β[0m - Saved quickstart config to C:\Users\baelf/.datahub/quickstart/quickstart_version_mapping.yaml.β[0m
β[32mβ[2m[2023-04-06 22:19:52,725]β[0m INFO β[0m β[34mβ[2m{datahub.cli.docker_cli:638}β[0m - Using quickstart plan: composefile_git_ref='master' docker_tag='head'β[0m
β[32mβ[2m[2023-04-06 22:19:52,735]β[0m INFO β[0m β[34mβ[2m{datahub.cli.docker_cli:656}β[0m - compose file name C:\Users\baelf\.datahub\quickstart/docker-compose.ymlβ[0m
β[32mβ[2m[2023-04-06 22:19:52,740]β[0m INFO β[0m β[34mβ[2m{datahub.cli.docker_cli:840}β[0m - Fetching docker-compose file <https://raw.githubusercontent.com/datahub-project/datahub/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml> from GitHubβ[0m
Pulling docker images...
This may take a while depending on your network bandwidth.
time="2023-04-06T22:19:53-05:00" level=warning msg="The \"HOME\" variable is not set. Defaulting to a blank string."
time="2023-04-06T22:19:53-05:00" level=warning msg="The \"HOME\" variable is not set. Defaulting to a blank string."
Error response from daemon: readlink /var/lib/docker/overlay2: invalid argument
Error while pulling images. Going to attempt to move on to docker compose up assuming the images have been built locally
Starting up DataHub...
.time="2023-04-06T22:19:55-05:00" level=warning msg="The \"HOME\" variable is not set. Defaulting to a blank string."
time="2023-04-06T22:19:55-05:00" level=warning msg="The \"HOME\" variable is not set. Defaulting to a blank string."
Error response from daemon: readlink /var/lib/docker/overlay2: invalid argument
microscopic-room-90690
04/07/2023, 6:27 AMFailed to execute operation
java.lang.UnsupportedOperationException: Only upsert operation is supported
I found that others also got this trouble but did not got a workaround. Anyone can help?brave-room-48783
04/07/2023, 9:15 AMyaml: unmarshal errors:
line 123: mapping key "labels" already defined at line 98
line 157: mapping key "labels" already defined at line 146
line 172: mapping key "labels" already defined at line 160
line 203: mapping key "labels" already defined at line 190
After running the command - python3 -m datahub docker quickstart
Versions I am running - DataHub CLI version: 0.10.1.1
Python version: 3.9.6 (default, Mar 10 2023, 20:16:38)
[Clang 14.0.3 (clang-1403.0.22.14.1)]
Deployment method - Dockerwide-ghost-47822
04/07/2023, 10:35 AMaction_list
contains Datahub configs and I investigate the traffic with nmap tool on port 8080 which is the gms port, and I can see that every requests are responded with HTTP status code 200. You can see some examples of the output:
{"proposal": {"entityType": "assertion", "entityUrn": "urn:li:assertion:81f38ff71a20153a0291b5f087e75f55", "changeType": "UPSERT", "aspectName": "assertionInfo", "aspect": {"value": "{\"customProperties\": {\"expectation_suite_name\": \"test_suite_3\"}, \"type\": \"DATASET\", \"datasetAssertion\": {\"dataset\": \"urn:li:dataset:(urn:li:dataPlatform:mysql,<my_table>)\", \"scope\": \"DATASET_COLUMN\", \"fields\": [\"urn:li:schemaField:(urn:li:dataset:(urn:li:dataPlatform:mysql,london.logging_bid,PROD),id)\"], \"aggregation\": \"IDENTITY\", \"operator\": \"NOT_NULL\", \"nativeType\": \"expect_column_values_to_not_be_null\", \"nativeParameters\": {\"column\": \"id\", \"mostly\": \"1.0\"}}}", "contentType": "application/json"}}}
HTTP/1.1 200 OK.
Date: Fri, 07 Apr 2023 07:39:55 GMT.
Content-Type: application/json.
X-RestLi-Protocol-Version: 2.0.0.
Content-Length: 61.
Server: Jetty(9.4.46.v20220331).
.
{"value":"urn:li:assertion:81f38ff71a20153a0291b5f087e75f55"}
{"proposal": {"entityType": "assertion", "entityUrn": "urn:li:assertion:81f38ff71a20153a0291b5f087e75f55", "changeType": "UPSERT", "aspectName": "dataPlatformInstance", "aspect": {"value": "{\"platform\": \"urn:li:dataPlatform:great-expectations\"}", "contentType": "application/json"}}}
POST /aspects?action=ingestProposal HTTP/1.1.
Host: <my-host>:8080.
User-Agent: python-requests/2.26.0.
Accept-Encoding: gzip, deflate.
Accept: */*.
Connection: keep-alive.
X-RestLi-Protocol-Version: 2.0.0.
Content-Type: application/json.
Content-Length: 287.
What I expect now is that I should see the expectations result in Datahub UI in tab called Validation
. Yet it looks disabled.
So, maybe you can guide me what should be the reason for that.average-alligator-6750
04/07/2023, 12:27 PMastonishing-printer-13992
04/07/2023, 1:11 PMimportant-night-50346
04/08/2023, 1:25 AM2023-04-08 01:15:49.352 INFO 1 --- [ main] c.l.m.s.e.indexbuilder.ESIndexBuilder : Created index dataprocessinstanceindex_v2_1680916549224
2023-04-08 01:16:49.405 INFO 1 --- [ main] c.l.m.s.e.indexbuilder.ESIndexBuilder : Task: yLvZIFxaQ8Km_Dg8_0vdHg:464294584 - Reindexing from dataprocessinstanceindex_v2 to dataprocessinstanceindex_v2_1680916549224 in progress...
2023-04-08 01:17:49.441 WARN 1 --- [ main] c.l.m.s.e.indexbuilder.ESIndexBuilder : Task: yLvZIFxaQ8Km_Dg8_0vdHg:464294584 - Document counts do not match 3036994 != 79135. Complete: 2.6057014%
2023-04-08 01:17:50.441 INFO 1 --- [ main] c.l.m.s.e.indexbuilder.ESIndexBuilder : Task: yLvZIFxaQ8Km_Dg8_0vdHg:464294584 - Reindexing from dataprocessinstanceindex_v2 to dataprocessinstanceindex_v2_1680916549224 in progress...
2023-04-08 01:18:50.489 WARN 1 --- [ main] c.l.m.s.e.indexbuilder.ESIndexBuilder : Task: yLvZIFxaQ8Km_Dg8_0vdHg:464294584 - Document counts do not match 3037042 != 123981. Complete: 4.0822945%
2023-04-08 01:18:52.492 INFO 1 --- [ main] c.l.m.s.e.indexbuilder.ESIndexBuilder : Task: yLvZIFxaQ8Km_Dg8_0vdHg:464294584 - Reindexing from dataprocessinstanceindex_v2 to dataprocessinstanceindex_v2_1680916549224 in progress...
indices:
$ curl -XGET <https://redacted:443/_cat/indices?s=index:asc>
green open .kibana_1 _v7DnFkWTfiNG9bsgVBmbg 1 0 4 3 22.9kb 22.9kb
yellow open .opendistro-job-scheduler-lock gXdKjsz8TmyVxQOZO7JcVA 5 1 58 896 405kb 405kb
green open .opensearch-observability ne6d_vqlS7azmrr7rs8MmA 1 0 0 0 208b 208b
green open .tasks JmDYsKzQSByAOe9_4FnXdw 1 0 2 0 13.8kb 13.8kb
yellow open assertion_assertionruneventaspect_v1 E4Oy5K5LSMuw9sCoBdG3wg 1 1 0 0 208b 208b
yellow open assertionindex_v2_1680916341451 MeSGfdE2TDCT3P8NzLJVow 1 1 0 0 208b 208b
yellow open assertionindex_v2_clone_1680916209740 SZ5x1BmoRb-XQog-ldzVLg 1 1 0 0 208b 208b
yellow open chart_chartusagestatisticsaspect_v1 6sK_wp9qTf6YfbTNHU3cvA 1 1 0 0 208b 208b
yellow open chartindex_v2_1680916508743 FX-Q3t6VQgel_29jwYQwcA 1 1 0 0 208b 208b
yellow open chartindex_v2_clone_1680916215081 EPWNO6eORKmlyvvU4LdzpA 1 1 0 0 208b 208b
yellow open containerindex_v2_1680916217544 GdHa6mUWQJCnNchf8kc5pg 1 1 295 0 445.4kb 445.4kb
yellow open containerindex_v2_clone_1680916205327 aZ9Y6FpuTLq-C0nHUPQLgw 1 1 295 1 240.2kb 240.2kb
yellow open corpgroupindex_v2_1680916298952 c0k6PZlLThGxolcV2jcCuw 1 1 9039 0 33.5mb 33.5mb
yellow open corpgroupindex_v2_clone_1680916207464 lOwXCdCrQba0R52yqQ1UoQ 1 1 9039 3 7mb 7mb
yellow open corpuserindex_v2_1680916385514 J7Pm9vamT7ChRHZH58-szQ 1 1 54570 0 108.2mb 108.2mb
yellow open corpuserindex_v2_clone_1680916212508 sbBpUSPfTPOXGhlIgwSXbA 1 1 54570 1 30.1mb 30.1mb
yellow open dashboard_dashboardusagestatisticsaspect_v1 j4zayejpSwSxbwa_zT7umw 1 1 0 0 208b 208b
yellow open dashboardindex_v2_1680916383282 IUviBU6qSz2p-ixrJ3_a5Q 1 1 0 0 208b 208b
yellow open dashboardindex_v2_clone_1680916211075 6fPmGrUHQKmD3IVaGaNjOw 1 1 0 0 208b 208b
yellow open dataflowindex_v2_1680916447029 moxh5o89Ruugr0Jp4snkpA 1 1 351 0 880.8kb 880.8kb
yellow open dataflowindex_v2_clone_1680916213195 lU3789SoR5q6El2ILp6vew 1 1 351 8 469.6kb 469.6kb
yellow open datahubaccesstokenindex_v2_1680916258241 icRmUSt-TXqbUj1DS6dDEg 1 1 2 0 16.3kb 16.3kb
yellow open datahubaccesstokenindex_v2_clone_1680916206271 6j4VQAnMQ0CVwGoJJYWT1g 1 1 2 0 14.8kb 14.8kb
yellow open datahubexecutionrequestindex_v2_1680916320699 soLegwj1Q4Oi4YeDgcWfVQ 1 1 0 0 208b 208b
yellow open datahubexecutionrequestindex_v2_clone_1680916209016 TPcHVGCORfyxspzgJwlbzw 1 1 0 0 208b 208b
yellow open datahubingestionsourceindex_v2_1680916487702 VSCr-ptJQhOz8at3xv7Hyg 1 1 0 0 208b 208b
yellow open datahubingestionsourceindex_v2_clone_1680916213912 gweGm5jMTIqpVDCihNV3Yg 1 1 0 0 208b 208b
yellow open datahubpolicyindex_v2_1680916238023 BuJtg-s_Sjq-1fp5wAhxRA 1 1 15 0 43kb 43kb
yellow open datahubpolicyindex_v2_clone_1680916205715 8ms_cGW_R8KYrUrcaA1UkQ 1 1 15 2 50.3kb 50.3kb
yellow open datahubretentionindex_v2_1680916385221 -g4aYQilSOiRwLWnJOyhEA 1 1 0 0 208b 208b
yellow open datahubretentionindex_v2_clone_1680916212282 qkmMdoB_T8OojE-KEXwHwQ 1 1 0 0 208b 208b
yellow open datahubroleindex_v2_1680916278714 wPxRPENxQdS46Y-L_eW2AQ 1 1 3 0 11.6kb 11.6kb
yellow open datahubroleindex_v2_clone_1680916207180 3F9I5cySTtaJQcRaGbRdyA 1 1 3 0 7.7kb 7.7kb
yellow open datajob_datahubingestioncheckpointaspect_v1 CHuumlxhTqW2BJbiDk5xhQ 1 1 356 0 6.2mb 6.2mb
yellow open datajob_datahubingestionrunsummaryaspect_v1 pzAyIz42Rbif9dZOnOXmCQ 1 1 0 0 208b 208b
yellow open datajobindex_v2_1680916321205 cmO1uwNJR3O967kXcD_67Q 1 1 3547 26 6.3mb 6.3mb
yellow open datajobindex_v2_clone_1680916209481 j-V2kltTQw-a1Cbssg_Ztw 1 1 3547 469 3.5mb 3.5mb
yellow open dataprocessinstance_dataprocessinstanceruneventaspect_v1 iT2WND21QaWcaBm6DDOA5Q 1 1 6076340 0 1gb 1gb
yellow open dataprocessinstanceindex_v2 MuN9IKTLQgS-YvD2YiheoQ 1 1 3037068 18854 2.5gb 2.5gb
yellow open dataprocessinstanceindex_v2_1680916549224 6jd2Ykf5RzG3tjSGijOyAA 1 1 176529 0 296.5mb 296.5mb
yellow open dataprocessinstanceindex_v2_clone_1680916215556 ZAKuf8iwT7m8t8lysazAyQ 1 1 3036957 19447 2.5gb 2.5gb
yellow open dataset_datasetprofileaspect_v1 pzQ-y-U0TO-ZR70ZNIJ6Pg 1 1 0 0 208b 208b
yellow open dataset_datasetusagestatisticsaspect_v1 UWg2cP78T6uVPNr6ThRloQ 1 1 124827 7465 136.8mb 136.8mb
yellow open dataset_operationaspect_v1 S5Dw0GKlTbGECeGcKqvzlQ 1 1 6758731 1416 1018.3mb 1018.3mb
yellow open datasetindex_v2_1680916508969 S-F_HiBgTXaqQAi8mfK67Q 1 1 6560 0 22.4mb 22.4mb
yellow open datasetindex_v2_clone_1680916215296 nrzWuXqjSoeN0HCMZ5RFgw 1 1 6560 100 17.1mb 17.1mb
yellow open domainindex_v2_1680916487907 4izOUkg8TeiGOvp7ggb8Gg 1 1 20 0 30.8kb 30.8kb
yellow open domainindex_v2_clone_1680916214148 Fj5-lqmZRn62u4y879PAwQ 1 1 20 8 46.3kb 46.3kb
yellow open globalsettingsindex_v2_1680916319428 7svGEVJmT6-n5XvGP-UfOw 1 1 0 0 208b 208b
yellow open globalsettingsindex_v2_clone_1680916207901 -SFPy2AGRMyTBIFY7DI1wQ 1 1 0 0 208b 208b
yellow open glossarynodeindex_v2_1680916467488 O-4-C2RHRCCgz9EBYiIaDA 1 1 1 0 38.4kb 38.4kb
yellow open glossarynodeindex_v2_clone_1680916213672 zx8Yb_ZJSQGzPpW3uzwKLw 1 1 1 0 32.6kb 32.6kb
yellow open glossarytermindex_v2_1680916362427 D2bBNT6FT9-2IrSNssfw0w 1 1 1 0 40.4kb 40.4kb
yellow open glossarytermindex_v2_clone_1680916210607 VrgvdzxvTtmktHCTGq1_6A 1 1 1 0 34.7kb 34.7kb
yellow open graph_service_v1 W_Bcz6tuTwW6nH2NWSZ1uQ 1 1 3079242 15035 604.1mb 604.1mb
yellow open system_metadata_service_v1 -ljiiLMTSeidyP0yqTZ7Bw 1 1 9408521 73010 1.1gb 1.1gb
yellow open tagindex_v2_1680916341913 raqGV84UQAqJ0P3GdkLYtg 1 1 5 0 25.7kb 25.7kb
yellow open tagindex_v2_clone_1680916210179 LdLr2zOWQzy8LnlClzlqtw 1 1 5 0 12.3kb 12.3kb
yellow open telemetryindex_v2_1680916508365 6Nh947jPRLW4RqQcuk_JlA 1 1 0 0 208b 208b
yellow open telemetryindex_v2_clone_1680916214654 yjhPnNPNQuOFjUUBz9jbSA 1 1 0 0 208b 208b
yellow open testindex_v2_1680916445849 BP18OC6lTZezTEjNPQm68Q 1 1 0 0 208b 208b
yellow open testindex_v2_clone_1680916212733 UY6leBs9QDur6ENpOwg1Rg 1 1 0 0 208b 208b
most-room-32003
04/09/2023, 1:52 AMnice-helmet-40615
04/09/2023, 5:43 PMsource:
type: openapi
config:
name: petstore
url: <https://petstore.swagger.io/>
swagger_file: v2/swagger.json
Ingestion for own DataHub OpenApi endpoint has no errors but nothing ingested:
https://datahubproject.io/docs/api/openapi/openapi-usage-guide/
source:
type: openapi
config:
name: datahub
url: <http://localhost:8080/>
swagger_file: openapi/v3/api-docs
The same behavior for OpenApi endpoints examples from here:
https://developer.imis.com/docs/imis-rest-api-data-models-and-swagger-json-files
Did I do something wrong or it is an ingestion bug that should be reported?billions-journalist-13819
04/10/2023, 7:18 AM