Hi team! I installed an ES analyzer in docker cont...
# troubleshoot
t
Hi team! I installed an ES analyzer in docker container and restarted the container. And then I updated a field's description and tried searching with part of the description I just added. Yet the table of which I just updated a field's description did not appear in the search result. I tried to locate the problem by directly searching ES indices. I checked this link and tried to follow the steps. But where is the datasetdocument index? https://github.com/datahub-project/datahub/issues/1772 Here is the command I ran
curl -X GET --location "<http://192.168.25.133:9200/_cat/indices?v=&pretty=>"
And here is the result.
Copy code
health status index                                                    uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   dataset_datasetprofileaspect_v1                          hv4oWE6YSUSJdbcMBLU_DA   1   1          0            0       208b           208b
yellow open   datajobindex_v2                                          1S-T1Y5jQziyqQJe61DYGA   1   1          0            0       208b           208b
yellow open   datahubexecutionrequestindex_v2                          QIRguaTMRnCrR1FnjLt_pg   1   1          0            0       208b           208b
yellow open   datahubsecretindex_v2                                    7ifKdzCVRfSY5fVgDyGlwA   1   1          0            0       208b           208b
yellow open   mlmodelindex_v2                                          JQOJjEpzSUWka9Cz4HNdRA   1   1          0            0       208b           208b
yellow open   dataflowindex_v2                                         TsEzAXk8RNK4i4EaPwpXsw   1   1          0            0       208b           208b
yellow open   mlmodelgroupindex_v2                                     TsxHmhO6SDauKjuLAt5LkA   1   1          0            0       208b           208b
yellow open   datahubpolicyindex_v2                                    V4vqL__2RTuYWpZUeyjVxg   1   1          5            0     10.9kb         10.9kb
yellow open   assertionindex_v2                                        oe7-AhYMTZyYYCHAwzzLuQ   1   1          0            0       208b           208b
yellow open   corpuserindex_v2                                         ERzx5nyjSRq9rStuV8v4JA   1   1          0            0       208b           208b
yellow open   dataprocessindex_v2                                      jM3LMbXoTl-nCNeArvZyTA   1   1          0            0       208b           208b
yellow open   chartindex_v2                                            1NC6UgJBSBWzRG-K7MdpOA   1   1          0            0       208b           208b
yellow open   tagindex_v2                                              kCEw1rUhTN2tn-FQG4_7Ng   1   1          0            0       208b           208b
yellow open   mlmodeldeploymentindex_v2                                KaMP2khtSRKk3wCt2gxp-Q   1   1          0            0       208b           208b
yellow open   datajob_datahubingestioncheckpointaspect_v1              1majN12nTCqADYYm8L62wQ   1   1          0            0       208b           208b
yellow open   dataplatforminstanceindex_v2                             jhg10SLCSUa6YUaDRJLBow   1   1          0            0       208b           208b
yellow open   dashboardindex_v2                                        msE7SGasQgmmedPIb2ZBMw   1   1          0            0       208b           208b
yellow open   assertion_assertionruneventaspect_v1                     0LUTkuT8Rc6Cwcn1cqWPrw   1   1          0            0       208b           208b
yellow open   telemetryindex_v2                                        7ZPOU6smSvGNuj_Wk4c2NA   1   1          0            0       208b           208b
yellow open   datasetindex_v2                                          c68bjkNARQGah2SPO8qfXQ   1   1        109            2    205.9kb        205.9kb
yellow open   mlfeatureindex_v2                                        Bx-cGds6S--LP7iHv9lYRA   1   1          0            0       208b           208b
yellow open   datajob_datahubingestionrunsummaryaspect_v1              8EQMHCrCQTmw7Ez3qW9w_w   1   1          0            0       208b           208b
yellow open   dataplatformindex_v2                                     fvzlZxDATlK22B_1wQJBCw   1   1          0            0       208b           208b
yellow open   dataprocessinstanceindex_v2                              GWILAn0GSCCsiYAeXsn27w   1   1          0            0       208b           208b
yellow open   glossarynodeindex_v2                                     9ip6m8sjQtuZh38eFGZriQ   1   1          0            0       208b           208b
yellow open   datahubingestionsourceindex_v2                           ys5TcFZNQIeCmcAX8V0HLQ   1   1          0            0       208b           208b
yellow open   datahubretentionindex_v2                                 9g_c0oXaQ6CivDjVccoWdA   1   1          0            0       208b           208b
yellow open   graph_service_v1                                         sVrfS2KkQ5yRxnafSMPMzQ   1   1        112            0       28kb           28kb
yellow open   dataprocessinstance_dataprocessinstanceruneventaspect_v1 NuJ6YSr8Re22tq6NFF_dlQ   1   1          0            0       208b           208b
yellow open   system_metadata_service_v1                               jRmFnMePTnCgWYk6hthcIQ   1   1        908            5    102.4kb        102.4kb
yellow open   dataset_operationaspect_v1                               OVTWSBX6Q7eTwJhmZ-H1jA   1   1          0            0       208b           208b
yellow open   datahubaccesstokenindex_v2                               Tq_HSa-3QnePbcEa0YKmvQ   1   1          0            0       208b           208b
yellow open   containerindex_v2                                        gMq55jCbSmCsgxdcLfl8qQ   1   1          4            0     11.5kb         11.5kb
yellow open   schemafieldindex_v2                                      QN2r3l3xROO-1Cvr_pIG-w   1   1          0            0       208b           208b
yellow open   domainindex_v2                                           KMF7IFgeTvGZSAmymuOpmA   1   1          0            0       208b           208b
yellow open   testindex_v2                                             oosgqWP4SI6xg3mhzkLJcQ   1   1          0            0       208b           208b
yellow open   mlfeaturetableindex_v2                                   8CRhft62SyyqJlTHFEe0GA   1   1          0            0       208b           208b
yellow open   notebookindex_v2                                         4ixEGqMUQrKL3D9P6-Bx6w   1   1          0            0       208b           208b
yellow open   glossarytermindex_v2                                     0GdVW9OWTi2hujJ6yOI1Ow   1   1          0            0       208b           208b
yellow open   mlprimarykeyindex_v2                                     EsDswNx_TriyDGB5qDQkCg   1   1          0            0       208b           208b
yellow open   .ds-datahub_usage_event-2022.06.06-000001                9ESN2DuARruqm1ECipf3sQ   1   1        278            0    217.6kb        217.6kb
yellow open   corpgroupindex_v2                                        JLS6yZtRQPmIxQvyOhy7TA   1   1          0            0       208b           208b
yellow open   dataset_datasetusagestatisticsaspect_v1                  k1HmML_1SDiEivM5BiOHmA   1   1          0            0       208b           208b
I'm completely at a lose for what to do next. Could you kindly provide any suggestions?
b
Hi @thankful-magazine-50386 - this is the index you want to look at: datasetindex_v2
Please open up this index and see what's inside. We use this Chrome plugin to make this easy!
t
Thanks @big-carpet-38439 thanks , it's really a fantastic plugin. I'll have a try.
Hi @big-carpet-38439, I installed an third-party analyzer and updated the mapping of datasetindex_v2. I am able to hit the result by directly searching the index. Here is the new mapping properties I used:
Copy code
{
  "properties": {
    "editedFieldDescriptions": {
      "type": "keyword",
      "normalizer": "keyword_normalizer",
      "fields": {
        "chinese": {
          "type": "text",
          "analyzer": "ik_max_word"
        },
        "delimited": {
          "type": "text",
          "analyzer": "word_delimited"
        },
        "keyword": {
          "type": "keyword"
        }
      }
    }
  }
}
Below is the command I executed to query:
Copy code
GET 192.168.21.6:9800/datasetindex_v2/_search
Content-Type: application/json

{
  "query": {
    "query_string": {
      "query": "风场"
    }
  },
  "size": 100,
  "from": 0,
  "sort": []
}
And here is the result:
Copy code
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "datasetindex_v2",
        "_type": "_doc",
        "_id": "urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Atrino%2Cdelta.presto.ads_t_work_ticket%2CPROD%29",
        "_score": 0.5753642,
        "_source": {
          "urn": "urn:li:dataset:(urn:li:dataPlatform:trino,delta.presto.ads_t_work_ticket,PROD)",
          "hasContainer": true,
          "container": "urn:li:container:8c10da9c9f51714c5b4591f36ff2e709",
          "origin": "PROD",
          "id": "delta.presto.ads_t_work_ticket",
          "browsePaths": [
            "/prod/trino/delta/presto/ads_t_work_ticket"
          ],
          "platform": "urn:li:dataPlatform:trino",
          "removed": false,
          "customProperties": [],
          "name": "ads_t_work_ticket",
          "hasDescription": false,
          "fieldPaths": [
            "fcid",
            "bmmc",
            "added_ticket_count",
            "pending_ticket_count",
            "wind_field_code",
            "create_day",
            "data_year",
            "create_time",
            "effect_date",
            "expire_date"
          ],
          "fieldGlossaryTerms": [],
          "fieldDescriptions": [],
          "fieldTags": [],
          "typeNames": [
            "table"
          ],
          "editedFieldGlossaryTerms": [],
          "editedFieldDescriptions": [
            "风场代码"
          ],
          "editedFieldTags": []
        }
      }
    ]
  }
}
However when I tried searching in datahub, I got nothing. Could you kindly help me out? Is there any parameter I need to change?
@big-carpet-38439 I noticed the text analyzer was set to "word_delimited" in
SearchQueryBuilder.java
. Could that be a reason?
@little-megabyte-1074 Sorry to bother. But could you kindly help me out with this?
b
@thankful-magazine-50386 You are searching for description right? Yeah so description we do word-delimited to avoid blowing up the index completely
If you search for the full word description does it work?
t
Hi @big-carpet-38439, thanks for the reply. Yes, I'm searching for description. Searching for the full word description worked.