Hello team, after upgrading to latest docker image...
# getting-started
c
Hello team, after upgrading to latest docker images, I ingested some PowerBI objects through GMS API, but browsing no longer works from UI. Ingestion is successful, as I can reach these objects by search. Think I've tried ingestion with BrowsePath and without BrowsePath. I don't see errors related to browsing in logs of UI, GMS and elasticsearch. where should I go next to figure this out. 🤔
Copy code
GMS logs:
17:12:07.872 [qtp544724190-3515] INFO  c.l.m.r.entity.EntityResource - GET urn:li:corpuser:datahub
17:12:07.875 [pool-9-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /entities/urn%3Ali%3Acorpuser%3Adatahub - get - 200 - 3ms
17:12:07.882 [I/O dispatcher 1] INFO  c.l.m.k.e.ElasticsearchConnector - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
17:12:08.359 [qtp544724190-3397] INFO  c.l.m.r.entity.EntityResource - BATCH GET [urn:li:corpuser:datahub]
17:12:08.363 [pool-9-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /entities?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 4ms
l
@green-football-43791 ^
g
you will want to ingest with BrowsePaths @curved-magazine-23582 - without would mean that you won't be able to browse to your entity!
Can you share an example snapshot of what you are ingesting? What does your BrowsePath aspect look like?
c
thanks for the info. Let me grab some example json.
Here is one:
Copy code
{
  "com.linkedin.common.BrowsePaths": {
    "paths": [
      "PROD/powerbi/Samtec - PBI Test Environment/GCT Wafer Yield Report Dataflow Test"
    ]
  }
}
here is another example, complete json blob for a dataset snapshot
Copy code
{
  "snapshot": {
    "urn": "urn:li:dataset:(urn:li:dataPlatform:powerbi,CR - Aws Athena Test3,PROD)",
    "aspects": [
      {
        "com.linkedin.common.Ownership": {
          "owners": [
            {
              "owner": "urn:li:corpuser:dataservices",
              "type": "DATAOWNER"
            }
          ],
          "lastModified": {
            "actor": "urn:li:corpuser:dataservices",
            "time": 1606234836017
          }
        }
      },
      {
        "com.linkedin.dataset.DatasetProperties": {
          "description": "",
          "tags": [],
          "customProperties": {
            "Workspace": "Samtec - PBI Test Environment",
            "PowerBI Dataset Id": "e2eca2da-c365-47a9-b910-318a49936164",
            "Configured By": "<mailto:tanmay.andalib@samtec.com|tanmay.andalib@samtec.com>",
            "Datasource1": "ODBC"
          }
        }
      },
      {
        "com.linkedin.common.Status": {
          "removed": false
        }
      },
      {
        "com.linkedin.schema.SchemaMetadata": {
          "schemaName": "CR - Aws Athena Test3-schema",
          "platform": "urn:li:dataPlatform:powerbi",
          "version": 0,
          "created": {
            "actor": "urn:li:corpuser:dataservices",
            "time": 1606234836017
          },
          "lastModified": {
            "actor": "urn:li:corpuser:dataservices",
            "time": 1606234836017
          },
          "hash": "",
          "platformSchema": {
            "com.linkedin.schema.KafkaSchema": {
              "documentSchema": "{\"type\":\"record\",\"name\":\"CR - Aws Athena Test3-schema\",\"namespace\":\"com.linkedin.dataset\",\"doc\":\"\",\"fields\":[]}"
            }
          },
          "fields": []
        }
      },
      {
        "com.linkedin.common.BrowsePaths": {
          "paths": [
            "PROD/powerbi/Samtec - PBI Test Environment/CR - Aws Athena Test3"
          ]
        }
      }
    ]
  }
}
g
Hmm- that looks promising
Do you have access to your elasticsearch index?
could you pull up the document related to one of those Urns?
I'd be interested to see what is in the browsePath property of the elastic document for those entities
c
thanks for checking on it! I am not really familiar with ES. i assume I need to query from ES API? u have an example to query doc by urn?
g
you'll want to use the GET api
Copy code
GET <index>/_doc/<doc_id>
the doc_id should be the entity's urn
to find your index,
GET /_cat/indices
will list your indices- check the one that starts with dataset_v2
c
thanks! will try all that
Copy code
curl "<http://localhost:9200/datasetindex_v2/_doc/urn:li:dataset:(urn:li:dataPlatform:powerbi,CR> - Aws Athena Test3,PROD)"  -X GET
Copy code
{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "invalid version format: - AWS ATHENA TEST3,PROD) HTTP/1.1"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "invalid version format: - AWS ATHENA TEST3,PROD) HTTP/1.1"
  },
  "status": 400
}
g
hmm
c
are those spaces in dataset names problematic?
g
potentially.. what if you strip the whitespace?
c
k, let me try that
g
does that curl work for entities that dont have whitespace in urns?
c
found a dataset with name
DRLS
, guess ES could not find it.
Copy code
curl "<http://localhost:9200/datasetindex_v2/_doc/urn:li:dataset:(urn:li:dataPlatform:powerbi,DRLS,PROD)>"  -X GET
Copy code
{
  "_index": "datasetindex_v2",
  "_type": "_doc",
  "_id": "urn:li:dataset:(urn:li:dataPlatform:powerbi,DRLS,PROD)",
  "found": false
}
but searching for 'DRLS' in DataHub works, leading me to the dataset 🤔
g
@curved-magazine-23582 you might need to url encode the URN
alternatively, you could try using elasticvue chrome extension
c
hmm, but for the
DRLS
one w/o space, browsing is still not working,
urn:li:dataset:(urn:li:dataPlatform:powerbi,DRLS,PROD)
I assumed there is something else wrong in addition to name/urn, no? 🤔
g
I don’t think so- I wonder if you need to url encode the : and ()
Just on the query side
When you are issueing the rest queries to elastic
I agree, you shouldn’t need to url encode on ingest
But viewing the documents from elastic will help us understand what is going wrong on the browse e
c
ah, gotcha, will try. thanks
still no go with ES. doesn't work even with user datahub. 😞 does below curl cmd look right to you to get doc of user 'datahub'?
Copy code
curl "<http://localhost:9200/corpuserindex_v2/_doc/urn%3Ali%3Acorpuser%3Adatahub>"  -X GET
Copy code
{
  "_index": "corpuserindex_v2",
  "_type": "_doc",
  "_id": "urn:li:corpuser:datahub",
  "found": false
}
g
ah ok- i see now
we already url encode the doc id once in order to make it compatible with elastic
so try the get with the doc id doubly url encoded
e.g.
Copy code
corpuserindex_v2/_doc/urn%253Ali%253Acorpuser%253Ajdoe
this worked for me ^
c
that makes sense. get doc for user datahub works now, but not for datasets. 😞
Copy code
curl "<http://localhost:9200/corpuserindex_v2/_doc/urn%253Ali%253Acorpuser%253Adatahub>"  -X GET
Copy code
{
  "_index": "corpuserindex_v2",
  "_type": "_doc",
  "_id": "urn%3Ali%3Acorpuser%3Adatahub",
  "_version": 2,
  "_seq_no": 3,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "urn": "urn:li:corpuser:datahub",
    "skills": [],
    "teams": [],
    "ldap": "datahub",
    "active": true,
    "fullName": "Data Hub",
    "email": "<mailto:datahub@linkedin.com|datahub@linkedin.com>"
  }
}
Copy code
curl "<http://localhost:9200/datasetindex_v2/_doc/urn%253Ali%253Adataset%253A(urn%253Ali%253AdataPlatform%253Apowerbi%252CDRLS%252CPROD)>"  -X GET
Copy code
{
  "_index": "datasetindex_v2",
  "_type": "_doc",
  "_id": "urn%3Ali%3Adataset%3A(urn%3Ali%3AdataPlatform%3Apowerbi%2CDRLS%2CPROD)",
  "found": false
}
g
Hey @curved-magazine-23582 - it might be helpful at this point to install the elasticvue chrome extension
its quite a bit easier than navigating elastic via rest API
i'd recommend downloading that, opening up the dataset index from the Indexes tab, and seeing whats going on in there
from the dataset index's page, you can issue a query for
*<your entity name>*
and see if you can find the document that way
c
thanks! here is the doc for a dataset named 'DRLS'
Copy code
{
  "_index": "datasetindex_v2",
  "_type": "_doc",
  "_id": "urn%3Ali%3Adataset%3A%28urn%3Ali%3AdataPlatform%3Apowerbi%2CDRLS%2CPROD%29",
  "_version": 8,
  "_seq_no": 6544,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "urn": "urn:li:dataset:(urn:li:dataPlatform:powerbi,DRLS,PROD)",
    "hasOwners": true,
    "owners": [
      "urn:li:corpuser:dataservices"
    ],
    "name": "DRLS",
    "origin": "PROD",
    "platform": "urn:li:dataPlatform:powerbi",
    "hasDescription": true,
    "removed": false,
    "fieldPaths": [],
    "fieldDescriptions": [],
    "fieldTags": [],
    "browsePaths": [
      "PROD/powerbi/Samtec - PBI Test Environment/DRLS"
    ]
  }
}
also noticed under index datasetindex_v2 in ES, it shows total of 1583 docs. while DataHub only shows 1497 datasets, which is why all my recent ingested datasets after upgrade are missing
oh, another thing I noticed is under
Analytics
section, the counts of objects seem to be correct, and matches counts in ES
g
Hey @curved-magazine-23582 - I have something for you to try!
The issue is that the browse paths are not prefixed with
/
try adding that as a prefix and let me know if that works!
c
haha, such an easy fix! Thanks for the help!
another issue for the browsePath I put in, is they are not lower case. I guess DataHub lowercase every browsepath after ingestion. The diff in casing is also part of the problem. seems all working now. Thanks again!
g
ahh great to hear
we will add something on the BE to make sure browse paths do not need the
/
prefix 🙂
👍 1