After upgrading `0.8.23` -> `0.8.32.1` I’ve los...
# troubleshoot
b
After upgrading
0.8.23
->
0.8.32.1
I’ve lost most of my looker charts (±58k to ±1k). I don’t have a deep understanding of what happens when gms is starting but I suspect that this happened, because the container was killed in the middle of bootstrapping due to misconfiguration of
initialDelaySeconds
on my part. I see around ±58k chart records in mysql (
aspect = chartKey
) and
chartindex_v2
es index. I’ve tried reingesting but only 100-200 more charts show up after ingesting the ±58k elements. I suspect that the couple hundred more that appear in the UI are the ones that had any changes. How does one go about fixing something like this?
d
I think if you reingest the data then it should show up. Please, can you check based on your ingestion log how many entities it ingested? Is stateful ingestion enabled for you?
b
What is stateful ingestion?
d
This is but I think it is not available for charts yet https://datahubproject.io/docs/metadata-ingestion/source_docs/stateful_ingestion/
b
I can’t get the cli to show me what I want
datahub ingest list-runs 0 100
returns some random old runs. The PAGE_SIZE option seems to be not working and there is no option to order by created at as far as I can see. Anyhow, by the number of rows in my ingestion logs I can see that there are a lot more items than a couple of thousand which I have in my UI. Besides when I ingest in my sandbox environment I can see:
while in prod I have (up from 1k since yesterday when I made the upgrade):
I’ve tried deleting via the cli as well (
datahub delete --entity_type chart --platform looker
), but it only found 61 entries which is also a mystery to me
In regards to stateful ingestion it doesn’t appear that I have that:
Copy code
{
  "models": {},
  "versions": {
    "linkedin/datahub": {
      "version": "v0.8.32",
      "commit": "ede6547eff1496a87048e5520f1d9e53e148f72c"
    }
  },
  "managedIngestion": {
    "defaultCliVersion": "0.8.26.6",
    "enabled": true
  },
  "statefulIngestionCapable": true,
  "supportsImpactAnalysis": true,
  "telemetry": {
    "enabledCli": true,
    "enabledIngestion": false
  },
  "retention": "true",
  "noCode": "true"
}
Oh sry, it’s actually there
"statefulIngestionCapable" : true,
, need to get my eyes checked out 😅
Found some numbers in the ingestion log too:`'charts_scanned': 66427`,
'records_written': 61087
Any ideas what else I can check? I am at my wits end this… I can actually access those charts if I type in the urn in the url, but I can’t search for those charts and they are not included in the count (I assume).
d
if stateful ingestion is true then it will only ingest the changes since the last run
b
As far as I understand it’s only
statefulIngestionCapable
. This needs to be explicitly enabled via recipes and is only available for sql based sources - https://datahubproject.io/docs/metadata-ingestion/source_docs/stateful_ingestion/#supported-sources.
Another thing I found is that I somehow have two chart indices on es:
Copy code
green open chartindex_v2_1649329490429                               oAil2RHdQYqvjNP4m-cnpA 1 1  58053     0    22mb  10.8mb
green open chartindex_v2                                             gZbQGeQFQ5eJ1b5vRv-PfQ 1 1   2277   116   1.1mb 525.2kb