Hello! Does anyone here have BigQuery + Metabase ...
# ingestion
b
Hello! Does anyone here have BigQuery + Metabase lineage working? It would be awesome to hear from even just one person that it works for them. It half worked up until 3 months ago before completely breaking with the latest /ingestion/source/metabase.py change.
πŸ” 1
πŸ“– 1
l
Hey there πŸ‘‹ I'm The DataHub Community Support bot. I'm here to help make sure the community can best support you with your request. Let's double check a few things first: βœ… There's a lot of good information on our docs site: www.datahubproject.io/docs, Have you searched there for a solution? βœ… button βœ… It's not uncommon that someone has run into your exact problem before in the community. Have you searched Slack for similar issues? βœ… button Did you find a solution to your issue? ❌ Sorry you weren't able to find a solution. I'm sending you some tips on info you can provide to help the community troubleshoot. Whenever you feel your issue is solved, please react βœ… to your original message to let us know!
a
CC: @gentle-hamburger-31302 for help here- May be a break in the recent update
b
Thanks. Happy to help debug or provide logs.
g
Hi @bland-lighter-26751 Could you please share the log. meanwhile I am setting up Metabase at my end to reproduce the issue.
b
Thanks @gentle-hamburger-31302. Which logs are you looking for? There actually is no errors relating to lineage failures. Let me know if you can reproduce!
g
@bland-lighter-26751 Could you please share DataHub Version where lineage was working ?
@bland-lighter-26751 I debug the metabase, It is urn issue. BigQuery connector is ingesting dataset URN in this format: urnlidataset:(urnlidataPlatform:bigquery,<project-id>.<database-name>.<table>,PROD) where as metabase connector is ingesting BigQuery lineage with format: urnlidataset:(urnlidataPlatform:bigquery,<database-name>.<table>,PROD). The <project-id> is missing in metabase URN generation logic. I am working on fix, will let you know once PR is raised cc: @astonishing-answer-96712
b
Glad to hear it! I am on 0.10.2 and Ingestion CLI 0.10.2.1
Misread your message. It was "working" before the file (metabase.py) I linked in my first message was last changed. I'm not sure the version. However, it has never completely worked. Metabase lineage would show a BQ table, but that BQ table was not properly linked. You can see an example in this thread: https://datahubspace.slack.com/archives/C029A3M079U/p1670873348300789
g
Hi @bland-lighter-26751 PR https://github.com/datahub-project/datahub/pull/8042 is raised
b
Thanks for getting that done!
Hey @gentle-hamburger-31302 can you confirm your fix made it into the latest 0.10.3 release? I just upgraded and ran my Metabase connector and still don't have lineage. Is there anything else I need to do?
a
Hi @bland-lighter-26751 It does have the fix. Could you please share the output of
datahub get urn --urn "<urn of dataset where lineage is missing>"
b
Sure, assuming you just want the first part of the response:
Copy code
{
  "browsePaths": {
    "paths": [
      "/prod/bigquery/study-xxxxx/hygieia"
    ]
  },
  "container": {
    "container": "urn:li:container:615ed490dab9f6ca68d77169976dab29"
  },
  "dataPlatformInstance": {
    "platform": "urn:li:dataPlatform:bigquery"
  },
  "datasetKey": {
    "name": "study-xxxxxx.Hygieia.masterMerchantTable",
    "origin": "PROD",
    "platform": "urn:li:dataPlatform:bigquery"
  },
  "datasetProperties": {
    "created": {
      "time": 1685072612042
    },
    "customProperties": {
      "billable_bytes_active": "11721891",
      "size_in_bytes": "11721891"
    },
    "externalUrl": "<https://console.cloud.google.com/bigquery?project=study-xxxxxx&ws=!1m5!1m4!4m3!1sstudy-xxxxx!2sHygieia!3smasterMerchantTable>",
    "lastModified": {
      "time": 1685072615283
    },
    "name": "masterMerchantTable",
    "qualifiedName": "study-xxxxxx.Hygieia.masterMerchantTable",
    "tags": []
  },
  "deprecation": {
    "actor": "urn:li:corpuser:cbeauchamp",
    "deprecated": false,
    "note": ""
  },
  "domains": {
    "domains": [
      "urn:li:domain:e437b0bf-e0ab-4ed5-8ecc-ee99caffa31c"
    ]
  },
a
@bland-lighter-26751 The lineage is missing from chart to this dataset! correct me if I am wrong ?
b
Yes! I do not have any lineage between BigQuery and Metabase charts still
and I confirmed that I am using CLI v0.10.3
I wonder if I need to do some database mappings in the Metabase recipe? There isn't too much in the docs for that though so not sure
g
@bland-lighter-26751 Could you please share output of
datahub get urn --urn "<chart urn>"
wanted to confirm if dataset urn in chart data is either missing or wrong
b
Copy code
{
  "browsePaths": {
    "paths": [
      "/metabase"
    ]
  },
  "chartInfo": {
    "chartUrl": "<https://metabase.domain.com/card/18>",
    "customProperties": {
      "Dimensions": "MerchantAccountId, BusinessId, MerchantName, DateApproved, LegalName, CurrencyId, Status, Activated, WeightClass, ActivityClass, HubMerchantAccountStatus, Msp, DateMerchantClosed, ClosureReason, Industry, IndustryCategory, IndustryMarket, IndustryId, Street1, Street2, City, Province, Country, PostalCode, Phone, EmailAddress, Unsubscribed, BusinessAge, Users, FavouriteTools, UsersAtSignUp, ApplicationMonthlyVolume, BusinessCategory, ApplicationTransactionVolume, PaymentFrequency, CanAcceptBankPayments, FirstTouchReferral, LastTouchReferral, WantToAcceptBankPayments, LogoUploaded, OrderedEquipment, Completed, OrderedEquipmentOnSignUp, HelcimCardReader, Skipped, HelcimCardReader2ndGen, HelcimStand, ZJ5809ThermalMobilePrinter, DateFirstBatch, DateLastBatch, Gpv, MedianMonthlyGPV, ReachedApplied, Transactions, MedianMonthlyTransactions, ActiveDays, MonthsToReachApplied, DaysToFirstBatch, DaysSinceLastBatch, DaysProcessing, ActiveRatio, Cohort, SuspectedFraud, MeanMonthlyGPV, Portfolio, Processed Cohort ID, Onboarded Cohort ID, DelayedDeliveryDays, TransactionLog β†’ Merchant Account ID, TransactionLog β†’ Transaction ID, TransactionLog β†’ Date Closed, TransactionLog β†’ Date Settled, TransactionLog β†’ Amount, TransactionLog β†’ Entry Point, TransactionLog β†’ Gross Profit, TransactionLog β†’ Entry Mode, TransactionLog β†’ Transaction Type",
      "Filters": "['and', ['=', ['field', 57, None], 'Addictive Ink Gear', 'Kinjo Express Millrise '], ['=', ['field', 93, {'join-alias': 'TransactionLog'}], 8], ['=', ['field', 57, None], 'Addictive Ink Gear']]",
      "Metrics": ""
    },
    "description": "",
    "inputs": [
      {
        "string": "urn:li:dataset:(urn:li:dataPlatform:Study,Hygieia.masterMerchantTable,PROD)"
      }
    ],
    "lastModified": {
      "created": {
        "actor": "urn:li:corpuser:rsekulic@domain.com",
        "time": 1654106342000
      },
      "lastModified": {
        "actor": "urn:li:corpuser:rsekulic@domain.com",
        "time": 1654106342000
      }
    },
    "title": "MasterMerchantTables, Filtered by MerchantName, , and MerchantName",
    "type": "LINE"
  },
  "chartKey": {
    "chartId": "18",
    "dashboardTool": "metabase"
  },
  "dataPlatformInstance": {
    "platform": "urn:li:dataPlatform:metabase"
  },
  "ownership": {
    "lastModified": {
      "actor": "urn:li:corpuser:unknown",
      "time": 0
    },
    "owners": [
      {
        "owner": "urn:li:corpuser:user@domain.com",
        "type": "DATAOWNER"
      }
    ]
  }
}
I think I see the problem
Copy code
(urn:li:dataPlatform:Study,Hygieia.masterMerchantTable,PROD)
should be study-xxxx, right?
g
Copy code
urn:li:dataPlatform:Study
?
What is Study ? not able to find this platform in DataHub doc. If the chart input is bigquery then it should be urnlidataPlatform:bigquery
^ @bland-lighter-26751
b
Sorry, that was from when I was playing around with mappings. It's just an internal name. "string": "urnlidataset:(urnlidataPlatform:bigquery,Hygieia.masterMerchantTable,PROD)" That is what it says now
g
The dataset urn in chart input should be as per above format
Copy code
"inputs": [
      {
        "string": "urn:li:dataset:(urn:li:dataPlatform:Study,Hygieia.masterMerchantTable,PROD)"
      }
Here it is wrong so not going to work
b
Ya Study does not exist anymore
I just re-ran the ingestion and they all say urnlidataset:(urnlidataPlatform:bigquery now
same issue though
g
@bland-lighter-26751 Could you please share the latest output of commands (chart and dataset ) and also share the command you have executed
b
sure
g
fyi... you need to ingest both the bigquery and metabase data for lineage to work
b
Yes, I did re-run both. Here is the metabase chart output
Copy code
datahub@actions:/$ datahub get urn --urn "urn:li:chart:(metabase,18)"
{
  "browsePaths": {
    "paths": [
      "/metabase"
    ]
  },
  "chartInfo": {
    "chartUrl": "<https://metabase.domain.com/card/18>",
    "customProperties": {
      "Dimensions": "MerchantAccountId, BusinessId, MerchantName, DateApproved, LegalName, CurrencyId, Status, Activated, WeightClass, ActivityClass, HubMerchantAccountStatus, Msp, DateMerchantClosed, ClosureReason, Industry, IndustryCategory, IndustryMarket, IndustryId, Street1, Street2, City, Province, Country, PostalCode, Phone, EmailAddress, Unsubscribed, BusinessAge, Users, FavouriteTools, UsersAtSignUp, ApplicationMonthlyVolume, BusinessCategory, ApplicationTransactionVolume, PaymentFrequency, CanAcceptBankPayments, FirstTouchReferral, LastTouchReferral, WantToAcceptBankPayments, LogoUploaded, OrderedEquipment, Completed, OrderedEquipmentOnSignUp, domainCardReader, Skipped, domainCardReader2ndGen, domainStand, ZJ5809ThermalMobilePrinter, DateFirstBatch, DateLastBatch, Gpv, MedianMonthlyGPV, ReachedApplied, Transactions, MedianMonthlyTransactions, ActiveDays, MonthsToReachApplied, DaysToFirstBatch, DaysSinceLastBatch, DaysProcessing, ActiveRatio, Cohort, SuspectedFraud, MeanMonthlyGPV, Portfolio, Processed Cohort ID, Onboarded Cohort ID, DelayedDeliveryDays, TransactionLog \u2192 Merchant Account ID, TransactionLog \u2192 Transaction ID, TransactionLog \u2192 Date Closed, TransactionLog \u2192 Date Settled, TransactionLog \u2192 Amount, TransactionLog \u2192 Entry Point, TransactionLog \u2192 Gross Profit, TransactionLog \u2192 Entry Mode, TransactionLog \u2192 Transaction Type",
      "Filters": "['and', ['=', ['field', 57, None], 'Addictive Ink Gear', 'Kinjo Express Millrise '], ['=', ['field', 93, {'join-alias': 'TransactionLog'}], 8], ['=', ['field', 57, None], 'Addictive Ink Gear']]",
      "Metrics": ""
    },
    "description": "",
    "inputs": [
      {
        "string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,Hygieia.masterMerchantTable,PROD)"
      }
    ],
    "lastModified": {
      "created": {
        "actor": "urn:li:corpuser:rsekulic@domain.com",
        "time": 1654106342000
      },
      "lastModified": {
        "actor": "urn:li:corpuser:rsekulic@domain.com",
        "time": 1654106342000
      }
    },
    "title": "MasterMerchantTables, Filtered by MerchantName, , and MerchantName",
    "type": "LINE"
  },
  "chartKey": {
    "chartId": "18",
    "dashboardTool": "metabase"
  },
  "dataPlatformInstance": {
    "platform": "urn:li:dataPlatform:metabase"
  },
  "ownership": {
    "lastModified": {
      "actor": "urn:li:corpuser:unknown",
      "time": 0
    },
    "owners": [
      {
        "owner": "urn:li:corpuser:zvisram@domain.com",
        "type": "DATAOWNER"
      }
    ]
  }
}
datahub@actions:/$
Copy code
datahub get urn --urn "urn:li:dataset:(urn:li:dataPlatform:bigquery,study-11111.Hygieia.masterMerchantTable,PROD)"
{
  "browsePaths": {
    "paths": [
      "/prod/bigquery/study-11111/hygieia"
    ]
  },
  "container": {
    "container": "urn:li:container:615ed490dab9f6ca68d77169976dab29"
  },
  "dataPlatformInstance": {
    "platform": "urn:li:dataPlatform:bigquery"
  },
  "datasetKey": {
    "name": "study-11111.Hygieia.masterMerchantTable",
    "origin": "PROD",
    "platform": "urn:li:dataPlatform:bigquery"
  },
  "datasetProperties": {
    "created": {
      "time": 1685072612042
    },
    "customProperties": {
      "billable_bytes_active": "11721891",
      "size_in_bytes": "11721891"
    },
    "externalUrl": "<https://console.cloud.google.com/bigquery?project=study-11111&ws=!1m5!1m4!4m3!1sstudy-11111!2sHygieia!3smasterMerchantTable>",
    "lastModified": {
      "time": 1685072615283
    },
    "name": "masterMerchantTable",
    "qualifiedName": "study-11111.Hygieia.masterMerchantTable",
    "tags": []
  },
  "deprecation": {
    "actor": "urn:li:corpuser:cbeauchamp",
    "deprecated": false,
    "note": ""
  },
domainactionLog \u2192 Entry Point, TransactionLog \u2192 Gross Profit, TransactionLog \u2192 Entry Mode, TransactionLog \u2192 Transaction Type",
      "Filters": "['and', ['=', ['field', 57, None], 'Addictive Ink Gear', 'Kinjo Express Millrise '], ['=', ['field', 93, {'join-alias': 'TransactionLog'}], 8], ['=', ['field', 57, None], 'Addictive Ink Gear']]",
      "Metrics": ""
    },
    "description": "",
    "inputs": [
      {
        "string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,Hygieia.masterMerchantTable,PROD)"
      }
    ],
    "lastModified": {
      "created": {
        "actor": "urn:li:corpuser:rsekulic@domain.com",
        "time": 1654106342000
      },
      "lastModified": {
        "actor": "urn:li:corpuser:rsekulic@domain.com",
        "time": 1654106342000
      }
    },
    "title": "MasterMerchantTables, Filtered by MerchantName, , and MerchantName",
    "type": "LINE"
  },
  "chartKey": {
    "chartId": "18",
    "dashboardTool": "metabase"
  },
  "dataPlatformInstance": {
    "platform": "urn:li:dataPlatform:metabase"
  },
  "ownership": {
    "lastModified": {
      "actor": "urn:li:corpuser:unknown",
      "time": 0
    },
    "owners": [
      {
        "owner": "urn:li:corpuser:zvisram@domain.com",
        "type": "DATAOWNER"
      }
    ]
  }
}
datahub@actions:/$
a
@bland-lighter-26751 both are chart output
b
fixed
a
again
study-11111
is missing from chart intpus urn. Is study-11111 project-id?
b
yes
Just for testing, I made a new chart in Metabase and re-ran ingestion. study-11111 is missing still
a
@bland-lighter-26751 let me verify at my end, just confirm
datahub --version
is pointing to correct version
b
Oh, in the docker container it says
Copy code
datahub@actions:/$ datahub --version
acryl-datahub, version 0.10.0.7
Ingestion report in the UI says this though
Copy code
~~~~ Ingestion Report ~~~~
{
  "cli": {
    "cli_version": "0.10.3",
I just manually updated and datahub --version now says 0.10.3. Let me try ingesting again
g
Ok
b
I still get
Copy code
"string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,Hygieia.masterMerchantTable,PROD)"
g
Ok, will update you soon
b
Thank you for digging into this with me. I'm going to bed now and will be back online in the morning
a
Hi @bland-lighter-26751 It is working at our end. Please find the image and recipe (nothing special in recipe as well)
Copy code
source:
  type: metabase
  config:
    # Coordinates
    connect_uri: <http://localhost:3000>

    # Credentials
    username: "xxxxxxxxxx"
    password: "xxxxxxxxx"
    
sink:
  type: "datahub-rest"
  config:
    server: "<http://localhost:8080>"
inputs of chart:
Copy code
"inputs": [
      {
        "string": "urn:li:dataset:(urn:li:dataPlatform:bigquery,acryl-data.ipl.matches-22,PROD)"
      }
 ],
Could you setup a docker quickstart (datahub docker quickstart ) and try the ingestion on that docker instance. if it doesn't work then We will take a slack call on Monday (fyi.. I am in IST timezone)
c
Uploaded by Mohdsiddique Bagwan
b
Thanks @gentle-hamburger-31302. That's good to see it works on your end. I will do some playing around today
Ok, I setup a dev environment using the latest quickstart images. I see two problems now when checking urns... In Metabase, if the question is generated through the UI (Question), my urn has the same problem as yesterday.
Copy code
datahub get urn --urn "urn:li:chart:(metabase,3558)"
urn:li:dataset:(urn:li:dataPlatform:bigquery,Hygieia.masterMerchantTable,PROD)
In Metabase, if the question is created with custom SQL, my urn DOES have the project-id, but lineage doesn't work. Could that be because everything is lowercase?
Copy code
datahub get urn --urn "urn:li:chart:(metabase,2148)"
urn:li:dataset:(urn:li:dataPlatform:bigquery,study-11111.hygieia.mastermerchanttable,PROD)
g
Could you please go to bigquery dataset and poste its urn
b
urn:li:dataset:(urn:li:dataPlatform:bigquery,study-11111.Hygieia.masterMerchantTable,PROD)
a
You need to use flag convert_urns_to_lowercase with bigquery
It is urn case issue
b
Oh ok let me give that a shot right now!
That worked in my dev environment!
So, now I have lineage on SQL questions in Metabase. I still do not have lineage on charts created with the UI because the ingestion creates those without the project-id
urn:li:dataset:(urn:li:dataPlatform:bigquery,Hygieia.masterMerchantTable,PROD)
g
Could you please share steps to create such chart so that I can reproduce it my end
b
Sure, you just click new question (clicking Native Query makes lineage work cause you can type the project-id in the query), then raw data, pick the data source (in my case BQ is called Study), then create a visualization and save
Important note for anyone else that comes across this. Setting convert_urns_to_lowercase will wipe the metadata associated with your BigQuery assets. I ended up having to do a restore on elastic and mysql
a
Hi @bland-lighter-26751 I identified the root cause of broken lineage while using raw sql query, if project id/schema/table has hyphen i.e - then sqllineage==1.3.6 doesn't work. Try with project name that doesn't have hyphen in name. I am discussing with team for ideal solution to this problem cc: @dazzling-judge-80093
b
I currently don't have a project without a hyphen. but regardless, if I have to change BigQuery to use
convert_urns_to_lowercase
then I can't do anything without having all the metadata wiped..
a
If stateful ingestion is enabled, it soft-deletes the records that it doesn’t see again in the latest ingestion. This is the same if you would run a delete command -> https://datahubproject.io/docs/how/delete-metadata/#soft-delete-an-entity-default If you run your ingestion from the UI then you can revert an ingestion by hitting the
Rollback
button next to the last ingestion run:
c
Uploaded by Tamas Nemeth (Acryl)
b
Hey Tamas, I'm not sure I'm understanding correctly. If I want lineage to work in some cases (different types of Metabase dashboards ingest differently), I will need to convert urns to lowercase forever. So while I could rollback, I would want them to be lowercase going forward so that some lineage appears. Are you saying I could ingest as lowercase, rollback, then continue to ingest as lowercase and my "old" asset metadata would still show up?
a
sorry, I was replying to your
Important note
b
Ahh, gotcha.
Is there a way to convert my urns to lowercase by just updating the existing assets? so I don't lose metadata
my plan would be to convert all existing BigQuery URNs to lowercase, then add
convert_urns_to_lowercase
to my existing BQ recipes.
a
@bland-lighter-26751 I believe you need to do urn migration using a shell script. @gray-shoe-75895 do we have such shell script for urn migration ?
b
That would be awesome to get a script. Did some playing around with
metabase.py
in my dev environment today and if I hard-code my project for
source_db
and have everything in lowercase it should all work... haha 🀞
Once the lowercase URN conversion is done, these are the changes I will implement to make it all work.
Copy code
source_tables.add(
    f"{source_db + '.' if source_db else 'hardcoded-hyphen-project.'}{source_schema.lower()}.{source_table.lower()}"
)
Because the Metabase connector returns different casing depending on the Metabase question type, I will also convert everything to lowercase. Edit - actually first I am going to find where the metabase connector does the lowercase conversion half the time and just remove it, then maybe I won't need to do a URN conversion Edit 2 - It's the LineageRunner in the
else
for
get_datasource_urn()
that is doing the conversion to lowercase
Made an issue here trying to explain the bug: https://github.com/datahub-project/datahub/issues/8814 Would still love the urn migration script if it exists. Thanks
a
Thanks for the detailed bug report! We'll look into it One thing I wanted to flag - we have a new SQL parser that automatically looks at the tables you have in datahub and tries to guess the correct casing accordingly. We haven't rolled it out to metabase yet, but I do wonder if we used that instead of LineageRunner these problems might just get resolved
b
Oh interesting. If you ever want me to test out the new parser let me know!
s
+1!