thankful-ghost-61888
09/28/2022, 9:02 AM${EXTENDED}
keyword (docs here). From the ingestion logs:
b"2022-09-25 10:05:57,890 ERROR SQL lineage analyzer error 'An Identifier is expected, got Token[value: EXTENDED] instead.' for query: 'SELECT\n"
[2022-09-25 10:05:57,891] {{pod_launcher.py:156}} INFO - b" date_trunc('week',purchase_date) as purchase_date,\n"
[2022-09-25 10:05:57,891] {{pod_launcher.py:156}} INFO - b' user_id,\n'
[2022-09-25 10:05:57,891] {{pod_launcher.py:156}} INFO - b' buyer_country,\n'
[2022-09-25 10:05:57,892] {{pod_launcher.py:156}} INFO - b' count(id) as items_bought_week,\n'
[2022-09-25 10:05:57,892] {{pod_launcher.py:156}} INFO - b' sum(GMV) as gmv_bought_week,\n'
[2022-09-25 10:05:57,892] {{pod_launcher.py:156}} INFO - b' min(user_order_sequence_number) as user_order_sequence_number_minweek,\n'
[2022-09-25 10:05:57,892] {{pod_launcher.py:156}} INFO - b' max(user_order_sequence_number) as user_order_sequence_number_maxweek,\n'
[2022-09-25 10:05:57,892] {{pod_launcher.py:156}} INFO - b" percent_rank() over (partition by date_trunc('week',purchase_date) order by items_bought_week, sum(GMV)) as rank_items_bought,\n"
[2022-09-25 10:05:57,892] {{pod_launcher.py:156}} INFO - b" percent_rank() over (partition by date_trunc('week',purchase_date),buyer_country order by items_bought_week, sum(GMV)) as rank_items_bought_country\n"
[2022-09-25 10:05:57,892] {{pod_launcher.py:156}} INFO - b' FROM (EXTENDED)\n'
[2022-09-25 10:05:57,892] {{pod_launcher.py:156}} INFO - b' GROUP BY 1,2,3\n'
[2022-09-25 10:05:57,892] {{pod_launcher.py:156}} INFO - b'2022-09-25 10:05:57,890 ERROR sql holder not present so cannot get tables\n'
[2022-09-25 10:05:57,894] {{pod_launcher.py:156}} INFO - b'2022-09-25 10:05:57,890 ERROR sql holder not present so cannot get columns\n'
The original lookML sql looks like this:
include: "dt_engine_purchasetransaction.view"
view: dt_buyer_transactions_weekly {
extends: [dt_engine_purchasetransaction]
derived_table: {
sql:
SELECT
date_trunc('week',purchase_date) as purchase_date,
user_id,
buyer_country,
count(id) as items_bought_week,
sum(GMV) as gmv_bought_week,
min(user_order_sequence_number) as user_order_sequence_number_minweek,
max(user_order_sequence_number) as user_order_sequence_number_maxweek,
percent_rank() over (partition by date_trunc('week',purchase_date) order by items_bought_week, sum(GMV)) as rank_items_bought,
percent_rank() over (partition by date_trunc('week',purchase_date),buyer_country order by items_bought_week, sum(GMV)) as rank_items_bought_country
FROM (${EXTENDED})
GROUP BY 1,2,3
;;
}
We’re also seeing an issue with parsing a new line after the FROM
statement:
b"2022-09-25 10:06:10,069 ERROR SQL lineage analyzer error 'An Identifier is expected, got Token[value: \n"
[2022-09-25 10:06:10,070] {{pod_launcher.py:156}} INFO - b"] instead.' for query: 'SELECT\n"
[2022-09-25 10:06:10,070] {{pod_launcher.py:156}} INFO - b' __d_a_t_e\n'
[2022-09-25 10:06:10,070] {{pod_launcher.py:156}} INFO - b' , userid\n'
[2022-09-25 10:06:10,070] {{pod_launcher.py:156}} INFO - b' , query\n'
[2022-09-25 10:06:10,070] {{pod_launcher.py:156}} INFO - b' , sum(total_searches) AS total_searches\n'
[2022-09-25 10:06:10,070] {{pod_launcher.py:156}} INFO - b' FROM\n'
[2022-09-25 10:06:10,070] {{pod_launcher.py:156}} INFO - b' (\n'
[2022-09-25 10:06:10,070] {{pod_launcher.py:156}} INFO - b' (SELECT\n'...
The original syntax looks like this:
view: product_interaction_searches {
derived_table: {
sql_trigger_value: SELECT max(event_date::date) FROM datalake_processed.etl_tracking_searches;;
distribution: "date"
sortkeys: ["date"]
sql:
SELECT
date
, userid
, query
, sum(total_searches) AS total_searches
FROM
(
(SELECT
cast(date as date) AS date
, userid AS userid
, query
, count(*) AS total_searches
FROM datalake_compacted.mixpanel_tracking_search_results_query_action...
Any thoughts about the above?modern-artist-55754
09/28/2022, 9:41 AM(
, this is the open issue in sqllineage
https://github.com/reata/sqllineage/issues/278