Hi. I am trying to update all descriptions of a da...
# ingestion
s
Hi. I am trying to update all descriptions of a dataset by re-ingesting the table from Hive. But after re-ingestion, modified descriptions don't change even w/ Status(removed=True) MCE event. How can I use the original table's description?
g
Hey @salmon-cricket-21860 - right now updated description take precedence over source descriptions in the UI
if you want to find the table's source description, you can click the edit icon. in the modal that appears, the source description should appear
what would be your preference? would you prefer to see the original description?
s
I am considering operational cases. For instance, β€’ User modified descriptions for table A β€’ But somehow, multiple original descriptions are incorrect for table A β€’ So want to update all by re-ingesting. β€’ User could modify columns based on the update original descriptions. The case is unusual, But we are considering all the cases before adopting Datahub for production
The reason I am asking, some urn?'s are updated by re-ingesting (e.g, Tags of Dataset by using transformation) So wanted to which object is updated or not.
g
I see- there is also the case where table A has a bad description in the database
πŸ™†β€β™‚οΈ 1
s
According to your answer, β€’ Custom description take precedence over original description β€’ But original descriptions are update by re-ingestion. I think it will be okay. Somehow user can check the updated original description.
g
exactly ^ original descriptions are updated by re-ingestion
❀️ 1
s
then User modifies it in Datahub to be correct
Right. this is cool part. Thanks!
g
here's an example:
in this case,
Corrected description
is whatever was most recently provided by the UI
and the text under
original:
shows the most recently ingested description from your source table
πŸ™ƒ 1
s
Awesome πŸ™‚ Thanks for the screenshot. Will try in our datahub distribution.
g
great πŸ™‚
s
@green-football-43791 I tested, the case what I described. β€’ User modified description as 'blank' (single space) (screenshot A) β€’ Admin re-ingested the table (screenshot B) But user-modified description still remains and considered as original. (screenshot A) Should I missing something?
g
hmm do you have access to your mysql?
it would be interesting to see whats in there
s
Yes, Please let me know any procedure I need to check
g
could you select all the rows with urn=<urn of your dataset>?
πŸ‘€ 1
and filter where aspect = "datasetProperties"?
πŸ‘€ 1
we also recently merged a fix to stop caching dataset information locally- if you see the new description & not in mysql you can try clearing your cache and refreshing
if that works, I'd make sure to update to the latest version- this may just be caching
s
Copy code
mysql> SELECT * FROM metadata_aspect_v2 WHERE urn LIKE '%place_competitor_client_activity_aggr_1d_daily_r0%' AND aspect = 'datasetProperties' LIMIT 10\G
*************************** 1. row ***************************
           urn: urn:li:dataset:(urn:li:dataPlatform:hive,default.t3_seller.place_competitor_client_activity_aggr_1d_daily_r0,PROD)
        aspect: datasetProperties
       version: 0
      metadata: {"description":"업체별 κ²½μŸμ—…μ²΄ ν΄λΌμ΄μ–ΈνŠΈ ν™œλ™ μš”μ•½ ν…Œμ΄λΈ” (1 일)","tags":[],"customProperties":{"Table Parameters: spark.sql.sources.schema.partCol.0":"p_ymd","Owner:":"hadoop","Table Parameters: spark.sql.sources.schema.part.0":"{\\\"type\\\":\\\"struct\\\",\\\"fields\\\":[{\\\"name\\\":\\\"batch_timestamp\\\",\\\"type\\\":\\\"long\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"배치 μž‘μ—… μ‹œκ°„ (epoch second)\\\"}},{\\\"name\\\":\\\"place_id\\\",\\\"type\\\":\\\"long\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"업체 아이디\\\"}},{\\\"name\\\":\\\"place_type\\\",\\\"type\\\":\\\"long\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"업체 μ’…λ₯˜ (1 = λͺ¨ν…”, 2 = ν˜Έν…”, 3 = νŽœμ…˜, 4 = κ²ŒμŠ€νŠΈν•˜μš°μŠ€)\\\"}},{\\\"name\\\":\\\"filtered_by\\\",\\\"type\\\":\\\"string\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"κ²½μŸμ—…μ²΄ 필터링 단계 (ACTIVITY = ν΄λΌμ΄μ–ΈνŠΈ 둜그 κΈ°μ€€ 미달, DISTANCE = 거리 κΈ°μ€€ 미달, NONE = 필터링 μ•ˆλ¨)\\\"}},{\\\"name\\\":\\\"competitor_place_types\\\",\\\"type\\\":\\\"string\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"κ²½μŸμ—…μ²΄ 필터링 ν›„ 계산에 μ‚¬μš©λœ (남은) 경쟁 μ—…μ²΄μ˜ μˆ™μ†Œ νƒ€μž…λ“€\\\"}},{\\\"name\\\":\\\"competitor_count\\\",\\\"type\\\":\\\"long\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"κ²½μŸμ—…μ²΄ 필터링 ν›„ 계산에 μ‚¬μš©λœ (남은) 경쟁 업체 수\\\"}},{\\\"name\\\":\\\"distance\\\",\\\"type\\\":\\\"string\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"경쟁 업체 거리 톡계\\\"}},{\\\"name\\\":\\\"click_count\\\",\\\"type\\\":\\\"string\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"경쟁 업체 μˆ™μ†Œ 클릭 수 톡계 (업체 λ¦¬μŠ€νŠΈμ— μ„œ)\\\"}},{\\\"name\\\":\\\"view_count\\\",\\\"type\\\":\\\"string\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"경쟁 업체 μˆ™μ†Œ 상세 νŽ˜μ΄μ§€ λ·° 수 톡계\\\"}},{\\\"name\\\":\\\"impression_count\\\",\\\"type\\\":\\\"string\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"경쟁 업체 μˆ™μ†Œ λ…ΈμΆœ 수 (업체 λ¦¬μŠ€νŠΈμ—μ„œ)\\\"}},{\\\"name\\\":\\\"reservation_count\\\",\\\"type\\\":\\\"string\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"경쟁 업체 ν΄λΌμ΄μ–ΈνŠΈ 둜그 κΈ°μ€€ μ˜ˆμ•½ 수 톡계\\\"}},{\\\"name\\\":\\\"reservation_price_sum\\\",\\\"type\\\":\\\"string\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"경쟁 업체 ν΄λΌμ΄μ–ΈνŠΈ 둜그 κΈ°μ€€ 가격 ν•© 톡계\\\"}},{\\\"name\\\":\\\"calculated_ctr\\\",\\\"type\\\":\\\"string\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"경쟁 업체 클릭율  톡계 (클릭 수 / λ…ΈμΆœ 수)\\\"}},{\\\"name\\\":\\\"calculated_cvrc\\\",\\\"type\\\":\\\"string\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"경쟁 업체 μ „ν™˜λ₯  톡계 (μ˜ˆμ•½ 수 / μˆ™μ†Œ 클릭 수)\\\"}},{\\\"name\\\":\\\"calculated_cvrv\\\",\\\"type\\\":\\\"string\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"경쟁 업체 μ „ν™˜λ₯  톡계 (μ˜ˆμ•½ 수 / μˆ™μ†Œ 상세 λ·° 수)\\\"}},{\\\"name\\\":\\\"calculated_rpc\\\",\\\"type\\\":\\\"string\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"경쟁 업체 클릭당 수읡 톡계 (μ˜ˆμ•½ κ°€κ²©μ˜ ν•© / 클릭 수)\\\"}},{\\\"name\\\":\\\"filter_min_click_count\\\",\\\"type\\\":\\\"long\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"κ²½μŸμ—…μ²΄ 필터링에 μ‚¬μš©λœ μ΅œμ†Œ 클릭 수 (업체 λ¦¬μŠ€νŠΈμ—μ„œ)\\\"}},{\\\"name\\\":\\\"filter_min_view_count\\\",\\\"type\\\":\\\"long\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"κ²½μŸμ—…μ²΄ 필터링에 μ‚¬μš©λœ μ΅œμ†Œ μˆ™μ†Œ 상세 λ·° 수\\\"}},{\\\"name\\\":\\\"filter_min_impression_count\\\",\\\"type\\\":\\\"long\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"κ²½μŸμ—…μ²΄ 필터링에 μ‚¬μš©λœ μ΅œμ†Œ μˆ™μ†Œ λ…ΈμΆœ 수 (업체 λ¦¬μŠ€νŠΈμ—μ„œ)\\\"}},{\\\"name\\\":\\\"filter_min_reservation_count\\\",\\\"type\\\":\\\"long\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"κ²½μŸμ—…μ²΄ 필터링에 μ‚¬μš©λœ μ΅œμ†Œ 업체 μ˜ˆμ•½ 수 (1 일 λ™μ•ˆ λ°œμƒν•œ - μˆ™λ°• κΈ°μ€€ 일이 μ•„λ‹˜.)\\\"}},{\\\"name\\\":\\\"filter_min_location_group_count\\\",\\\"type\\\":\\\"long\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"κ²½μŸμ—…μ²΄ 필터링에 μ‚¬μš©λœ μ΅œμ†Œ κ·Όμ ‘ 업체 수\\\"}},{\\\"name\\\":\\\"filter_max_distance_as_meters\\\",\\\"type\\\":\\\"long\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"κ²½μŸμ—…μ²΄ 필터링에 μ‚¬μš©λœ μ΅œλŒ€ 거리 (N λ―Έν„°)\\\"}},{\\\"name\\\":\\\"competitors\\\",\\\"type\\\":\\\"string\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"계산에 μ‚¬μš©λœ 경쟁 업체 place id 의 리슀트 (JSON stringified)\\\"}},{\\\"name\\\":\\\"p_ymd\\\",\\\"type\\\":\\\"string\\\",\\\"nullable\\\":true,\\\"metadata\\\":{}}]}","OutputFormat:":"org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat","CreateTime:":"Mon Dec 31 15:36:43 KST 2018","Retention:":"0","Table Parameters: EXTERNAL":"TRUE","Location:":"<s3://data-yanolja-general/data/t3/seller/combined/place_competitor_client_activity_1d_aggr_daily/r0>","Database:":"t3_seller","Table Parameters: numFiles":"924","Table Parameters: numPartitions":"924","Table Parameters: spark.sql.create.version":"2.4.0","Table Parameters: spark.sql.sources.schema.numPartCols":"1","Table Parameters: spark.sql.sources.schema.numParts":"1","Table Parameters: transient_lastDdlTime":"1546238203","Sort Columns:":"[]","Table Parameters: comment":"업체별 κ²½μŸμ—…μ²΄ ν΄λΌμ΄μ–ΈνŠΈ ν™œλ™ μš”μ•½ ν…Œμ΄λΈ” (1  일)","Num Buckets:":"-1","Bucket Columns:":"[]","Table Parameters: numRows":"0","InputFormat:":"org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat","Table Parameters: totalSize":"8451698341","SerDe Library:":"org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe","Storage Desc Params: serialization.format":"1","Table Type:":"EXTERNAL_TABLE","Compressed:":"No","LastAccessTime:":"UNKNOWN","Table Parameters: rawDataSize":"0"}}
systemmetadata: NULL
     createdon: 2021-07-15 01:04:54.863000
     createdby: urn:li:principal:UNKNOWN
    createdfor: NULL
1 row in set (0.01 sec)
g
hmm interesting-
it says there is just 1 row?
s
Sorry for large output 😞 The column name is
batch_timestamp
I modified above.
g
what happened to the row without a description?
s
Copy code
{\\\"name\\\":\\\"batch_timestamp\\\",\\\"type\\\":\\\"long\\\",\\\"nullable\\\":true,\\\"metadata\\\":{\\\"comment\\\":\\\"배치 μž‘μ—… μ‹œκ°„ (epoch second)\\\"}},
I think the row has original comment for the description. "배치 μž‘μ—… μ‹œκ°„"
I will try clear cache in browser as you suggested.
Datahub UI still shows the updated text in the original description section after clearning browser caches.
it says there is just 1 row?
Yes
g
ok- it looks like this may be a bug then
ill look into this and make sure its fixed asap- sorry about that!
the original description should show what is in the db, not the updated description in the UI
πŸ™†β€β™‚οΈ 1
s
Thanks for the quick response and guiding the procedure for debugging. Please l et me know If you need more testing or any information I can provide!
πŸ™ƒ 1
g
For sure! Happy to help.
I think we've got enough for now, but I'll let you know if I need more help.
πŸ™Œ 1
s
Ah, forgot to mention the ability to remove custom description. If we have the feature, managing all column descriptions would be easier. Currently only "add custom description" is possible. (Correct me if it's not) Or can we emit MCE event such as Status(removed=True) for description?
g
That makes sense! I just filed a feature request for that issue: https://github.com/linkedin/datahub/issues/2895
πŸ™Œ 1
feel free to add more context on the github issue ^
πŸ‘€ 1