Hi, I’m trying to ingest dashboard data from looke...
# ingestion
b
Hi, I’m trying to ingest dashboard data from looker and it seems like gms chokes on utf-8 characters in dashboard titles:
Copy code
Sink (datahub-rest) report:
{'failures': [{'error': 'Unable to emit metadata to DataHub GMS',
               'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
                        'message': "javax.persistence.PersistenceException: Error[(conn=2227952) Incorrect string value: '\\xC4\\x97l pi...' for "
                                   "column 'metadata' at row 1]",
Is this something that can be easily fixed or are utf-8 characters are just not supported?
b
@early-lamp-41924 I remember we'd seen this before. We had to change a DB setting, maybe this didn't make it to OSS?
e
No šŸ˜ž Adding now
In the meantime, if you have a mysql client of choice. Run the following command
Copy code
ALTER TABLE metadata_aspect_v2 CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
b
Thanks Dexter šŸ™‚
m
@bumpy-activity-74405: the latest release of datahub (0.8.9) fixes this issue.
šŸ‘ 1
g
I am running datahub version 0.10.3 and facing same issue
Copy code
mysql> use datahub
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables
    -> ;
+--------------------+
| Tables_in_datahub  |
+--------------------+
| metadata_aspect_v2 |
| metadata_index     |
+--------------------+
2 rows in set (0.37 sec)

mysql> describe metadata_aspect_v2;
+----------------+--------------+------+-----+---------+-------+
| Field          | Type         | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| urn            | varchar(500) | NO   | PRI | NULL    |       |
| aspect         | varchar(200) | NO   | PRI | NULL    |       |
| version        | bigint(20)   | NO   | PRI | NULL    |       |
| metadata       | longtext     | NO   |     | NULL    |       |
| systemmetadata | longtext     | YES  |     | NULL    |       |
| createdon      | datetime(6)  | NO   |     | NULL    |       |
| createdby      | varchar(255) | NO   |     | NULL    |       |
| createdfor     | varchar(255) | YES  |     | NULL    |       |
+----------------+--------------+------+-----+---------+-------+
8 rows in set (0.37 sec)

mysql> SELECT default_character_set_name FROM information_schema.SCHEMATA WHERE schema_name = "datahub"
    -> ;
+----------------------------+
| default_character_set_name |
+----------------------------+
| latin1                     |
+----------------------------+
1 row in set (0.34 sec)

mysql> SELECT CCSA.character_set_name FROM information_schema.`TABLES` T,information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = "datahub" AND T.table_name = "metadata_aspect_v2";
+--------------------+
| character_set_name |
+--------------------+
| latin1             |
+--------------------+
1 row in set (0.27 sec)

mysql>
running
Copy code
ALTER TABLE metadata_aspect_v2 CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
seems to help but now seeing another issue
Copy code
/usr/local/bin/ingestion_common.sh: line 56:   113 Killed                  pip install -r $req_file
It fixed after increasing pod memory limit