Hi How to delete all data in production, including...
# ingestion
m
Hi How to delete all data in production, including all entities like dataset, tag? Thanks
I noticed there is a
datahub delete
command, but it can only delete dataset? and has limit of 10000?
o
If just want a full wipe you can drop the database and indices
thank you 1
m
@orange-night-91387 I deleted the content in mysql database
Copy code
mysql> drop table metadata_aspect_v2
    -> ;
Query OK, 0 rows affected (0.01 sec)

mysql> drop table metadata_index;
Query OK, 0 rows affected (0.00 sec)

mysql> -- create metadata aspect table
mysql> create table if not exists metadata_aspect_v2 (
    ->   urn                           varchar(500) not null,
    ->   aspect                        varchar(200) not null,
    ->   version                       bigint(20) not null,
    ->   metadata                      longtext not null,
    ->   systemmetadata                longtext,
    ->   createdon                     datetime(6) not null,
    ->   createdby                     varchar(255) not null,
    ->   createdfor                    varchar(255),
    ->   constraint pk_metadata_aspect_v2 primary key (urn,aspect,version)
    -> );
Query OK, 0 rows affected (0.01 sec)

mysql> 
mysql> -- create default records for datahub user if not exists
mysql> CREATE TABLE temp_metadata_aspect_v2 LIKE metadata_aspect_v2;
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO temp_metadata_aspect_v2 (urn, aspect, version, metadata, createdon, createdby) VALUES(
    ->   'urn:li:corpuser:datahub',
    ->   'corpUserInfo',
    ->   0,
    ->   '{"displayName":"Data Hub","active":true,"fullName":"Data Hub","email":"<mailto:datahub@linkedin.com|datahub@linkedin.com>"}',
    ->   now(),
    ->   'urn:li:corpuser:__datahub_system'
    -> ), (
    ->   'urn:li:corpuser:datahub',
    ->   'corpUserEditableInfo',
    ->   0,
    ->   '{"skills":[],"teams":[],"pictureLink":"<https://raw.githubusercontent.com/linkedin/datahub/master/datahub-web-react/src/images/default_avatar.png>"}',
    ->   now(),
    ->   'urn:li:corpuser:__datahub_system'
    -> );
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> -- only add default records if metadata_aspect is empty
mysql> INSERT INTO metadata_aspect_v2
    -> SELECT * FROM temp_metadata_aspect_v2
    -> WHERE NOT EXISTS (SELECT * from metadata_aspect_v2);
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> DROP TABLE temp_metadata_aspect_v2;
Query OK, 0 rows affected (0.00 sec)

mysql> 
mysql> -- create metadata index table
mysql> CREATE TABLE IF NOT EXISTS metadata_index (
    ->  `id` BIGINT NOT NULL AUTO_INCREMENT,
    ->  `urn` VARCHAR(200) NOT NULL,
    ->  `aspect` VARCHAR(150) NOT NULL,
    ->  `path` VARCHAR(150) NOT NULL,
    ->  `longVal` BIGINT,
    ->  `stringVal` VARCHAR(200),
    ->  `doubleVal` DOUBLE,
    ->  CONSTRAINT id_pk PRIMARY KEY (id),
    ->  INDEX longIndex (`urn`,`aspect`,`path`,`longVal`),
    ->  INDEX stringIndex (`urn`,`aspect`,`path`,`stringVal`),
    ->  INDEX doubleIndex (`urn`,`aspect`,`path`,`doubleVal`)
    -> );
Query OK, 0 rows affected (0.00 sec)
I can still see content in the UI. I guess there are still some data than mysql?
o
Yeah you'll need to drop the ElasticSearch indices as well
m
Do you know how to do it? 😅Thanks
o
If you have access to the ElasticSearch REST API you can use: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html After getting all your index names to delete with: https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-indices.html
thank you 1
m
Hi @orange-night-91387 I deleted all indices. Then I started to ingest data. It failed with
Copy code
22:07:51.518 [generic-mae-consumer-job-client-0-C-1] ERROR c.l.m.k.MetadataChangeLogProcessor:78 - Failed to execute MCL hook with name com.linkedin.metadata.kafka.hook.UpdateIndicesHook
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=index_not_found_exception, reason=no such index [graph_service_v1]]
I think I deleted some required indices. Do you know which indices are required? Thanks
I used elasticsearch-setup to setup all required indices. I feel it is difficult to clean up everything. 😅