Hi we are going to ingest thrift file. I think cur...
# advice-metadata-modeling
m
Hi we are going to ingest thrift file. I think current typing system is not enough to support thrift types. For example, the thrift union type has a name, while the datahub union type does not. The thrift enum type has a name as well as enum items, while the datahub enum type does not. Here is my proposal 1. We ingest thrift struct and union as Dataset. Because the Dataset is an entity, and it has fields. So it is very similar with thrift struct and union. 2. We will create an entity for the enum. The datahub enum is not a entity, so it cannot be reference. However, we need urn for other types to reference. 3. We will create the
ThriftSchema
as a new kind of platformSchema. In the ThriftSchema we have name, index, fields and annotations which stores the additional information than the DataSet. 4. In the SchemaField.nativeDataType we will fill either a. primitive types like
bool
,
i64
,
string
b. type references like
urn:li:thrift:enum:my_namespace:my_enum
c. composite types like
list<i64>
,
map<string, list<i64>
,
list<urn:li:thrift:enum:my_namespace:my_enum>
Any suggestions? Thanks cc @mammoth-bear-12532 @helpful-optician-78938 @orange-tailor-45265 @curved-librarian-24314
👍 1