hi, could datahub UI search find tables whose docu...
# all-things-deployment
e
hi, could datahub UI search find tables whose document or field description contain the query
1
🧠 1
o
Yes, these are searchable fields that are by default queried when doing a search
e
@orange-night-91387 I tried, but it won’t give me the table whose field description contains my query. I wonder if it needs to be restricted in english language, because I wrote my description in Chinese. thanks
a
What description did you write? Was it more than 3 characters and was your search query also more than 3 characters?
e
@orange-night-91387 a field description is
日频k线的开盘价格
, and my query is
开盘
. besides, I tried to use a query with more than 3 characters e.g.
开盘价格
. it still returns nothing.
and I test the query in English. it works, it could find the table whose column descriptions contain the query.
or is there any option to set in datahub/elastic search that make them support utf-8 characters
a
Ah, so this is a limitation on how we're dividing "words."
日频k线的开盘价格
is treated as a single "word" and within a word we are only doing prefix based fuzzy matching. So for example a query for
日频k线
should your dataset, but
开盘价格
would not because it's a partial match at the end of the "word"
I can definitely understand why this wouldn't be an ideal tokenization strategy for Chinese where each character can really be a full word or even more on its own, but would probably require custom tokenization strategies configured in ElasticSearch by language
This will require in-depth knowledge of ElasticSearch tokenization & analyzers though to get working in a desirable way
o
What you'll probably want here is full ngram matching instead of just prefix, we don't do this by default because it is very expensive to both performance and ElasticSearch index size.
e
thanks for your help. I am trying on elastic tokenization these days but I am not working out on it till now.