I've got a data migration project from a Google sh...
# cfml-general
t
I've got a data migration project from a Google sheet where the use of the single and double-quote characters is quite random. The service needs to standardize those characters so that dimensions can be parsed out.
the use of charset decode/encode should assist with this, but I haven't found a combination that works yet.
Copy code
charsetEncode( charsetDecode(unit.size,'utf-8'), "us-ascii"))
What method should I be trying?
Based on searches within Stackoverflow, it seems like there's really no automated way to do this. The fun part is that Lucee doesn't seem to consistently handle asc/chr conversions either. What a pain.
I'm certainly open to ideas here. Having to manually catch/replace is silliness.
d
This is probably obvious, and/or I'm misunderstanding the situation, but single and double quote characters are have different meanings in foot and inch measurements. Single quote is feet, double quote is inches.
t
Indeed.
But not all quote characters are the same... curly, angled, straight, and who knows what others...
I'm looking for some way to convert the string so that all the silliness gets converted to straight.
m
Oh man, this is bringing back some horror stories. The number of "quote" types is wild. I used to have to parse Word docs, brutal stuff. Honestly, at the end, I wrote the mother of all regex statements to basically strip left/right "curly" whatevers, replace them with their "normal" equiv, and then bulk replace them in the strings as &QUOTE;, etc. It was ugly and I'm not proud of it but for the use case I had, it worked. If you're searching for elegance, you might have a tough time finding it 😞
t
I was afraid of that.
m
I had a similar issue with carriage returns. shudder
t
For all of the standards that have made this world a pleasure to work in, this is a significantly effect of over & under engineering the solution. So many calls for a way to map back to the "base character", and the problem is made worse with solutions like "unicode".
p
If you have a bulk of all the “known” variations that exist in your data you could create a map and then run something like normalizeQuotes() and have it basically always return the same value for the outputted data…or again do a bulk update of all the data
1
t
Yeah, that's the crazy part... somewhere, that list of known variations exists. Why it's illusive is beyond me.
t
That link isn't working
t
hmm
it worked for me when I clicked it from StackOverflow. But when i click that one, it's not...
but that gives a list of all the quote-like things.
❤️ 1
Now to find a way to make that usable
e
this might be a good use case for chatgpt, give it that list and ask it to extract/format it in a usable way (might have to give it some hints of what you want)
👍 1