I ve got a data migration project from a Google sheet where cfml #cfml-general

I've got a data migration project from a Google sh...

Travis

01/09/2025, 5:10 PM

I've got a data migration project from a Google sheet where the use of the single and double-quote characters is quite random. The service needs to standardize those characters so that dimensions can be parsed out.

Travis

01/09/2025, 5:12 PM

the use of charset decode/encode should assist with this, but I haven't found a combination that works yet.

Copy code

charsetEncode( charsetDecode(unit.size,'utf-8'), "us-ascii"))

What method should I be trying?

Travis

01/09/2025, 5:19 PM

Based on searches within Stackoverflow, it seems like there's really no automated way to do this. The fun part is that Lucee doesn't seem to consistently handle asc/chr conversions either. What a pain.

Travis

01/09/2025, 5:25 PM

I'm certainly open to ideas here. Having to manually catch/replace is silliness.

Dave Merrill

01/09/2025, 5:30 PM

This is probably obvious, and/or I'm misunderstanding the situation, but single and double quote characters are have different meanings in foot and inch measurements. Single quote is feet, double quote is inches.

Travis

01/09/2025, 5:43 PM

Indeed.

Travis

01/09/2025, 5:44 PM

But not all quote characters are the same... curly, angled, straight, and who knows what others...

Travis

01/09/2025, 5:45 PM

I'm looking for some way to convert the string so that all the silliness gets converted to straight.

Mark Takata (Adobe)

01/09/2025, 5:48 PM

Oh man, this is bringing back some horror stories. The number of "quote" types is wild. I used to have to parse Word docs, brutal stuff. Honestly, at the end, I wrote the mother of all regex statements to basically strip left/right "curly" whatevers, replace them with their "normal" equiv, and then bulk replace them in the strings as &QUOTE;, etc. It was ugly and I'm not proud of it but for the use case I had, it worked. If you're searching for elegance, you might have a tough time finding it 😞

Travis

01/09/2025, 5:48 PM

I was afraid of that.

Mark Takata (Adobe)

01/09/2025, 5:50 PM

I had a similar issue with carriage returns. shudder

Travis

01/09/2025, 5:50 PM

For all of the standards that have made this world a pleasure to work in, this is a significantly effect of over & under engineering the solution. So many calls for a way to map back to the "base character", and the problem is made worse with solutions like "unicode".

Patrick

01/09/2025, 5:56 PM

If you have a bulk of all the “known” variations that exist in your data you could create a map and then run something like normalizeQuotes() and have it basically always return the same value for the outputted data…or again do a bulk update of all the data

➕ 1

Travis

01/09/2025, 6:30 PM

Yeah, that's the crazy part... somewhere, that list of known variations exists. Why it's illusive is beyond me.

Tim

01/09/2025, 6:31 PM

https://util.unicode.org/UnicodeJsps/confusables.jsp?a=%22&r=None

Travis

01/09/2025, 6:32 PM

That link isn't working

Tim

01/09/2025, 6:33 PM

hmm

Tim

01/09/2025, 6:34 PM

it worked for me when I clicked it from StackOverflow. But when i click that one, it's not...

Tim

01/09/2025, 6:34 PM

It's linked in the first answer here: https://stackoverflow.com/questions/18735921/are-there-different-types-of-double-quotes-in-utf-8-php-str-replace

Tim

01/09/2025, 6:34 PM

but that gives a list of all the quote-like things.

❤️ 1

Travis

01/09/2025, 6:36 PM

https://www.unicode.org/Public/security/8.0.0/confusables.txt

Travis

01/09/2025, 6:36 PM

Now to find a way to make that usable

Erin Brewer

01/09/2025, 7:52 PM

this might be a good use case for chatgpt, give it that list and ask it to extract/format it in a usable way (might have to give it some hints of what you want)

👍 1

Open in Slack

Previous Next