Today s fun exercise We have a vendor to whom we send off lo cfml #cfml-general

Today's fun exercise: We have a vendor to whom we ...

sknowlton

05/25/2022, 3:01 PM

Today's fun exercise: We have a vendor to whom we send off lots of user data. Turns out they can't accept any unicode characters. So somebody whose name is Tomáš breaks the universe. Anybody ever have to convert Tomáš to Tomas in CF before?

😲 1

bdw429s

05/25/2022, 3:08 PM

I've seen some libraries that basically map a huge list of unicode chars to the closest ascii equiv, but I don't recall where they were at the moment

sknowlton

05/25/2022, 3:08 PM

Ugh. Yeah I guess that's what we need. I'll check out Normalize too. Thanks guys

bdw429s

05/25/2022, 3:09 PM

You can probably find a list somewhere online and just place them i a struct with the uyicode char as the key and the ascii replacement as the value and just do a lookup and replacement char by char.

sknowlton

05/25/2022, 3:09 PM

love it when the billion dollar corporations flunk data architecture 101

bdw429s

05/25/2022, 3:09 PM

lol, yeah it's strange you have to deal with this in 2022

bdw429s

05/25/2022, 3:09 PM

Probably a mainframe

bdw429s

05/25/2022, 3:09 PM

i've dealt with a couple places using AS/400's and their DB2 DB wouldn't do unicode at all

bdw429s

05/25/2022, 3:10 PM

Let me guess, do they store everything in upper case? 🙂

sknowlton

05/25/2022, 3:10 PM

not a mainframe. SOAP-based API built in 2014!

sknowlton

05/25/2022, 3:10 PM

who in 2014 was like 'yeah XML is the best'

Adam Cameron

05/25/2022, 3:14 PM

No excuse even for a SOAP API built in 2014. Supporting unicode was well-trod ground even then. It's just shit implementation.

sknowlton

05/25/2022, 3:15 PM

I've rewritten by email acknowledgment of 'no unicode characters allowed' several times to try and make my reply something other than 'it's just shit implementation' but not having much luck yet

sknowlton

05/25/2022, 3:15 PM

api docs are even better. it's a windows help file

🤢 1

Adam Cameron

05/25/2022, 3:15 PM

hahaha

Adam Cameron

05/25/2022, 3:15 PM

😞

bhartsfield

05/25/2022, 3:16 PM

We still have one of those floating around here somewhere too (predates me)

bhartsfield

05/25/2022, 3:19 PM

You could always do something like...

string.reReplace( "[^\x00-\x7F]+", "__ONLY SUPPORTED IN THE FUTURE__", "all" )

sknowlton

05/25/2022, 3:20 PM

people gonna be mad when whey find out they just signed up their kid, ONLY SUPPORTED IN THE FUTURE, to play soccer

sknowlton

05/25/2022, 3:21 PM

plus what does 'the future' even mean when you're dealing with SOAP in 2022

bdw429s

05/25/2022, 3:21 PM

Little Bobby Tables is playing goalie this year, I hear

👍 2

bhartsfield

05/25/2022, 3:21 PM

lol

bhartsfield

05/25/2022, 3:21 PM

"Tomáš".reReplace( "[^\x00-\x7F]+", "[NOT 'MERICAN]", "all" )

sknowlton

05/25/2022, 3:24 PM

the best part is where they claimed this limitation was due to ...PCI-DSS compliance

eye roll 1

sknowlton

05/25/2022, 3:24 PM

"deploy acronym buzzword defense shield"

sknowlton

05/25/2022, 3:26 PM

I kind of want to write back and say 'hey guys, I'm literally on a COLDFUSION SLACK CHANNEL and we're talking shit about your ancient, dumb ways, make of that what you will'

😆 2

sknowlton

05/25/2022, 3:28 PM

thanks to James Moberg and https://dev.to/gamesover/convert-unicode-strings-to-ascii-with-coldfusion-junidecode-lhf - figures the solution would be a Java port of a damn PERL module

😀 1

Matt Jones

05/25/2022, 3:58 PM

we have had ok luck using java.text.Normalizer

sknowlton

05/25/2022, 3:59 PM

that's the first step in James' solution above

Matt Jones

05/25/2022, 4:18 PM

Copy code

public string function stripAccents(string input="") {
	var pattern = createObject("java","java.util.regex.Pattern").compile("\p{InCombiningDiacriticalMarks}+");
	var Normalizer = createObject("java","java.text.Normalizer");
	var decomposed = Normalizer.normalize(trim(input), createObject("java","java.text.Normalizer$Form").NFD);
	return trim(pattern.matcher(decomposed).replaceAll(""));
}

Jochem

05/25/2022, 4:45 PM

The first question would be "which charsets do they accept?". The example Tomáš would fit in win-1251 I think (basically the East European default Latin charset).

sknowlton

05/25/2022, 4:47 PM

ASCII.

Jochem

05/25/2022, 4:51 PM

Which ascii? 6 bit? 7 bit? 8 bit?

sknowlton

05/25/2022, 4:52 PM

Looks like 7

sknowlton

05/25/2022, 4:52 PM

they don't exactly want to talk about it at length. Can't imagine why

Jochem

05/25/2022, 4:57 PM

My guess: because they have several different systems that each use their own 8-bit charset, and then they decided that using just the ascii half of it would work because that was common between all the systems.

Jochem

05/25/2022, 4:58 PM

But in that case I don't have any suggestions that haven't been mentioned yet.

5 Views

Open in Slack

Previous Next