I found something interesting about non-English ch...
# cfml-general
t
I found something interesting about non-English characters and how they are searched & matched using CF function https://trycf.com/gist/f9444ec0e0237046e484a520e64bc717/lucee5?theme=monokai So in some versions of Lucee & ACF, ASCII char #383 (which is " ſ ") is the same as English letter "s" & "S" (yes, it matches both lower case & upper case)
Lucee 5.4.3+2 does not have this issue, and we discovered it after we upgraded to 5.4.6+9 In fact, browser such as Chrome and VSCode (I think vscode is Electron underneath, so basically a browser too) behave likes this too. I copy that character " ſ " and do a text search in Chrome & vscode, and they found all the lower & upper S
Screenshot 2024-08-09 at 2.27.19 pm.jpg
Nothing major, not an issue as well, just thought this is interesting and maybe people work with other language know why Lucee/ACF & browser behave likes this
t
(me again, posting from my personal account) Testing chr(383) eq chr(83) - upper "S" ==> true Testing chr(383) eq chr(115) - lower "s" ==> true https://trycf.com/gist/449b2aa9fba7085493b0d63fd194a172/lucee5?theme=monokai From tryCF.com; cfml engines saying chr(383) eq to "s" / "S" • Lucee 5 • ACF 11, 2016, 2018, 2021, 2023 cfml engines saying chr(383) neq to "s" / "S" • Lucee 4.5 • ACF 10
Alright, this is probably the closest reason I found why some programming languages treat chr(383), which is called "Long S", equal to char "S". Because it is indeed letter "S", in old English writing https://www.livescience.com/65560-long-s-old-texts.html
I feel like I am studying some history through programming...
Did some simple PHP and JS test online and they both say chr(383) != "S"
a
FYI BoxLang returns false for
echo( char(383) == "S" )
d
See what this line does instead: (note the case-sensitive nature):
result = Replace(test, "#chr(383)#", "_", "all");
https://trycf.com/gist/c3f61f154fafe0ccab4c7e1940d6e4c0/lucee5?theme=monokai
You can maybe try the same stuff in the IDE (as in make the search case-sensitive)
Some interesting related-ish history of why this is the way it is in chromium (note FIrefox has a far superior search 🧐😃) https://issues.chromium.org/issues/40518349
just in general I think anything in the extended ascii range (anything above 128) is going to do weird stuff depending on encoding/charsets/locale
t
Yes, I noticed if using case-sensitive search then VSCode won't match chr(383) to "S" & "s". And we indeed changed our replaceNoCase() to replace() when substituting these characters. Thanks for the link to that chromium ticket. I didn't know there are so many characters behave the same in Chrome. And yes I agree Chrome needs a case-sensitive search likes Firefox, so ppl using these languages & characters can control how they want to do text search
👍 1
💯 1
j
I would submit a bug report to Lucee for 5.4.6+9. That’s a breaking change that would effect any kind of UTF-8 string comparison
d
It seems to behave the same in all of them
Which is actually what I was expecting, I never tested them before tho
t
I think there might reasons that ReplaceNoCase(text, chr(383), "_", "all") replacing all "s" & "S" with "_" ? As per the article I posted earlier, chr(383) is letter S; but it is not lower case "s" nor it is a upper case "S". That's why Replace(text, chr(383), "_", "all") won't replacing any "s" or "S" But the case that chr(383) == chr(83) and chr(383) == chr(115) I would consider that a bug.
☝️ 1
And maybe this behaviour is from the Java level
d
I think it has to do with how ascii works when you mix it with extended ascii kinda?
more capable text methods shouldn't have this problem afaik
Next you guys will say that
"S" == "s"
should be considered a bug 😜
To remind us how fun dynamically-typed languages are around strings (depending on how you do stuff in JS you will see similar behavior): https://trycf.com/gist/1c461125d6ddc8b255bb50f88c94656e/lucee5?theme=monokai
It is just one of the foot-guns that comes with the territory. I agree it could be less foot-shooting-prone… but what does that do for compatibility? Maybe you tighten it up a bit and add an option to the engines for "relaxed" or "more wat" mode? 😃
t
asc(test_S) == asc(test_s) --> true
this is my bad, I was naming the variable
test_S
and
test_s
thinking CF supports case-sensitive variables. But
"s" == "S" --> true
is new to me. I have always thought string equality test is case sensitive and always do this
lcase(var1) eq lcase(var2)
😅 Guess I should always test, don't assume, even as simple as this
d
LOL & OMG! I totally forgot that variables are case-insensitive as well! If I had been running a good linter on it, maybe I'd have been alerted to that common mistake! 😉
It is real common to uppercase or lowercase when comparing strings, which can result in funny stuff when some characters are converted— as was demonstrated with ſ (the history of text, or what we would think of as "simple strings", is an entire case-study in computers as the links I posted cover) And I think that is part of why there is a surprising result for
==
in CF (history!) as it came from
eq
, which wasn't exactly what other languages would think of as
==
, if that makes sense.
For Lucee at least, you can do
'S' === 's'
to get what you would expect. (for some definition of expect, lol)
👍 1
Oh that's interesting, Lucee says false for
'S' === 's'
but ACF (from 2018, when they added it (after Lucee), IIRC) says true
Maybe we need
====
🤪
🤣 1
a
There's always been
compare
and
compareNoCase
which has been the recommended way of comparing strings where casing matters. CFML has always been positioned as a case-insensitive language, so there should be no surprises that the equality operators behave that way. As for
===
behaviour, I believe Lucee's implementation is bugged in its design (as well as its incompat with CF) in that it's doing an identity check, not following the CFML rules which is "check type then check value", where in the case of strings "check value" is case-insensitive. Compare: https://trycf.com/gist/f66fdce2282956978e67f50edf9ac430/lucee5?theme=monokai When using string literals, the strings will equal due to how Java stores string instances; but if the values are created dynamically,
===
is no help to you.
d
Good point about Lucee's being an identity check! I think they took inspiration from JavaScript. Is there any way to check that the object is the same in Adobe's version? FWIW if I'm thinking "strict equality" then I'm thinking "S" != "s" and although Lucee's version has its own way of footgunning you, I like it better than the Adobe footgun. (Also, if a feature has been in Lucee for years, and then Adobe adds it, I personally think it would behoove Adobe to make their implementation compatible with Lucee's. But I digress.) I like the logic on display here! It's a valiant argument for Adobe's implementation, but I don't think it is a strong one, since it's supposed to be "strict", and if case-insensitive is "strict", it makes my joke of
====
not so funny (like what would be a shorthand for stricter-strict equality?) There's a reason JS added
===
, and I think it's the same reason CF needed it (dynamic types), so of the two implementations I'd take Lucee's, as it's closer to filling the actual need.
LOL, of course Ben covered it in his blog! So much for my JS logic 😃 https://www.bennadel.com/blog/3775-exploring-the-triple-equals-operator-in-lucee-cfml-5-3-4-77.htm