Is there some process inside the CF engine which i...
# cfml-general
a
Is there some process inside the CF engine which is automatically converting an apostrophe into &pos; without our input? Is CFQUERYPARAM doing this? Is the Global Script Protection doing this? Does a CF Function do this? OR is the SQL Server doing this? Legacy CF11 code, O'Malley in the string becomes O'Malley in the table. Can't find where. Sad.
b
You may have some sort of interceptor or tool which is pre-emptively looping over your form or url scopes and attempting to "sanitize" data
This had become a popular practice for a while to avoid XSS attacks, but it's quite a blunt approach which can cause issues as you're seeing
To answer your original question, if a form is submitted with a single quote and you're taking
form.name
and inserting it into the DB, there's nothing that would encode that out of the box in CF
a
Yep, that's what I thought. But I can't find any hints - in the CF part anyway. It's a function argument that has an apostrophe when I dump it at the top of the function, and then ' when it gets into the table.
b
I'd step through and see where it changes then
Is this a ColdBox app?
a
Nope, I dump it right before the cfquery insert. So the only suspect I have is the cfqueryparam
I suspect trolls.
🤣 1
b
How are you viewing the data? Are you looking at it directly in the DB or via some select that outputs it?
a
dBeaver
so a jdbc pull bascially
b
I assume dbeaver is showing you what's really there. Do you have SSMS installed?
a
I should note that it happens before the INSERT, because this whole thing started with a truncate error
b
You can run a trace on your DB while performing the insert to see how the data is actually coming across
a
the string grew too long due to the &apos, so the SQL choked
b
Can you show the cfquery that performs the insert?
a
Lemme see...
New Text Document.txt
b
Which column?
a
xml file gets parsed... I cfdump the node right before the insert, and it's got ' then the insert complains that "'" is too long
b
I also assume you removed all the
<
and
>
from that code?
a
UIDDesc
I'm looking at this CF11 file for the first time actually
b
Can you dump
Copy code
left(trim(xmlVendor.VENDOR_SITE_CODE.xmlText),20)
directly
a
That's the one. It dumps as ARIANA L. D'AGOSTINO
Then the SQL server tells me that ARIANA L. D&apos;AGOSTINO is too long
b
It's possible the error message is simply being encoded
ACF used to do that
What is the actual column of the DB?
a
When I shorten it to ARIANA L. D'AGO, then the DB successfully saves it as ARIANA L. D&apos;AGO
m
are you using encodeForHtml somewhere before it, or have it within a cfoutput with the attribute encodefor="html" ?
a
Hey, could the DBA have put a trigger on this thing?
b
Are there any INSERT triggers on the DB table?
jinx šŸ™‚
I've seen DBAs do that before
have it within a cfoutput with the attribute encode="html" ?
That's an interesting theory. Did CF11 support that? You could try outputting and aborting prior to the cfquery to test the cfoutput theory, though I don't know to what degree CF "inherits" cfoutput encoding from "higher up"
m
looks like 2016 for the attribute, so no
a
Don't see any html encoding here. This is pretty simple 10+ year old stuff I'm stuck with
b
yeah, just found that too, so shouldn't apply to CF11
a
The main takeaway is that CF is not "helping me" by doing this by some grand design
b
The two suggested steps left to explore are • run a trace on the DB to capture the exact text being passed in • look for triggers on the table
a
Yes, thanks. Similar jynx again. I'll go resurrect the DBA with some voodoo or see if they have a new one yet.
e
what if you write the field value to a text file before the DB insert, just to make sure its a ' before it gets to cfquery
a
This just in, I'm an idiot. As you were...
šŸ˜† 1
r
@alholden you are not going to tell us the mistake?
a
I was adding an element of suspense... It was only after obtaining an actual copy of the source XML from the 3rd party that I could look at it in plain text. And of course, as it's probably in the xml spec.. the value was indeed encoded (&pos;) all along at the source. I was just unable to see that by using any of my go-to tools, and assumed the parser was decoding it back again because that's the only version I saw displayed to me.
šŸ‘ 2
Therefore, any programming challenge that starts with "assumption" will ultimately end with "idiot" šŸ˜‰
b
@alholden You can encode an apostrophe as
&apos;
in the XML, but once parsed, it won't be there. The parsing will decode that back into a
'
Copy code
myXML_encoded = "<root>O&apos;Reilly</root>";
The only way the actual
XMLText
of a node would still have the
&apos;
is if the original source XML was double encoded like so:
Copy code
myXML_double_encoded = "<root>O&apos;Reilly</root>";
This example shows the raw contexts of the parsed xml text in each case https://trycf.com/gist/b30908dff1d96d5295a44465c9b12b52/acf11?theme=monokai
Are you saying the XML from the source looked like example 1 or example 2?
Only example #2 would give you encoded XML entities in your DB without an extra decoding step, but that would be a rather curious way for your 3rd party to build their XML (read: "wrong" šŸ™‚ )
a
The source from the 3rd party had this field encoded once - so example 1 I was going to haul out a library like Apache text to decode these things before any DB work, but then I realized this 1 problem field is a proper noun (making apostrophes the only likely candidate), and I realized I'm lazy. So this has pretty much fixed it so far.
<cfqueryparam _cfsqltype_="cf_sql_nvarchar" _value_="#*left*(*trim*( *replace*(xmlVendor.VENDOR_SITE_CODE.xmlText,"&apos;","'","all") ),20)#" />
(the target column is a nVarchar) The queryparam seems to handle the apostrophe as part of the INSERT for me.
b
Hmm, a single encoding should not be a problem
If you run that trycf code I linked to, it's using CF 11 and it properly decodes the
&apos;
back to a
'
as it should
You shouldn't need to manually decode anything once CF has parsed the XML
a
I thought that HTML based encoding was just all about %hex number%
b
For the same reason you don't have to manually decode things like
%25
in a query string. CF/the servlet automatically decode those for you when they parse it.
&...;
is an encoded XML/HTML entity
There are pre-made ones for common chars such as
&apos;
OR you can litearlly encode any possible ASCII char as
&#65;
which would be an
A
a
It's 2 encodings in 1! Now how much would you pay!!
b
But all of that is beside the point. The XML Parsing step is responsible for decoding all of that. By the time you receive the parsed XML, all that should be gone
Another example is JSON encoding. When you parse the JSON
"foo\"bar"
you don't need to manually replace the
\"
with
"
as the JSON parsing decodes that for you.
So for a single encoded XML entity, you should never be seeing it in the actual parsed object.
a
I don't know why it wasn't gone. It was a single
&apos;
when I opened it in Notepad, and later when it got inserted.
b
Only if you were dealing with the raw XML string would you need to deal with that
a
xmlVendor
in the snip above is the parsed object. It's a mystery.
b
I think you still have something funky going on if you ask me. The CF11 on tryCF doesn't exhibit the behavior you're seeing.
The XML is decoded properly
a
And in a weird way we've circled back to my OP
Here's the original dev's snip, after the file read
<cfset xmlDoc = *xmlparse*(vendorFileXML)>
<cfset arRows = xmlDoc.XMLRoot.XMLChildren> _<!--- each array element contains the XML for one UID --->_
<!--- loop over array of vendors rows (the array contents are xml)) --->
<cfloop _from_=1 _to_="#*arrayLen*(arRows)#" _index_="ii" >
<cfset xmlVendor = arRows[ii]>
and so on...
I don't know what the last update was to this CF11 marvel.
b
11.0.19+314546
is the latest patch level
You can dump out
server.coldfusion
to see where you're sitting
Doesn't seem to make any difference accessing the node text via the xml children in a loop https://trycf.com/gist/7739746651c7bd6923b20ca698765f1d/acf11?theme=monokai
Didn't you say that the
len()
of the string reported a length that would imply a
'
?
a
Sorry, gotta run. Another client is getting an "IoError 2032" when trying to generate any of their PDFs via cfreport... now that's wierd.