I am trying to create a webhook for a inbound API ...
# cfml-general
j
I am trying to create a webhook for a inbound API from a vendor. The vendor is posting a JSON string in the content body (
getHttpRequestData().content
) and one of the values in the JSON contains a UCS2 encoded string (essentially utf-16LE). Normally, if I am getting a utf-16 string in a form scope, I would simply call setEncoding("form", "utf-8") - and that's it. But I am having a heck of a time trying to convert the value coming from the deserializeJSON of the
getHttpRequestData().content
. I've tried UTF-16, UTF-16BE and LE. The JSON string is:
{"msg_id": "f25ac62a-e0a2-4445-9359-06858fd1833c", "message": "桥�?�?"}
I've tried the following:
Copy code
<cfset content = getHttpRequestData().content>

<cfset data = deserializeJSON(content)>

<cfset msg=  CharsetEncode(CharsetDecode(data.message, "UTF-16"), "UTF-8")>
<cffile action="append" file="#ExpandPath(".")#\callback.log"   output="#now()# UTF-16:#msg#"/>

<cfset msg=  CharsetEncode(CharsetDecode(data.message, "UTF-16BE"), "UTF-8")>
<cffile action="append" file="#ExpandPath(".")#\callback.log"   output="#now()# UTF-16BE:#msg#"/>

<cfset msg=  CharsetEncode(CharsetDecode(data.message, "UTF-16LE"), "UTF-8")>
<cffile action="append" file="#ExpandPath(".")#\callback.log"   output="#now()# UTF-16LE:#msg#"/>
Any ideas? The content (as utf-8) in the "message" is the word "hey"
c
Not sure if this would work or not, but can you decode the whole
getHttpRequestData()
.content before deserializing the JSON?
j
I did try that but that just messed up the rest of the JSON that was already in utf-8. Only a part of the JSOn is in utf-16le
Just incase it was a JSON parsing issue, I tried extracting the content from the JSON variable directly- same thing:
I found a solution. 1. Write the getHttpRequestData().content to a file with no charset set 2. load the file as binary 3. convert the binary to utf-8 4. Deserialize the data 5. Then use CharsetDecode UTF-16BE then reencode to UTF-8 This seems to be the only way I can get it to work
Copy code
<cfset content = getHttpRequestData().content>
<cffile action="write" file="#ExpandPath(".")#\rawdata.bin" output="#content#"/>
<cfset binaryDataRead = fileReadBinary("#ExpandPath(".")#\rawdata.bin") >
<cfset content = CharsetEncode(binaryDataRead, "UTF-8")>
<cfset data = deserializeJSON(content)>
<cfset msgutf16 = data.content>
<cfset msg=  CharsetEncode(CharsetDecode(msgutf16, "UTF-16BE"), "UTF-8")>
🤢 1
d
I was going to mention that it would be for all the content, not just a portion of it, which it looks like you have sorted out
j
Well I jumped the gun. It worked for some decoding, but it's not working for all tests. Arhg.
I was thinking of trying to read getPageContext().getRequest().getInputStream() next - which I've never done before.
d
I think it's just this:
Copy code
<cfset content = getHttpRequestData().content>
<cfset data = deserializeJSON(content)>
where you are doing implicit conversion to whatever the system is (via deserializeJSON). Instead of that, you want to always specify a charset when you're switching from binary (getHttpRequestData().content) to text. So theoretically do what you were doing but put this before the the
deserialzeJson
call
Copy code
<cfset json =  CharsetEncode(CharsetDecode(getHttpRequestData().content, "UTF-16"), "UTF-8")>
<cfset object = deserializeJSON(data)>
Also do you have your server set to UTF-8 as the default? Before on windows it would use the windows CP-1252 but I think everything is using UTF-16 now.
j
I tried that. The issue is the inbound request is coming in as content-type "application/json" (not utf-8) and somewhere (IIS or CF) is then treating the characterset accordingly which is making it a pain to try to convert.
d
If you do
isBinary(getHttpRequestData().content)
what does it say?
j
Something in the middle is manipulating the data so it won't properly decode. I feel confident trying to get the RAW data from getPageContext().getRequest().getInputStream() would work
That returns false.
because of the content-type directive, normally application/json isn't going to be binary. but I need to access it as binary to be able to convert it.. getPageContext().getRequest().getInputStream() seems to be the only way
d
That is a pretty common thing to have to do (using the inputStream) but I thought there was a way without dropping into java
Pretty sure you'd want to do any converting prior to the
deserializeJSON
call regardless
j
Agreed. I've never used getInputStream, know if there are any examples of using it out there? ChatGPT is struggling at it. lol
d
I'm certain there are with how often I was messing with the pageContext in the past lol, probably Ben has a nice blog post on it. 😄
j
Thanks - I'll search around. I think I've done it once before, I'll have to dig around
d
This is such a common problem for CF I'm surprised nobody has chimed in with the definitive solution
💯 1
The only thing to be careful about around the IO streams is closing and flushing. Otherwise they should work fine. But I'm almost certain there is a clean way of solving this with the encoding and charset functions available…
1
b
Looking at the code that works, I wondered what would happen if you did away with the file-write and file-read operations:
Copy code
<cfset content = getHttpRequestData().content>
<cfset content = charsetEncode(content, "UTF-8")>
<cfset data = deserializeJSON(content)>
<cfset msgutf16 = data.content>
<cfset msg = charsetEncode(charsetDecode(msgutf16, "UTF-16BE"), "UTF-8")>
d
Hmm, might need to do a
content.getBytes()
(since it's text vs. binary) for that
charsetEncode
but that should be the same, neh?