I have an issue with cfhttp not working in CF11 when request cfml #adobe

I have an issue with cfhttp not working in CF11 wh...

ryan

01/27/2023, 9:02 PM

I have an issue with cfhttp not working in CF11 when requesting a website as it appears that cloudflare is blocking the request, so we attempted to see what happens in CF18, and CF21. It throws an error in CF21, works in CF18. Would anyone know why this website does and does not work based on the version? I'm asking in hopes to find a solution for making it work in the current CF11 version, and I know CF11 is not supported. Please see the cffiddle below: https://cffiddle.org/app/file?filepath=79fca3da-74f0-4d11-82b0-7abfa15cd718/a56c60f[…]1b0-beab-bf8bd6ee6959/2e42e53c-6444-4c7c-b512-46872b76c187.cfm

websolete

01/27/2023, 9:14 PM

perhaps the default useragent string differs between the two versions

websolete

01/27/2023, 9:15 PM

try adding an explicit useragent attribute to the 2021 call

websolete

01/27/2023, 9:15 PM

www.useragentstring.com if you need one

websolete

01/27/2023, 9:15 PM

or your own custom one, but cloudflare may bitch about 'unknown' ones too

ryan

01/27/2023, 9:16 PM

ok, thanks, will try that

ryan

01/27/2023, 9:18 PM

It works on cf18 without a useragent

websolete

01/27/2023, 9:18 PM

oh duh, it says at the top to enable cookies

ryan

01/27/2023, 9:18 PM

yeah, i noticed that as well, but couldn't find anything on how to configure that within cfhttp

websolete

01/27/2023, 9:19 PM

make the 2018 call and grab the cookies it sets, add those to the 2021 call, i'd say

ryan

01/27/2023, 9:19 PM

ok, will do that now

ryan

01/27/2023, 9:19 PM

thanks

websolete

01/27/2023, 9:19 PM

cfhttpparam

ryan

01/27/2023, 9:31 PM

I added all of the cookies but the same result is still present between the versions

ryan

01/27/2023, 9:32 PM

Here is my current file if you would like to double check https://cffiddle.org/app/file?filepath=808b2064-d70d-42c6-8fc4-4cb692ee3ce5/2732b1c[…]077-88ca-7f2c0e55f25c/563b7167-9841-4935-9b0b-1d0f985a1fcc.cfm

websolete

01/27/2023, 9:34 PM

weird

websolete

01/27/2023, 9:34 PM

assuming you're trying to scrape this site? it's what i imagine they're protecting against

ryan

01/27/2023, 9:35 PM

is there a default useragent used by cfhttp if there is not a useragent defined?

ryan

01/27/2023, 9:36 PM

yes, attempting to scrape

websolete

01/27/2023, 9:36 PM

yes, i think it's something like ColdFusion

ryan

01/27/2023, 9:36 PM

it's just interesting that it works in some CF versions and not others

ryan

01/27/2023, 9:36 PM

it also works in Lucee

websolete

01/27/2023, 9:38 PM

pfft, worked on trycf the second time after i tweaked one char in the useragent: https://trycf.com/gist/649fc637efaf82e77db8a2cb1699eb3e/acf2021?theme=monokai

ryan

01/27/2023, 9:39 PM

no kidding

websolete

01/27/2023, 9:39 PM

the 'protection' seems inconsistent

ryan

01/27/2023, 9:39 PM

what engine version did you use in trycf?

websolete

01/27/2023, 9:39 PM

2021

ryan

01/27/2023, 9:39 PM

websolete

01/27/2023, 9:39 PM

let me make sure

websolete

01/27/2023, 9:40 PM

yep, worked for me with 2021

websolete

01/27/2023, 9:41 PM

fwiw, i added the trailing slash to the url.

ryan

01/27/2023, 9:41 PM

interesting

ryan

01/27/2023, 9:42 PM

I copied your cfhttp to cffiddle against 2021 and it still gets blocked

websolete

01/27/2023, 9:42 PM

you MAY want to look at semi-DIY scraping solutions like scrapy or the like. if scraping is mission critical to what you're doing then leveraging a service that can handle diffs and rerequests and whatnot might make sense

websolete

01/27/2023, 9:42 PM

may be an ip ban

websolete

01/27/2023, 9:42 PM

what if you run from local

ryan

01/27/2023, 9:42 PM

it's a different ip between cf versions on the same page in cffiddle?

websolete

01/27/2023, 9:43 PM

no idea, but certainly different from cffiddle to trycf

ryan

01/27/2023, 9:43 PM

hmm

websolete

01/27/2023, 9:43 PM

i've done some pet projects in the past using scrapy: https://scrapy.org/

👍🏼 1

websolete

01/27/2023, 9:44 PM

easy to get lost in there, but it's an option

ryan

01/27/2023, 9:44 PM

i'm not sure it is an ip block because i can easily make myself blocked and then if i change the config in cfhttp, it can be unblocked and work

websolete

01/27/2023, 9:45 PM

well, could be a combined footprint of multiple things, ip, useragent, method, url, params

ryan

01/27/2023, 9:45 PM

agree

websolete

01/27/2023, 9:45 PM

unclear how they have their cloudflare protection configured

ryan

01/27/2023, 9:46 PM

yeah, there are a lot of settings they can place for protection. I was reading through it earlier

ryan

01/27/2023, 9:46 PM

https://developers.cloudflare.com/firewall/recipes/challenge-bad-bots/

chris_hopkins

01/27/2023, 10:41 PM

They will do things like test for cookies, serve captchas loads of stuffs. The config on that sort of thing is vaaaaast.

ryan

01/27/2023, 10:42 PM

Yeah, it is just odd that some versions of CF works fine and others do not

ryan

01/27/2023, 10:43 PM

it's as if maybe each version of CF might be sending different headers and payloads

chris_hopkins

01/27/2023, 11:05 PM

They fingerprint like everything, they constantly have teams working on new models of scraping behaviour etc. Was seeing a new model based on audio devices installed (how they get that I don't have a clue) but they go way deep sometimes

ryan

01/28/2023, 3:35 PM

That's insane lol

ryan

02/02/2023, 1:32 PM

fwiw if anyone has this sort of problem, I came up with a solution in a round-about way that works wonderfully. Instead of spending time in attempting to figure out what exactly how to make the cfhttp work against what is deemed as a losing proposition against a team of screen scraping police at CloudFlare. I took the approach to abide by their rules. 1) I installed Google Chrome on the prod server and set Chrome up to always run at start up and to open the last tabs on startup. 2) Installed two plugins: Auto Refresh Plus plugin and Auto-Save plugin. Everytime the page refreshes, auto-save plugin saves the source file to a local directory on the prod server. 3) At this point, you can have the CF code grab the file to process the source html directly, but I took a different approach so that this entire configured system is more noticeable to other devs by creating a local IIS website that only points to the directory of the saved source html. Then, I have my cfhttp that was previously pointing to the public website that is being blocked to now point to the local IIS website. I only had to change the url address in the code. If anyone looks at this down the road for updating, they will see that there are two websites running which will hopefully send a flag in their mind (other than the documentation regarding this) that there is more to this that needs to be set up. Hopefully, too, they will see that the Chrome website is up and running as well. This has been running for about 5 days flawlessly and next time this happens, I will be taking this same approach without thinking twice, because even IF I did figure out the problem to make

cfhttp

work with CF11, the next upgrade may not work and may not work for separate reasons. My approach should always work since it is an actual web browser to only view the webpage. [wink wink]

2 Views

Open in Slack

Previous Next