I have an issue with cfhttp not working in CF11 wh...
# adobe
r
I have an issue with cfhttp not working in CF11 when requesting a website as it appears that cloudflare is blocking the request, so we attempted to see what happens in CF18, and CF21. It throws an error in CF21, works in CF18. Would anyone know why this website does and does not work based on the version? I'm asking in hopes to find a solution for making it work in the current CF11 version, and I know CF11 is not supported. Please see the cffiddle below: https://cffiddle.org/app/file?filepath=79fca3da-74f0-4d11-82b0-7abfa15cd718/a56c60f[…]1b0-beab-bf8bd6ee6959/2e42e53c-6444-4c7c-b512-46872b76c187.cfm
w
perhaps the default useragent string differs between the two versions
try adding an explicit useragent attribute to the 2021 call
www.useragentstring.com if you need one
or your own custom one, but cloudflare may bitch about 'unknown' ones too
r
ok, thanks, will try that
It works on cf18 without a useragent
w
oh duh, it says at the top to enable cookies
r
yeah, i noticed that as well, but couldn't find anything on how to configure that within cfhttp
w
make the 2018 call and grab the cookies it sets, add those to the 2021 call, i'd say
r
ok, will do that now
thanks
w
cfhttpparam
r
I added all of the cookies but the same result is still present between the versions
w
weird
assuming you're trying to scrape this site? it's what i imagine they're protecting against
r
is there a default useragent used by cfhttp if there is not a useragent defined?
yes, attempting to scrape
w
yes, i think it's something like ColdFusion
r
it's just interesting that it works in some CF versions and not others
it also works in Lucee
w
pfft, worked on trycf the second time after i tweaked one char in the useragent: https://trycf.com/gist/649fc637efaf82e77db8a2cb1699eb3e/acf2021?theme=monokai
r
no kidding
w
the 'protection' seems inconsistent
r
what engine version did you use in trycf?
w
2021
r
ok
w
let me make sure
yep, worked for me with 2021
fwiw, i added the trailing slash to the url.
r
interesting
I copied your cfhttp to cffiddle against 2021 and it still gets blocked
w
you MAY want to look at semi-DIY scraping solutions like scrapy or the like. if scraping is mission critical to what you're doing then leveraging a service that can handle diffs and rerequests and whatnot might make sense
may be an ip ban
what if you run from local
r
it's a different ip between cf versions on the same page in cffiddle?
w
no idea, but certainly different from cffiddle to trycf
r
hmm
w
i've done some pet projects in the past using scrapy: https://scrapy.org/
👍🏼 1
easy to get lost in there, but it's an option
r
i'm not sure it is an ip block because i can easily make myself blocked and then if i change the config in cfhttp, it can be unblocked and work
w
well, could be a combined footprint of multiple things, ip, useragent, method, url, params
r
agree
w
unclear how they have their cloudflare protection configured
r
yeah, there are a lot of settings they can place for protection. I was reading through it earlier
c
They will do things like test for cookies, serve captchas loads of stuffs. The config on that sort of thing is vaaaaast.
r
Yeah, it is just odd that some versions of CF works fine and others do not
it's as if maybe each version of CF might be sending different headers and payloads
c
They fingerprint like everything, they constantly have teams working on new models of scraping behaviour etc. Was seeing a new model based on audio devices installed (how they get that I don't have a clue) but they go way deep sometimes
r
That's insane lol
fwiw if anyone has this sort of problem, I came up with a solution in a round-about way that works wonderfully. Instead of spending time in attempting to figure out what exactly how to make the cfhttp work against what is deemed as a losing proposition against a team of screen scraping police at CloudFlare. I took the approach to abide by their rules. 1) I installed Google Chrome on the prod server and set Chrome up to always run at start up and to open the last tabs on startup. 2) Installed two plugins: Auto Refresh Plus plugin and Auto-Save plugin. Everytime the page refreshes, auto-save plugin saves the source file to a local directory on the prod server. 3) At this point, you can have the CF code grab the file to process the source html directly, but I took a different approach so that this entire configured system is more noticeable to other devs by creating a local IIS website that only points to the directory of the saved source html. Then, I have my cfhttp that was previously pointing to the public website that is being blocked to now point to the local IIS website. I only had to change the url address in the code. If anyone looks at this down the road for updating, they will see that there are two websites running which will hopefully send a flag in their mind (other than the documentation regarding this) that there is more to this that needs to be set up. Hopefully, too, they will see that the Chrome website is up and running as well. This has been running for about 5 days flawlessly and next time this happens, I will be taking this same approach without thinking twice, because even IF I did figure out the problem to make
cfhttp
work with CF11, the next upgrade may not work and may not work for separate reasons. My approach should always work since it is an actual web browser to only view the webpage. [wink wink]