Hey all, a client of mine says they have RAW TEXT ...
# cfml-general
f
Hey all, a client of mine says they have RAW TEXT on the page and supposedly bots are hitting the page and scrapping email info and then spamming the users on the page.....besides CAPTCHA on the page (if that would even help), any ideas about how to keep bots from scrapping raw text on a cfml page.....thoughts....I have no ideas so reaching out to you all πŸ™‚
d
You can use services like cloud flare to block requests from certain places. But the bigger concern is why you don't have authentication in front of a page that has user data. Why is it just sitting out there for the world to read. Put it behind auth and use security to fix it.
f
@deactivateduser We said something about cloudFlare and they said no....now when you said behind auth, you mean for them to log in first? I don't know, but my guess is the raw text is on a generic page....
p
Not sure if using coldbox; but its prob got some code to help ya out
d
Well you probably shouldn't have customer information on a generic page unless you want that taken. While bot scripts have become sophisticated, they are still trying to hit a constantly moving and growing target.
f
@Patrick doubt it is using coldbox but can look at that code as well....will be great to see what other ideas come out
w
that bot session thing just sets bot session timeouts to be really low, it doesn't disallow them, so wouldn't prevent scraping
πŸ‘πŸ» 1
πŸ‘ 1
f
@deactivateduser Yeah, I was just hit up with this issue, and I did not have any good ideas so trying to see if there is some good ideas...
My Guess is this email information might be on a contact us page....which would be a page open to everyone...
w
you can disallow bots explicitly using rewrite rules, or barring that in application.cfc onrequeststart, but i think the bigger issue as someone pointed out is to have 'sensitive' data exposed on any page that isn't behind some type of login
d
Then your emails on your "public" page should not be user's emails but email addresses that are buckets and have strict spam policies.
f
@deactivateduser I will have to check into this and see what they are....
Also everyone, is using robot.txt or .httaccess a way to block bots...now I guess you block the bot after it hits your site, but that is a way to do it....right
b
You're not going to 100% block all bots from getting to a publicly accessible page. The best solution is to just not have PII publicly available. next best option is to hide behind authentication and the next best option is google captcha (they have a hidden version that users don't have to see or interact with. You might get away with that one)
robots.txt is only good for telling legit bots where to go and where not to go. Malicious bots will just use it to find out where they shouldn't be going and go there.
πŸ‘ 3
w
robots.txt is a suggestion, it's not a barrier. rewrite rules will take care of a lot of bots but not all. you could require the user to perform some activity, like logging in, and set a persistent cookie if successful. then you check that in onRequestStart and if it exists, they're not a bot, otherwise you set session low and prevent access to certain areas until that cookie has been set by some human-driven activity
p
Well honestly someone asking for your advice for a solution to the problem then telling you NO is clearly not looking for your advice πŸ™‚ (Cloudflare is so simple and would clear all this up for free)
πŸ‘πŸ» 1
πŸ‘ 4
f
This is just great info for me...so not sure what to do...we told them about cloudflare and captha, but they said no...
w
this is america, no means yes
p
I would ask for reasoning and not just take their hard-no response. So you can relieve their stress about cloudflare.
βœ… 1
b
message has been deleted
πŸ˜‚ 5
f
I am sending these ideas to my manager to find out why cloudflare is a no....
πŸ‘πŸ» 1
πŸ‘ 1
b
Good luck. Hopefully they come around.
w
you could just leave the emails in the source but change all their values to bhartsfield@aol.com
πŸ˜‚ 2
b
please do... that person uses my actual email address all the time
f
@websolete leave in source but change values to something else...hmmm or is this a joke for bhartsfield
b
he was 100% joking. that wont solve anything
w
pick the person in this thread that you like the least and use their email address. solved.
f
unless we tell the client that all eamils go to you and you forward them to the right people...LOL
w
emails shouldn't be plaintext on publicly available pages, full stop
πŸ›‘ 2
b
well... the person you like 2nd least... it wouldn't be fair for websolete to be that guy since it was his suggestion.
p
the default value is always websolete, we all know that here.
w
and short of displaying email addresses as images, not much else you can do. javascript magic? nope. file it under 'dumb ideas we didn't know were dumb' and change it.
f
@websolete I will post here when I get back info about why they said no to the simple ideas....
w
send them that meme pic bhartsfield posted above, it sums it up succinctly
f
LOL, my manager wore back and said their reasoning is simply because then know better than us....from what she can tell....shocking right
w
ok well, enjoy your email spam i guess
πŸ‘ 1
p
Yea, idiots....
f
She did say thanks for the ideas from everyone πŸ™‚
w
there's always the confundus charm
b
I guess you can just html encode all characters and hope the bots giving them grief are too stupid to figure it out.
w
except to decode is usually a one liner in most languages
b
yeah... barely better than nothing
websolete@myspace4ever.com
r
I am genuinely appreciating this thread. It’s good to know that my (work) life does not suck as much as some seem to. Close second, perhaps, but it’s good to be reminded that things are never so bad they can’t get worse.
🀣 1
b
Misery loves company
r
I had asked to have an emoji for my email address (think poop-smiley-sad@agency.org) in hopes of limiting work spam (and all email, in reality) but the request was declined fast enough that I don’t think they even really considered it on its true merits…
πŸ˜‚ 1
m
Just make sure that the pages have .cfm extensions. No self respecting bots will hit those πŸ˜‰
πŸ˜‚ 1
f
@Mark Takata (Adobe) LOL you are right I will let them know πŸ™‚
🀣 1
d
In addition to bhartsfield's suggestion, you could put parts of the address in separate html elements, e.g.
<span>noreply</span>@<span>gmail</span>.<span>com</span>
.
d
Be careful that may break accessibility standards as well as most likely will break linking in browsers. Thus making user experience poor.
d
How so?
I guess if you're comparing it to e.g.
<a href = "mailto: <mailto:noreply@gmail.com|noreply@gmail.com>">Send Email</a>
, then yeah, it will be less machine-readable (that's kind of the point). Of course, it would be better to just not allow bots to access the page in the first place.
p
A BOT is going to scrape it with all that extra code thus non-useable yes, but then your site is LESS useable from the user's standpoint not being able to click a link etc
d
Also a smart bot will strip out extra dom elements and recognize you have an email there
βœ”οΈ 1
d
Yeah, obviously, if you will accept nothing less than a traditional email link, like the one in my previous comment, then there is no point using any method of obfuscation.
A really smart bot will use OCR, so again, no method of obfuscation will stop it.
But stopping the dumb bots is better than nothing.
p
Sure, having a good IT team and email service will eliminate the issues entirely. πŸ™‚
d
Obscuring the address should be a last resort. The best option is obviously to not allow bots access to the page in the first place.
d
^^^ This. Don't make stuff public that you don't want public. This isn't a hard concept.
m
I had a form on my page when I ran my old consulting business. We used the most advanced CAPTCHA we could. After over 200 bot submissions I gave up and closed that feature. Several of the bot submissions actually advertised their "advanced bot features capable of defeating any CAPTCHA". The robots are winning y'all. Put it behind a gate.
b
We do what @bhartsfield suggested above, which is to encode the characters. You'd probably dissuade at least some of the more primitive bots from finding emails.
m
On some of my systems I use a contact Form and not expose the e-mail directly to the end users and then use a captcha and other security practices (rate limiting .. etc ) to secure that form. And so the user never can email the person directly but the person can if so desire reply to the e-mail. But in honesty the current addresses are out their and they will now be targeted until the end of time no matter what you do.
🎯 1