Robots.txt / indexing problem with site running as pages Cloudflare #workers-help

Robots.txt / indexing problem with site running as...

D I

05/20/2023, 10:51 PM

Running into a situation where Google search is unable to index the site running as pages. Robots.txt renders as index.html for some reason. I did create explicit robots.txt in my repo and I'm pretty sure it's propagating to pages - anyone know what might be happening?

James

05/20/2023, 10:54 PM

Do you have a 404.html? If not, any page that 404s will render the index.html, for SPA behaviour. That won't prevent Pages from server a file though. Where is your robots.txt? Can you provide a link to it and/or your repo for us to see?

D I

05/20/2023, 11:09 PM

Thanks

D I

05/20/2023, 11:10 PM

https://cdn.discordapp.com/attachments/1109614324768592003/1109619292615475251/image.png▾

James

05/20/2023, 11:25 PM

Your robots.txt should be in your output folder. I imagine that’s public or dist?

James

05/20/2023, 11:25 PM

The same for the index.html too

Isaac McFadyen | YYZ01

05/21/2023, 12:07 AM

From what it looks like you're running Vite (or a framework that uses Vite). The way Vite works is that it copies everything from public -> dist at build time so your robots.txt needs to go in public/ as James says. Side note - I'd recommend against committing dist (add it to your .gitignore), those generated files change often and will slow down git in the long run.

D I

05/21/2023, 1:55 PM

Thanks, you’re right - robots doesn’t seem to make it to public or dist. If I remove dist from the github, will it get generated by the cf pages?

Isaac McFadyen | YYZ01

05/21/2023, 2:01 PM

Yes, it should.

Isaac McFadyen | YYZ01

05/21/2023, 2:01 PM

As long as you have the build command set correctly (

npm run build

I assume but it might be different for you).

D I

05/21/2023, 2:13 PM

Got it, I'll try- thanks!

Previous Next