Hi everyone! Hope you're all doing well. I have a small question about Crawlee.
My use case is a little simpler than a crawler; I just want to scrape a single URL every few seconds.
To do this, I create a RequestList with just one url and start the Crawler. Sometimes, the crawler returns HTTP errors and fails. However, I don't mind as I'm going to run the crawler again after a few seconds and I'd prefer the errors to be ignored rather than automatically reclaimed.
Is there a way of doing this?
h
Hall
09/09/2024, 4:43 PM
message has been deleted
n
nathanist
09/09/2024, 7:24 PM
You can simply set the maxRequestRetries option to 0:
Copy code
javascript
const crawler = new BasicCrawler({
maxRequestRetries: 0,
...
});
l
Led
09/09/2024, 8:11 PM
Maybe I misunderstood how the lib works, but wouldn't that just make the request go to failed status faster?
Correct me if I'm wrong, but what I understood is:
- Url is added to requests;
- If the request fails, it is retried up to ``maxRequestRetries`` times;
- If it still fails, it is marked as failed and can be reclaimed.