Redirect Control
# crawlee-js
n
Im trying to make a simple crawler, how do proper control the redirects? Some bad proxies sometimes redirect to auth page , in this case i want to mark the request as failed if the redirect URL ( target ) contains something like /auth/login. Whats the best to handle this scenarios and abort the request earlier?
h
Someone will reply to you shortly. In the meantime, this might help:
a
n
so each request is a session? say i send 3 urls to crawl would this mark them all as failed once the session is marked as bad? I think i might have explained myself incorrectly. This still lets the page navigate to the auth-login page, my question was if its possible to prevent a redirect on the main document and retire the session in case it is.
a
sessions defined by the session pool, so on blocking mark request session as "bad" to not continue with other requests if current one is blocked
o
You can do something like this:
Copy code
// Option 1: Use the failedRequestHandler
  failedRequestHandler: async ({ request, session, error }) => {
    if (error.message.includes('/auth/login') || request.url.includes('/auth/login')) {
      console.log(`Request redirected to auth page: ${request.url}`);
      // Mark the proxy as bad if you're using a session pool
      if (session) {
        session.markBad();
      }
      // You can retry with a different proxy if needed
      // request.retryCount = 0;
      // await crawler.addRequest(request);
    }
  },
  
  // Option 2: Handle redirects in the request handler
  requestHandler: async ({ request, response, $, crawler, session }) => {
    // Check if we were redirected to an auth page
    if (request.url.includes('/auth/login') || response.url.includes('/auth/login')) {
      console.log(`Detected auth redirect: ${response.url}`);
      // Mark the session as bad
      if (session) {
        session.markBad();
      }
      // Throw an error to fail this request
      throw new Error('Redirected to auth page');
    }
    
    // Your normal processing code if not redirected
    // ...
  },
  
  // Option 3: Use the preNavigationHooks for Playwright/Puppeteer
  preNavigationHooks: [
    async ({ request, page, session }) => {
      // Set up redirect interception
      await page.route('**', async (route) => {
        const url = route.request().url();
        if (url.includes('/auth/login')) {
          console.log(`Intercepted auth redirect: ${url}`);
          // Abort the navigation
          await route.abort();
          // Mark the session as bad
          if (session) {
            session.markBad();
          }
          throw new Error('Prevented auth page redirect');
        } else {
          await route.continue();
        }
      });
    }
  ],