remove uniqueKey from queue blacklist
# crawlee-js
c
Hi all, Im scraping a weird website which has file attachment links which rotate every 5 minutes or so, eg https://dca-global.org/serve-file/e1725459845/l1714045338/da/c1/Rvfa9Lo-AzHHX0NYJ3f-Tx3FrxSI8-N-Y5ytfS8Prak/1/37/file/1704801502dca_media_publications_2024_v10_lr%20read%20only.pdf everything between
serve-file
and
file
changes regularly. My strategy to deal with this is to calculate the unique key based on the 'stable' parts of the url. Then when i detect the url has changed, I can remove any queued requests with the unique key and replace them with the new url My question is, if a request has hit its retry limit and has been 'blacklisted' from the request queue, how can i remove it so the new url can be processed? Thanks!
h
message has been deleted
p
Hi @Crafty , You can remove request from RequestQueue based on its
id
through the API. If you know only
uniqueKey
, then I suggest you to add new
Request
to the
RequestQueue
with the same
uniqueKey
-> It will end with response aving attribute
wasAlreadyPresent
set to true , but you should also obtain the stored Request data (with
id
). When you have the Request
id
, you may do DELETE Http Request see https://docs.apify.com/api/v2#tag/Request-queuesQueue/operation/requestQueue_request_delete to delete it.
c
hi @Pepa J , thanks for the help. I an actually working with only crawlee and not apify but i found a method along the same lines. May i suggest a feature for ethier the request queue or request queue client to more easily query a request from its uniqueId?
Copy code
typescript
    const requestQueue = await crawler.getRequestQueue();
    const result = await requestQueue.addRequest({url: 'https://google.com', uniqueKey: 'aaa', label: 'secondary'})
    log.info('result', result)
    if (result.wasAlreadyPresent) {
      log.info('already present')
      const request = await requestQueue.getRequest(result.requestId);
      log.info('request', request)
    }
p
@Crafty Ah I am sorry for the API mention. I believe there are some architectural decisions around this. What you can do is to create a map in memory and save the
uniqueKey->requestId
relation as a key pair value there. 🤔
c
ah ok sounds good. it would be interesting to know if there is a reason for it, surely
addRequest
must be doing it under the hood.