https://discord.cloudflare.com logo
Join Discord
Powered by
# workers-discussions
  • w

    Walshy | Pages

    04/25/2023, 2:27 PM
    what's your account ID?
  • s

    sathoro

    04/25/2023, 2:29 PM
    ac9902443335f46a9fc60a881a41da87 please don't hack me
  • w

    Walshy | Pages

    04/25/2023, 2:29 PM
  • w

    Walshy | Pages

    04/25/2023, 2:29 PM
    You are indeed over 30s
  • s

    sathoro

    04/25/2023, 2:30 PM
    but I think I found the issue. we need to parse HTML (I know, I know) and one of the pages we are scraping is massive. I think this lib is causing trouble:
  • s

    sathoro

    04/25/2023, 2:30 PM
    wait that is the average?
  • m

    mbackonja

    04/25/2023, 2:30 PM
    Hello guys, I have issues with the simple Cloudflare worker that acts like a proxy to a single file on a public S3. That worker was working a few months ago and now I need it again, but it doesn't work now... AWS S3 bucket is public, the file is also public, I can access that HTML file directly through the browser (masked bucket name):
    https://xxxxxxx.s3.eu-central-1.amazonaws.com/error.html
    And my worker is like this
    Copy code
    addEventListener('fetch', event => {
      event.respondWith(handleRequest(event.request))
    })
    
    async function handleRequest(request) {
      return await fetch('https://xxxxxxx.s3.eu-central-1.amazonaws.com/error.html')
    }
    And once I attach Cloudflare Worker to some routes, those routes returns
    Copy code
    403 Forbidden
    cloudflare
    but if I open that worked directly through
    xxxxx.workers.dev
    domain it's working as expected (HTML from S3 bucket is returned)...
  • w

    Walshy | Pages

    04/25/2023, 2:30 PM
    for the exceeded cpu rows that i have
  • w

    Walshy | Pages

    04/25/2023, 2:30 PM
    i'm looking only at failures
  • s

    sathoro

    04/25/2023, 2:30 PM
    oh for exceeded ones, okay that makes sense
  • j

    James

    04/25/2023, 2:30 PM
    If you can for your use-case, I'd recommend using HTMLRewriter. It'll be significantly more efficient as it's streamed; https://developers.cloudflare.com/workers/runtime-apis/html-rewriter/
  • s

    sathoro

    04/25/2023, 2:31 PM
    we do use that where possible, but it isn't possible for what we are doing unfortunately. we have to load the full DOM
  • s

    sathoro

    04/25/2023, 2:31 PM
    need to pass the DOM into here:
  • s

    sathoro

    04/25/2023, 2:32 PM
    using linkedom was a workaround to make it work on Workers. the recommended lib, JSDom, wouldn't work and is even heavier
  • j

    James

    04/25/2023, 2:32 PM
    Ahh I see, okay yeah I can definitely see how that would consume a lot of time on large pages 😅
  • s

    sathoro

    04/25/2023, 2:32 PM
    I wouldn't be surprised if there is an infinite loop getting triggered or something
  • s

    sathoro

    04/25/2023, 2:32 PM
    within linkedom
  • s

    sathoro

    04/25/2023, 2:32 PM
    yeahhh I am wondering how we could limit the HTML before passing it to linkedom without destroying it
  • s

    sathoro

    04/25/2023, 2:33 PM
    I'd share the specific page but I don't want to crash anyone's browser. it is a cast page on IMDB
  • s

    sathoro

    04/25/2023, 2:34 PM
    it is 1.4MB
  • s

    sathoro

    04/25/2023, 2:35 PM
  • j

    James

    04/25/2023, 2:37 PM
    If you run those libraries to parse it locally, how fast does it run?
  • m

    mbackonja

    04/25/2023, 2:39 PM
    Anyone has some idea? First pic - workers.dev - working Second pic - real domain - not working Third pic - source code of the worker
  • w

    Walshy | Pages

    04/25/2023, 2:39 PM
    I'd recommend posting into #1052656806058528849
  • w

    Walshy | Pages

    04/25/2023, 2:39 PM
    as there's already ongoing support here 🙂
  • m

    mbackonja

    04/25/2023, 2:40 PM
    Oh, okay, thanks
  • s

    sathoro

    04/25/2023, 2:44 PM
    ohh you know what... I blamed the completely wrong library. the CPU time is being consumed after it is parsed to DOM and run through the readability library. it seems to be getting stuck when we strip the HTML using another lib...
  • s

    sathoro

    04/25/2023, 2:44 PM
  • j

    James

    04/25/2023, 2:45 PM
    Ah well you've narrowed it down now and can hopefully reproduce and report 😄
  • s

    sathoro

    04/25/2023, 2:47 PM
    it would probably be faster if I load it back into linkedom and use
    innerText
    🤣
1...241524162417...2509Latest