Does anyone know of a way to list files in a folde...
# cfml-general
a
Does anyone know of a way to list files in a folder only past a certain date without QoQ? Example... I have a folder with 100k files in it... I only want DirectoryList (or a similar function) to return files in that folder that are older than 60 days. I know this can be done using QoQ after the DirectoryList, but the DirectoryList takes 30 seconds due to volume of files and generally there are only going to be 3-4 files older than 60 days in this folder.
n
I don't suppose you have something in the filename to indicate the date? Just thinking there might be a way to do it via the return function
Copy code
directoryList("/var/data", false, "path", function(path){ return arguments.path.hasSuffix(".log"); })
(or whether there's something in
arguments
which would help you filter)
a
No, there user uploaded documents and all have the filename as it existed on the client machine.
a
You can use a filter to discard the newer ones, but is directoryList is the slow bit or the QofQ
a
I couldn't find any examples or documentation showing how to filter by date using DirectoryList... but yes, the time I mentioned for taking 30 seconds is only the DirectoryList.
a
var x = files.filter( (el) => isDate(el.DateLastModified ) );
is how you'd filter it https://cfdocs.org/queryfilter
👍 1
I'm surprised directoryList is that slow though - it's not you doing a cfdump of all the files that's making it slow is it?
👍 1
j
Agreed, too much output can cause significant slow downs (e.g. dumping too much) - personal experience
a
No, we're not outputting anything... just using GetTickCount() before and after the directory list and then writing the difference to a log file
a
Does it need to be fast? Could you schedule it as a clean up ever night on a schedule?
a
it's already a scheduled task... I mean 30 seconds isn't a big deal since it's a task, but it seems unnecessary to pull 100k files into a query or array when I am only taking action on < 10 files.
a
Just trying to understand your challenges. I think Apache commons I linked to above is worth a look then.
j
You could use
CFEXECUTE
to issue a command using your OS, such as
find
on Linux or
PowerShell
on Windows, to filter files by date before processing them in ColdFusion.
🐲 1
a
I wonder what this means, in the docs for the filter param of `directoryList`: "Additionally, it can also accept the instances of Java FileFilter Objects." I'd poss dump out the arguments of the callback and see what you're provided.
(soz on my phone or I'd investigate further myself)
c
Was just checking that out, Adam.... seems that on ACF2023 at least (not Lucee apparently), you can pass an
AgeFileFilter
java object (as mentioned by John and which implements
FileFilter
) to
DirectoryList()
as a filter function. Not sure if it will be faster but it should limit what's added to the query.
Copy code
sixtyDaysAgo = Now().Add( "d", -60 )
ageFileFilter = CreateObject( "java", "org.apache.commons.io.filefilter.AgeFileFilter" ).init( sixtyDaysAgo )
result = DirectoryList( folderPath, false, "query", ageFileFilter )
3
a
Thanks. I stepped out for a bit but I’ll give it a try when I get back.
a
I'd also look at this: https://stackoverflow.com/questions/20786822/in-java-how-to-only-pick-or-filter-files-created-between-during-a-specific-time, or other search results for "java list files filter on date"
@saghosh can you pls improve the docs here? The example is awful (why the hell is it dicking around with forms??). It ought to show each variation of each param. Especially the FileFilter option as it's the most useful and the least simple variation.
🗳️ 5
d
I would probably maintain a cache since disk IO is so slow. Separate directories (maybe by year / month) would also speed stuff up. Otherwise OS level stuff (cfexecute or some JNI (apropos to nothing java 21 has some slick stuff replacing JNI)) is probably going to be fastest due to it having caches and whatnot.
a
@cfsimplicity, I tried what you suggested and it works perfectly. No additional time savings. In a folder that currently has 72k records, it gets the 300 that are older than 60 days in roughly the same time (+/- half a second) as without the filter. However, I like that I don't have an array with 72k indexes or an additional QoQ.
👍🏻 1
👍 2