Can anyone think of a more efficient and performan...
# box-products
d
Can anyone think of a more efficient and performant way to do this? I have an array of arrays and I want to convert it to a CSV file. I loop over the array and convert the inner arrays to lists and save the lists to a variable with line endings, then write the content to the csv file.
Copy code
var out = "";
for( a in myArray ) {
  out += out & a.toList() & chr(10) & chr(13);
}
fileWrite( csvFullPath, out );
This is pretty minimal code.
the array of arrays originally comes from a csv file that is about 800 to 1500 rows (400kb to 1mb). This script will be run on a batch of files - ranging from 15 files to 100 files.
a
Might be worth having a look at OpenCSV if you want performance http://opencsv.sourceforge.net/#writing_from_an_array_of_strings
👍🏾 1
d
Probably need to define what more performant is to you. Less memory, faster, etc. String concatenation is extremely poor. So using
+=
will be very heavy. Try using stringbuilder, or user arrayToList on an array to not generate a ton of extra strings that need cleaned up.
1
Other things you may want to do is write out to the file multiple times (appending) vs holding everything in memory at the same time and doing one giant write.
☝🏾 1
☝️ 1
d
Its so easy to write these types of scripts in CFML. I have them all over my monolith legacy app.
d
(just in case you weren't aware: https://cfdocs.org/fileappend)
d
and it is a micro optimization but putting
chr(10) & chr(13)
into a variable/spot in memory and appending that static string will also be more memory performant because it won't have to do 2 string generation/cleanups every time you append them both.
1
d
Yes @domwatson I am. But since I always create new files from data I have in memory I have never used it. I suppose, I would create an empty file first then fileAppend(). I honestly thought, that was too many operations.
d
if you have a lot of data, it is way more efficient to do that.
(in terms of memory usage)
But if the amount of data is negligable, then not so much a problem, of course.
d
But I highly recomend going to a prewritten library, because it probably has most of the easy gotchas already taken care of. Such as quoting out commas in strings so your CSV doesn't become unusable.
👍 1
d
@deactivateduser what is stringbuilder?
😮 1
but in the real world http://opencsv.sourceforge.net/
🤔 1
Stringbuilder is a java class that allows for concatenating a lot of strings together without the memory overhead of creating a new string every time.
In Java (And CF) strings are immutable, which means any small change to them requires you to create a completely new string in memory and then allow the garbagecollector to clean up the old string. So doing something like "String" + "String2" + "String3" actually creates in memory 3 strings for the original to be concatenated 1 string that is "StringString2" 1 string that is "StringString2String3"
🙌🏾 1
So as your string gets larger, you end up using almost twice the amount of memory to do any concatenations. So imagine you have lines 100 characters long and 1 million lines and you want to append 1 more line. You end up requiring twice the amount of memory because it has to hold the original 1 million line string, and then also the 1 million and 1 line long string in memory. That can bring servers down when you get large enough
🙌🏾 1
d
@thisOldDave nice use of arrayReduce. at the core, its not much different than my loop and string concat technique. or is it?
d
You can think of stringbuilder as a slightly fancier arrayToList that allows builder syntax vs having to manage an external array.
t
array reduce
should
be faster but you would need to test it but otherwise no its the same
d
Why "should" reduce be faster?
d
@deactivateduser thank you for the explanation of immutable strings. thats the sciency stuff I was missing in understanding how my script can be improved.
👍 1
t
IF it is using the underlying stream api it should be around twice as fast
d
so if arrayReduce under the hood is using a "stringbuilder" method it would be faster?
that makes sense. I don't know if it is though.
d
No, arrayReduce couldn't use StringBuilder due to it not being generic.
d
where can we look under the hood on all these CFML functions?
d
It is definitely something now on my test list, because streams are not inheritally faster to my knowledge, unless it is something that is able able to be done in threads, and I don't think arrayReduce would thread by default. With Lucee you could look at github, with Adobe you are SOL
d
SOL?
d
Shit outta Luck
d
lol
wow. so Lucee's arrayReduce() can be totally different under the hood compared to Adobe's version.
d
It is most likely different.
a
That isn't surprising though. If I ask 5 devs to write a function that 'given x should return y' then it's possible to get 5 different implementations back, which all pass the specification.
a
@deactivateduser - I'd be curious about how streams compare too as I remeber seeing a thread about StringBuilder causing an OOM error too because it doubles memory when you call append(). So if a string gets big enough it can have memory problems too
d
I mean you can still get an OOM error with any "to large" string.
But string builder doesn't cause a ton of allocate memery deallocate memory. 1000 appends make 1001 strings. String builder can also be used with streams.
Still can't put 1 gallon of liquid in a 1 liter container.
a
StringBuilder stores contents in a character array, different than a CF array. The default size is 16 chars, i think. Every time you append, it resizes, or doubles the size of the new array if the new contents can't fit in the old array. IIRC, it also has to be contiguous space (not positive about that though). Anyway, not the same as string concatenation, but not exactly free either.
Though at least with StringBuilder you can set an initialize size higher than the default 16, to eliminate some of the array resizing/reallocation. Just something to keep in mind if working with larg-ish strings
d
Sure
d
@deactivateduser really appreciate the tip on using Stringbuilder. Here is my new code, which is still minimal code and I really like that.
Copy code
component {
   function init(){ return this; } 
   void function arrayToString(array data){
        
        var data = [
            ["asdf","132414","1324.12","1341kkk","ABC SAS ADEWK ADFADADA"],
            ["poui","0987","1824.12","poiu987","OIU SAS ADEWK ADFADADA"],
            ["wert","3546","1824.12","poiu987","OIU SAS ADEWK ADFADADA"]
        ];
        
        var sb = createObject("java", "java.lang.StringBuilder").init();
        var newline = chr(13) & chr(10);
        for( a in data ) { 
            sb.append( a.toList() ).append( newline );
        }
        return sb;
    }
}
I think this built-in function should be added to the core by Adobe and Lucee.
d
Missing a
var
on the
a
variable @Daniel Mejia (code review eyes activated)
😆 1
Copy code
for( var a in data ) {...
d
@domwatson hmm...i do recall seeing examples like that. Also I don't know for how long I've skipped it, and I don't think its required since I never get an error for it.
d
it won't error. BUT, if you have a single instance of your CFC and two different threads both call the
arrayToString()
method at the same time, you can get trouble.
Because
a
variable now is scoped to the entire instance of the CFC (by default). So its value is shared across all threads that are using the instance.
d
Thank you @domwatson. Rest assured this is the only way I use variables without a scope, until now. I also know that its better to define a scope so that the lang parser is not iterating through the scopes until it finds it.
@domwatson @deactivateduser What happens in memory/heap when you call this String builder function that is part of loop? Here what I assume is happening. the data is being copied and saved into different variables - fArray, pcode and fileString get overridden on each loop, not sure if sb variable gets overridden? fArray "fileread()" = 1st copy of the data pcode = 2nd copy of the data sb = 3rd copy of the data fileString = 4th copy of the data
Copy code
function processFiles(required array filelist){
    for( var f in filelist ){
         var fArray = fileRead(f.path).listToArray(variables.newline);
         
         var pcode = convertToPcodeArray( fArray ); // massages the data and adds other info

         var fileString = arrayToString( pcode ); // Does this create a new scope on every loop?

         fileWrite( outputPath, fileString );
    }
}

function arrayToString(required array data){ 
        var sb = createObject("java", "java.lang.StringBuilder").init();

        var newline = chr(13) & chr(10);

        for( a in data ) { 
            sb.append( a.toList() ).append( newline );
        }

        return sb;
}
d
I don't know exactly. BUT, looking now at your code - can see you are reading from a file and processing each of its lines. I'd highly recommend this approach (pseudo code):
Copy code
CREATE EMPTY OUTPUT FILE
DO
   READ NEXT LINE from input file INTO srcLine
   CONVERT srcLine to targetFormat
   APPEND targetFormat to OUTPUT FILE
WHILE not end of input file
^^ then don't care about memory really at all. CFML Functions:
Copy code
var sourceFile = FileOpen( path, "read" );

try {
  while( !FileIsEoF( sourceFile ) ) {
    var ln = FileReadLine( sourceFile );
    // convert ln to new format
    // use FileAppend to output to new file
  }
} catch( any e ) {
  rethrow;
} finally {
  FileClose( sourceFile );
}
👍🏾 1
d
Thank you so much for this. there is some magic in FileIsEoF() and FileReadLine() that I don't understand. but somehow it knows how to read the next line.
So this is the way to use to the file-system instead of ram
👍 1