Does anyone have an opinion on the fastest/most el...
# cfml-general
j
Does anyone have an opinion on the fastest/most elegant way to turn a query into a CSV file while maintaining column order?
j
love that library but need to return directly from an api at scale
really meant csv data/string
s
well you could use queryEach/queryMap using async=true and just comma separate the values based on the columnList
j
that is kind of where I was headed. Something like this
Copy code
var snort = qb
	.setReturnFormat( "query" )
	.from( "accounts a" )
	.join( "accounts_holdings h", "a.id", "h.accountID" )
	.when( rc.keyExists( "accountID" ), ( q ) => {
		q.andWhere( "a.id", rc.accountID );
	} )
	.when( rc.keyExists( "bTaxable" ), ( q ) => {
		q.andWhere( "a.bTaxable", rc.bTaxable );
	} )
	.selectRaw( "a.portfolioCode,ticker,format(purchaseDate,'yyyy-MM-dd'),costbasis,lotID,'TRUE','TRUE',0" )
	.get()
var cols = snort.columnArray()
snort    = snort.reduce( ( o, row ) => {
	return o & cols
		.map( ( col, i ) => {
			return row[ col ]
		} )
		.toList() & chr( 10 );
}, "" )
s
sort of, thought the more performant would be map the rows into strings in parallel. and then use ArrayToList to join them all together at the end
j
that's smart
a
but need to return directly from an api at scale
What does that mean?
In the context of why you can't use that UDF that's already been written, I mean
j
that isn't a udf - it's a spreadsheet manipulator. Not fast enough. I have dozens of containers pulling a ton of financial data and need to keep the data access part as fast as possible.
s
the beauty of the arrayToList in this example is that the string building is much more performant
Copy code
var columns = ["id","title"];
    csvData = sortedNews.map((row) =>{
        var rowData = [];
        columns.each((colName) => {
            rowData.append(row[colName]);
        })
        return rowData.toList(",")
    },true,10);
    dump(arrayToList(csvData,"#chr(10)##chr(13)#"))
j
yeah interesting. why not a nested map instead of the for loop?
s
no reason actually, that would be fine
though i will say i omitted the async on the columns
each
because there is a breaking point (depending on how many columns you have) of speed vs. spinning up threads. so on 10-15 columns probably not going to be faster with async
j
Almost instant.
Copy code
var cols = [
      "portfolioCode",
      "ticker",
      "purchaseDate",
      "costbasis",
      "lotID",
      "isHeldShort",
      "isForceHold",
      "penalty"
    ]
    snort = snort
      .map( ( row ) => {
        return cols
          .map( ( col, i ) => {
            return row[ col ]
          } )
          .toList()
      }, true )
      .toList( chr( 10 ) );
s
but the rows it will definitely help, and the
10
is a placeholder for the number of parallel processes so set as necessary for your use case
j
yep
I use parallel a lot. Thanks for the nudge
a
Have you considered that data can have commas, quotes, newlines etc? I'd perhaps look to use the Apache Commons CSV lib rather than trying to roll yer own. No point reinventing the wheel. https://www.baeldung.com/apache-commons-csv#creating-a-csv-file
j
In this case, it won't but I hear you. also love a challenge πŸ˜‰ 25ms to convert 830 rows including 3ms for the query
βœ… 1
a
Might we worth ditching the braces and returns? All of those callbacks look like they're single expressions? Not sure if it would make it more or less readable though.
j
you're right... habit
a
I find it's 50/50 if it makes it more readable πŸ˜‰ You seem to have a stray
i
yer not using in that innermost
map
, too.
j
also a good point hehe
6 seconds for 147K rows
βœ… 1
a
I wonder... given you are converting a query to a string... whether
reduce
is more idiomatic. Although if performance is a consideration, all the accumulated string building might be slow.
j
it is. 19 seconds for reduce, 6 seconds for threaded map
βœ… 1
a
The `toList`calls are fine & clear though. And probably deal with the delimiting better anyhow.
j
Copy code
snort.map( ( row ) => cols.map( ( col, i ) => row[ col ] ).toList(), true ).toList( "<br>" );
vs
Copy code
snort.reduce( ( o, row ) => o & cols.map( ( col ) => row[ col ], true ).toList() & "<br>", "" )
oddly removing the brackets shaved off another 400ms
a
that is odd
I wonder how much faster it would be if you made a wee Java wrapper for the commons lib. Beyond a point the returns would diminish though.
j
It might be worth a try, but I'm not so good with java πŸ˜•
a
Heh "you like a challenge" I believe someone said
m
you might surprise yourself...
j
lol tru dat
a
I reckon the Lucee parallelisation would probably be helping here though.
I'm no Java dev, but it looked pretty easy. Simpler than the code you have here
But there's all the faffing about making a jar, deploying it etc.
j
yeah, but you're Adam Cameron πŸ˜‰
a
No need to call me a twat mate
πŸ˜›
j
hehe
a
It's piqued my interest now anyhow. Something to look at another day perhaps.
j
well, if you end up trying it, let me know how it works out
πŸ’― 1
a
But for now... 23:48 says "zzzzzzzzz" to me. Good luck.
s
does the 6s include the query time?
j
yep
5600ms now
s
can you cache it and see what the actual loop time is?
a
Don't forget to load test it properly if performance matters
s
or time it separately to see what actual performance is
@Adam Cameron is talking in his sleep
😜 1
j
phew - 3919ms
a
Single request testing won't really be telling you much about real-world performance here.
j
agreed
just a starting point
a
@Adam Cameron is talking in his sleep
FINE. Ciao.
s
lol
im curious of getting the data as a query vs an array is any faster
there is some talk of streaming csv to the browser which may speed up time, ive never tried it before
j
now that would be interesting
if you remember where you saw that I'd like to explore
the longest part is streaming the data
s
perfect
c
Just a late footnote: https://github.com/cfsimplicity/spreadsheet-cfml/wiki/queryToCsv uses the Apache Commons CSV java library under the hood.
βœ… 1
s
@gpickin
It makes me curious which would be faster. I do see however that there is a little more going on (in a good way) in the spreadsheet-cfml version. There is at least some data checking and automatic formatting for dates/integer’s being done. Otherwise it just seems like its using a java string builder over using the
.toList()
method.
@bdw429s do you know if there is a performance hit based on how a query is returned (i.e. query vs. array of structs)?
c
You're right, Scott, I'm having to convert the cfml query into an array and sanitize it before I can pass it to the commons-csv lib. As an experiment I've just tried changing the for-loops to parallel and it does speed up quite a bit (about 4x on a 1K row query). But I think that's Lucee only for now and I have to support ACF 2016.
s
nice! i was just about to mention that would be the only suggestion for a speed improvement.
seeing as how
.toList()
just creates a string builder