Does anyone have an opinion on the fastest most elegant way cfml #cfml-general

Does anyone have an opinion on the fastest/most el...

John Wilson

10/20/2022, 9:56 PM

Does anyone have an opinion on the fastest/most elegant way to turn a query into a CSV file while maintaining column order?

Scott Steinbeck

10/20/2022, 9:59 PM

https://github.com/cfsimplicity/spreadsheet-cfml/wiki/queryToCsv

John Wilson

10/20/2022, 10:00 PM

love that library but need to return directly from an api at scale

John Wilson

10/20/2022, 10:00 PM

really meant csv data/string

Scott Steinbeck

10/20/2022, 10:02 PM

well you could use queryEach/queryMap using async=true and just comma separate the values based on the columnList

John Wilson

10/20/2022, 10:03 PM

that is kind of where I was headed. Something like this

Copy code

var snort = qb
	.setReturnFormat( "query" )
	.from( "accounts a" )
	.join( "accounts_holdings h", "a.id", "h.accountID" )
	.when( rc.keyExists( "accountID" ), ( q ) => {
		q.andWhere( "a.id", rc.accountID );
	} )
	.when( rc.keyExists( "bTaxable" ), ( q ) => {
		q.andWhere( "a.bTaxable", rc.bTaxable );
	} )
	.selectRaw( "a.portfolioCode,ticker,format(purchaseDate,'yyyy-MM-dd'),costbasis,lotID,'TRUE','TRUE',0" )
	.get()
var cols = snort.columnArray()
snort    = snort.reduce( ( o, row ) => {
	return o & cols
		.map( ( col, i ) => {
			return row[ col ]
		} )
		.toList() & chr( 10 );
}, "" )

Scott Steinbeck

10/20/2022, 10:05 PM

sort of, thought the more performant would be map the rows into strings in parallel. and then use ArrayToList to join them all together at the end

John Wilson

10/20/2022, 10:06 PM

that's smart

Adam Cameron

10/20/2022, 10:14 PM

but need to return directly from an api at scale

What does that mean?

Adam Cameron

10/20/2022, 10:14 PM

In the context of why you can't use that UDF that's already been written, I mean

Scott Steinbeck

10/20/2022, 10:15 PM

https://trycf.com/gist/364a7aecdce820c5d82cdfc162b0196c/lucee5?theme=monokai

John Wilson

10/20/2022, 10:15 PM

that isn't a udf - it's a spreadsheet manipulator. Not fast enough. I have dozens of containers pulling a ton of financial data and need to keep the data access part as fast as possible.

Scott Steinbeck

10/20/2022, 10:16 PM

the beauty of the arrayToList in this example is that the string building is much more performant

Copy code

var columns = ["id","title"];
    csvData = sortedNews.map((row) =>{
        var rowData = [];
        columns.each((colName) => {
            rowData.append(row[colName]);
        })
        return rowData.toList(",")
    },true,10);
    dump(arrayToList(csvData,"#chr(10)##chr(13)#"))

John Wilson

10/20/2022, 10:17 PM

yeah interesting. why not a nested map instead of the for loop?

Scott Steinbeck

10/20/2022, 10:17 PM

no reason actually, that would be fine

Scott Steinbeck

10/20/2022, 10:21 PM

though i will say i omitted the async on the columns

each

because there is a breaking point (depending on how many columns you have) of speed vs. spinning up threads. so on 10-15 columns probably not going to be faster with async

John Wilson

10/20/2022, 10:22 PM

Almost instant.

Copy code

var cols = [
      "portfolioCode",
      "ticker",
      "purchaseDate",
      "costbasis",
      "lotID",
      "isHeldShort",
      "isForceHold",
      "penalty"
    ]
    snort = snort
      .map( ( row ) => {
        return cols
          .map( ( col, i ) => {
            return row[ col ]
          } )
          .toList()
      }, true )
      .toList( chr( 10 ) );

Scott Steinbeck

10/20/2022, 10:22 PM

but the rows it will definitely help, and the

is a placeholder for the number of parallel processes so set as necessary for your use case

John Wilson

10/20/2022, 10:22 PM

yep

John Wilson

10/20/2022, 10:23 PM

I use parallel a lot. Thanks for the nudge

Adam Cameron

10/20/2022, 10:26 PM

Have you considered that data can have commas, quotes, newlines etc? I'd perhaps look to use the Apache Commons CSV lib rather than trying to roll yer own. No point reinventing the wheel. https://www.baeldung.com/apache-commons-csv#creating-a-csv-file

John Wilson

10/20/2022, 10:28 PM

In this case, it won't but I hear you. also love a challenge 😉 25ms to convert 830 rows including 3ms for the query

✅ 1

Adam Cameron

10/20/2022, 10:33 PM

Might we worth ditching the braces and returns? All of those callbacks look like they're single expressions? Not sure if it would make it more or less readable though.

John Wilson

10/20/2022, 10:36 PM

you're right... habit

Adam Cameron

10/20/2022, 10:38 PM

I find it's 50/50 if it makes it more readable 😉 You seem to have a stray

yer not using in that innermost

map

, too.

John Wilson

10/20/2022, 10:39 PM

also a good point hehe

John Wilson

10/20/2022, 10:40 PM

6 seconds for 147K rows

✅ 1

Adam Cameron

10/20/2022, 10:42 PM

I wonder... given you are converting a query to a string... whether

reduce

is more idiomatic. Although if performance is a consideration, all the accumulated string building might be slow.

John Wilson

10/20/2022, 10:43 PM

it is. 19 seconds for reduce, 6 seconds for threaded map

✅ 1

Adam Cameron

10/20/2022, 10:43 PM

The `toList`calls are fine & clear though. And probably deal with the delimiting better anyhow.

John Wilson

10/20/2022, 10:43 PM

Copy code

snort.map( ( row ) => cols.map( ( col, i ) => row[ col ] ).toList(), true ).toList( "<br>" );

Copy code

snort.reduce( ( o, row ) => o & cols.map( ( col ) => row[ col ], true ).toList() & "<br>", "" )

John Wilson

10/20/2022, 10:43 PM

oddly removing the brackets shaved off another 400ms

Adam Cameron

10/20/2022, 10:44 PM

that is odd

Adam Cameron

10/20/2022, 10:45 PM

I wonder how much faster it would be if you made a wee Java wrapper for the commons lib. Beyond a point the returns would diminish though.

John Wilson

10/20/2022, 10:46 PM

It might be worth a try, but I'm not so good with java 😕

Adam Cameron

10/20/2022, 10:46 PM

Heh "you like a challenge" I believe someone said

Michael Schmidt

10/20/2022, 10:46 PM

you might surprise yourself...

John Wilson

10/20/2022, 10:46 PM

lol tru dat

Adam Cameron

10/20/2022, 10:47 PM

I reckon the Lucee parallelisation would probably be helping here though.

Adam Cameron

10/20/2022, 10:47 PM

I'm no Java dev, but it looked pretty easy. Simpler than the code you have here

Adam Cameron

10/20/2022, 10:47 PM

But there's all the faffing about making a jar, deploying it etc.

John Wilson

10/20/2022, 10:48 PM

yeah, but you're Adam Cameron 😉

Adam Cameron

10/20/2022, 10:48 PM

No need to call me a twat mate

Adam Cameron

10/20/2022, 10:48 PM

😛

John Wilson

10/20/2022, 10:48 PM

hehe

Adam Cameron

10/20/2022, 10:48 PM

It's piqued my interest now anyhow. Something to look at another day perhaps.

John Wilson

10/20/2022, 10:49 PM

well, if you end up trying it, let me know how it works out

💯 1

Adam Cameron

10/20/2022, 10:49 PM

But for now... 23:48 says "zzzzzzzzz" to me. Good luck.

Scott Steinbeck

10/20/2022, 10:49 PM

does the 6s include the query time?

John Wilson

10/20/2022, 10:49 PM

yep

John Wilson

10/20/2022, 10:50 PM

5600ms now

Scott Steinbeck

10/20/2022, 10:50 PM

can you cache it and see what the actual loop time is?

Adam Cameron

10/20/2022, 10:50 PM

Don't forget to load test it properly if performance matters

Scott Steinbeck

10/20/2022, 10:50 PM

or time it separately to see what actual performance is

Scott Steinbeck

10/20/2022, 10:50 PM

@Adam Cameron is talking in his sleep

😜 1

John Wilson

10/20/2022, 10:51 PM

phew - 3919ms

Adam Cameron

10/20/2022, 10:51 PM

Single request testing won't really be telling you much about real-world performance here.

John Wilson

10/20/2022, 10:51 PM

agreed

John Wilson

10/20/2022, 10:51 PM

just a starting point

Adam Cameron

10/20/2022, 10:52 PM

@Adam Cameron is talking in his sleep

FINE. Ciao.

Scott Steinbeck

10/20/2022, 10:52 PM

lol

Scott Steinbeck

10/20/2022, 10:54 PM

im curious of getting the data as a query vs an array is any faster

Scott Steinbeck

10/20/2022, 11:00 PM

there is some talk of streaming csv to the browser which may speed up time, ive never tried it before

John Wilson

10/20/2022, 11:00 PM

now that would be interesting

John Wilson

10/20/2022, 11:01 PM

if you remember where you saw that I'd like to explore

John Wilson

10/20/2022, 11:02 PM

the longest part is streaming the data

John Wilson

10/20/2022, 11:24 PM

https://www.bennadel.com/blog/4034-experimenting-with-lazy-queries-and-streaming-csv-comma-separated-value-data-in-lucee-cfml-5-3-7-47.htm

Scott Steinbeck

10/20/2022, 11:26 PM

perfect

cfsimplicity

10/24/2022, 9:45 AM

Just a late footnote: https://github.com/cfsimplicity/spreadsheet-cfml/wiki/queryToCsv uses the Apache Commons CSV java library under the hood.

✅ 1

Scott Steinbeck

10/24/2022, 7:01 PM

@gpickin

Scott Steinbeck

10/24/2022, 7:31 PM

It makes me curious which would be faster. I do see however that there is a little more going on (in a good way) in the spreadsheet-cfml version. There is at least some data checking and automatic formatting for dates/integer’s being done. Otherwise it just seems like its using a java string builder over using the

.toList()

method.

Scott Steinbeck

10/24/2022, 7:32 PM

@bdw429s do you know if there is a performance hit based on how a query is returned (i.e. query vs. array of structs)?

cfsimplicity

10/24/2022, 7:41 PM

You're right, Scott, I'm having to convert the cfml query into an array and sanitize it before I can pass it to the commons-csv lib. As an experiment I've just tried changing the for-loops to parallel and it does speed up quite a bit (about 4x on a 1K row query). But I think that's Lucee only for now and I have to support ACF 2016.

Scott Steinbeck

10/24/2022, 7:42 PM

nice! i was just about to mention that would be the only suggestion for a speed improvement.

Scott Steinbeck

10/24/2022, 7:44 PM

seeing as how

.toList()

just creates a string builder

3 Views

Open in Slack

Previous Next