Friday puzzle slightly smiling face This isn t CFML related cfml #sql

Friday puzzle :slightly_smiling_face:. This isn’t ...

danmurphy

04/29/2022, 4:05 PM

Friday puzzle 🙂. This isn’t CFML related, just a bit of a SQL puzzle, so I thought I’d throw it out here for some ideas. There is an Oracle database table that an application writes to, setting new records with a

processed

field set to

and

record_created_date

set to the current date/time. SAP PI (middleware software) polls the table for unprocessed records, creates SAP IDOCs from the data, then updates the

processed

flag to

. Here is the kicker…the way the SAP PI JDBC sender works, there are 2 separate SQL statements that make this happen that have no knowledge of each other. The 1st is a SELECT to get the records to create the IDOCS. Then a totally separate UPDATE statement to update the records that have been processed. The polling interval can be set to whatever timeframe. So you might just say… 1.

SELECT * FROM table WHERE processed = 'N'

2. Process the records and create IDOCs 3.

UPDATE table SET processed = 'Y' WHERE processed = 'N'

The problem is, if processing the records takes 3 seconds (or 10ms, or 30s, etc), the records that are inserted in-between step 1 and 3 don’t get IDOCs created but are updated in the table as processed. Is there any way to make sure all records are processed?

danmurphy

04/29/2022, 4:20 PM

I’ve thought about having them change it to only process records that are older than the previous 5 minute bucket of time. So if the table is polled at 4:01pm, it will process anything from 3:55pm or earlier. If it polls at 4:02pm, 40345pm, 40459pm, etc., it will still only do 3:55pm or earlier. The problem still remains that if it polls at 40459pm it will do 3:55pm or earlier, but then if the update statement runs at 40501pm, it would update anything from 4:00pm or earlier. So the records from 355pm 400pm were lost.

Myka Forrest

04/29/2022, 4:57 PM

When/after you select the records

where processed = 'N'

, can you set

processed

'P'

(pending), and then only update

where processed = 'P'

Myka Forrest

04/29/2022, 4:58 PM

Or record the IDs of what was selected and only update those.

danmurphy

04/29/2022, 4:59 PM

There is literally only 2 fields in the JDBC Sender in SAP XI - one for a SELECT statement (which is where the data is available to be processed) and one for an UPDATE statement. They are two totally separate fields that you plop SQL into and no data can be passed between.

danmurphy

04/29/2022, 5:00 PM

(my mind was a bit blown when I realized this from the SAP PI developer as we tracked down this bug)

Myka Forrest

04/29/2022, 5:15 PM

Is there any way to tie a SAP IDOC to a particular record? Can you read those to update the table?

danmurphy

04/29/2022, 5:26 PM

The Update doesn’t have any knowledge of the data that was processed. It is just a separate update statement.

Myka Forrest

04/29/2022, 5:51 PM

and there's no way to make it aware of those updates? 🤔

danmurphy

04/29/2022, 5:54 PM

I pulled this screenshot from the internet, but this is what they have to work with…

websolete

04/29/2022, 5:55 PM

can the select statement be any valid sql, like a cte?

danmurphy

04/29/2022, 5:56 PM

Yes. Including calling a function since that is allowed in a SELECT statement in Oracle. But functions in oracle can’t manipulate data.

Myka Forrest

04/29/2022, 5:58 PM

Are you able to modify these tables or add a helper table?

websolete

04/29/2022, 5:59 PM

yeah, i was going to say, what about a third table to stage records to be processed, then the update sql will do an UPDATE / INNER JOIN to that third table from the main one

danmurphy

04/29/2022, 5:59 PM

Yes, to some extent, we could do that. I added the

record_created_date

field to try and help, for example.

danmurphy

04/29/2022, 6:00 PM

I’m intrigued, keep talking. How would records get in the staging table? The initial SELECT is just that…only a SELECT.

Myka Forrest

04/29/2022, 6:01 PM

Can you write a query to go with the select to update a helper table? If you could add a record_created_date, can you add an id column?

danmurphy

04/29/2022, 6:02 PM

I could probably add an ID column. How would it be used? Tell me more about what you mean with “write a query to go with the select to update a helper table”.

Myka Forrest

04/29/2022, 6:08 PM

use the helper table as a holding cell. (I'm a Sql Server user, so this may require some translation to oracle)

select ID

(from origTable)

into helperTable where processed = 'N'

. Then

Select * from helperTable

for processing IDOCs.

Update origTable SET processed = 'Y' WHERE id in (select id from helperTable).

Truncate helperTable

so the next batch starts from scratch.

websolete

04/29/2022, 6:09 PM

Copy code

INSERT INTO helpertable ( col1, col2 ) 
SELECT col1, col2 
FROM tablename 
WHERE processed = 'N'; 

SELECT col1, col2
FROM tablename 
WHERE processed = 'N';

-------------------------

UPDATE t 
SET t.processed = 'Y' 
FROM tablename t 
	INNER JOIN helpertable h ON t.col1 = h.col1
WHERE t.processed = 'N';

you'd have to update the update sql to match how oracle does update/inner joins, of course. you'd also want to have something that cleans up helpertable although if the above works you could forego that (but have lots of leftover useless rows)

👍 1

websolete

04/29/2022, 6:09 PM

we're saying the same thing pretty much

👍 1

websolete

04/29/2022, 6:10 PM

unsaid: oracle. blech.

😂 2

danmurphy

04/29/2022, 6:12 PM

But how does the insert into the helpertable happen? It’s kinda like wanting an “in process” flag (e.g. P instead of just Y or N). Where does that event take place?

Myka Forrest

04/29/2022, 6:13 PM

Where does

select * from table where processed = 'n'

take place?

websolete

04/29/2022, 6:13 PM

it assumes you can have multiple queries in the select item

Myka Forrest

04/29/2022, 6:13 PM

Why can't it go just before that?

Myka Forrest

04/29/2022, 6:13 PM

right, assuming "any valid sql"

danmurphy

04/29/2022, 6:13 PM

Ok, I’ll ask about multiple statements in that little box. Not sure. If that is possible, then all sorts of stuff opens up. I’m not sure if it is.

Myka Forrest

04/29/2022, 6:14 PM

or can you call some sort of oracle-equivalent to a stored procedure?

websolete

04/29/2022, 6:14 PM

perhaps you can wrap all this up in a stored proc and invoke that

websolete

04/29/2022, 6:14 PM

jinx

😂 1

websolete

04/29/2022, 6:14 PM

that would give you the most flexibility i'd say

danmurphy

04/29/2022, 6:14 PM

I don’t know if it has the “I saw a certain character and will continue to run the next statement” ability.

Myka Forrest

04/29/2022, 6:15 PM

@websolete, it only encourages me that I might be on the right track. 😁

danmurphy

04/29/2022, 6:15 PM

Yeah, I tried that route first. Evidently it has to be valid SQL, not PL/SQL.

websolete

04/29/2022, 6:15 PM

which is why capturing the complex select/flag/update in a stored proc and 'simply' calling the stored proc makes the most sense. the update sql should be easy

👍 1

websolete

04/29/2022, 6:16 PM

CALL myproc; ?

websolete

04/29/2022, 6:16 PM

or EXEC or whatever oracle overcomplicates for that

danmurphy

04/29/2022, 6:17 PM

Right. In Oracle I think it is EXECUTE proc;, or something like that. Maybe I need to have them show me that so I can see it not working with my own eyes, ha.

danmurphy

04/29/2022, 6:17 PM

And also, the procedure doesn’t return data, right? Which is needed for that “Query SQL Statement” field, because it needs the data to process the IDOCs.

websolete

04/29/2022, 6:17 PM

it can, at least in normal rdbms's

websolete

04/29/2022, 6:17 PM

doesn't have to, but can

danmurphy

04/29/2022, 6:18 PM

Maybe I just never do that and it can in Oracle too? 🤔

websolete

04/29/2022, 6:18 PM

anything's possible. i hear they have auto-incrementing numeric identifiers in oracle now, so there's hope

🤯 1

danmurphy

04/29/2022, 6:19 PM

Oh, the triggers and sequences I’ve created in my day. Whew.

Myka Forrest

04/29/2022, 6:22 PM

Regardless...this is an interesting challenge!

danmurphy

04/29/2022, 6:22 PM

Yeah, I don’t think you can return data with an

EXECUTE sproc;

statement, right? Sorry if I’m too Oracle-ized and this sounds ridiculous. But Returning data = SELECT statements = SQL. Stored procedures are in the PL/SQL realm. Right?

danmurphy

04/29/2022, 6:23 PM

furiously googling…

websolete

04/29/2022, 6:23 PM

fwiw, you can return a resultset from a stored proc in sql server in a 'flat' exec proc statement, but only a single resultset, not multiple like if you actually invoke the stored proc

websolete

04/29/2022, 6:24 PM

which is what you want anyway, but not sure if oracle behaves the same

danmurphy

04/29/2022, 6:25 PM

Single result set, meaning could be the results of one query?

websolete

04/29/2022, 6:25 PM

yes

danmurphy

04/29/2022, 6:27 PM

Hmm. Looks like I’m on 11g for this table and resultsets hadn’t been dreamt up yet. Still googling though…

danmurphy

04/29/2022, 6:28 PM

In other rdbms’s, can functions manipulate data?

websolete

04/29/2022, 6:29 PM

i don't believe so, if you mean altering data in tables. you can always massage what's returned from the function, but i don't think you can affect data changes at the storage level

danmurphy

04/29/2022, 6:30 PM

Yeah, that’s what I mean. Ok, that’s consistent with Oracle too.

websolete

04/29/2022, 6:30 PM

does oracle have computed columns?

danmurphy

04/29/2022, 6:31 PM

I might need a translation or example of what that means.

danmurphy

04/29/2022, 6:32 PM

googling…

websolete

04/29/2022, 6:32 PM

https://www.oracle.com/technetwork/database/database-technologies/rdb/automatic-columns-132042.pdf

websolete

04/29/2022, 6:32 PM

information so important it could only be captured in a pdf, apparently

😂 1

websolete

04/29/2022, 6:33 PM

my point in asking is that's KIND of like a 'function that affects data'

websolete

04/29/2022, 6:33 PM

but it's tied to the column definition

danmurphy

04/29/2022, 6:35 PM

Ah, gotcha. So yeah, it looks like it exists, but I’ve usually seen that type of thing solved by an insert/update trigger that manipulates the data before committing.

websolete

04/29/2022, 6:37 PM

similar yes, a bit more lightweight than triggers i'd say, again at least in sql server

👌 1

websolete

04/29/2022, 6:42 PM

ok, so i have an idea:

websolete

04/29/2022, 6:45 PM

let's say you add a new column to the table called processAfter, that is a computed column or trigger-driven that will add 10 minutes to the current_timestamp/now(), so newly inserted records will be flagged for processing 'in the future'. then your select statement will get records only where processed = 'N' and processAfter < now(). then you go to that five minute schedule you mentioned, and you'll know any new records' processAfter will be outside your five minute window and therefore should be insulated. your update query would need to accommodate that in its where clause as well

danmurphy

04/29/2022, 6:47 PM

Ok, I’ve reread that a couple of times. I think it will be susceptible to the same issue I explained here… https://cfml.slack.com/archives/C082RPGTZ/p1651249243704979?thread_ts=1651248315.509289&cid=C082RPGTZ

websolete

04/29/2022, 6:49 PM

ok, so you set the processAfter time to add 20 mins, and you still poll on five min schedule. assuming the batch finishes quickly, your update sql will simply be

WHERE processed = 'N' AND processAfter < now()

any newly inserted records that have occurred will be flagged for 20 mins in the future

websolete

04/29/2022, 6:51 PM

ideally you'd set the processAfter value in 15 minute increments, to round up and create 'batches', avoiding the problem if 'new records inserted too close to the cutoff which causes them to be inadvertently included in the update'

websolete

04/29/2022, 6:52 PM

is there a requirement that these records be processed as soon as inserted as possible or could there be a lag time that is acceptable?

websolete

04/29/2022, 6:53 PM

cause hell, you could just do a DATEDIFF() in the two statements and target records where they're at least an hour (or 30 mins, or whatever) old

danmurphy

04/29/2022, 6:54 PM

Ok, that’s interesting (the batch idea). Ideally, within minutes. Right now it polls every 20 seconds. But it also potentially leaves records hanging every 20 seconds, so that’s not exactly ideal either. 😄

danmurphy

04/29/2022, 6:54 PM

Well, that’s what I tried with this statement…

Copy code

WHERE record_created_date <= trunc(sysdate, 'mi') - mod(EXTRACT(minute FROM cast(sysdate as timestamp)), 10) / (24 * 60)
    AND processed = 'N';

danmurphy

04/29/2022, 6:55 PM

But like I said, if the SELECT uses that and starts at 40459pm and then finishes at 40501pm, and then if the UPDATE runs the same where statement, your going to lose a batch. Right?

websolete

04/29/2022, 6:56 PM

if you base it on the insert timestamp rather than a padded timestamp 'in the future' that is beyond your schedule window

websolete

04/29/2022, 6:56 PM

or so i think

websolete

04/29/2022, 6:58 PM

every new row is 'don't do anything with this one for at least five mins', and if your batch completes in a couple mins, you should be safe, since your select WHERE clause checks processed = 'N' as well as 'must be at least five mins old'

danmurphy

04/29/2022, 6:59 PM

Trying to think through that to figure out if you still have the problem at the polling interval…

websolete

04/29/2022, 6:59 PM

it's not clear to me if that truly avoids the problem you're describing about getting close to the threshold of the schedule time, but i think it dos

websolete

04/29/2022, 6:59 PM

does

websolete

04/29/2022, 7:00 PM

what i'm advocating is only updating rows that are at a minimum 'one polling period ago'

websolete

04/29/2022, 7:00 PM

not the previous period, but previous period + 1

danmurphy

04/29/2022, 7:01 PM

Does the

processAfter

become the same thing as

record_create_date

, but just 10 minutes later?

websolete

04/29/2022, 7:01 PM

yes

websolete

04/29/2022, 7:03 PM

still feels like rounding to a quarter or half hour for processAfter to create groups of records would be better, but maybe that won't matter

danmurphy

04/29/2022, 7:05 PM

Just trying to rubber duck this a bit. It might not be clicking yet, but I think my problem still exists. The statement below would give me would give me 1:00pm if run right now (1:04pm). It will give me 1:00pm anytime from 1:00pm until 10959pm.

trunc(sysdate, 'mi') - mod(EXTRACT(minute FROM cast(sysdate as timestamp)), 5) / (24 * 60)

websolete

04/29/2022, 7:05 PM

you're rounding down, i think you need to round up

danmurphy

04/29/2022, 7:06 PM

But anytime a SELECT then UPDATE combo rolls over the time when the period that returns changes, I’m losing records.

danmurphy

04/29/2022, 7:06 PM

Ok, let me think through that (forward vs backward). It feels the same, just delayed. 🤔

websolete

04/29/2022, 7:07 PM

perhaps

danmurphy

04/29/2022, 7:13 PM

Yeah, I think it has the same problem. If one batch of updates is set for 4:15pm in the processAfter field and the next is set at 4:30pm, if the SELECT runs at 41459, it wouldn’t pick up the 4:15pm batch. But if the correlated UPDATE then runs at 41501pm, it would update the 4:15pm batch as processed. Right?

websolete

04/29/2022, 7:24 PM

what about this: let's say you're ok with an hour lag time for processing. when inserting records in to the table, you could have a column called

hourBatch

or batchId or whatever whose value will be the hour part of a 24 hour clock + 1.

Copy code

id	col1	col2	processed 	hourBatch	timestamp
1	92	hello	N		12		11:01:33 am
1	93	blah	N 		12		11:11:43 am
1	94	fred	N 		13		12:32:00 pm
1	95	mary	N 		15		02:01:33 pm

then, your SELECT will target only records from the current hour:

Copy code

SELECT col1, col1
FROM tablename
WHERE processed = 'N' AND hourBatch = hour(now())  -- let's say it's 11:02am right now, so this would be 11

you process those records (which were inserted an hour ago) and then your UPDATE statement targets the same set

Copy code

UPDATE tablename
SET processed = 'Y'
WHERE processed = 'N' AND hourBatch = hour(now()) -- 11

the trick would be finding the smallest time increment (quarter hour, half hour, hour, whatever) you can get away with and not risk any records slipping into the current batch. in this exaggerated scenario of one hour lag, you're fine as long as any one batch doesn't take more than an hour to complete. it's the disconnect between the select and update and the forced simplicity of the sql statements you can use that are going to make any solution that does not include a third agent getting involved to try and bridge the gap really difficult

danmurphy

04/29/2022, 7:37 PM

Ok - thinking aloud based on what you said. There’s something there about the polling interval and the batches, I think. Thinking about it at an hour length is maybe sparking something. Really, the problem is if the processing takes longer than the polling duration. Said another way, the problem of a separate SELECT/UPDATE is only problematic if a new batch is picked up in the UPDATE but wasn’t in the SELECT. If the SELECT runs at 41459, it wouldn’t pick up the 4:15pm batch. But if the correlated UPDATE then runs at 41501pm, it would update the 4:15pm batch as processed. But what if you poll every 5 minutes to update the batches marked at every 15 minutes? So you will have had two tries at updating that batch already by the time the 3rd poll comes around. Lessening your chance that 3rd and final one is problematic.

danmurphy

04/29/2022, 7:37 PM

Is that making sense?

websolete

04/29/2022, 7:46 PM

is the update on a separate schedule too or is it called at the completion of the processing, however long that takes?

danmurphy

04/29/2022, 7:49 PM

It is called on completion.

websolete

04/29/2022, 7:51 PM

does oracle have global temp variables? e.g., in sql server that would be a @@varname rather than a @varname

danmurphy

04/29/2022, 7:53 PM

Not sure about variables. Yes on temp tables.

websolete

04/29/2022, 7:54 PM

if so, could you do something like

SELECT col1, col2, @@lastprocessed = now() FROM tablename WHERE processed = 'N'

do the processing and then

UPDATE tablename SET processed = 'Y' WHERE processed = 'N' AND record_date < @@lastprocessed

websolete

04/29/2022, 7:54 PM

obviously using oracle-correct syntax

danmurphy

04/29/2022, 7:58 PM

How would

@@lastprocessed

get set?

websolete

04/29/2022, 7:59 PM

inline in the sql. the last row that is collected in the select would have the latest timestamp at that time, so any new records inserted would be explicitly after that time

websolete

04/29/2022, 7:59 PM

i mean, it would have to be declared before its use or whatever oracle does

danmurphy

04/29/2022, 8:01 PM

Ah, ok, I missed that in the SELECT. Yeah, I haven’t seen anything like that in Oracle. I don’t think that is available. That would certainly work, it seems.

danmurphy

04/29/2022, 8:01 PM

(if in SQL Server)

websolete

04/29/2022, 8:02 PM

12. ~~Dislike Oracle more than disliked previously~~ DONE.

😂 2

danmurphy

04/29/2022, 8:06 PM

You would do something like

SELECT bleh INTO myVariable

with Oracle and you can use it within the same function/procedure, but that would only be in PL/SQL, not normal ’ol SQL.

websolete

04/29/2022, 8:10 PM

i can't think of another way other than pre-flagging rows to be processed. can that interface/process you posted a screenshot of have three steps?

Myka Forrest

04/29/2022, 8:11 PM

At this point, changing DBs might actually be the easiest way to accomplish this. 😂

😂 1

danmurphy

04/29/2022, 8:12 PM

It cannot have three steps. I just had them send me the error when they tried to have two statements in the first select field.

danmurphy

04/29/2022, 8:16 PM

I think if you batch them in 1 minute increments and poll every 20 seconds, that is the best solution I can think of with the given limitations. Assuming the processing doesn't take more than a minute.

✔️ 1

2 Views

Open in Slack

Previous Next