Friday puzzle :slightly_smiling_face:. This isn’t ...
# sql
d
Friday puzzle 🙂. This isn’t CFML related, just a bit of a SQL puzzle, so I thought I’d throw it out here for some ideas. There is an Oracle database table that an application writes to, setting new records with a
processed
field set to
N
and
record_created_date
set to the current date/time. SAP PI (middleware software) polls the table for unprocessed records, creates SAP IDOCs from the data, then updates the
processed
flag to
Y
. Here is the kicker…the way the SAP PI JDBC sender works, there are 2 separate SQL statements that make this happen that have no knowledge of each other. The 1st is a SELECT to get the records to create the IDOCS. Then a totally separate UPDATE statement to update the records that have been processed. The polling interval can be set to whatever timeframe. So you might just say… 1.
SELECT * FROM table WHERE processed = 'N'
2. Process the records and create IDOCs 3.
UPDATE table SET processed = 'Y' WHERE processed = 'N'
The problem is, if processing the records takes 3 seconds (or 10ms, or 30s, etc), the records that are inserted in-between step 1 and 3 don’t get IDOCs created but are updated in the table as processed. Is there any way to make sure all records are processed?
I’ve thought about having them change it to only process records that are older than the previous 5 minute bucket of time. So if the table is polled at 4:01pm, it will process anything from 3:55pm or earlier. If it polls at 4:02pm, 40345pm, 40459pm, etc., it will still only do 3:55pm or earlier. The problem still remains that if it polls at 40459pm it will do 3:55pm or earlier, but then if the update statement runs at 40501pm, it would update anything from 4:00pm or earlier. So the records from 355pm 400pm were lost.
m
When/after you select the records
where processed = 'N'
, can you set
processed
to
'P'
(pending), and then only update
where processed = 'P'
?
Or record the IDs of what was selected and only update those.
d
There is literally only 2 fields in the JDBC Sender in SAP XI - one for a SELECT statement (which is where the data is available to be processed) and one for an UPDATE statement. They are two totally separate fields that you plop SQL into and no data can be passed between.
(my mind was a bit blown when I realized this from the SAP PI developer as we tracked down this bug)
m
Is there any way to tie a SAP IDOC to a particular record? Can you read those to update the table?
d
The Update doesn’t have any knowledge of the data that was processed. It is just a separate update statement.
m
and there's no way to make it aware of those updates? 🤔
d
I pulled this screenshot from the internet, but this is what they have to work with…
w
can the select statement be any valid sql, like a cte?
d
Yes. Including calling a function since that is allowed in a SELECT statement in Oracle. But functions in oracle can’t manipulate data.
m
Are you able to modify these tables or add a helper table?
w
yeah, i was going to say, what about a third table to stage records to be processed, then the update sql will do an UPDATE / INNER JOIN to that third table from the main one
d
Yes, to some extent, we could do that. I added the
record_created_date
field to try and help, for example.
I’m intrigued, keep talking. How would records get in the staging table? The initial SELECT is just that…only a SELECT.
m
Can you write a query to go with the select to update a helper table? If you could add a record_created_date, can you add an id column?
d
I could probably add an ID column. How would it be used? Tell me more about what you mean with “write a query to go with the select to update a helper table”.
m
use the helper table as a holding cell. (I'm a Sql Server user, so this may require some translation to oracle)
select ID
(from origTable)
into helperTable where processed = 'N'
. Then
Select * from helperTable
for processing IDOCs.
Update origTable SET processed = 'Y' WHERE id in (select id from helperTable).
Truncate helperTable
so the next batch starts from scratch.
w
Copy code
INSERT INTO helpertable ( col1, col2 ) 
SELECT col1, col2 
FROM tablename 
WHERE processed = 'N'; 

SELECT col1, col2
FROM tablename 
WHERE processed = 'N';

-------------------------

UPDATE t 
SET t.processed = 'Y' 
FROM tablename t 
	INNER JOIN helpertable h ON t.col1 = h.col1
WHERE t.processed = 'N';
you'd have to update the update sql to match how oracle does update/inner joins, of course. you'd also want to have something that cleans up helpertable although if the above works you could forego that (but have lots of leftover useless rows)
👍 1
we're saying the same thing pretty much
👍 1
unsaid: oracle. blech.
😂 2
d
But how does the insert into the helpertable happen? It’s kinda like wanting an “in process” flag (e.g. P instead of just Y or N). Where does that event take place?
m
Where does
select * from table where processed = 'n'
take place?
w
it assumes you can have multiple queries in the select item
m
Why can't it go just before that?
right, assuming "any valid sql"
d
Ok, I’ll ask about multiple statements in that little box. Not sure. If that is possible, then all sorts of stuff opens up. I’m not sure if it is.
m
or can you call some sort of oracle-equivalent to a stored procedure?
w
perhaps you can wrap all this up in a stored proc and invoke that
jinx
😂 1
that would give you the most flexibility i'd say
d
I don’t know if it has the “I saw a certain character and will continue to run the next statement” ability.
m
@websolete, it only encourages me that I might be on the right track. 😁
d
Yeah, I tried that route first. Evidently it has to be valid SQL, not PL/SQL.
w
which is why capturing the complex select/flag/update in a stored proc and 'simply' calling the stored proc makes the most sense. the update sql should be easy
👍 1
CALL myproc; ?
or EXEC or whatever oracle overcomplicates for that
d
Right. In Oracle I think it is EXECUTE proc;, or something like that. Maybe I need to have them show me that so I can see it not working with my own eyes, ha.
And also, the procedure doesn’t return data, right? Which is needed for that “Query SQL Statement” field, because it needs the data to process the IDOCs.
w
it can, at least in normal rdbms's
doesn't have to, but can
d
Maybe I just never do that and it can in Oracle too? 🤔
w
anything's possible. i hear they have auto-incrementing numeric identifiers in oracle now, so there's hope
🤯 1
d
Oh, the triggers and sequences I’ve created in my day. Whew.
m
Regardless...this is an interesting challenge!
d
Yeah, I don’t think you can return data with an
EXECUTE sproc;
statement, right? Sorry if I’m too Oracle-ized and this sounds ridiculous. But Returning data = SELECT statements = SQL. Stored procedures are in the PL/SQL realm. Right?
furiously googling…
w
fwiw, you can return a resultset from a stored proc in sql server in a 'flat' exec proc statement, but only a single resultset, not multiple like if you actually invoke the stored proc
which is what you want anyway, but not sure if oracle behaves the same
d
Single result set, meaning could be the results of one query?
w
yes
d
Hmm. Looks like I’m on 11g for this table and resultsets hadn’t been dreamt up yet. Still googling though…
In other rdbms’s, can functions manipulate data?
w
i don't believe so, if you mean altering data in tables. you can always massage what's returned from the function, but i don't think you can affect data changes at the storage level
d
Yeah, that’s what I mean. Ok, that’s consistent with Oracle too.
w
does oracle have computed columns?
d
I might need a translation or example of what that means.
googling…
information so important it could only be captured in a pdf, apparently
😂 1
my point in asking is that's KIND of like a 'function that affects data'
but it's tied to the column definition
d
Ah, gotcha. So yeah, it looks like it exists, but I’ve usually seen that type of thing solved by an insert/update trigger that manipulates the data before committing.
w
similar yes, a bit more lightweight than triggers i'd say, again at least in sql server
👌 1
ok, so i have an idea:
let's say you add a new column to the table called processAfter, that is a computed column or trigger-driven that will add 10 minutes to the current_timestamp/now(), so newly inserted records will be flagged for processing 'in the future'. then your select statement will get records only where processed = 'N' and processAfter < now(). then you go to that five minute schedule you mentioned, and you'll know any new records' processAfter will be outside your five minute window and therefore should be insulated. your update query would need to accommodate that in its where clause as well
d
Ok, I’ve reread that a couple of times. I think it will be susceptible to the same issue I explained here… https://cfml.slack.com/archives/C082RPGTZ/p1651249243704979?thread_ts=1651248315.509289&amp;cid=C082RPGTZ
w
ok, so you set the processAfter time to add 20 mins, and you still poll on five min schedule. assuming the batch finishes quickly, your update sql will simply be
WHERE processed = 'N' AND processAfter < now()
any newly inserted records that have occurred will be flagged for 20 mins in the future
ideally you'd set the processAfter value in 15 minute increments, to round up and create 'batches', avoiding the problem if 'new records inserted too close to the cutoff which causes them to be inadvertently included in the update'
is there a requirement that these records be processed as soon as inserted as possible or could there be a lag time that is acceptable?
cause hell, you could just do a DATEDIFF() in the two statements and target records where they're at least an hour (or 30 mins, or whatever) old
d
Ok, that’s interesting (the batch idea). Ideally, within minutes. Right now it polls every 20 seconds. But it also potentially leaves records hanging every 20 seconds, so that’s not exactly ideal either. 😄
Well, that’s what I tried with this statement…
Copy code
WHERE record_created_date <= trunc(sysdate, 'mi') - mod(EXTRACT(minute FROM cast(sysdate as timestamp)), 10) / (24 * 60)
    AND processed = 'N';
But like I said, if the SELECT uses that and starts at 40459pm and then finishes at 40501pm, and then if the UPDATE runs the same where statement, your going to lose a batch. Right?
w
if you base it on the insert timestamp rather than a padded timestamp 'in the future' that is beyond your schedule window
or so i think
every new row is 'don't do anything with this one for at least five mins', and if your batch completes in a couple mins, you should be safe, since your select WHERE clause checks processed = 'N' as well as 'must be at least five mins old'
d
Trying to think through that to figure out if you still have the problem at the polling interval…
w
it's not clear to me if that truly avoids the problem you're describing about getting close to the threshold of the schedule time, but i think it dos
does
what i'm advocating is only updating rows that are at a minimum 'one polling period ago'
not the previous period, but previous period + 1
d
Does the
processAfter
become the same thing as
record_create_date
, but just 10 minutes later?
w
yes
still feels like rounding to a quarter or half hour for processAfter to create groups of records would be better, but maybe that won't matter
d
Just trying to rubber duck this a bit. It might not be clicking yet, but I think my problem still exists. The statement below would give me would give me 1:00pm if run right now (1:04pm). It will give me 1:00pm anytime from 1:00pm until 10959pm.
trunc(sysdate, 'mi') - mod(EXTRACT(minute FROM cast(sysdate as timestamp)), 5) / (24 * 60)
w
you're rounding down, i think you need to round up
d
But anytime a SELECT then UPDATE combo rolls over the time when the period that returns changes, I’m losing records.
Ok, let me think through that (forward vs backward). It feels the same, just delayed. 🤔
w
perhaps
d
Yeah, I think it has the same problem. If one batch of updates is set for 4:15pm in the processAfter field and the next is set at 4:30pm, if the SELECT runs at 41459, it wouldn’t pick up the 4:15pm batch. But if the correlated UPDATE then runs at 41501pm, it would update the 4:15pm batch as processed. Right?
w
what about this: let's say you're ok with an hour lag time for processing. when inserting records in to the table, you could have a column called
hourBatch
or batchId or whatever whose value will be the hour part of a 24 hour clock + 1.
Copy code
id	col1	col2	processed 	hourBatch	timestamp
1	92	hello	N		12		11:01:33 am
1	93	blah	N 		12		11:11:43 am
1	94	fred	N 		13		12:32:00 pm
1	95	mary	N 		15		02:01:33 pm
then, your SELECT will target only records from the current hour:
Copy code
SELECT col1, col1
FROM tablename
WHERE processed = 'N' AND hourBatch = hour(now())  -- let's say it's 11:02am right now, so this would be 11
you process those records (which were inserted an hour ago) and then your UPDATE statement targets the same set
Copy code
UPDATE tablename
SET processed = 'Y'
WHERE processed = 'N' AND hourBatch = hour(now()) -- 11
the trick would be finding the smallest time increment (quarter hour, half hour, hour, whatever) you can get away with and not risk any records slipping into the current batch. in this exaggerated scenario of one hour lag, you're fine as long as any one batch doesn't take more than an hour to complete. it's the disconnect between the select and update and the forced simplicity of the sql statements you can use that are going to make any solution that does not include a third agent getting involved to try and bridge the gap really difficult
d
Ok - thinking aloud based on what you said. There’s something there about the polling interval and the batches, I think. Thinking about it at an hour length is maybe sparking something. Really, the problem is if the processing takes longer than the polling duration. Said another way, the problem of a separate SELECT/UPDATE is only problematic if a new batch is picked up in the UPDATE but wasn’t in the SELECT. If the SELECT runs at 41459, it wouldn’t pick up the 4:15pm batch. But if the correlated UPDATE then runs at 41501pm, it would update the 4:15pm batch as processed. But what if you poll every 5 minutes to update the batches marked at every 15 minutes? So you will have had two tries at updating that batch already by the time the 3rd poll comes around. Lessening your chance that 3rd and final one is problematic.
Is that making sense?
w
is the update on a separate schedule too or is it called at the completion of the processing, however long that takes?
d
It is called on completion.
w
does oracle have global temp variables? e.g., in sql server that would be a @@varname rather than a @varname
d
Not sure about variables. Yes on temp tables.
w
if so, could you do something like
SELECT col1, col2, @@lastprocessed = now() FROM tablename WHERE processed = 'N'
do the processing and then
UPDATE tablename SET processed = 'Y' WHERE processed = 'N' AND record_date < @@lastprocessed
?
obviously using oracle-correct syntax
d
How would
@@lastprocessed
get set?
w
inline in the sql. the last row that is collected in the select would have the latest timestamp at that time, so any new records inserted would be explicitly after that time
i mean, it would have to be declared before its use or whatever oracle does
d
Ah, ok, I missed that in the SELECT. Yeah, I haven’t seen anything like that in Oracle. I don’t think that is available. That would certainly work, it seems.
(if in SQL Server)
w
12. Dislike Oracle more than disliked previously DONE.
😂 2
d
You would do something like
SELECT bleh INTO myVariable
with Oracle and you can use it within the same function/procedure, but that would only be in PL/SQL, not normal ’ol SQL.
w
i can't think of another way other than pre-flagging rows to be processed. can that interface/process you posted a screenshot of have three steps?
m
At this point, changing DBs might actually be the easiest way to accomplish this. 😂
😂 1
d
It cannot have three steps. I just had them send me the error when they tried to have two statements in the first select field.
I think if you batch them in 1 minute increments and poll every 20 seconds, that is the best solution I can think of with the given limitations. Assuming the processing doesn't take more than a minute.
✔️ 1