Nov
28
Written by:
StevenFeuersteinTW
Monday, November 28, 2011 2:12 PM
A function is considered to be deterministic if it returns the same result value whenever it is called with the same values for its IN and IN OUT arguments. Another way to think about deterministic programs is that they have no side-effects. Everything the program depends on and/or affects is reflected in the parameter list.
The following function (a simple encapsulation on top of SUBSTR) is a deterministic function:
FUNCTION betwnstr (
string_in IN VARCHAR2, start_in IN PLS_INTEGER, end_in IN PLS_INTEGER)
RETURN VARCHAR2 IS
BEGIN
RETURN (SUBSTR (string_in, start_in, end_in - start_in + 1));
END betwnstr;
As long as I pass in, for example, "abcdef" for the string, 3 for the start, and 5 for the end, betwnstr will always return "cde."
Well, if that is the case, why not have the database save the results associated with a set of arguments? Then when I next call the function with those arguments, it can return the result without executing the function!
What a great idea, right? Of course, if I can think of it, then so can Oracle.
Suppose I alter the above function by adding the DETERMINISTIC keyword:
FUNCTION betwnStr (
string_in IN VARCHAR2, start_in IN PLS_INTEGER, end_in IN PLS_INTEGER)
RETURN VARCHAR2 DETERMINISTIC
By taking this step, I can now use this function in a function-based index. Even better, Oracle will, under very specific circumstances, cache the IN argument values and the RETURN values, and then avoid executing the function body if the same inputs are provided.
Let's take a look at an example of the caching nature of deterministic functions. Suppose I define the following encapsulation on top of SUBSTR (return the string between start and end locations) as a deterministic function (note: this code comes from the deterministic.sql file in the demo.zip download on PL/SQL Obsession):
FUNCTION betwnstr (
string_in IN VARCHAR2, start_in IN PLS_INTEGER, end_in IN PLS_INTEGER)
RETURN VARCHAR2 DETERMINISTIC
IS
BEGIN
RETURN (SUBSTR (string_in, start_in, end_in - start_in + 1));
END betwnstr;
I can then call this function inside a query (it does not modify any database tables, which would otherwise preclude using it in this way), such as:
SELECT betwnstr (last_name, 1, 5) first_five
FROM employees
And when betwnstr is called in this way, the database will build a cache of inputs and the return value. So if I call the function with the same inputs, the database will return the value without calling the function. To demonstrate this optimization, I will change betwnstr to the following:
FUNCTION betwnstr (
string_in IN VARCHAR2, start_in IN PLS_INTEGER, end_in IN PLS_INTEGER)
RETURN VARCHAR2 DETERMINISTIC
IS
BEGIN
DBMS_LOCK.sleep (.01);
RETURN (SUBSTR (string_in, start_in, end_in - start_in + 1));
END betwnstr;
In other words, I will use the sleep subprogram of DBMS_LOCK to pause betwnstr 1/100th of a second.
If I call this function in a PL/SQL block of code (not from within a query), the database will not cache the function values, and so when I query the 107 rows of the employees table, it takes more than one second:
DECLARE
l_string employees.last_name%TYPE;
BEGIN
sf_timer.start_timer;
FOR rec IN (SELECT * FROM employees)
LOOP
l_string := betwnstr ('FEUERSTEIN', 1, 5);
END LOOP;
sf_timer.show_elapsed_time ('Deterministic function in block');
END;
/
Deterministic function in block Elapsed: 1.67 seconds.
If I now execute the same logic, but move the call to betwnstr inside the query, the performance is quite different:
BEGIN
sf_timer.start_timer;
FOR rec IN (SELECT betwnstr ('FEUERSTEIN', 1, 5) FROM employees)
LOOP
NULL;
END LOOP;
sf_timer.show_elapsed_time ('Deterministic function in query');
END;
/
Deterministic function in query Elapsed: .05 seconds.
As you can see, caching with a deterministic function is a very effective path to optimization.
But you are also wondering how much this can help you. How often do you even call a PL/SQL function inside an SQL statement? And that's the reason that very few PL/SQL developers are aware of and have taken advantage of this feature.
If and when you upgrade to Oracle Database 11g, however, you will find the DETERMINISTIC keyword to be much more helpful. The reason? The PL/SQL optimizer is now able to perform a similar optimization as described above, but in native PL/SQL code itself. In other words, I can call a function like betwnstr outside of a SQL statement, in my block, and for the duration of that server call avoid execution of the function if it is called with the same input value used with a previous call to that same function.
I discovered this fact when I met earlier in November with members of the PL/SQL development team, including the person responsible for the optimizer. You can verify this behavior yourself with the following script:
/* First a package to help calculate elapsed CPU time. */
CREATE OR REPLACE PACKAGE plch_timer
IS
PROCEDURE start_timer;
PROCEDURE show_elapsed_time (
message_in IN VARCHAR2 := NULL);
END plch_timer;
/
CREATE OR REPLACE PACKAGE BODY plch_timer
IS
last_timing NUMBER := NULL;
PROCEDURE start_timer
IS
BEGIN
last_timing := DBMS_UTILITY.get_cpu_time;
END;
PROCEDURE show_elapsed_time (
message_in IN VARCHAR2 := NULL)
IS
BEGIN
DBMS_OUTPUT.put_line (
'"'
|| message_in
|| '" completed in: '
|| (DBMS_UTILITY.get_cpu_time - last_timing) / 100
|| ' seconds');
END;
END plch_timer;
/
CREATE OR REPLACE FUNCTION plch_getdata_determ (n NUMBER)
RETURN NUMBER
DETERMINISTIC
IS
BEGIN
RETURN n;
END;
/
CREATE OR REPLACE FUNCTION plch_getdata (n NUMBER)
RETURN NUMBER
IS
BEGIN
RETURN n;
END;
/
/* Deterministic helps with execution of function in PL/SQL
block - on 11g. */
DECLARE
n NUMBER;
BEGIN
plch_timer.start_timer;
FOR indx IN 1 .. 10000000
LOOP
n := plch_getdata (1);
END LOOP;
plch_timer.show_elapsed_time (
'In PL/SQL Non-DETERMINISTIC');
--
plch_timer.start_timer;
FOR indx IN 1 .. 10000000
LOOP
n := plch_getdata_determ (1);
END LOOP;
plch_timer.show_elapsed_time ('In PL/SQL DETERMINISTIC');
END;
/
You should see noticeably faster performance for the deterministic version.
After I learned about this new feature, I also discovered that I wasn't the first person (outside of Oracle HQ) to make the discovery.
Rob van Wijk, a PL/SQL Challenge player and experienced Oracle technologist, wrote about this same topic on his blog.
Other players are also researching this feature in more depth, experimentally determining when the optimizer may not apply the caching optimization. I encourage you to check out the PL/SQL quiz on this topic for the latest details.
3 comment(s) so far...
Re: The Power of Determinism in Oracle Database 11g
Hi Steven,
This may be a repeat comment. I apologise if it is. I tried to submit a comment yesterday and it was not clear from the feedback whether it was successful or not. As the comment has not appeared, I assume it was not successful.
Whilst it's true that the wider performance benefits of deterministic functions have improved in later versions, I still believe that, currently at least, there is still only one genuine use case for them and that is function-based indexes.
I'm hesitant to link to an unfinished and inconclusive series of articles but I started to look deeper into deterministic functions about 1 year ago (oh dear - still unfinished). I'm reluctant to link to these incomplete findings but I think they add to the discussion: orastory.wordpress.com/2010/12/16/determining-determinism/
One of the reasons these articles remain unfinished is that I knew my summary from the start and my findings were only reinforcing the message - deterministic is best suited to FBIs only.
My findings seemed to show that deterministic functions are subject to a wider variety of factors compared to subquery caching. For example, fetch size (i.e. the arraysize setting in sql*plus) seemed to affect the performance benefits in a way that subquery caching (i.e. SELECT function(..) FROM DUAL) was not.
It may well be that deterministic functions evolve to be the method of choice when it comes to reducing the overhead of functions called from sql (obviously having no function is the ideal from a pure performance perspective) - they would be slightly more intuitive and obvious a solution than subquery caching, in my opinon.
However, for me, the fact that the benefits of subquery caching extend beyond just the latest release and their benefits seem to be more predictable than deterministic functions means that subquery caching remains my preferred method (if, really if, the logic of the function can't be brought inline).
Finally, as you know of course, if these functions themselves contain SQL statements then these can completely bypass the read consistency of the main query.
Cheers, Dominic
By dombrooks on
Wednesday, November 30, 2011 9:42 AM
|
Re: The Power of Determinism in Oracle Database 11g
You know what - I take back what I said about FBIs being the only use case. I'm talking nonsense.
I sort of went off on a tangent due to the cases that I see of functions called from SQL and subsequently lost sight of what you were saying.
If you know your function is DETERMINISTIC, you should say so, right? Just make sure that it really is DETERMINISTIC.
That way you get benefit if you call it from PLSQL in later versions and you might also get some benefit if you really need to call it from SQL.
By dombrooks on
Thursday, December 01, 2011 10:03 AM
|
Re: The Power of Determinism in Oracle Database 11g
Over at the PL/SQL Challenge (www.plsqlchallenge.com), we've been exploring this issue further and it looks like the in-PL/SQL optimization of a deterministic function will only happen under a very narrow set of circumstances. Check out the details at plsql-challenge.blogspot.com/2011/11/nuances-of-deterministic-functions-in.html.
But I come to same conclusion as you: if you know it is deterministic, why not say so? At a VERY minimum, you are telling everyone who comes along later that they should "indeterminize" it.
By StevenFeuersteinTW on
Thursday, December 01, 2011 10:05 AM
|