Jan
10
Written by:
StevenFeuersteinTW
Thursday, January 10, 2013 9:30 AM
| We at Toad World are very pleased to offer excerpts from Steven Feuerstein's Oracle PL/SQL Best Practices Second Edition in this blog. Many thanks to O'Reilly Media for giving permission to publish the excerpts. Here's a description of the Oracle PL/SQL Best Practices: "In this compact book, Steven Feuerstein, widely recognized as one of the world's leading experts on the Oracle PL/SQL language, distills his many years of programming, teaching, and writing about PL/SQL into a set of best practices-recommendations for developing successful applications. Covering the latest Oracle release, Oracle Database 11g Release 2, Feuerstein has rewritten this new edition in the style of his best-selling Oracle PL/SQL Programming. The text is organized in a problem/solution format, and chronicles the programming exploits of developers at a mythical company called My Flimsy Excuse, Inc., as they write code, make mistakes, and learn from those mistakes-and each other." |
Problem: Jasper queries the same, unchanging data over and over and over again.
Like most applications, the MFE (My Flimsy Excuse) application contains lots of tables. Most of them contain data that is constantly changing. New customers are added almost every minute. Excuses are created, used by customers, edited, and so on.
Other tables change relatively infrequently, especially in comparison to the frequency of querying. An example of such a table is mfe_employees. Sure, people come and go, and their personal data changes, but the HR application on which it is based queries the table thousands of times a day, while the data changes only once or twice a day.
And still other tables never change at all when users are running the application. One such table is the mfe_ref_codes dataset, a general "types" table that holds reference codes—for example, "O" is for "OPEN", "C" is for "CLOSED", etc.
Jasper wrote a function to retrieve a description for a given code:
FUNCTION ref_code_description (
category_in IN mfe_ref_codes.category%TYPE
,code_in IN mfe_ref_codes.code%TYPE
)
RETURN mfe_ref_codes.description%TYPE
IS
l_description mfe_ref_codes.description%TYPE;
BEGIN
SELECT description INTO l_description
FROM mfe_ref_codes WHERE category=category_in AND code=code_in;
RETURN l_description;
EXCEPTION
WHEN NO_DATA_FOUND THEN RETURN NULL;
END ref_code_description;
And this function is called all over the application. In fact, it is called so often that Jasper receives a call one day from the production DBA:
Jasper, I just did some performance analysis on the application. I discovered that the ref_code_description function is called more than 25,000 times in the span on a few minutes' time. Sure, each call is pretty fast, but the total elapsed time still amounts to 33 seconds.
Jasper is stunned. That's an awful lot of time to be spent retrieving this data. But he doesn't really see what he can do about it. "Doesn't the database automatically optimize such queries?" he asks the DBA.
"Sure," replies the DBA. "That is the optimized performance. The query is preparsed and cached. The data is already sitting in the SGA; in fact, I moved the mfe_ref_codes table into the KEEP pool. I've even pinned the function into the shared pool. There's nothing more I can do. Are you sure you need to call the function so frequently?"
Note: The DBA could also move the mfe_ref_codes table into the SGA KEEP pool – rather than the default, and CACHE the table. All of these are techniques to try to keep frequently accessed data in the SGA longer than normal. These methods are superior since the database engine handles them and any concurrency/consistency issues.
Well, in fact, the application does need to call the function that often. Clean out of ideas, Jasper turns to Delaware for advice. And Delaware declares; "This calls for caching—the programmer way!"
Solution: If the normal SGA cache doesn't the trick, look for other ways to cache.
The whole point of the SGA is to store information in memory to avoid reading and writing to disk – the slowest step in the whole database round-trip. The MFE DBA is correct; the SGA is doing its job and is caching the information requested in this query.
But we can often apply the concept behind the SGA caching to our PL/SQL code and achieve impressive decreases in elapsed time – but in the process we consume more memory.
Let's look at how we can employ the following caching options in the ref_code_description function:
· Per-session cache through PL/SQL package persistent data: Through this cache we can store data in a package-level variable. The data persists for the entire session, but each session has its own copy.
· Oracle Database 11g PL/SQL result cache: A brand-new feature in the Oracle database, this new SGA memory cache avoids the overhead of the caching described in the Problem section and is shared across all sessions in the database instance.
The following sections describe these caches.
Per-session cache in a PL/SQL package. When you declare data inside a package but not within any individual procedure or function in the package, that data persists for your entire session.[1] A package-level collection, for example, retains its values (say, 1000 rows of data) until you delete one or more rows from the collection, close your connection, or recompile the package. This data persistence means that you can use package data as a local cache—local to that single session.
Here are some guidelines to help you decide how best to apply this technique:
· The data must be static for the duration of the session. It’s possible to come up with ways to update the cache, but such efforts are likely to cancel out performance gains – and can be very complicated.
· Always remember that the memory consumed by package data comes out of the PGA (Process Global Area) and not the SGA (System Global Area). If this cache consumes 1 MB of data and you have 1000 sessions connected to the Oracle database, your application will require an additional 1GB of memory in addition to the SGA memory requirements.
· If your application uses connection pooling, you will need to make sure that each connection in a pool contains the same cached data. Otherwise, as your frontend application moves between sessions in the pool, it will work with different sets of data.
· You should declare the data structures inside the package body (make them private to the package use only) so that you can manage their contents and guarantee their integrity.
So how would Jasper apply this to the ref_code_function? First, he moves the function inside a package:
PACKAGE ref_code_cache
IS
FUNCTION ref_code_description (
category_in IN mfe_ref_codes.category%TYPE
,code_in IN mfe_ref_codes.code%TYPE
)
RETURN mfe_ref_codes.description%TYPE;
END ref_code_cache;
Note that this means that he will either need to change all the existing calls to the function so that they include the package name, or need to change the implementation of the original function to be a pass-through, as in:
FUNCTION ref_code_description (
category_in IN mfe_ref_codes.category%TYPE
, code_in IN mfe_ref_codes.code%TYPE
)
RETURN mfe_ref_codes.description%TYPE
IS
BEGIN
RETURN ref_code_cache.ref_code_description (category_in, code_in);
END ref_code_description;
Next, Jasper implements the package body. Here is the code, followed by an explanation (you will also find this code in the ref_code_cache.pkg file on the book’s web site):
1 PACKAGE BODY ref_code_cache
2 IS
3 TYPE descriptions_for_code_aat IS TABLE OF mfe_ref_codes.description%TYPE
4 INDEX BY mfe_ref_codes.code%TYPE;
5
6 TYPE codes_for_category_aat IS TABLE OF descriptions_for_code_aat
7 INDEX BY mfe_ref_codes.category%TYPE;
8
9 g_cache codes_for_category_aat;
10
11 FUNCTION ref_code_description (
12 category_in IN mfe_ref_codes.CATEGORY%TYPE
13 , code_in IN mfe_ref_codes.code%TYPE
14 )
15 RETURN mfe_ref_codes.description%TYPE
16 IS
17 BEGIN
18 RETURN g_cache (category_in) (code_in);
19 EXCEPTION
20 WHEN NO_DATA_FOUND THEN RETURN NULL;
21 END ref_code_description;
22 BEGIN
23 FOR ref_code_rec IN (SELECT * FROM mfe_ref_codes)
24 LOOP
25 g_cache (ref_code_rec.category) (ref_code_rec.code) :=
26 ref_code_rec.description;
27 END LOOP;
28* END ref_code_cache;
Here is the explanation:
|
Line(s)
|
Significance
|
|
3-9
|
Declare two collection types: one to hold the description for a given code, the other to hold all the codes for a given category. Notice that I use the string columns as the collection indexes. I then declare my cache variable as one of these "collections of collections."
Note that string-indexed and multi-level collections are supported only in Oracle9i Database Release 2 and above.
|
|
23-27
|
In the initialization section of my package, I query all the rows in the table and store the description in the cache, using the related category and code values to find the right place in the two-level collection
|
|
18
|
Inside the description lookup function, I now replace the query with a simple (trivial, really, once you get used to the syntax of these multilevel collections) retrieval from the collection.
|
As you can see, the data is retrieved just once (per session) from the database. From that point on, all requests for descriptions are satisfied from the collection alone.
So what kinds of increases in performance are possible with this approach? To find out, I built another caching package that runs against the standard Oracle HR employees table, along with a procedure to compare elapsed time of a number of different approaches to retrieving a row of data from the table. Here are the relevant files:
· emplu.pkg: File that contains three different packages, showcasing different approaches to caching.
· emplu.tst: Procedure that compares the performance and runs the test
Here are the results for the standard query-driven function and a package-based cache (just two of the five approaches you will see compared in the emplu.tst script):
|
Type of retrieval
|
Performance in seconds
|
|
Run query for each request for data
|
5.07
|
|
Cache entire contents of table into the associative array cache on initialization
|
.22
|
As you can see, the improvement in performance can be quite dramatic!
So be on the lookout for ways to apply this technique, but remember that your system must be able to accommodate the required per-session memory.
This caching technique is also very handy when you execute long-running batch processes that must perform multiple passes through large result sets. In this case, you can usually afford to cache large amounts of data, because there is just a single process, as opposed to hundreds or thousands of sessions doing the same thing.
Oracle Database 11g PL/SQL result cache. The package-based cache described in the previous section is a great technique ,but it has two major limitations:
· The memory used by the cache is duplicated in each session.
· The cache only makes sense for completely static datasets.
These limitations greatly reduce the range of circumstances under which the caching can be applied. So in Oracle Database 11g, Oracle implemented a new kind of caching that combines the speed of the package-based PGA cache with the memory optimization and automatic cache management of the SGA.
Here is a rewrite of the original ref_code_description function using the PL/SQL result cache:
1 FUNCTION ref_code_description (
2 category_in IN mfe_ref_codes.CATEGORY%TYPE
3 , code_in IN mfe_ref_codes.code%TYPE
4 )
5 RETURN mfe_ref_codes.description%TYPE
6 RESULT_CACHE RELIES_ON (mfe_ref_codes)
7 IS
8 l_description mfe_ref_codes.description%TYPE;
9 BEGIN
10 SELECT description INTO l_description
11 FROM mfe_ref_codes WHERE CATEGORY = category_in AND code = code_in;
12
13 RETURN l_description;
14 EXCEPTION
15 WHEN NO_DATA_FOUND THEN RETURN NULL;
16 END ref_code_description;
In other words, the only change made to the function was to add a new clause to the header (line 6):
RESULT_CACHE RELIES_ON (mfe_ref_codes)
This clause tells the Oracle database that it should remember (store in a special in-memory result cache) each record retrieved for a specific category-code combination. And when a session executes this function and passes in a combination that was previously stored, the PL/SQL runtime engine will not execute the function body, which includes that query. Instead, it will simply retrieve the record from the cache and return that data immediately. The result is much faster retrieval time. (Note: this caching does differ slightly from the package implementation in that the manual caching pre-loaded all the rows when the session started, while the result cache loads the required data as needed.)
In addition, by specifying "RELIES_ON (mfe_ref_codes)", we inform the Oracle database that if any session commits changes to that table, any data in the result cache drawn from the table must be invalidated. The next call to the ref_code_description function would then have to execute the query and retrieve the data fresh from the table.
Finally, this cache is not per-session, but is available to all sessions connected to the instance.
Well, that is certainly is a whole lot easier to write than the package-based implementation! But how does it compare in performance? I modified my original "emplu" files to include the result cache technique. You can find the code for these files in the following:
· 11g_emplu.pkg: Source code for the various packages containing the different approaches.
· 11g_emplu_compare.sp: Procedure that compares the performance of each approach, using PLVtmr.
· 11g_emplu.tst: Script that runs the above two files, compiling all the necessary code, and then runs the test.
|
Type of retrieval
|
Performance in seconds
|
PGA memory consumed
|
|
Run query for each request for data
|
4.5
|
None (or very little)
|
|
Package-based cache
|
.11
|
~130,000 bytes
|
|
PL/SQL result cache
|
.27
|
None (or very little)
|
What’s the conclusion of all this testing? The per-session PGA-based cache is still the fastest (it is simply more efficient to read from PGA memory, than SGA memory), but the PL/SQL result cache is dramatically faster than repetitive querying. Plus, the PL/SQL function result cache minimizes the amount of memory needed to cache and share this data across all sessions. This low memory profile, plus the automatic purge of cached results whenever changes are committed, makes this feature of Oracle Database 11g a very practical method for optimizing performance in our applications.
[1] You can insert the PRAGMA SERIALLY_REUSABLE statement into your package if you don't want package-level data to be persistent.