SAS Administration: Process Data Faster in RDBMS by Buffering the Data in Memory
By default, accessing third party relational databases can be very slow if not configured properly. I recently started using PostgreSQL 9.1, an open source database, to store high volumes of data for my SAS Global Forum 2013 paper. In this example I am using an ODBC connection to connect to a database I defined in Postgres. Native SAS/ACCESS engines to databases such as Oracle or MS SQL Server will have different options and abilities. At first it was taking forever to load up data because SAS was inserting 1 row at a time into the database table. After adding a simple option my data processing was off to the races!
The SAS INSERTBUFF and READBUFF options will improve ODBC and OLE DB libraries dramatically. By default these are set to 1 and 250 rows respectively for ODBC connections. Setting these buffer sizes tells SAS how many rows to buffer in memory before processing.
Using the LIBNAME Option
These options can be added to the LIBNAME statement to set the buffering sizes for all processing done on tables within the library. Ideally if you have the SAS Metadata server running, your SAS Administrator should set these options through the Data Library manager in SAS Management Console.
If you are using Base SAS or writing code in SAS Enterprise Guide, you can also manually write the LIBNAME step like this:
LIBNAME pgsgf13 ODBC DBCOMMIT=10000 READBUFF=30000 INSERTBUFF=30000 DATASRC=sasgf13 SCHEMA=public ;
Using a SAS Data Set Option
You can also explicitly define these buffer options for an individual data step in your code if you want. This may come in handy depending on the type, size and width of data you plan on inserting.
LIBNAME pgsgf13 ODBC DATASRC=sasgf13 SCHEMA=public ; data pgsgf13.test(dbcommit=10000 insertbuff=10000 readbuff=10000); *** DATA STEP STUFF ****; run;
Careful consideration must be taken into account when setting these options. The optimal setting depends on your SAS compute server resources and network capacity. The number of rows to buffer should be much less for very wide tables with lots of character data because of the physical byte sizes of character columns and the overall width of the table. In my project I am using very skinny fact tables with numeric data, which requires only 8 bytes per column of numeric data. Assuming I have 10 numeric columns, that’s only about 80 bytes of data per row. For my data step which creates a huge volume of data, I could theoretically set the INSERTBUFF much larger, but SAS does have a hard limit of approximately 32,000 rows it can buffer in memory .