A macro calls R in SAS for paneled 3d plotting SAS and R could complement each other. SAS is a versatile ETL (extraction, transformation and loading) machine and its statistical procedures based on generalized linear model are impeccable. R would bring cutting-edge data mining and data visualization technologies at low cost (or no cost). Although the two packages dwell in distinctive ecosystems (for example: different OS/ETL/database/reporting layers) [Ref. 1], mixed programming by combining them together would make an analytics shop invincible. Some SAS programmers like to use SAS/IML to call R’s functions [Ref. 2]. However, it seems that SAS/IML fails to work with the latest versions of R since 2.12 ...
SAS makes spreadsheet for reporting Excel is the last stop in the pipeline of my daily work. Most clients of mine like colorful multi-sheet spreadsheets more than plain CSV files (I guess they all use Windows). I am not a power user of Excel, and honestly I am a little afraid of it: sometimes it got wrong while I accidently dragged the cells to make duplicates. As the result, I tend to have everything ready in SAS and treat Excel as the close box. At the beginning I preferred to use the high-tech ODBC engine to exchange data; eventually I gave up when I found ...
Macros communicate SQLite and SAS without ODBC SQLite is an open-source relationship database management system with full functionality [Ref.1]. The light-weight (300k+ size) and zero configuration features distinguish it from its’ 800-pound counterparts like Oracle or MySQL. Thanks to the rise of mobile devices (plus SQLite-embedded Firefox), SQLite will probably be seen everywhere pretty soon. I just love SQLite, since SQLite helped me learn not only writing SQL codes on Windows and Linux, but also managing complicated databases. Both Python and R have nice support for SQLite. And I always expect to implement SQLite as a frontend or backup for SAS. The shortcut is to apply some ...
A macro calls random forest in SAS SASHELP.CARS, with 428 observations and 15 variables, is a free dataset in SAS for me to exercise any classification methods. I always have the fantasy to predict which country a random car is manufactured by, such as US, Japan or Europe. After trying many methods in SAS, including decision tree, logistic regression, k-NN and SVM, I eventually found that random forest, an ensemble classifier of many decision trees [Ref. 1], can slash the overall misclassification rate to around 25%. The SAS code is powered by R’s package ‘randomForest’. In my tiny experiment, it seems that the ensemble of 100 trees ...
Using Proc IML for credit risk validation Validation step is crucial for a scorecard in credit risk industry. Gunter and Peter mentioned in their fantastic book [Ref. 1] that cumulative accuracy profile (CAP) and receiver operating characteristic (ROC) are two popular methods. Thus, the values of accuracy ratio from CAP (or I refer it as Gini coefficient) and area under curve(AUC) from ROC would be important metrics to evaluate the discriminatory power of the scorecard. And actually they can be derived from each other by their linear relationship.In the latest post of his blog, Rick Wicklin introduced how to implement the trapezoidal rule or calculate trapezoid areas ...