The efficiency of five SAS methods in multi-dataset merging Introduction: Merging two or multiple datasets is essential for many ‘data people’. Yes, it is a dirty and routine job. Everyone wants to get it done quick and accurate. Actually, SAS has many ways to tackle this job[3]. In two competing papers from SAS Global Conference 2009, Qinfeng Liang[1] described five ways to marge a base table and a lookup table regarding the healthcare industry, while David Franklin[2] pictured eight methods to combine patient and effect datasets in a typical pharmaceutical scenario. Here I would like to extend the discussion further: one base table and two lookup tables. I would ...
Proc Arboretum: a secret weapon in decision tree Introduction: Decision tree, such as CHAID and CART, is a power predicative tool in statistical learning and business intelligence. Starting from SAS®9.1, the ARBORETUM procedure provided facilities to interactively build and deploy decision tress. Even though it is still an experiment procedure, the ARBORETUM procedure has comprehensive features for classification and predication. And the ARBORETUM procedure is also the foundation of decision tree node in SAS Enterprise Miner.Method: A common SAS dataset ’sashelp.cars’ was divided into three parts of equal size: training, validation and scoring. Two methods were applied: the target variable ‘origin’ as nominal level and the target variable ...
Macro embedded function finds AUC As a routine practice to reuse codes, SAS programmers tend to contain procedures in a SAS macro and pass arguments to them as macro variables. The result could be anything by data set and SAS procedure: figure, dataset, SAS list, etc. Thus, macro in SAS is like module or class in other languages. However, if repeated calling of a macro is about to accumulate some key values based on different input variables, the design of such a macro could be tricky. My first thought is to use a nested macro (child macro within parent macro) to capture the invisible macro ...
Visualize decision tree by coding Proc Arboretum Decision tree (tree-based partition or recursive partition) dominates the top positions of recent data mining competitions. It is easy to realize and explain like logistic regression, but usually brings more powers (AUC). Not like SVM, neural network or random forest, decision tree is quick and resource-efficient. It is really a blessing for big data. No wonder regression tree and classification tree are widely used in industry: thanks to Google’s application on its Gmail, I am seldomly harassed by spam. The documents about Proc Arboretum are still scarce. From my experience, Proc Arboretum is pretty robust and powerful. It divides input ...
Multi-study research on Bovine respiratory disease Situation:The purpose of this research was to (1) to explore a recent multi-study approach (Arends, et al. 2008) in combining observational survival data instead of traditional meta-analysis, and (2) to develop multivariate random-effects models with or without covariates to aggregate three studies on Bovine Respiratory Disease (BRD). Models were constructed, assessed and presented by programming in SAS®.Task:The multivariate random-effects models built in this report demonstrated improved efficiency, and generalizability and precision.Action:First the modeling is simple and easy to explain. Second the aggregation of the three studies was accomplished and survival proportion with CIs were updated. Second the estimated survival proportions ...