A SAS data miner without Enterprise Miner SAS Enterprise Miner (EM) is indeed a fancy tool for a SAS programmer who wants to switch to the field of data mining. It is like the point-and-click camera: you drag several nodes onto the diagram, run it and everything is settled. And I was quite impressed by the resourceful reports and figures automatically generated by SAS EM. Then I spent several months to learn it, passed a SAS certified exam, and finished some pet projects on it. You don’t even need to know anything about statistics to be a data miner. Does that sound too cool to be true?Some ...
Labeling variables by a macro in SAS To rename the variables of a dataset in SAS is a daily routine. SAS or the programmer s would give an arbitrary name for any variable at the initial stage of data integration. Those names have to be modified afterward. Wensui [Ref.1] developed a macro to add prefixes to the variables . Vincent et al. [Ref. 2] extended his idea and added some parameters into the macros. However, giving a name to a variable in SAS has many restrictions regarding the length and the format. For better understanding and recognition, labeling variables instead of renaming them would be useful. In ...
Self-matching and its applications Programming is all about data structure and algorithm. For example, value comparison needs to find right data structure and iteration method. To fulfill this purpose, the first thing is to load the variable with a key-value like data structure, followed by key-driven iterations. In SAS, the typical key-value data types are array and hash table. Proc Format powered format can be another option. For data merging or equivalent values searching, those data types are pretty adequate. If exact numerical difference is desired, those SAS data structures may have some obstacles. First of all, array and hash table have to be ...
How to generate HCPCS 2009 long description data two;infile 'https://www.cms.gov/HCPCSReleaseCodeSets/Downloads/INDEX2009.pdf' truncover;input @1 code [email protected] description $200.;if code='Page ' then delete;if code=' ' then delete;run;proc sort data=two; by code;run;proc transpose data=two out=four;by code;var description;run;data five (keep=description code);set four;sp=' ';description=trim(left(col1)) || sp || trim(left(col2)) || sp || trim(left(col3))|| sp || trim(left(col4))|| sp || trim(left(col5)) ;run;
Proc Fcmp(3): brute force for a distribution's pdf Proc Fcmp can produce arbitrary distribution formulas. In this example, suppose that I don’t know too much statistics, what if I want to evaluate the pdf of the absolute value of the subtraction between two independent random variable from the uniform distributions? In this example, probably we can follow such procedures: (1) use Proc Kde to find the kernel distribution and make a guess about what this distribution is; (2) use Proc Fcmp to make the pdf of the distribution; (3) use Proc Model to fit the test dataset and find the parameters; (4) use the simulation to validate the ...