Labeling variables by a macro in SAS To rename the variables of a dataset in SAS is a daily routine. SAS or the programmer s would give an arbitrary name for any variable at the initial stage of data integration. Those names have to be modified afterward. Wensui [Ref.1] developed a macro to add prefixes to the variables . Vincent et al. [Ref. 2] extended his idea and added some parameters into the macros. However, giving a name to a variable in SAS has many restrictions regarding the length and the format. For better understanding and recognition, labeling variables instead of renaming them would be useful. In ...
Self-matching and its applications Programming is all about data structure and algorithm. For example, value comparison needs to find right data structure and iteration method. To fulfill this purpose, the first thing is to load the variable with a key-value like data structure, followed by key-driven iterations. In SAS, the typical key-value data types are array and hash table. Proc Format powered format can be another option. For data merging or equivalent values searching, those data types are pretty adequate. If exact numerical difference is desired, those SAS data structures may have some obstacles. First of all, array and hash table have to be ...
How to generate HCPCS 2009 long description data two;infile 'https://www.cms.gov/HCPCSReleaseCodeSets/Downloads/INDEX2009.pdf' truncover;input @1 code [email protected] description $200.;if code='Page ' then delete;if code=' ' then delete;run;proc sort data=two; by code;run;proc transpose data=two out=four;by code;var description;run;data five (keep=description code);set four;sp=' ';description=trim(left(col1)) || sp || trim(left(col2)) || sp || trim(left(col3))|| sp || trim(left(col4))|| sp || trim(left(col5)) ;run;
Proc Fcmp(3): brute force for a distribution's pdf Proc Fcmp can produce arbitrary distribution formulas. In this example, suppose that I don’t know too much statistics, what if I want to evaluate the pdf of the absolute value of the subtraction between two independent random variable from the uniform distributions? In this example, probably we can follow such procedures: (1) use Proc Kde to find the kernel distribution and make a guess about what this distribution is; (2) use Proc Fcmp to make the pdf of the distribution; (3) use Proc Model to fit the test dataset and find the parameters; (4) use the simulation to validate the ...
Music social network on DNA microarray The incoming 2011 KDD Cup data mining competition [1] by Yahoo! Lab posts an interesting challenge to predict the users' ratings for individual songs out of this company’s huge music database. Unlike previous KDD Cups projects filled by tons of variables that make dimension reduction a serious concern, Yahoo! Lab provides few variables: artist/genre/album. No demographic or geographic information is disclosed. It is interesting to forecast the behavior of a web user by limited web records. Digging valuable clues out for potential following direct marketing is also rewarding. Especially while the competition datasets contain up to 1 million users, 600 ...