SAS vs. R in data mining The past three years witnessed the rise of R, an open source statistical software. Search R related books in Amazon, and tons of recent titles show up ranging from graphics to scientific computation. Thanks to those graduates sprang out of school that received R training in their statistics major, R starts to appear in some serious business. The basic difference is that license of SAS is sold by SAS Institute, a company with 20k employees, while R is free. In their book ‘SAS and R’, Ken and Nicholas systematically compared the two packages. Even though they carefully avoided the sensitive ...
Remove tabs from SAS code files By default, SAS records the indent by pressing the tab key by tab, which causes many problem to use the code files under a different environment. There are actually two ways to eliminate the tab character in SAS and replace with empty spaces. Regular expressionPress Ctrl + H → Replace window pops out → Choose Regular expression search → At the box of Find text input \t→ At the box of Replace input multiple\s, say fourEditor optionClick Tools → Options → Enhanced Editors… → Choose Insert spaces for tabs → Choose Replace tabs with spaces on file open
Count large chunk of data in Python The line-by-line feature in Python allows it to count hard disk-bound data. The most frequently used data structures in Python are list and dictionary. Many cases the dictionary has advantages since it is a basically a hash table that many realizes O(1) operations.However, for the tasks of counting values, the two options make no much difference and we can choose any of them for convenience. I listed two examples below.Use a dictionary as a counterThere is a question to count the strings in Excel.Count the unique values in one column in EXCEL 2010. The worksheet has 1 million rows and 10 ...
Use recursion and gradient ascent to solve logistic regression in Python In his book Machine Learning in Action, Peter Harrington provides a solution for parameter estimation of logistic regression . I use pandas and ggplot to realize a recursive alternative. Comparing with the iterative method, the recursion costs more space but may bring the improvement of performance.# -*- coding: utf-8 -*-"""Use recursion and gradient ascent to solve logistic regression in Python"""import pandas as pdfrom ggplot import *def sigmoid(inX): return 1.0/(1+exp(-inX))def grad_ascent(dataMatrix, labelMat, cycle): """ A function to use gradient ascent to calculate the coefficients """ if isinstance(cycle, int) == False or cycle < 0: raise ValueError("Must be a valid value for the number of iterations") m, n = ...
SAS vs. Python for data analysis To perform data analysis efficiently, I need a full stack programming language rather than frequently switching from one language to another. That means — this language can hold large quantity of data, manipulate data promptly and easily (e.g. if-then-else; iteration), connect to various data sources such as relational database and Hadoop, apply some statistical models, and report result as graph, table or web. SAS is famous for its capacity to realize such a data cycle, as long as you are willing to pay the annual license fee.SAS’s long-standing competitor, R, still keeps growing. However, in the past years, the Python ...