When Google Analytics meets SAS Thanks to Tricia’s introduction, I recently realized that Google Analytics is such a powerful tool for web analytics or business intelligence. It will fit the special needs if we use SAS to analyze the well-structure users’ data accumulated in Google Analytics. The challenge is that Google Analytics API and SAS hardly meet each other: Google Analytics often serves web/Linux, and SAS dwells in the ecosystems of Windows/UNIX/Mainframe. On a Windows-equipped computer, I tried three methods to pull out this blog’s data from Google Analytics to SAS: they have their own pros and cons. Method 1: CliendLogin + HTTP protocol The ...
SAS vs. R in data mining The past three years witnessed the rise of R, an open source statistical software. Search R related books in Amazon, and tons of recent titles show up ranging from graphics to scientific computation. Thanks to those graduates sprang out of school that received R training in their statistics major, R starts to appear in some serious business. The basic difference is that license of SAS is sold by SAS Institute, a company with 20k employees, while R is free. In their book ‘SAS and R’, Ken and Nicholas systematically compared the two packages. Even though they carefully avoided the sensitive ...
Remove tabs from SAS code files By default, SAS records the indent by pressing the tab key by tab, which causes many problem to use the code files under a different environment. There are actually two ways to eliminate the tab character in SAS and replace with empty spaces. Regular expressionPress Ctrl + H → Replace window pops out → Choose Regular expression search → At the box of Find text input \t→ At the box of Replace input multiple\s, say fourEditor optionClick Tools → Options → Enhanced Editors… → Choose Insert spaces for tabs → Choose Replace tabs with spaces on file open
Count large chunk of data in Python The line-by-line feature in Python allows it to count hard disk-bound data. The most frequently used data structures in Python are list and dictionary. Many cases the dictionary has advantages since it is a basically a hash table that many realizes O(1) operations.However, for the tasks of counting values, the two options make no much difference and we can choose any of them for convenience. I listed two examples below.Use a dictionary as a counterThere is a question to count the strings in Excel.Count the unique values in one column in EXCEL 2010. The worksheet has 1 million rows and 10 ...
Use recursion and gradient ascent to solve logistic regression in Python In his book Machine Learning in Action, Peter Harrington provides a solution for parameter estimation of logistic regression . I use pandas and ggplot to realize a recursive alternative. Comparing with the iterative method, the recursion costs more space but may bring the improvement of performance.# -*- coding: utf-8 -*-"""Use recursion and gradient ascent to solve logistic regression in Python"""import pandas as pdfrom ggplot import *def sigmoid(inX): return 1.0/(1+exp(-inX))def grad_ascent(dataMatrix, labelMat, cycle): """ A function to use gradient ascent to calculate the coefficients """ if isinstance(cycle, int) == False or cycle < 0: raise ValueError("Must be a valid value for the number of iterations") m, n = ...