Cumulative sums can be obtained in SAS’s DATA step by the RETAIN statement. As the codes below, a new variable of the cumulative values will be returned by DATA step’s implicit DO loop.

data one;

do i = 1 to 1e6;

z = ranuni(0);

output;

end;

drop i;

run;

data two;

set one;

retain y 0;

y + z;

run;

The same logic can be realized by an explicit DO loop in SAS/IML. However, SAS/IML has a function CUSUM which is specially made for cumulative sums. To compare the efficiency of the two methods, I tried an experiment: calculating the cumulative sums for a random vector with incremental sizes from 1 million to 20 million by 1 million. The result shows that the CUSUM function is always 100 times faster than a raw DO loop. With the increase of the vector size, the gap gets wider. When the number of elements in the vector reaches 20 million, a Do loop would ask almost 40 seconds, while the CUSUM function requires only 0.35 seconds. Therefore, the vectorized CUSUM function is going to be a good candidate to do such jobs in SAS/IML.

proc iml;

a = t(do(1e6, 2e7, 1e6));

timer = j(nrow(a), 2);

do p = 1 to nrow(a);

n = a[p];

z = t(ranuni(1:n));

t0 = time();

x = cusum(z);

timer[p, 1] = time() - t0;

y = j(n, 1, .);

t0 = time();

do i = 1 to n;

if i = 1 then y[i] = z[i];

else y[i] = y[i-1] + z[i];

end;

timer[p, 2] = time() - t0;

end;

t = a||timer;

create _1 from t;

append from t;

close _1;

quit;

data _2;

set _1;

length test $100.;

label col1 = "Number of observations"

time = "Time by seconds for cumulative summation";

test = "The CUSUM function"; time = col2; output;

test = "DO loop"; time = col3; output;

keep test time col1;

run;

proc sgplot data = _2;

series x = col1 y = time / curvelabel group = test;

yaxis grid;

run;

P.S. Thanks to Rick’s correction, I included his improved DO loop. Comparing with my previous DO loop with a conditional statement, this new DO loop is obviously more efficient. The conclusion is still the same: the CUSUM function outperforms the DO loops in this competition. And with more data involved, the contrast gets even more striking.

proc iml;

a = t(do(1e6, 2e7, 1e6));

timer = j(nrow(a), 3);

do p = 1 to nrow(a);

n = a[p];

z = t(ranuni(1:n));

/* 1 -- The CUSUM function */

t0 = time();

x = cusum(z);

timer[p, 1] = time() - t0;

/* 2 -- Bad DO loop*/

y = j(n, 1, .);

t0 = time();

do i = 1 to n;

if i = 1 then y[i] = z[i];

else y[i] = y[i-1] + z[i];

end;

timer[p, 2] = time() - t0;

/* 3 -- Good DO loop */

w = z;

t0 = time();

do i = 2 to n;

w[i] = w[i-1] + z[i];

end;

timer[p, 3] = time() - t0;

end;

t = a||timer;

create _1 from t;

append from t;

close _1;

quit;

data _2;

set _1;

length test $100.;

label col1 = "Number of observations"

time = "Time by seconds for cumulative summation";

test = "The CUSUM function"; time = col2; output;

test = "bad DO loop"; time = col3; output;

test = "good DO loop"; time = col4; output;

keep test time col1;

run;

proc sgplot data = _2;

series x = col1 y = time / curvelabel group = test;

yaxis grid;

run;