Spark practice (2): query text using SQL In a class of a few children, use SQL to find those who are male and weight over 100.class.txt (including Name Sex Age Height Weight)Alfred M 14 69.0 112.5 Alice F 13 56.5 84.0 Barbara F 13 65.3 98.0 Carol F 14 62.8 102.5 Henry M 14 63.5 102.5 James M 12 57.3 83.0 Jane F 12 59.8 84.5 Janet F 15 62.5 112.5 Jeffrey M 13 62.5 84.0 John M 12 59.0 99.5 Joyce F 11 51.3 50.5 Judy F 14 64.3 90.0 Louise F 12 56.3 77.0 Mary F 15 66.5 112.0 Philip M 16 72.0 150.0 Robert ...
Spark practice (1): find the stranger that shares the most friends with me Given the friend pairs in the sample text below (each line contains two people who are friends), find the stranger that shares the most friends with me.sample.txtme AliceHenry meHenry Aliceme JaneAlice JohnJane JohnJudy Aliceme MaryMary JoyceJoyce HenryJudy meJudy JaneJohn Carol Carol meMary HenryLouise RonaldRonald ThomasWilliam ThomasThoughtsThe scenario is commonly seen for a social network user. Spark has three methods to query such data:MapReduceGraphXSpark SQLIf I start with the simplest MapReduce approach, then I would like to use two hash tables in Python. First I scan all friend pairs and store the friends for each person in a hash table. Second ...
Use a vector to print Pascal's triangle in SAS Yesterday Rick Wicklin showed a cool SAS/IML function to use a matrix and print a Pascal’s triangle. I come up with an alternative solution by using a vector in SAS/IML.MethodTwo functions are used, including a main function PascalRule and a helper function _PascalRule. The helper function recycles the vector every time and fills the updated values; the main function increases the length of the vector from 1 to n.ProGet the nth row directly, for example, return the 10th row by PascalRule(10); no need to use a matrix or matrix related operators; use less memory to fit a possibly bigger n.ConMore ...
Support vector machine in SAS by R I just recently discovered endless fun to synchronize SAS and R to do something meaningful. Yep, I am a SAS programmer: during the day time, I use SAS for my work; at the evening, I use R for entertainment. It is always exciting to hook up them together. How about a SAS/R module, like SAS/STAT or SAS/BASE, in the future? Some SAS programmers or SAS ‘developers’ already utilized coding to communicate SAS and R [Ref. 1 and 2] (thanks to Rick Wicklin’s mentioning). Since R can write dataset in SAS code (the ‘foreign’ package) and SAS can use call R ...
Minimize complexity by Spark There is always a trade-off between time complexity and space complexity for computer programs. Deceasing the time cost will increase space cost, and vice versa, The ideal solution to parallelize the program to multiple cores if there is a multiple-core computer, or even scale it out to multiple machines across a cluster, which would eventually reduce both time complexity and space complexity.Spark is currently the hottest platform for cluster computing on top of Hadoop, and its Python interface provides map, reduce and many other methods, which allow a mapRecdue job in a straightforward way, and therefore easily migrate an algorithm ...