loop through values in richPipe : scalding

loop through values in richPipe : scalding - scala

I am trying to solve a issue where I have to loop through all values in pipe.To simulate my problem I am explain through a sample problem
Input file :
number
1
2
3
4
Output should be
number sumOfSmaller
1 0
2 1
3 3
4 6
So for each value I have to read all of the records in pipe and apply function sumOfSmaller.
I have no idea on how to loop through values in scalding pipe.
Using map I can apply function of each element of list but I want to avoid this approch

You can get the contents of the whole pipe with
val wholePipe = pipe.groupAll.toList, and then join it with itself, and apply your function: pipe.groupAll.join(wholePipe).values.map { case (x, list) => sumOfSmaller(x, list) }
This is not a very good idea though, especially, if your pipe is of any decent size. Knowing more details about what it is you are really trying to do should almost certainly allow for a better approach.

Related

How to create a list of length x with identical elements?

I would like to create a list in q/kdb of variable length x which contains the same element e repeated. For example:
x:4;
e:`this;
expected_result:`this`this`this`this

As mentioned by all, # is the best solution in the singular case. If you wanted to duplicate multiple items into a larger single list, then where can achieve this nicely
q)`this`that where 4 2
`this`this`this`this`that`that

Take is what you're looking for:
https://code.kx.com/v2/ref/take/
q)x:4
q)e:`this
q)x#e
`this`this`this`this

You can do this using # https://code.kx.com/v2/ref/take/
q)n:4
q)vals:`this
q)n#vals
`this`this`this`this

Use '#'(take) function:
q) x:4
q) e:`this
q) x#e

combing colls together - MaxMSP

I work on a project with MaxMSP where I have multiple colls. I want to combine all the lists in there in one single coll. Is there a way to do that directly without unpacking and repacking everything?
In order to be more clear, let’s say I have two colls, with the first one being:
0, 2
1, 4
2, 4
….
99, 9
while the second one is:
100, 8
101, 4
…
199, 7
I would like the final coll to be one list from 0-199.
Please keep in mind I don’t want to unpack everything ( with uzi for instance) cause my lists are very long and I find that it is problematic for the cpu to use colls with such long lists.That’s why I broke my huge list into sublists/subcolls in the first place
Hope that’s clear enough.

If the two colls do not have overlapping indices, then you can just dump one into the other, like this:
----------begin_max5_patcher----------
524.3ocyU0tSiCCD72IOEQV7ybnZmFJ28pfPUNI6AlKwIxeTZEh28ydsCDNB
hzdGbTolTOd20yXOd6CoIjp98flj8irqxRRdHMIAg7.IwwIjN995VtFCizAZ
M+FfjGly.6MHdisaXDTZ6DxVvfYvhfCbS8sB4MaUPsIrhWxNeUdFsf5esFex
bPYW+bc5slwBQinhFbA6qt6aaFWwPXlCCPnxDxSEQaNzhnDhG3wzT+i7+R4p
AS1YziUvTV44W3+r1ozxUnrKNdYW9gKaIbuagdkpGTv.HalU1z26bl8cTpkk
GufK9eI35911LMT2ephtnbs+0l2ybu90hl81hNex241.hHd1usga3QgGUteB
qDoYQdDYLpqv3dJR2L+BNLQodjc7VajJzrqivgs5YSkMaprkjZwroVLI03Oc
0HtKv2AMac6etChsbiQIprlPKto6.PWEfa0zX5+i8L+TnzlS7dBEaLPC8GNN
OC8qkm4MLMKx0Pm21PWjugNuwg9A6bv8URqP9m+mJdX6weocR2aU0imPwyO+
cpHiZ.sQH4FQubRLtt+YOaItUzz.3zqFyRn4UsANtZVa8RYyKWo4YSwmFane
oXSwBXC6SiMaV.anmHaBlZ9vvNPoikDIhqa3c8J+vM43PgLLDqHQA6Diwisp
Hbkqimwc8xpBMc1e4EjPp8MfRZEw6UtU9wzeCz5RFED
-----------end_max5_patcher-----------

mzed's answer works, as stated if the lists have no overlapping indices which they shouldn't based on the design you specify.
If you are treating your 'huge list' as multiple lists, or vice versa, that might help come up with an answer. One question some may ask is "why are you merging it again?"
you consider your program to have one large list
that large list is really an interface that handles how you interact with several sub-lists for efficiency sake
the interface to your data persistence (the lists) for storing and retrieval then acts like one large list but works with several under-the-hood
an insertion and retrieval mechanism for handling the multiple lists as one list should exist for your interface then
save and reload the sublists individually as well
If you wrap this into a poly~, the voice acts as the sublist, so when I say voice I basically mean sublist:
You could use a universal send/receive in and out of a poly~ abstraction that contains your sublist's unique coll, the voice# from poly~ can append uniquely to your sublist filename that is reading/saving to for that voice's [coll].
With that set up, you could specify the number of sublists (voices) and master list length you want in the poly~ arguments like:
[poly~ sublist_manager.maxpat 10 1000] // 10 sublists emulating a 1000-length list
The math for index lookup is:
//main variables for master list creation/usage
master_list_length = 1000
sublist_count = 10
sublist_length = master_list_length/sublist_count;
//variables created when inserting/looking up an index
sublist_number = (desired_index/sublist_count); //integer divide to get the base sublist you'll be performing the lookup in
sublist_index = (desired_index%sublist_length); //actual index within your sublist to access
If the above ^ is closer to what you're looking for I can work on a patch for that. cheers

Parsing options that take more than one value with scopt in scala

I am using scopt to parse command line arguments in scala. I want it to be able to parse options with more than one value. For instance, the range option, if specified, should take exactly two values.
--range 25 45
Coming, from python background, I am basically looking for a way to do the following with scopt instead of python's argparse:
parser.add_argument("--range", default=None, nargs=2, type=float,
metavar=('start', 'end'),
help=(" Foo bar start and stop "))
I dont think minOccurs and maxOccurs solves my problem exactly, nor the key:value example in its help.

Looking at the source code, this is not possible. The Read type class used has a member tuplesToRead, but it doesn't seem to be working when you force it to 2 instead of 1. You will have to make a feature request, I guess, or work around this by using --min 25 --max 45, or --range '25 45' with a custom Read instance that splits this string into two parts. As #roterl noted, this is not a standard way of parsing.

It should be ok if only your values are delimited with something else than a space...
--range 25-45
... although you need to split them manually. Parse it with something like:
opt[String]('r', "range").action { (x, c) =>
val rx = "([0-9]+)\\-([0-9]+)".r
val rx(from, to) = x
c.copy(from = from.toInt, to = to.toInt)
}
// ...
println(s" Got range ${parsedArgs.from}..${parsedArgs.to}")

dataFrame keying using pandas groupby method

I new to pandas and trying to learn how to work with it. Im having a problem when trying to use an example I saw in one of wes videos and notebooks on my data. I have a csv file that looks like this:
filePath,vp,score
E:\Audio\7168965711_5601_4.wav,Cust_9709495726,-2
E:\Audio\7168965711_5601_4.wav,Cust_9708568031,-80
E:\Audio\7168965711_5601_4.wav,Cust_9702445777,-2
E:\Audio\7168965711_5601_4.wav,Cust_7023544759,-35
E:\Audio\7168965711_5601_4.wav,Cust_9702229339,-77
E:\Audio\7168965711_5601_4.wav,Cust_9513243289,25
E:\Audio\7168965711_5601_4.wav,Cust_2102513187,18
E:\Audio\7168965711_5601_4.wav,Cust_6625625104,-56
E:\Audio\7168965711_5601_4.wav,Cust_6073165338,-40
E:\Audio\7168965711_5601_4.wav,Cust_5105831247,-30
E:\Audio\7168965711_5601_4.wav,Cust_9513082770,-55
E:\Audio\7168965711_5601_4.wav,Cust_5753907026,-79
E:\Audio\7168965711_5601_4.wav,Cust_7403410322,11
E:\Audio\7168965711_5601_4.wav,Cust_4062144116,-70
I loading it to a data frame and the group it by "filePath" and "vp", the code is:
res = df.groupby(['filePath','vp']).size()
res.index
and the output is:
[E:\Audio\7168965711_5601_4.wav Cust_2102513187,
Cust_4062144116, Cust_5105831247,
Cust_5753907026, Cust_6073165338,
Cust_6625625104, Cust_7023544759,
Cust_7403410322, Cust_9513082770,
Cust_9513243289, Cust_9702229339,
Cust_9702445777, Cust_9708568031,
Cust_9709495726]
Now Im trying to approach the index like a dict, as i saw in examples, but when im doing
res['Cust_4062144116']
I get an error:
KeyError: 'Cust_4062144116'
I do succeed to get a result when im putting the filepath, but as i understand and saw in previouse examples i should be able to use the vp keys as well, isnt is so?
Sorry if its a trivial one, i just cant understand why it is working in one example but not in the other.

Rutger you are not correct. It is possible to "partial" index a multiIndex series. I simply did it the wrong way.
The index first level is the file name (e.g. E:\Audio\7168965711_5601_4.wav above) and the second level is vp. Meaning, for each file name i have multiple vps.
Now, this is correct:
res['E:\Audio\7168965711_5601_4.wav]
and will return:
Cust_2102513187 2
Cust_4062144116 8
....
but trying to index by the inner index (the Cust_ indexes) will fail.

You groupby two columns and therefore get a MultiIndex in return. This means you also have to slice using those to columns, not with a single index value.
Your .size() on the groupby object converts it into a Series. If you force it in a DataFrame you can use the .xs method to slice a single level:
res = pd.DataFrame(df.groupby(['filePath','vp']).size())
res.xs('Cust_4062144116', level=1)
That works. If you want to keep it as a series, boolean indexing can help, something like:
res[res.index.get_level_values(1) == 'Cust_4062144116']
The last option is a bit less readable, but sometimes also more flexibile, you could test for multiple values at once for example:
res[res.index.get_level_values(1).isin(['Cust_4062144116', 'Cust_6073165338'])]

what's this matlab code doing?

I'm fairly new to matlab and wrote this following code:
datadir=('/.../prod/balanceSheet/DB/');
seriesnames = {'a.m','b.m','c.m','d.m','f.m','g.m','h.m','i.m'};
for proj=1:5;
database='';
switch proj
case 1
database=strcat(datadir,'scenario1');
case 2
database=strcat(datadir,'scenario2');
case 3
database=strcat(datadir,'scenario3');
case 4
database=strcat(datadir,'scenario4');
case 5
database=strcat(datadir,'scenario5');
end;
database;
gooddatanames={};
a=length(seriesnames);
for i=1:a
gooddatanames={gooddatanames,database,seriesnames(i)};
end
end
this is my first time using a switch. basically what I'm trying to do is to take series from databases (1,2,3,...) such that all series are subject to all scenarios. I'm missing the function that would pull the data but is the above code doing the intended?

Change:
gooddatanames={gooddatanames,database,seriesnames(i)};
to
gooddatanames={gooddatanames{:},database,seriesnames{i}};
and move gooddatanames = {} outside of the loop, and then it does what I think you expect, which is to produce a 1x80 cell array with alternating folders and file names.
More likely, make a few more changes, like this:
datadir=('/.../prod/balanceSheet/DB/');
seriesnames = {'a.m','b.m','c.m','d.m','f.m','g.m','h.m','i.m'};
gooddatanames={};
for proj=1:5;
database='';
switch proj
case 1
database=fullfile(datadir,'scenario1');
case 2
database=fullfile(datadir,'scenario2');
case 3
database=fullfile(datadir,'scenario3');
case 4
database=fullfile(datadir,'scenario4');
case 5
database=fullfile(datadir,'scenario5');
end;
for i=1:length(seriesnames);
gooddatanames{end+1} = fullfile(database,seriesnames{i});
end
end
which results in a 1x40 array of full paths to the individual files.

I agree with what Pursuit has written, though I would like to add that your for/switch structure is a little silly. If you effectively have to enumerate all of them, as you do with the 'switch' as you've implemented it, there's no reason not to take the for/switch loops out entirely and just leave yourself with the commands. One possible alternative would be to replace the entire unnecessary "switch" with:
database = fullfile(datadir, ['scenario', num2str(proj)]);

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

loop through values in richPipe : scalding - scala

Related

How to create a list of length x with identical elements?

combing colls together - MaxMSP

Parsing options that take more than one value with scopt in scala

dataFrame keying using pandas groupby method

what's this matlab code doing?

Categories

Resources