How can I find a match in 2 separate python lists? [duplicate] - python-3.7

I tried using cmp(list1, list2) to learn it's no longer supported in Python 3.3. I've tried many other complex approaches, but none have worked.
I have two lists of which both contain just words and I want it to check to see how many words feature in both and return the number for how many.

You can find the length of the set intersection using & like this:
len(set(list1) & set(list2))
Example:
>>>len(set(['cat','dog','pup']) & set(['rat','cat','wolf']))
1
>>>set(['cat','dog','pup']) & set(['rat','cat','wolf'])
{'cat'}
Alternatively, if you don't want to use sets for some reason, you can always use collections.Counter, which supports most multiset operations:
>>> from collections import Counter
>>> print(list((Counter(['cat','dog','wolf']) & Counter(['pig','fish','cat'])).elements()))
['cat']

If you just want the count of how many words are common
common = sum(1 for i in list1 if i in list2)
If you actually want to get a list of the shared words
common_words = set(list1).intersection(list2)

Related

Eliminate numbers from a mix of numbers and letters from a list of strings

I'm pretty new to programming and I need help.
list = ["defew21", "dldapl43, "vdvc3232"]
I need to eliminate all the numbers from the above list to return a list like this
list1 = ["defew", "dldapl", "vdvc"]
I have tried several ideas but of no avail.

Check for list of substrings inside string column in PySpark

For checking if a single string is contained in rows of one column. (for example, "abc" is contained in "abcdef"), the following code is useful:
df_filtered = df.filter(df.columnName.contains('abc'))
The result would be for example "_wordabc","thisabce","2abc1".
How can I check for multiple strings (for example ['ab1','cd2','ef3']) at the same time?
I'm ideally searching for something like this:
df_filtered = df.filter(df.columnName.contains(['word1','word2','word3']))
The result would be for example "x_ab1","_cd2_","abef3".
Please, post scalable solutions (no for loops, for example) because the aim is to check a big list around 1000 elements.
All you need is isin
df_filtered = df.filter(df['columnName'].isin('word1','word2','word3')
Edit
You need rlike function to achieve your result
words="(aaa|bbb|ccc)"
df.filter(df['columnName'].rlike(words))

How should list be represented in ASP (Answer Set Programming)?

A processor 'a' takes care the header 'a' of a message 'a_b_c_d' and passes the payload 'b_c_d' to the another processor in the next level as following:
msg(a, b_c_d).
pro(a;b;c;d).
msg(b, c_d) :- pro(X), msg(X, b_c_d).
msg(c, d) :- pro(X), msg(X, c_d).
msg(d) :- pro(X), msg(X, d).
#hide. #show msg/2. #show msg/1.
How should I represent list 'a_b_c_d' in ASP, and change the above to general cases?
No, official way, but I think most people don't realize you can construct cons-cells in ASP.
For instance, here's how you can get items for all lists of length 5 from elements 1..6
element(1..6).
listLen(empty, 0).
listLen(cons(E, L), K + 1) :- element(E); listLen(L, K); K < 5.
is5List(L) :- listLen(L, 5).
#show is5List/1.
resulting in
is5List(cons(1,cons(1,cons(1,cons(1,cons(1,empty))))))
is5List(cons(1,cons(1,cons(1,cons(1,cons(2,empty))))))
is5List(cons(1,cons(1,cons(1,cons(1,cons(3,empty))))))
...
There is no 'official' way to handle lists in ASP as far as I know. But, DLV has built-in list handling similar to Prolog's.
The way you implement a list, the list itself cannot be used as a term and thus what if you want to bind between variables in the list and other elements of a rule? Perhaps you would like something such as p(t, [q(X), q(Y)]) :- X != Y.
You can try implementing a list as (a, b, c) and an append predicate but the problem is ASP requires grounding before computing answer-sets. Consequently a list defined in this way whilst more like lists in Prolog would mean the ground-program contains all ground-instances of all possible lists (explosion) regardless of whether they are used or not.
I therefore come back to my first point, try using DLV instead of Clingo if possible (for this task, at least).
By using index, I do have a way to walk a list, however, I do not know this is the official way to handle a list in ASP. Could someone has more experience in ASP give us a hand? Thanks.
index(3,a). index(2,b). index(1,c). index(0,d).
pro(a;b;c;d). msg(3,a).
msg(I-1,N) :- pro(P), msg(I,P), index(I,P), I>0, index(I-1,N).
#hide. #show msg/2.
You can use s(ASP) or s(CASP) ASP systems. Both of them support list operations like prolog. You might need to define the list built-in in ASP .

combing colls together - MaxMSP

I work on a project with MaxMSP where I have multiple colls. I want to combine all the lists in there in one single coll. Is there a way to do that directly without unpacking and repacking everything?
In order to be more clear, let’s say I have two colls, with the first one being:
0, 2
1, 4
2, 4
….
99, 9
while the second one is:
100, 8
101, 4
…
199, 7
I would like the final coll to be one list from 0-199.
Please keep in mind I don’t want to unpack everything ( with uzi for instance) cause my lists are very long and I find that it is problematic for the cpu to use colls with such long lists.That’s why I broke my huge list into sublists/subcolls in the first place
Hope that’s clear enough.
If the two colls do not have overlapping indices, then you can just dump one into the other, like this:
----------begin_max5_patcher----------
524.3ocyU0tSiCCD72IOEQV7ybnZmFJ28pfPUNI6AlKwIxeTZEh28ydsCDNB
hzdGbTolTOd20yXOd6CoIjp98flj8irqxRRdHMIAg7.IwwIjN995VtFCizAZ
M+FfjGly.6MHdisaXDTZ6DxVvfYvhfCbS8sB4MaUPsIrhWxNeUdFsf5esFex
bPYW+bc5slwBQinhFbA6qt6aaFWwPXlCCPnxDxSEQaNzhnDhG3wzT+i7+R4p
AS1YziUvTV44W3+r1ozxUnrKNdYW9gKaIbuagdkpGTv.HalU1z26bl8cTpkk
GufK9eI35911LMT2ephtnbs+0l2ybu90hl81hNex241.hHd1usga3QgGUteB
qDoYQdDYLpqv3dJR2L+BNLQodjc7VajJzrqivgs5YSkMaprkjZwroVLI03Oc
0HtKv2AMac6etChsbiQIprlPKto6.PWEfa0zX5+i8L+TnzlS7dBEaLPC8GNN
OC8qkm4MLMKx0Pm21PWjugNuwg9A6bv8URqP9m+mJdX6weocR2aU0imPwyO+
cpHiZ.sQH4FQubRLtt+YOaItUzz.3zqFyRn4UsANtZVa8RYyKWo4YSwmFane
oXSwBXC6SiMaV.anmHaBlZ9vvNPoikDIhqa3c8J+vM43PgLLDqHQA6Diwisp
Hbkqimwc8xpBMc1e4EjPp8MfRZEw6UtU9wzeCz5RFED
-----------end_max5_patcher-----------
mzed's answer works, as stated if the lists have no overlapping indices which they shouldn't based on the design you specify.
If you are treating your 'huge list' as multiple lists, or vice versa, that might help come up with an answer. One question some may ask is "why are you merging it again?"
you consider your program to have one large list
that large list is really an interface that handles how you interact with several sub-lists for efficiency sake
the interface to your data persistence (the lists) for storing and retrieval then acts like one large list but works with several under-the-hood
an insertion and retrieval mechanism for handling the multiple lists as one list should exist for your interface then
save and reload the sublists individually as well
If you wrap this into a poly~, the voice acts as the sublist, so when I say voice I basically mean sublist:
You could use a universal send/receive in and out of a poly~ abstraction that contains your sublist's unique coll, the voice# from poly~ can append uniquely to your sublist filename that is reading/saving to for that voice's [coll].
With that set up, you could specify the number of sublists (voices) and master list length you want in the poly~ arguments like:
[poly~ sublist_manager.maxpat 10 1000] // 10 sublists emulating a 1000-length list
The math for index lookup is:
//main variables for master list creation/usage
master_list_length = 1000
sublist_count = 10
sublist_length = master_list_length/sublist_count;
//variables created when inserting/looking up an index
sublist_number = (desired_index/sublist_count); //integer divide to get the base sublist you'll be performing the lookup in
sublist_index = (desired_index%sublist_length); //actual index within your sublist to access
If the above ^ is closer to what you're looking for I can work on a patch for that. cheers

dataFrame keying using pandas groupby method

I new to pandas and trying to learn how to work with it. Im having a problem when trying to use an example I saw in one of wes videos and notebooks on my data. I have a csv file that looks like this:
filePath,vp,score
E:\Audio\7168965711_5601_4.wav,Cust_9709495726,-2
E:\Audio\7168965711_5601_4.wav,Cust_9708568031,-80
E:\Audio\7168965711_5601_4.wav,Cust_9702445777,-2
E:\Audio\7168965711_5601_4.wav,Cust_7023544759,-35
E:\Audio\7168965711_5601_4.wav,Cust_9702229339,-77
E:\Audio\7168965711_5601_4.wav,Cust_9513243289,25
E:\Audio\7168965711_5601_4.wav,Cust_2102513187,18
E:\Audio\7168965711_5601_4.wav,Cust_6625625104,-56
E:\Audio\7168965711_5601_4.wav,Cust_6073165338,-40
E:\Audio\7168965711_5601_4.wav,Cust_5105831247,-30
E:\Audio\7168965711_5601_4.wav,Cust_9513082770,-55
E:\Audio\7168965711_5601_4.wav,Cust_5753907026,-79
E:\Audio\7168965711_5601_4.wav,Cust_7403410322,11
E:\Audio\7168965711_5601_4.wav,Cust_4062144116,-70
I loading it to a data frame and the group it by "filePath" and "vp", the code is:
res = df.groupby(['filePath','vp']).size()
res.index
and the output is:
[E:\Audio\7168965711_5601_4.wav Cust_2102513187,
Cust_4062144116, Cust_5105831247,
Cust_5753907026, Cust_6073165338,
Cust_6625625104, Cust_7023544759,
Cust_7403410322, Cust_9513082770,
Cust_9513243289, Cust_9702229339,
Cust_9702445777, Cust_9708568031,
Cust_9709495726]
Now Im trying to approach the index like a dict, as i saw in examples, but when im doing
res['Cust_4062144116']
I get an error:
KeyError: 'Cust_4062144116'
I do succeed to get a result when im putting the filepath, but as i understand and saw in previouse examples i should be able to use the vp keys as well, isnt is so?
Sorry if its a trivial one, i just cant understand why it is working in one example but not in the other.
Rutger you are not correct. It is possible to "partial" index a multiIndex series. I simply did it the wrong way.
The index first level is the file name (e.g. E:\Audio\7168965711_5601_4.wav above) and the second level is vp. Meaning, for each file name i have multiple vps.
Now, this is correct:
res['E:\Audio\7168965711_5601_4.wav]
and will return:
Cust_2102513187 2
Cust_4062144116 8
....
but trying to index by the inner index (the Cust_ indexes) will fail.
You groupby two columns and therefore get a MultiIndex in return. This means you also have to slice using those to columns, not with a single index value.
Your .size() on the groupby object converts it into a Series. If you force it in a DataFrame you can use the .xs method to slice a single level:
res = pd.DataFrame(df.groupby(['filePath','vp']).size())
res.xs('Cust_4062144116', level=1)
That works. If you want to keep it as a series, boolean indexing can help, something like:
res[res.index.get_level_values(1) == 'Cust_4062144116']
The last option is a bit less readable, but sometimes also more flexibile, you could test for multiple values at once for example:
res[res.index.get_level_values(1).isin(['Cust_4062144116', 'Cust_6073165338'])]