To extract numbers after a particular string - python-3.7

I want to extract a number that follows a specific string ':' and write a code that adds that number. I think.. can split it by space and extract it from it... Well, it doesn't work.
1.(12321 6,80.0:3 210.1:3!!!73 540.2:1++ 96.3:3!<<<<%% 689.4:3 24.5:4)
I want to extract the number 3 3 1 3 3 3 4 followed by ":" from this string and find out that the sum is 17.
import re
var1 = '1.(12321 6,80.0:3 210.1:3!!!73 540.2:1++ 96.3:3!<<<<%% 689.4:3 24.5:4)'
item = var1.split(" ")

sum([int(i) for i in re.findall('(?<=:)\\d+',var1)])
17

Related

kdb - how to create sum a list of dynamic columns using functional select

I want to be able to construct (+; (+; `a; `b); `c) given a list of `a`b`c
Similarly if I have a list of `a`b`c`d, I want to be able to construct another nest and so on and so fourth.
I've been trying to use scan but I cant get it right
q)fsum:(+;;)/
enlist[+;;]/
q)fsum `a`b`c`d
+
(+;(+;`a;`b);`c)
`d
If you only want the raw parse tree output, one way is to form the equivalent string and use parse. This isn't recommended for more complex examples, but in this case it is clear.
{parse "+" sv string x}[`a`b`c`d]
+
`d
(+;`c;(+;`b;`a))
If you are looking to use this in a functional select, we can use +/ instead of adding each column individually, like how you specified in your example
q)parse"+/[(a;b;c;d)]"
(/;+)
(enlist;`a;`b;`c;`d)
q)f:{[t;c] ?[t;();0b;enlist[`res]!enlist (+/;(enlist,c))]};
q)t:([]a:1 2 3;b:4 5 6;c:7 8 9;d:10 11 12)
q)f[t;`a`b`c]
res
---
12
15
18
q)f[t;`a`b]
res
---
5
7
9
q)f[t;`a`b`c]~?[t;();0b;enlist[`res]!enlist (+;(+;`a;`b);`c)]
1b
You can also get the sum by indexing directly to return a list of each column values and sum over these. We use (), to turn any input into a list, otherwise it will sum the values in that single column and return only a single value
q)f:{[t;c] sum t (),c}
q)f[t;`a`b`c]
12 15 18

Finding the three longest substrings in a string using SPARQL on the Wikidata Query Service, and ranking them across strings

I'm trying to identify the longest three substrings from a string using SPARQL and the Wikidata Query Service and then rank
the substrings within a string by length
the strings by the lengths of any of those longest substrings .
I managed to identify the first and second substring from a string and could of course just create similar additional lines to tackle the problem, but this seems ugly and inefficient, so I am wondering if anyone here knows of a better way to get there.
This is a simplified version of the code, though I have left some auxiliary variables in that I am using for tracking progress on the way. You can try it here.
Clarification in response to this comment: if it is necessary to treat this query as a subquery and to feed it with results from another subquery, that's fine with me. To get an idea of the kinds of use I have in mind, see this demo.
SELECT * WHERE {
{
VALUES (?title) {
("What are the longest three words in this string?")
("A really complicated title")
("OneWordTitleInCamelCase")
("Thanks for your help!")
}
}
BIND(STRLEN(REPLACE(?title, " ", "")) AS ?titlelength)
BIND(STRBEFORE(?title, " ") AS ?substring1)
BIND(STRLEN(REPLACE(?substring1, " ", "")) AS ?substring1length)
BIND(STRAFTER(?title, " ") AS ?postfix)
BIND(STRLEN(REPLACE(?postfix, " ", "")) AS ?postfixlength)
BIND(STRBEFORE(?postfix, " ") AS ?substring2)
BIND(STRLEN(REPLACE(?substring2, " ", "")) AS ?substring2length)
}
ORDER BY DESC(?substring1length)
Expected results:
longsubstring substringlength
OneWordTitleInCamelCase 23
complicated 11
longest 7
really 6
string 6
Thanks 6
title 5
three 5
your 4
help 4
Actual results:
title titlelength substring1 substring1length postfix postfixlength substring2 substring2length
Thanks for your help! 18 Thanks 6 for your help! 12 for 3
What are the longest three words in this string? 40 What 4 are the longest three words in this string? 36 are 3
A really complicated title 23 A 1 really complicated title 22 really 6
OneWordTitleInCamelCase 23 0 0 0

Table of integers right aligned

I have a table of numbers that are in an array that have gotten mapped and now I'm trying to present them right aligned for example I have this:
[1,2,3,4,5,6]
[1,2,44,5,66,77]
But want this:
1 2 3 4 5 6
1 2 44 5 66 77
Not sure if its coming through but I don't want the brackets or quotes if the values were a string BUT I want them right aligned vs left aligned. I figured out left aligned and just trying to see if there is an easy way to do this.
var arr= [0,1,2,3]
for i in 0...3 {
let table = arr.map { $0 * i }
print (table)
}
You are simply printing the array and the description method of Array will show the list of values separated by commas with the brackets.
If you want any other output you need to generate it yourself.
Replace your current print with the following:
let line = table.map { String(format: "%4d", $0)}.joined()
print(line)
This maps the array of Int into an array of String and then joins those strings into a single string with no separator between them. Each Int is formatted into a String that will take four spaces and the number will be right-aligned within those four spaces. Adjust as needed.

OpenRefine: Fill down with increasing counter

Is it possible in OpenRefine to fill down blank cells with a counter instead of copying the top non-blank value?
In this example image:
Or here the same example as typed text - image this as a column from top to bottom:
1
1
blank
1
blank
blank
blank
blank
blank
1
I would like to see the column filled as follows (again, imagine top to bottom):
1
1
2
1
2
3
4
5
6
1
Thanks, help is very much appreciated.
It's not really simple. You have to:
1 Replace the blanks with something else, such as an "x"
2 Create a unique record for the entire dataset
3 Use this Jython script:
import itertools
data = row['record']['cells']['YOUR COLUMN NAME']['value']
x = itertools.count(2)
liste = []
for i, el in enumerate(data):
if data[i] == "x":
liste.append(x.next())
else:
x = itertools.count(2)
liste.append(el)
return ",".join([str(x) for x in liste])
4 Use Blank down to clear duplicates
5 Split the first multivalued cell.
Here is a screencast of the operations described above.
If you know a little Python, you can also transform your file using pandas. I do not know what is the most elegant way to do it, but this script should work.
import itertools
import pandas as pd
x = itertools.count(2)
def set_x():
global x
x = itertools.count(2)
set_x()
def increase(value):
if not value:
return next(x)
else:
set_x()
return value
data = pd.read_csv("your_file.csv", na_values=['nan'], keep_default_na=False)
data['column 1'] = data['column 1'].apply(lambda row: increase(row))
print(data)
data.to_csv("final_file.csv")
Here are two simple solutions using GREL.
Use records
You could move the column to the beginning, telling OpenRefine to use the numbers as records. You might need to transform the column to text to really convince OpenRefine to use it as records.
Then either add a new column or transform the existing one with the following expression.
1 + row.index - row.record.fromRowIndex
Use record markers
In case you don't want to use records or don't have a static number, you can create a similar setup. Imagine you have an incomplete counter like in the following table and want to fill it.
Origin
Desired
1
1
2
1
1
2
2
3
1
1
To fill the missing cells first add a new column based on your orignal column using the following expression and name it record_row_index.
if(isNonBlank(value), row.index, "")
After that fill down the original column and the new column record_row_index.
Then create a new column based on the original filled column using the following expression.
value + row.index - cells["record_row_index"].value
Hint: the expression is expecting both columns to be of type number.
If one of them is of type text, you can either transform the column beforehand or use toNumber() in the expression.
The following table shows how these operations are working together.
Origin
Origin filled
row.index
record_row_index
Desired
1
1
0
0
1 + 0 - 0 = 1
1
1
0
1 + 1 - 0 = 2
1
1
2
2
1 + 2 - 2 = 1
2
2
3
3
2 + 3 - 3 = 2
2
4
3
2 + 4 - 3 = 3
1
1
5
5
1 + 5 - 5 = 1

Applescript: return specific index positions from a date string

I have already used the text delimiters and item numbers to extract a date from a file name, so I'm clear about how to use these. Unfortunately the date on these particular files are formatted as "yyyyMMdd" and I need to covert the date into format "yyyy-MM-dd". I have been trying to use the offset function to get particular index positions, and I have found several examples of how you would return the offset of particular digits in the string, example:
set theposition to offset of 10 in theString -- this works
(which could return 5 or 7) but I have not found examples of how to call the digits at a specific index:
set _day to offset 7 of file_date_raw -- error
"Finder got an error: Some parameter is missing for offset." number -1701
How would you do this, or is there a totally better way I'm unaware of?
To "call the digits at a specific index", you use:
text 1 thru 4 of myString
If you know that each string has 8 characters in the yyyymmdd format, then you don't need to use 'offset' or any parsing, just add in the -'s, using text x thru y to dissect the string.
set d to "20011018"
set newString to (text 1 thru 4 of d) & "-" & (text 5 thru 6 of d) & "-" & (text 7 thru 8 of d)