i have table name as res_schedule.In that table i have field as sq_name_new1.
the values inside that field is as follows:
sq_name_new1
Gawade & Sushil & Arvind(Ramchandra Ankush Gawade,Arvind Harishchandra More,Sushil Kailas Shinde)
Arvind / Krishna / Somnath(Somnath Gopinath Londhe,Arvind Harishchandra More,Krishna Shesha Devadiga)
Deshmukh/Arvind BBS(new)(Arvind Harishchandra More,Sanjay Dnyaneshwar Deshmukh)
so After spliting it i want the result as:
Ramchandra Ankush Gawade Arvind Harishchandra More Sushil Kailas Shinde Somnath Gopinath Londhe Arvind Harishchandra More Krishna Shesha Devadiga Arvind Harishchandra More Sanjay Dnyaneshwar Deshmukh
That is it should remove the initial part from the record and split the value by storing it into one by one different row.
so for this i have used the function like :
unnest(string_to_array(substring(res_scheduledjobs.sq_name_new1,'\((().*)\)'),','))
but it is not splitting the value properly.this function did the work for the record:
Deshmukh/Arvind BBS(new)(Arvind Harishchandra More,Sanjay Dnyaneshwar Deshmukh)
as:
sq_name_new1
new)(Arvind Harishchandra More
Sanjay Dnyaneshwar Deshmukh
means it is not neglecting 'new)(' this part.
so what can i do,so that it will also neglect the 'new)(' same as it did for others.
awaiting for your response.
please suggest me some solution.i need help.
Try using
'\(([^()]*)\)[^)]*$'
instead of
'\((().*)\)'
as regexp pattern.
Related
I have a DataFrame with 6 string columns named like 'Spclty1'...'Spclty6' and another 6 named like 'StartDt1'...'StartDt6'. I want to zip them and collapse into a columns that looks like this:
[[Spclty1, StartDt1]...[Spclty6, StartDt6]]
I first tried collapsing just the 'Spclty' columns into a list like this:
DF = DF.withColumn('Spclty', list(DF.select('Spclty1', 'Spclty2', 'Spclty3', 'Spclty4', 'Spclty5', 'Spclty6')))
This worked the first time I executed it, giving me a new column called 'Spclty' containing rows such as ['014', '124', '547', '000', '000', '000'], as expected.
Then, I added a line to my script to do the same thing on a different set of 6 string columns, named 'StartDt1'...'StartDt6':
DF = DF.withColumn('StartDt', list(DF.select('StartDt1', 'StartDt2', 'StartDt3', 'StartDt4', 'StartDt5', 'StartDt6'))))
This caused AssertionError: col should be Column.
After I ran out of things to try, I tried the original operation again (as a sanity check):
DF.withColumn('Spclty', list(DF.select('Spclty1', 'Spclty2', 'Spclty3', 'Spclty4', 'Spclty5', 'Spclty6'))).collect()
and got the assertion error as above.
So, it would be good to understand why it only worked the first time (only), but the main question is: what is the correct way to zip columns into a collection of dict-like elements in Spark?
.withColumn() expects a column object as second parameter and you are supplying a list.
Thanks. After reading a number of SO posts I figured out the syntax for passing a set of columns to the col parameter, using struct to create an output column that holds a list of values:
DF_tmp = DF_tmp.withColumn('specialties', array([
struct(
*(col("Spclty{}".format(i)).alias("spclty_code"),
col("StartDt{}".format(i)).alias("start_date"))
)
for i in range(1, 7)
]
))
So, the col() and *col() constructs are what I was looking for, while the array([struct(...)]) approach lets me combine the 'Spclty' and 'StartDt' entries into a list of dict-like elements.
I have a column with xmls
<Options TE="2017/09/01, 16:45:00.000" ST="2017/09/01, 09:00:00.000" TT="2017/09/01, 16:45:00.000"/>
<Options TE="2017/09/01, 16:45:00.000" ST="2017/09/01, 09:00:00.000" TT="2017/09/01, 16:45:00.000"/>
<Options TE="2017/09/04, 16:45:00.000" ST="2017/09/04, 09:00:00.000" TT="2017/09/04, 16:45:00.000"/>
That I am trying to split in columns
TE, ST, TT
The type of the data is C
Not very familiar with kdb/q I tried to go the very manual way. First removed the start and end tags
x:update `$ssr[;"<Options";""] each tags from x
x:update `$ssr[;"/>";""] each string tags from x
leaving me with rows like
TE="2017/09/01, 16:45:00.000" ST="2017/09/01, 09:00:00.000" TT="2017/09/01, 16:45:00.000"
Then, splitting the string
select `$"\"" vs' string tags from x
gives me a list where the odd entries are my times. I just can't figure out how to take that list and split it into separate columns. Any ideas?
I've taken a slightly different approach but the following should do what you want:
//Clean the tags up for separation
//(get rid of open/close tags, change ", " to "," for ease of parsing and remove quote marks)
x:update tags:{ssr/[x;("<Options ";"/>";", ";"\"");("";"";",";"")]} each tags from x
//Parse the various tags using 0:, put the result into a dictionary,
//exec out to table form and add to x
x:x,'exec (!) ./: ("S= " 0:/: tags) from x
For reference here's the table I used:
x:([] tags:("<Options TE=\"2017/09/01, 16:45:00.000\" ST=\"2017/09/01, 09:00:00.000\" TT=\"2017/09/01, 16:45:00.000\"/>";
"<Options TE=\"2017/09/01, 16:45:00.000\" ST=\"2017/09/01, 09:00:00.000\" TT=\"2017/09/01, 16:45:00.000\"/>";
"<Options TE=\"2017/09/04, 16:45:00.000\" ST=\"2017/09/04, 09:00:00.000\" TT=\"2017/09/04, 16:45:00.000\"/>"))
Crazy thought: Is your XML data that regular looking, so that one can select "columns" via indexing. If so, suppose the data (above) was in 3-element list of strings, is it not possible that you apply some function foo to:
foo xmllist[;ind]
where ind selects the data required. The function foo would do the necessary conversion to the timestamp datatype, either by using (types;delimiter) 0: ... ?
see if you can export XML file into JSON file.
kdb+/q has a json parser which does all the dirty work for you.
.j.k and .j.j.
Reference: http://code.kx.com/q/cookbook/websockets/#json
I want to use fscanf for reading a text file containing 4 rows with an unknown number of columns. The newline is represented by two consecutive spaces.
It was suggested that I pass : as the sizeA parameter but it doesn't work.
How can I read in my data?
update: The file format is
String1 String2 String3
10 20 30
a b c
1 2 3
I have to fill 4 arrays, one for each row.
See if this will work for your application.
fid1=fopen('test.txt');
i=1;
check=0;
while check~=1
str=fscanf(fid1,'%s',1);
if strcmp(str,'')~=1;
string(i)={str};
end
i=i+1;
check=strcmp(str,'');
end
fclose(fid1);
X=reshape(string,[],4);
ar1=X(:,1)
ar2=X(:,2)
ar3=X(:,3)
ar4=X(:,4)
Once you have 'ar1','ar2','ar3','ar4' you can parse them however you want.
I have found a solution, i don't know if it is the only one but it works fine:
A=fscanf(fid,'%[^\n] *\n')
B=sscanf(A,'%c ')
Z=fscanf(fid,'%[^\n] *\n')
C=sscanf(Z,'%d')
....
You could use
rawText = getl(fid);
lines = regexp(thisLine,' ','split);
tokens = {};
for ix = 1:numel(lines)
tokens{end+1} = regexp(lines{ix},' ','split'};
end
This will give you a cell array of strings having the row and column shape or your original data.
To read an arbitrary line of text then break it up according the the formating information you have available. My example uses a single space character.
This uses regular expressions to define the separator. Regular expressions powerful but too complex to describe here. See the MATLAB help for regexp and regular expressions.
Let say i have the full name like: Wan Ahmad Wan Dollah Karmat.
And i want to display like: Wan Ahmad W.D.K
I tried this code:
preg_replace('/(.)[^\s]+\s?/', '${1}.', strtoupper($_GET['fullname']), 2)
But the output is: W.A.Wan Dollah Karmat
I want the first two words and shorter the rest words. please help.
Problem solved, thanks to Casimir et Hippolyte. The final code is:
preg_replace('~^(?:\s*\S+){1,2}(*SKIP)(*FAIL)|(\S)\S+~', '${1}.', strtoupper($_GET['fullname']))
its the matter of patterns.
You can use the backtracking control verbs (*SKIP) and (*FAIL) to avoid the two first words.
$pattern = '~^(?:\s*\S+){1,2}(*SKIP)(*FAIL)|(\S)\S+~';
$result = preg_replace_callback($pattern,
function ($m) { return strtoupper($m[1]) . '.'; },
$_GET['fullname'] );
In short:
(*SKIP) forces a substring that matches the preceding subpattern to not be retry if the pattern fails later.
(*FAIL) forces the pattern to fail.
This may be a very simple task for many but I could not find anything appropriate for me.
I have a file name: filenm_A006.2011.269.10.47.G25_2010
I want to separate all its parts (separated by . and _) to use them separately. How can I do it with simple matlab commands?
Kind Regards,
Mushi
I recommend regexp:
fname = 'filenm_A006.2011.269.10.47.G25_2010';
parts = regexp(fname, '[^_.]+', 'match');
parts =
'filenm' 'A006' '2011' '269' '10' '47' 'G25' '2010'
You can now refer to parts{1} through parts{8} for the pieces. Explanation: the regexp pattern [^_.] means all characters not equal to _ or ., and the + means you want groups of at least 1 character. Then 'match' asks the regexp function to return a cell array of the strings of all the matches of that pattern. There are other regexp modes; for example, the indices of each piece of the file.
Use the command
strsplit.
cellArrayOfParts = strsplit(fileName,{'.' '_'});
You can use strsplit to split it:
strsplit('filenm_A006.2011.269.10.47.G25_2010',{'_','.'})
ans =
'filenm' 'A006' '2011' '269' '10' '47' 'G25' '2010'
Another option is to use regexp, like Peter suggested.