How to delete substring from string? - sed

I have a string:
09/May/2012:05:14:58 +0100
How to delete substring 58 +0100 from string ?
sed 's/\:[0-9][0-9] \+0100//'
Not work

It does work:
echo "09/May/2012:05:14:58 +0100"|sed 's/\:[0-9][0-9] \+0100//'
Output:
09/May/2012:05:14

If they're always in that format, you can just do:
s/:[^:]*$//
This basically gets rid of everything beyond (and including) the final : character (colon, followed by any number of characters that aren't a colon, to the end of the line).

Related

how to remove spaces and underscores from a string in kdb?

How do I remove spaces and underscores from a string?
Input String:
s:"Monday comes_after Sunday";
Expected Output:
"MondaycomesafterSunday"
Want to look at the special characters section of https://code.kx.com/v2/kb/regex/
q)s:"Monday comes_after Sunday";
q)ssr[s;"[ _]";""]
"MondaycomesafterSunday"
alternatively could use except which is generally going to be faster if only removing characters
q)s except " _"
"MondaycomesafterSunday"
q)\ts:100000 s except " _"
90 816
q)\ts:100000 ssr[s;"[ _]";""]
691 1072

Replace emdash with double dash

I want to replace ― back into --
I tried with the utf8 encodings but that doesn't work
string = "blablabla -- blablabla ―"
I want to replace the long dash (if there is one) with double hyphens. I tried it the simple way but that didn't work:
string= string.replace ("―", "--")
I also tried to encode it with utf8 and use the codes of the special characters
stringutf8= string.encode("utf-8")
emdash= u"\u2014"
hyphen= u"\u002D"
if emdash in stringutf8:
stringutf8.replace(emdash, 2*hyphen)
Any suggestions?
I am working with text files in which sometimes apparently the two hyphens are replaced automatically with a long dash...
thanks a lot!
You are dealing with strings here. Strings are lists of characters. Replace the character, leave the encoding out of the equation.
string = 'blablabla -- blablabla \u2014'
emdash = '\u2014'
hyphen = '\u002D'
string2 = string.replace(emdash, 2*hyphen)

Extracting values from a single file

I have a file with multiple lines; but a specific line contains tons of information, with several repeated expressions. I'm trying to extract some specific values. I first tried some commands with sed, for instance, but with no success. So, I was wondering if you could give me some insights.
So, here you have one fraction of the unique line of the given document I mentioned:
[...]6[&length_range={0.19
[... a lot of more information here in between ...]
0.01},habitat.set.prob={0.01,0.03,0.56,0.01,0.01,0.34,0.01,0.01,0.01},DLOOP.rate_median=0.04131395026396427,length=
[...]
10[&length_range={0.19
[... a lot of more information here in between ...]
0.01},habitat.set.prob={0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61},DLOOP.rate_median=0.04131395026396427,length=
[...]
My aim here is first to extract all the values that is between the brackets, after "habitat.set.prob={". and put them in a single line in a text file.
Also, it would be important to extract the numbers that appears just before the expression "[&length_range=]", which in this case are "6" and "10". They are the label of the set of numbers after "prob={"
So the set of numbers I want to extract always appears between "habitat.set.prob={" and "},DLOOP.rate_median", while the other number (the label) is always rigth before "[&length_range="; but what is before the label is not the same expression; actually it is a random number.
The goal then is end up with a file with the following characteristcs:
6 0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
10 0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
and so on …
What do you think? Is this possible?
I started with this very basic command at least to try to extract the set of numbers, but it didn't work
sed -n "/habitat.set.prob={/,/},DLOOP.rate_median=/ p"
| Well... I got some improvement.
I was able to get the values at least:
awk '{gsub("habitat.set.prob={","\n");printf"%s",$0}' filename | awk -F'},' '{print $1"}"}' | grep -iv "TREE" > stats.txt
|
Many thanks in advance.
Cheers,
Luiz
Something like that:
sed -rn '/.*[0-9]+\[&length_range=\{/,/habitat.set.prob=\{/{s/.*\b([0-9]+)\[&length_range.*/\1/p; s/.*habitat.set.prob=\{([^D]+)\},DLOOP.rate.*/\1/p}' habitat
6
0.01,0.03,0.56,0.01,0.01,0.34,0.01,0.01,0.01
10
0.21,0.33,0.56,0.01,0.01,0.33,0.01,0.01,0.61
The first part '/.a./,/.b./' searches from pattern a to b, distributed over multiple lines. The -n told sed to do non-printing as default.
In '/.a./,/.b./{s/.c./.d./p; s/.e./.f./p}'
there are two substitution commands with p=print in curly braces.
I am not sure if you really digged a little, so not providing the complete answer, but let's hope this would help you:
for the first part: getting the no(which you call as label) you didn't mention if there is any specific pattern, so try this (data is the file which contains the actual input) - you need to work on how to get the number and tweak the RE a bit
sed -n 's/.*\([0-9][0-9]*\).*length_range.*/\1/p' data
For the other part which gives the numericals between habitat and DLOOP:
sed -n 's/.*habitat.set.prob=\(.*\),DLOOP.*/\1/pg' data | tr '{' ' ' | tr '}' ' '
Now, try to take this as a starter and work on your output to get your desired result!
To explain a bit:
In the first section - I am trying to capture the numericals between anything(.*) and (.*)length_range [you can escape the character [ and & by using \ in front of them]
In the second section: I am capturing pattern in between habitat.set.prob and DLOOP and then doin a tr to remove the brackets.
#include <iostream>
using namespace std;
int main()
{
string p = "1:2:3:4"; //input your string
int arr[4] = {}; //create a new empty integer array to put the integers in it
for(int i=0, j=0; i <p.length(); i++){//loop on the string to extract integers
if( p[i] == ':'){continue;}//if the value = ':' skip it and continue
arr[j]=(int)p[i]-48;j++;//put the integer in the array we created
}
cout << "String={"<<arr[0]<<" "<<arr[1]<<" "<<arr[2]<<" "<<arr[3]<<"}";//print the array
return 0;
}

Date formatting a character variable in SAS

I have a character variable with a the value like this:
Aug 1, 2015
I want to have this date value in a date9. and a DDMMYYD10. format.
This is what I have tried:
month=upcase(substr(startdato,1,3));
day=0!!substr(startdato,5,1);
year=substr(startdato, 8,4);
startdato_a=trim(day)!!trim(month)!!trim(year);
FORMAT Startdato2 date9.;
format startdato3 DDMMYYD10.;
Startdato2 = INPUT(startdato_a,date9.);
Startdato3 = INPUT(startdato_a,date9.);
I get this output:
month=AUG
day=01
year=2015
startdato_a=01AUG2015
startdato2=.
startdato3=.
Why don't I get values in startdato2 and startdato3?
You are getting missing values due to leading / trailing whitespace in startdato_a, which prevents the informat from working properly. If you do input(strip(startdato_a),date9.) instead, it works as expected.
However, there is a much simpler way of doing this:
data want;
textdate = 'Aug 1, 2015';
date = input(textdate,anydtdte11.);
format date date9.;
run;
Output your character variables using the $QUOTE. function is a handy way to see leading blanks. Doing this you can see the what is causing the trouble with the INPUT() function.
startdato="Aug 1, 2015"
month="AUG"
day=" 01"
year="2015"
startdato_a=" 01AUG2015"
The cause of this is including a numeric constant when generating that character string.
day=0!!substr(startdato,5,1);
SAS had to convert the 0 into a string so it used best12. format which is why there are 11 leading blanks in the value of DAY. You could have use a string literal instead.
day='0'!!substr(startdato,5,1);
Which would yield better results.
startdato="Aug 1, 2015"
month="AUG"
day="01"
year="2015"
startdato_a="01AUG2015"
Startdato2=01AUG2015
startdato3=01-08-2015

Sed replacing Special Characters in a string

I am having difficulties replacing a string containing special characters using sed. My old and new string are shown below
oldStr = "# td=(nstates=20) cam-b3lyp/6-31g geom=connectivity"
newStr = "# opt b3lyp/6-31g geom=connectivity"
My sed command is the following
sed -i 's/\# td\=\(nstates\=20\) cam\-b3lyp\/6\-31g geom\=connectivity/\# opt b3lyp\/6\-31g geom\=connectivity/g' myfile.txt
I dont get any errors, however there is no match. Any ideas on how to fix my patterns.
Thanks
try s|# td=(nstates=20) cam-b3lyp/6-31g geom=connectivity|# opt b3lyp/6-31g geom=connectivity|g'
you can use next to anything after s instead of /, as your expression contains slashes I used | instead. -, = and # don't have to be escaped (minus only in character sets [...]), escaped parens indicate a group, nonescaped parens are literals.