I have a csv file with 500 lines and 8 columns. One column is for the office phone number. About 200 of them are in the form -1234 but need to be 123-456-7890.
Each of the 200 instances have a different last four as they are desk phone numbers. The area code and exchange part are the same for all. So as an example, we have three data points:
-2329
-5679
-8891
These three items need to be converted to the full phone number so will need to look like:
212-598-2329
212-598-5679
212-598-8891
Anybody have the sed wizardry to pull this off ?
An example line of data looks like:
Bill,Smith,Mr.,bsmith#mydomain.org,bsmith,-5315,800-878-5554,\N,\N,\N,\N
I would use awk:
awk -F, '{$6="212-598"$6}1' OFS=, input.file
Using , as the input and output field delimiter it is simple to access the sixth field and prepend the static value.
alternative with sed
$ sed 's/,-/,212-598-/' missing_phone
Bill,Smith,Mr.,bsmith#mydomain.org,bsmith,212-598-5315,800-878-5554,\N,\N,\N,\N
Related
I have a plain text file:
line1_text
line2_text
I need to add a number of whitespaces between the two lines.
Adding 10 whitespaces is easy.
But say I need to add 10000 whitespaces, how would I achieve that using sed?
P.S. This is for experimental purposes
There undoubtedly is a sed method to do this but, since sed does not have any natural understanding of arithmetic, it is not a natural choice for this problem. By contrast, awk understands arithmetic and can readily, for example, print an empty line n times for any integer value of n.
As an example, consider this input file:
$ cat infile
line1_text
line2_text
This code will add as many blank lines as you like before any line that contains the string line2_text:
$ awk -v n=5 '/line2_text/{for (i=1;i<=n;i++)print""} 1' infile
line1_text
line2_text
If you want 10,000 blank lines instead of 5, then replace n=5 with n=10000.
How it works
-v n=5
This defines an awk variable n with value 5.
/line2_text/{for (i=1;i<=n;i++)print""}
Every time that a line matches the regex line2_text, then a for loop is performed with prints an empty line n times.
1
This is awk's shorthand for print-the-line and it causes every line from input to be printed to the output.
This might work for you (GNU sed):
sed -r '/line1_text/{x;s/.*/ /;:a;ta;s/ /&\n/10000;tb;s/^[^\n]*/&&/;ta;:b;s/\n.*//;x;G}' file
This appends the hold space to the first line. The hold space is manipulated to hold the required number of spaces by a looping mechanism based on powers of 2. This may produce more than necessary and the remainder are chopped off using a linefeed as a delimiter.
To change spaces to newlines, use:
sed -r '/line1_text/{x;s/.*/ /;:a;ta;s/ /\n&/10000;tb;s/^[^\n]*/&&/;ta;:b;s/\n.*//;s/ /\n/g;x;G}' file
In essence the same can be achieved using this (however it is very slow for large numbers):
sed -r '/line1_text/{x;:a;/ {20}/bb;s/^/ /;ta;:b;x;G}' file
I'm running Windows and have the GnuWin32 toolkit, which includes sed. Specifically:
C:\TEMP>sed --version
GNU sed version 4.2.1
I have a text file with two sections: A fixed part I want to preserve, and a part that's appended after running a job.
In the file is a unique string that identifies the start of the part that's added, and I'd like to use Gnu sed to isolate only the part of the file that's before the unique string - i.e., so I can append different data to the fixed part each time the job is run.
I know I could keep the fixed portion in a separate file, but that adds complexity and it would be more elegant if I could just reuse the data at the start of the same file.
A long time ago I knew how to set up sed scripts, and I'm sure this can be done with sed, but I've slept since then. :)
Can you please describe how to use sed to display the lines of text in a file up to and not including a specific string?
Example:
line 1 of fixed portion
line 2 of fixed portion
unique string
line 1 of appended portion
line 2 of appended portion
line 3 of appended portion
What I'd like is to see as output:
line 1 of fixed portion
line 2 of fixed portion
I've gotten as far as:
sed -r -n -e "0,/unique string/p"
but that prints the unique string as well.
Thanks in advance.
-Noel
This should work for you:
sed -n '/unique string/q;p' file
It quits processing at unique string. Other lines get printed.
An alternative might be to use a range address like this:
sed -n '1,/unique string/{/unique string/!p}' file
Note that sed includes the range border. We need to exclude unique string from printing.
Furthermore I'm using the -n option which makes sed suppress the output of input lines by default.
One thing, if unique string can contain characters which are also syntax characters in the regex like ...
test*
... sed might not be the right tool for the job any more since it can only match regular expressions but not fixed strings.
In that case awk might be the tool of choice:
awk 'index("*unique string*"){exit}1' file
index("string") returns a non zero value (the position) if the string has been found. We cancel further processing of input lines in that case and don't print that line as well.
The trailing 1 always evaluates to true and makes awk print all the lines until the previous condition applies.
I have a CSV file that is causing me serious headaches going into Tableau. Some of the rows in the CSV are wrapped in a " " and some not. I would like them all to be imported without this (i.e. ignore it on rows that have it).
Some data:
"1;2;Red;3"
1;2;Green;3
1;2;Blue;3
"1;2;Hello;3"
Do you have any suggestions?
If you have a bash prompt hanging around...
You can use cat to output the file contents so you can make sure you're working with the right data:
cat filename.csv
Then, pipe it through sed so you can visually check that the quotes were delted:
cat filename.csv | sed 's/"// g'
If the output looks good, use the -i flag to edit the file in place:
sed -i 's/"// g' filename.csv
All quotes should now be missing from filename.csv
If your data has quotes in it, and you want to only strip the quotes that appear at the beginning and end of each line, you can use this instead:
sed -i 's/^"\(.*\)"$/\1/' filename.csv
It's not the most elegant way to do it in Tableau but if you cannot remove it in the source file, you could create a calculated field for the first and last column that strips the quotation marks.
right click on the field for the first column choose Create/Calculated Field
Use this formula: INT(REPLACE([FirstColumn],'"',''))
Name the column accordingly
Do the same for the last column
Assuming the data you provided fits the data you work on. The assumption is that these fields are integer field (thus the INT() usage). In case they are string fields you would want to make sure that you don't remove quotation marks that belong to the field value.
I have a large file (~4,000,000 lines) consisting of multiple blocks of data, each with an introductory ID tag, and a list of selected ID tags in a second file.
For example:
Data.txt
>ID:1000
data about this
more data
data
>ID:1001
blah blah
data
>ID:1002
foo
...
And ID_Tags.txt
>ID:1000
>ID:1002
>ID:1085
>ID:3062
...
I need a way to grab the ID tag and corresponding data from Data.txt for the data specified in ID_Tags.txt so that I wind up with a file looking like:
Select_Data.txt
>ID:1000
data about this
more data
data
>ID:1002
foo
...
I can get one block of data at a time with
sed -n '/ID:1000/,/>/p' Data.txt | head -n -1 >> Select_Data.txt
But this only does a single ID tag at a time, and I have hundreds of select ID tags. Is there a way to avoid doing this manually?
$ awk 'NR==FNR{tags[$0];next} /^>/{f=($0 in tags)} f' ID_Tags.txt Data.txt
>ID:1000
data about this
more data
data
>ID:1002
foo
This might work for you (GNU sed):
sed $'1i:a\ns#.*#/^&$/bb#;$ad;:b;n;/^>/ba;bb' ids_file | sed -f - data_file
This builds a sed script from the ids file and runs the script against the data file. The sed script looks for those ids in the ids file and prints the id line and those lines following until the next id where it loops back check the id. All other lines are deleted.
You can use the following awk script:
awk 'NR==FNR{i[$1];next} NF>1 && $1 in i{print ">"$0}' RS='>' ids.txt data.txt
Output:
>ID:1000
data about this
more data
data
>ID:1002
etc
The key to my solution is to replace the default record separator \n by > using RS='>'. Using this trick it is pretty simple to access the individual fields of the data.
Explanation
We are passing both files to awk, ids.txt and data.txt and awk will process them in order.
NR==FNR{i[$1];next} runs unless awk is parsing the first file, ids.txt. NR represents the current record number, and FNR the number of the record in the current file. They are equal only when parsing the first file. i[$1] adds the value of the id (without the leading > since it is the field separator) as key to the array i. next stops further processing of the line.
$1 in i {print ">"$0} will check if the first column of the data record - the id - is a key in our array i and prints the record while adding the > back to the front of it.
Note that we are additionally checking if NF>1 (meaning the record is not empty) because awk will return an empty first record because the data file starts with the record delimiter >. <none> in array will result into true in awk and would print and additional >.
I have a list of usernames and i would like add possible combinations to it.
Example. Lets say this is the list I have
johna
maryb
charlesc
Is there is a way to use sed to edit it the way it looks like
ajohn
bmary
ccharles
And also
john_a
mary_b
charles_c
etc...
Can anyone assist me into getting the commands to do so, any explanation will be awesome as well. I would like to understand how it works if possible. I usually get confused when I see things like 's/\.(.*.... without knowing what some of those mean... anyway thanks in advance.
EDIT ... I change the username
sed s/\(user\)\(.\)/\2\1/
Breakdown:
sed s/string/replacement/ will replace all instances of string with replacement.
Then, string in that sed expression is \(user\)\(.\). This can be broken down into two
parts: \(user\) and \(.\). Each of these is a capture group - bracketed by \( \). That means that once we've matched something with them, we can reuse it in the replacement string.
\(user\) matches, surprisingly enough, the user part of the string. \(.\) matches any single character - that's what the . means. Then, you have two captured groups - user and a (or b or c).
The replacement part just uses these to recreate the pattern a little differently. \2\1 says "print the second capture group, then the first capture group". Which in this case, will print out auser - since we matched user and a with each group.
ex:
$ echo "usera
> userb
> userc" | sed "s/\(user\)\(.\)/\2\1/"
auser
buser
cuser
You can change the \2\1 to use any string you want - ie. \2_\1 will give a_user, b_user, c_user.
Also, in order to match any preceding string (not just "user"), just replace the \(user\) with \(.*\). Ex:
$ echo "marya
> johnb
> alfredc" | sed "s/\(.*\)\(.\)/\2\1/"
amary
bjohn
calfred
here's a partial answer to what is probably the easy part. To use sed to change usera to user_a you could use:
sed 's/user/user_/' temp
where temp is the name of the file that contains your initial list of usernames. How this works: It is finding the first instance of "user" on each line and replacing it with "user_"
Similarly for your dot example:
sed 's/user/user./' temp
will replace the first instance of "user" on each line with "user."
Sed does not offer non-greedy regex, so I suggest perl:
perl -pe 's/(.*?)(.)$/$2$1/g' file
ajohn
bmary
ccharles
perl -pe 's/(.*?)(.)$/$1_$2/g' file
john_a
mary_b
charles_c
That way you don't need to know the username before hand.
Simple solution using awk
awk '{a=$NF;$NF="";$0=a$0}1' FS="" OFS="" file
ajohn
bmary
ccharles
and
awk '{a=$NF;$NF="";$0=$0"_" a}1' FS="" OFS="" file
john_a
mary_b
charles_c
By setting FS to nothing, every letter is a field in awk. You can then easy manipulate it.
And no need to using capturing groups etc, just plain field swapping.
This might work for you (GNU sed):
sed -r 's/^([^_]*)_?(.)$/\2\1/' file
This matches any charactes other than underscores (in the first back reference (\1)), a possible underscore and the last character (in the second back reference (\2)) and swaps them around.