Related
I have lines in a text file which looks like this example:
"2009217",2015,3,"N","N","2","UPPER DARBY FIREFIGHTERS "PAC"","","","","7235 WEST CHESTER PIKE","","UPPER DARBY","PA","19082","","6106220269",4245.0100,650.0000,.0000
I want to replace every double quote in multiple partial strings similar to this "UPPER DARBY FIREFIGHTERS "PAC""across the whole file.
So the result should be as below for each instance of the recurring double quotes:
"2009217",2015,3,"N","N","2","UPPER DARBY FIREFIGHTERS PAC","","","","7235 WEST CHESTER PIKE","","UPPER DARBY","PA","19082","","6106220269",4245.0100,650.0000,.0000
I came to this sed line:
cat file.txt | sed "s/\([^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,\)\([^,]*\),\(.*\)/\1\2\3/"
But now I don't know how to replace the double quote within \2.
Is that possible with sed?
I would personally use awk for that because it is more readable:
#!/usr/bin/env awk
BEGIN {
# Use ',' as the input and output field delimiter
FS=OFS=","
}
{
# Iterate through all fields. (NF is the number of fields.)
for(i=1;i<=NF;i++) {
# If the field starts and ends with a '"'
if($i ~ /^".*"$/) {
# Replace all '""
gsub(/"/,"",$i)
# Wrap in '"' again
$i = "\"" $i "\""
}
}
}
print
This might work for you (GNU sed):
sed -r ':a;s/^((([^",]*,)*("[^",]*",([^",]*,)*)*)"[^",]*)"([^,])/\1\6/;ta' file
This removes extra double quotes from strings surrounded by double quotes and delimited by ,'s.
It does this by eliminating properly constructed double quotes strings and non-quoted strings (in this example numbers) and then removes double quotes that are not followed by ,
[^",]*, # non double quoted strings
"[^",]*", # properly quoted strings
(([^",]*,)*("[^",]*",([^",]*,)*)*) # eliminate all properly constructed strings
"[^",]*"([^,]) # improper double quotes
^
|
I try to use sed to replace a word in a 2-line pattern with another word. When in one line the pattern 'MACRO "something"' is found then in the next line replace 'BLOCK' with 'CORE'. The "something" is to be put into a reference and printed out as well.
My input data:
MACRO ABCD
CLASS BLOCK ;
SYMMETRY X Y ;
Desired outcome:
MACRO ABCD
CLASS CORE ;
SYMMETRY X Y ;
My attempt in sed so far:
sed 's/MACRO \([A-Za-z0-9]*\)/,/ CLASS BLOCK ;/MACRO \1\n CLASS CORE ;/g' input.txt
The above did not work giving message:
sed: -e expression #1, char 30: unknown option to `s'
What am I missing?
I'm open to one-liner solutions in perl as well.
Thanks,
Gert
Using a perl one-liner in slurp mode:
perl -0777 -pe 's/MACRO \w+\n CLASS \KBLOCK ;/CORE ;/g' input.txt
Or using a streaming example:
perl -pe '
s/^\s*\bCLASS \KBLOCK ;/CORE ;/ if $prev;
$prev = $_ =~ /^MACRO \w+$/
' input.txt
Explanation:
Switches:
-0777: Slurp files whole
-p: Creates a while(<>){...; print} loop for each line in your input file.
-e: Tells perl to execute the code on command line.
When in one line the pattern 'MACRO "something"' is found then in the
next line replace 'BLOCK' with 'CORE'.
sed works on lines of input. If you want to perform substitution on the next line of a specified pattern, then you need to add that to the pattern space before being able to do so.
The following might work for you:
sed '/MACRO/{N;s/\(CLASS \)BLOCK/\1CORE/;}' filename
Quoting from the documentation:
`N'
Add a newline to the pattern space, then append the next line of
input to the pattern space. If there is no more input then sed
exits without processing any more commands.
If you want to make use of address range as in your attempt, then you need:
sed '/MACRO/,/CLASS BLOCK/{s/\(CLASS\) BLOCK/\1 CORE/}' filename
I'm not sure why do you need a backreference for substituting the macro name.
You could try this awk command also,
awk '{print}/MACRO/ {getline; sub (/BLOCK/,"CORE");{print}}' file
It prints all the lines as it is and do the replacing action on seeing a word MACRO on a line.
Since getline has so many pitfall I try not to use it, so:
awk '/MACRO/ {a++} a==1 {sub(/BLOCK/,"CORE")}1' file
MACRO ABCD
CLASS CORE ;
SYMMETRY X Y ;
This could do it
#!awk -f
BEGIN {
RS = ";"
}
/MACRO/ {
sub("BLOCK", "CORE")
}
{
printf s++ ? ";" $0 : $0
}
"line" ends with ;
sub BLOCK for CORE in "lines" with MACRO
print ; followed by "line" unless first line
I need to search for a specific word in a file starting from specific line and return the line numbers only for the matched lines.
Let's say I want to search a file called myfile for the word my_word and then store the returned line numbers.
By using shell script the command :
sed -n '10,$ { /$my_word /= }' $myfile
works fine but how to write that command on tcl shell?
% exec sed -n '10,$ { /$my_word/= }' $file
extra characters after close-brace.
I want to add that the following command works fine on tcl shell but it starts from the beginning of the file
% exec sed -n "/$my_word/=" $file
447431
447445
448434
448696
448711
448759
450979
451006
451119
451209
451245
452936
454408
I have solved the problem as follows
set lineno 10
if { ! [catch {exec sed -n "/$new_token/=" $file} lineFound] && [string length $lineFound] > 0 } {
set lineNumbers [split $lineFound "\n"]
foreach num $lineNumbers {
if {[expr {$num >= $lineno}] } {
lappend col $num
}
}
}
Still can't find a single line that solve the problem
Any suggestions ??
I don't understand a thing: is the text you are looking for stored inside the variable called my_word or is the literal value my_word?
In your line
% exec sed -n '10,$ { /$my_word/= }' $file
I'd say it's the first case. So you have before it something like
% set my_word wordtosearch
% set file filetosearchin
Your mistake is to use the single quote character ' to enclose the sed expression. That character is an enclosing operator in sh, but has no meaning in Tcl.
You use it in sh to group many words in a single argument that is passed to sed, so you have to do the same, but using Tcl syntax:
% set my_word wordtosearch
% set file filetosearchin
% exec sed -n "10,$ { /$my_word/= }" $file
Here, you use the "..." to group.
You don't escape the $ in $my_word because you want $my_word to be substitued with the string wordtosearch.
I hope this helps.
After a few trial-and-error I came up with:
set output [exec sed -n "10,\$ \{ /$myword/= \}" $myfile]
# Do something with the output
puts $output
The key is to escape characters that are special to TCL, such as the dollar sign, curly braces.
Update
Per Donal Fellows, we do not need to escape the dollar sign:
set output [exec sed -n "10,$ \{ /$myword/= \}" $myfile]
I have tried the new revision and found it works. Thank you, Donal.
Update 2
I finally gained access to a Windows 7 machine, installed Cygwin (which includes sed and tclsh). I tried out the above script and it works just fine. I don't know what your problem is. Interestingly, the same script failed on my Mac OS X system with the following error:
sed: 1: "10,$ { /ipsum/= }": extra characters at the end of = command
while executing
"exec sed -n "10,$ \{ /$myword/= \}" $myfile"
invoked from within
"set output [exec sed -n "10,$ \{ /$myword/= \}" $myfile]"
(file "sed.tcl" line 6)
I guess there is a difference between Linux and BSD systems.
Update 3
I have tried the same script under Linux/Tcl 8.4 and it works. That might mean Tcl 8.4 has nothing to do with it. Here is something else that might help: Tcl comes with a package called fileutil, which is part of the tcllib. The fileutil package contains a useful tool for this case: fileutil::grep. Here is a sample on how to use it in your case:
package require fileutil
proc grep_demo {myword myfile} {
foreach line [fileutil::grep $myword $myfile] {
# Each line is in the format:
# filename:linenumber:text
set lineNumber [lindex [split $line :] 1]
if {$lineNumber >= 10} { puts $lineNumber}
}
}
puts [grep_demo $myword $myfile]
Here is how to do it with awk
awk 'NR>10 && $0~f {print NR}' f="$my_word" "$myfile"
This search for all line larger than line number 10 that contains word in variable $my_word in file name stored in variable myfile
I have a results.txt file that is structured in this format:
Uncharted 3: Javithaxx l Rampant l Graveyard l Team Deathmatch HD (D1VpWBaxR8c)
Matt Darey feat. Kate Louise Smith - See The Sun (Toby Hedges Remix) (EQHdC_gGnA0)
The Matrix State (SXP06Oax70o)
Above & Beyond - Group Therapy Radio 014 (guest Lange) (2013-02-08) (8aOdRACuXiU)
I want to create a new file extracting the youtube URL ID specified in the last characters in each line line "8aOdRACuXiU"
I'm trying to build a URL like this in a new file:
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Note, I appended the &hd=1 to the string that I am trying to be replaced. I have tried using Linux reverse and cut but reverse or rev munges my data. The hard part here is that each line in my text file will have entries with parentheses and I only care about getting the data between the last set of parentheses. Each line has a variable length so that isn't helpful either. What about using grep and .$ for the end of the line?
In summary, I want to extract the youtube ID from results.txt and export it to a new file in the following format: http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Using awk:
awk '{
v = substr( $NF, 2, length( $NF ) - 2 )
printf "%s%s%s\n", "http://www.youtube.com/watch?v=", v, "&hd=1"
}' infile
It yields:
http://www.youtube.com/watch?v=D1VpWBaxR8c&hd=1
http://www.youtube.com/watch?v=EQHdC_gGnA0&hd=1
http://www.youtube.com/watch?v=SXP06Oax70o&hd=1
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
$ sed 's!.*(\(.*\))!http://www.youtube.com/watch?v=\1\&hd=1!' results.txt
http://www.youtube.com/watch?v=D1VpWBaxR8c&hd=1
http://www.youtube.com/watch?v=EQHdC_gGnA0&hd=1
http://www.youtube.com/watch?v=SXP06Oax70o&hd=1
http://www.youtube.com/watch?v=8aOdRACuXiU&hd=1
Here, .*(\(.*\)) looks for the last occurrence of a pair of parentheses, and captures the characters inside those parentheses. The captured group is then inserted into the URL using \1.
Using a perl one-liner :
perl -lne 'printf "http://www.youtube.com/watch?v=%s&hd=1\n", $& if /[^\(]+(?=\)$)/' file.txt
Or multi-line version :
perl -lne '
printf(
"http://www.youtube.com/watch?v=%s&hd=1\n",
$&
) if /[^\(]+(?=\)$)/
' file.txt
I have a list of strings and I want to pass those strings as arguments in a single Bash command line call. For simple alphanumeric strings it suffices to just pass them verbatim:
> script.pl foo bar baz yes no
foo
bar
baz
yes
no
I understand that if an argument contains spaces or backslashes or double-quotes, I need to backslash-escape the double-quotes and backslashes, and then double-quote the argument.
> script.pl foo bar baz "\"yes\"\\\"no\""
foo
bar
baz
"yes"\"no"
But when an argument contains an exclamation mark, this happens:
> script.pl !foo
-bash: !foo: event not found
Double quoting doesn't work:
> script.pl "!foo"
-bash: !foo: event not found
Nor does backslash-escaping (notice how the literal backslash is present in the output):
> script.pl "\!foo"
\!foo
I don't know much about Bash yet but I know that there are other special characters which do similar things. What is the general procedure for safely escaping an arbitrary string for use as a command line argument in Bash? Let's assume the string can be of arbitrary length and contain arbitrary combinations of special characters. I would like an escape() subroutine that I can use as below (Perl example):
$cmd = join " ", map { escape($_); } #args;
Here are some more example strings which should be safely escaped by this function (I know some of these look Windows-like, that's deliberate):
yes
no
Hello, world [string with a comma and space in it]
C:\Program Files\ [path with backslashes and a space in it]
" [i.e. a double-quote]
\ [backslash]
\\ [two backslashes]
\\\ [three backslashes]
\\\\ [four backslashes]
\\\\\ [five backslashes]
"\ [double-quote, backslash]
"\T [double-quote, backslash, T]
"\\T [double-quote, backslash, backslash, T]
!1
!A
"!\/'" [double-quote, exclamation, backslash, forward slash, apostrophe, double quote]
"Jeff's!" [double-quote, J, e, f, f, apostrophe, s, exclamation, double quote]
$PATH
%PATH%
&
<>|&^
*#$$A$##?-_
EDIT:
Would this do the trick? Escape every unusual character with a backslash, and omit single or double quotes. (Example is in Perl but any language can do this)
sub escape {
$_[0] =~ s/([^a-zA-Z0-9_])/\\$1/g;
return $_[0];
}
If you want to securely quote anything for Bash, you can use its built-in printf %q formatting:
cat strings.txt:
yes
no
Hello, world
C:\Program Files\
"
\
\\
\\\
\\\\
\\\\\
"\
"\T
"\\T
!1
!A
"!\/'"
"Jeff's!"
$PATH
%PATH%
&
<>|&^
*#$$A$##?-_
cat quote.sh:
#!/bin/bash
while IFS= read -r string
do
printf '%q\n' "$string"
done < strings.txt
./quote.sh:
yes
no
Hello\,\ world
C:\\Program\ Files\\
\"
\\
\\\\
\\\\\\
\\\\\\\\
\\\\\\\\\\
\"\\
\"\\T
\"\\\\T
\!1
\!A
\"\!\\/\'\"
\"Jeff\'s\!\"
\$PATH
%PATH%
\&
\<\>\|\&\^
\*#\$\$A\$##\?-_
These strings can be copied verbatim to for example echo to output the original strings in strings.txt.
What is the general procedure for safely escaping an arbitrary string for use as a command line argument in Bash?
Replace every occurrence of ' with '\'', then put ' at the beginning and end.
Every character except for a single quote can be used verbatim in a single-quote-delimited string. There's no way to put a single quote inside a single-quote-delimited string, but that's easy enough to work around: end the string ('), then add a single quote by using a backslash to escape it (\'), then begin a new string (').
As far as I know, this will always work, with no exceptions.
You can use single quotes to escape strings for Bash. Note however this does not expand variables within quotes as double quotes do. In your example, the following should work:
script.pl '!foo'
From Perl, this depends on the function you are using to spawn the external process. For example, if you use the system function, you can pass arguments as parameters so there"s no need to escape them. Of course you"d still need to escape quotes for Perl:
system("/usr/bin/rm", "-fr", "/tmp/CGI_test", "/var/tmp/CGI");
sub text_to_shell_lit(_) {
return $_[0] if $_[0] =~ /^[a-zA-Z0-9_\-]+\z/;
my $s = $_[0];
$s =~ s/'/'\\''/g;
return "'$s'";
}
See this earlier post for an example.
Whenever you see you don't get the desired output, use the following method:
"""\special character"""
where special character may include ! " * ^ % $ # # ....
For instance, if you want to create a bash generating another bash file in which there is a string and you want to assign a value to that, you can have the following sample scenario:
Area="(1250,600),(1400,750)"
printf "SubArea="""\""""${Area}"""\""""\n" > test.sh
printf "echo """\$"""{SubArea}" >> test.sh
Then test.sh file will have the following code:
SubArea="(1250,600),(1400,750)"
echo ${SubArea}
As a reminder to have newline \n, we should use printf.
Bash interprets exclamation marks only in interactive mode.
You can prevent this by doing:
set +o histexpand
Inside double quotes you must escape dollar signs, double quotes, backslashes and I would say that's all.
This is not a complete answer, but I find it useful sometimes to combine two types of quote for a single string by concatenating them, for example echo "$HOME"'/foo!?.*' .
FWIW, I wrote this function that invokes a set of arguments using different credentials. The su command required serializing all the arguments, which required escaping them all, which I did with the printf idiom suggested above.
$ escape_args_then_call_as myname whoami
escape_args_then_call_as() {
local user=$1
shift
local -a args
for i in "$#"; do
args+=( $(printf %q "${i}") )
done
sudo su "${user}" -c "${args[*]}"
}