Replace the html tags - sed

My HTML code has the following line.
<TH>column1</TH><TH>column2</TH><TH>column3</TH>
Can I use sed tool to replace the column1 with "Name", column2 with "Surname" ...
<TH>Name</TH><TH>Surname</TH><TH>City</TH>
I have the list of the columns in a echo statement of my shell script.
echo 'Name, Surname, City'
These 3 values needs to be replaced in the respective columns in the HTML code. The number of columns might change.

Can you change the input format of the new column names, or are you stuck with the echo. And does the table header line appear once per html file, or multiple times?
For your current situation, this would work:
echo 'Name, Surname, City' |
awk -F'<TH>|</TH><TH>|</TH>' 'NR==1{n=split($0,a,", *");OFS="";next}/<TH>/{for(i=1; i<=n;i++)$(i+1)="<TH>"a[i]"</TH>"}1' - file.html
Output:
<TH>Name</TH><TH>Surname</TH><TH>City</TH>
Note that things will go horribly wrong when your input html has a different form (additional or missing newlines). If you want to do anything more advanced you should use a proper SGML parser instead of awk or sed.

put your replacements into variables instead of doing echo, then simply
sed 's|<TH>column1<\/TH>|<TH>Name</TH>|;s|<TH>column2</TH>|<TH>Surname</TH>|;s|<TH>column3</TH>|<TH>City</TH>|' file
Note, this is not fool proof if your pattern span multiple lines. But if all the things you need replaced is on one line, then it should be all right.

Related

Using variable in exiftool -if condition

I am trying to reorganise images based on keywords that are found in the IPTC metadata. More specifically, I need sort images into directories based on the species name in the subject pseudo tag of exiftool.
To do this, I have compiled the keywords in a .txt file (species_ls.txt), with each keyword on a new line, as such:
Asian Tortoise
Banded Civet
Banded Linsang
...
To sort the images I have created the following for loop, which iterates through each line of the document, with sed pulling out the keyword. Here, $line_no is the number of lines in the species_ls.txt file, and image_raw is the directory containing the images.
for i in 'seq 1 $line_no'; do
sp_name=$(sed -n "${i}p" < species_ls.txt)
exiftool -r -if '$subject=~/${sp_name}/i' \
'-Filename=./${sp_dir}/%f%+c%E' image_raw`
Although the for loop runs, no conditions are being met in the -if flag in exiftool. I am assuming this is because the variable sp_name is not being passed into the condition properly.
Any suggestions, or a better way of doing this, would be appreciated.
For the line with condition, rather than using single quotes (' '), it would be better to use double quotes (" ").
The single quotes mean that the content is passed literally, meaning your variable won't get expanded.
To overcome, the $subject line expanding (which I presume you don't want), you can just put a \ in front of the $ to escape it being read as a variable.
This line should now look like:
exiftool -r -if "\$subject=~/${sp_name}/i"
Hope this helps you!

Use processed output from stdin as a replacement string in Sed

Following command gives me the output I want:
$ sed '/^<template.*>/,/<\/template>/!d;//d' src/components/**/*.vue | html2jade
in that it processes each template containing html into it's pug equivalent.
Would it be possible now to somehow replace the originally found html in all those files, with this now
processed output? There is also some other content outside template tags, which should stay as it is,
namely some script and style tags.

Remove some rows with " in front

I have a CSV file that is causing me serious headaches going into Tableau. Some of the rows in the CSV are wrapped in a " " and some not. I would like them all to be imported without this (i.e. ignore it on rows that have it).
Some data:
"1;2;Red;3"
1;2;Green;3
1;2;Blue;3
"1;2;Hello;3"
Do you have any suggestions?
If you have a bash prompt hanging around...
You can use cat to output the file contents so you can make sure you're working with the right data:
cat filename.csv
Then, pipe it through sed so you can visually check that the quotes were delted:
cat filename.csv | sed 's/"// g'
If the output looks good, use the -i flag to edit the file in place:
sed -i 's/"// g' filename.csv
All quotes should now be missing from filename.csv
If your data has quotes in it, and you want to only strip the quotes that appear at the beginning and end of each line, you can use this instead:
sed -i 's/^"\(.*\)"$/\1/' filename.csv
It's not the most elegant way to do it in Tableau but if you cannot remove it in the source file, you could create a calculated field for the first and last column that strips the quotation marks.
right click on the field for the first column choose Create/Calculated Field
Use this formula: INT(REPLACE([FirstColumn],'"',''))
Name the column accordingly
Do the same for the last column
Assuming the data you provided fits the data you work on. The assumption is that these fields are integer field (thus the INT() usage). In case they are string fields you would want to make sure that you don't remove quotation marks that belong to the field value.

Manipulate characters with sed

I have a list of usernames and i would like add possible combinations to it.
Example. Lets say this is the list I have
johna
maryb
charlesc
Is there is a way to use sed to edit it the way it looks like
ajohn
bmary
ccharles
And also
john_a
mary_b
charles_c
etc...
Can anyone assist me into getting the commands to do so, any explanation will be awesome as well. I would like to understand how it works if possible. I usually get confused when I see things like 's/\.(.*.... without knowing what some of those mean... anyway thanks in advance.
EDIT ... I change the username
sed s/\(user\)\(.\)/\2\1/
Breakdown:
sed s/string/replacement/ will replace all instances of string with replacement.
Then, string in that sed expression is \(user\)\(.\). This can be broken down into two
parts: \(user\) and \(.\). Each of these is a capture group - bracketed by \( \). That means that once we've matched something with them, we can reuse it in the replacement string.
\(user\) matches, surprisingly enough, the user part of the string. \(.\) matches any single character - that's what the . means. Then, you have two captured groups - user and a (or b or c).
The replacement part just uses these to recreate the pattern a little differently. \2\1 says "print the second capture group, then the first capture group". Which in this case, will print out auser - since we matched user and a with each group.
ex:
$ echo "usera
> userb
> userc" | sed "s/\(user\)\(.\)/\2\1/"
auser
buser
cuser
You can change the \2\1 to use any string you want - ie. \2_\1 will give a_user, b_user, c_user.
Also, in order to match any preceding string (not just "user"), just replace the \(user\) with \(.*\). Ex:
$ echo "marya
> johnb
> alfredc" | sed "s/\(.*\)\(.\)/\2\1/"
amary
bjohn
calfred
here's a partial answer to what is probably the easy part. To use sed to change usera to user_a you could use:
sed 's/user/user_/' temp
where temp is the name of the file that contains your initial list of usernames. How this works: It is finding the first instance of "user" on each line and replacing it with "user_"
Similarly for your dot example:
sed 's/user/user./' temp
will replace the first instance of "user" on each line with "user."
Sed does not offer non-greedy regex, so I suggest perl:
perl -pe 's/(.*?)(.)$/$2$1/g' file
ajohn
bmary
ccharles
perl -pe 's/(.*?)(.)$/$1_$2/g' file
john_a
mary_b
charles_c
That way you don't need to know the username before hand.
Simple solution using awk
awk '{a=$NF;$NF="";$0=a$0}1' FS="" OFS="" file
ajohn
bmary
ccharles
and
awk '{a=$NF;$NF="";$0=$0"_" a}1' FS="" OFS="" file
john_a
mary_b
charles_c
By setting FS to nothing, every letter is a field in awk. You can then easy manipulate it.
And no need to using capturing groups etc, just plain field swapping.
This might work for you (GNU sed):
sed -r 's/^([^_]*)_?(.)$/\2\1/' file
This matches any charactes other than underscores (in the first back reference (\1)), a possible underscore and the last character (in the second back reference (\2)) and swaps them around.

How to export postgres data containing newlines to CSV without breaking records on several lines

I am trying to export data from postgresq to CSV files but when I do have newlines in text in the database, the exported data will be broken on several lines, which makes much harder to read the CSV file, not to say that most applications will fail to load it properly.
Here is how I export the data now:
PRESQL="\pset format unaligned
\pset fieldsep \",\"
\pset footer off
\o 'out.csv'
"
cat <(echo $PRESQL) $QUERYFILE | psql …
Sa far, so good, unless you have newlines in the text fields. Any hack that would allow me to generate a very simple to parse CSV file (with one record per line)?
It was a mistake to consider that a CSV can be forced to have one line per row. The RFC states clear that newlines are to be enclosed in double quotes.
You can try replace() or regexp_replace() function.
The answer to the followinig SO question should give you an idea: How to remove carriage returns and new lines in Postgresql?