Replace matches of one regex expression with matches from another, across two files - sed

I am currently helping a friend reorganise several hundred images on a database driven website. I have generated a list of the new, reorganised image paths offline and would like to replace each matching image reference in the sql export of the database with the new paths.
EDIT: Here is an example of what I am trying to achieve
The new_paths_list.txt is a file that I generated using a batch script after I had organised all of the existing images into folders. Prior to this all of the images were in just a few folders. A sample of this generated list might be:
image/data/product_photos/telephones/snom/snom_xyz.jpg
image/data/product_photos/telephones/gigaset/giga_xyz.jpg
A sample of my_exported_db.sql (the database exported from the website) might be:
...
,(110,32,'data/phones/snom_xyz.jpg',3),(213,50,'data/telephones/giga_xyz.jpg',0),
...
The result I want is my_exported_db.sql to be:
...
,(110,32,'data/product_photos/telephones/snom/snom_xyz.jpg',3),(213,50,'data/product_photos/telephones/gigaset/giga_xyz.jpg',0),
...
Some pseudo code to illustrate:
1/ Find the first image name in my_exported_db.sql, such as 'snom_xyz.jpg'.
2/ Find the same image name in new_paths_list.txt
3/ If it is present, copy the whole line (the path and filename)
4/ Replace the whole path in in my_exported_db.sql of this image with the copied line
5/ Repeat for all other image names in my_exported_db.sql
A regex expression that appears to match image names is:
([^)''"/])+\.(?:jpg|jpeg|gif|png)
and one to match image names, complete with path (for relative or absolute) is:
\bdata[^)''"\s]+\.(?:jpg|jpeg|gif|png)
I have looked around and have seen that Sed or Awk may be capable of doing this, but some pointers would be greatly appreciated. I understand that this will only work accurately if there are no duplicated filenames.

You can use sed to convert new_paths_list.txt into a set of sed replacement commands:
sed 's|\(.*\(/[^/]*$\)\)|s#data\2#\1#|' new_paths_list.txt > rules.sed
The file rules.sed will look like this:
s#data/snom_xyz.jpg#image/data/product_photos/telephones/snom/snom_xyz.jpg#
s#data/giga_xyz.jpg#image/data/product_photos/telephones/gigaset/giga_xyz.jpg#
Then use sed again to translate my_exported_db.sql:
sed -i -f rules.sed my_exported_db.sql
I think in some shells it's possible to combine these steps and do without rules.sed:
sed 's|\(.*\(/[^/]*$\)\)|s#data\2#\1#|' new_paths_list.txt | sed -i -f - my_exported_db.sql
but I'm not certain about that.
EDIT<:
If the images are in several directories under data/, make this change:
sed "s|image/\(.*\(/[^/]*$\)\)|s#[^']*\2#\1#|" new_paths_list.txt > rules.sed

Related

sed search and replace with 2 conditions/patterns

I have a file:
http://www.gnu.org/software/coreutils/
glob
lxc-ls
I need to only replace below:
lxc-ls
with
lxc-ls
lxc-ls can be any word as I have multiple such links in several files which I need to replace.
I do not want to make any changes to the other 2 links. i.e.
http://www.gnu.org/software/coreutils/
glob
What I have tried until is:
$ sed '/html/ i\
..' file
But this appends to the start of the line, also the other condition of excluding 2 URLs is also not full filled.
Here is a more realistic example from one of the file.
<b>echoping</b>(1),
<b>getaddrinfo</b>(3),
<b>getaddrinfo_a</b>(3),
<b>getpeername</b>(2),
<b>getsockname</b>(2),
<b>ping_setopt</b>(3),
<b>proc</b>(5),
<b>rds</b>(7),
<b>recv</b>(2),
<b>rtnetlink</b>(7),
<b>sctp</b>(7),
<b>sctp_connectx</b>(3),
<b>send</b>(2),
<b>udplite</b>(7)
http://gnu.org/licenses/gpl.html
http://translationproject.org/team/
Here I only need to replace:
<b>rds</b>(7),
<b>rtnetlink</b>(7),
<b>sctp</b>(7),
<b>udplite</b>(7)
with:
<b>rds</b>(7),
<b>rtnetlink</b>(7),
<b>sctp</b>(7),
<b>udplite</b>(7)
Using sed
$ sed s'|"\([[:alpha:]].*\)|"../\1|' file
<b>echoping</b>(1),
<b>getaddrinfo</b>(3),
<b>getaddrinfo_a</b>(3),
<b>getpeername</b>(2),
<b>getsockname</b>(2),
<b>ping_setopt</b>(3),
<b>proc</b>(5),
<b>rds</b>(7),
<b>recv</b>(2),
<b>rtnetlink</b>(7),
<b>sctp</b>(7),
<b>sctp_connectx</b>(3),
<b>send</b>(2),
<b>udplite</b>(7)

creating a per sample table from a vcf using bcftools

I have a multi-sample vcf file and I want to get a table of IDs on the left column with the variants in which they have an alternate allele in. It should look like this:
ID1 chr2:87432:A:T_0/1 chr10:43234:C:G_1/1
ID2 chr2:87432_A:T_1/1
ID3 chr11:432434:T:G chr14:34234234:C:G chr20:34324234:T:C
This is to then read into R
I have tried combinations of:
bcftools query -f '[%SAMPLE\t] %CHROM:%POS:%REF:%ALT[%GT]\n'
but I keep getting sample IDs overlapping on the same line and I can't quite figure out the sytnax.
Your help would be much appreciated
You cannot achieve what you want with a single BCFtools command. BCFtools parses one VCF variant at a time. However, you can use a command like this to extract what you want:
bcftools +split -i 'GT="0/1" | GT="1/1"' -Ob -o DIR input.vcf
This will create one small .bcf file for each sample and you can then run multiple instance of bcftools query to get what you want

Change one word in lots of HTML pages with another word from a list of words

I have about 2000 HTML pages (all pages have the same content except for the city name). I have the list of city names, and i need each page to have 1 city name.
How can I change the City name in each page?
city name list: birmingham
montgomery
mobile
huntsville
tuscaloosa
hoover.. etc...
and I need to make each page like this:
title: birmingham,
next page;
title: montgomery,
and so on.
I need the change to happen in the title:Example (City Name)
and in 2 other h2 tags.
Thank you very much for your attention!
Update:
This script is for the existing files. It will hierarchically find all the index.html files in the current directory and will replace the "string_to_replace" string with that file's parent directory's name which is the city name in your case. It will also make that name capitalized before the replacement.
Feel free to update the tamplate_string variable value in the script so that it fits to the string which is used in your index.html files in place of the city name.
#!/bin/bash
template_string="string_to_replace"
current_dir=`pwd`
find $current_dir -name 'index.html' | while read file; do
dir=`basename $(dirname "$file")`
city="$(tr '[:lower:]' '[:upper:]' <<< ${dir:0:1})${dir:1}"
sed -i -e 's/'$template_string'/'$city'/g' $file
done
Initial answer:
My initial suggestion is to use a bash script (e.g. script.sh) similar to this:
#!/bin/bash
file="./cities.txt"
template="./template.html"
template_string="string_to_replace"
while IFS= read line
do
cp $template $line".html"
sed -i -e 's/'$template_string'/'$line'/g' $line".html"
echo "$line"
done <"$file"
and run it from bash terminal:
$ source script.sh
What you need to have:
cities.txt with cities names list, e.g.
London
Yerevan
Berlin
template.html with the html template you need to have in each file. Make sure the city name is set as "string_to_replace" in it, e.g. title: string_to_replace
Since you did not mention anything related to the files names, the files will be named like London.html, Yerevan.html,...
Let me know in case you don't need to create new files, and need to make replacement in the existing ones. In this case we'll need to update the script a bit after you tell me how you know which string is going to be used in the exact file.

Find Duplicate Function names in different files

I have been merging all of source-code files used by various developers/CAD drafters for the past 15 or so years. It appears that everyone worked off the same code base until about 7 years ago, when everyone seems to have made a local copy of all the files and used/edited them locally.
I have successfully/painfully merged all of their files with the same names back together. However, I am finding that sometimes, files with different names contain functions with the same names and parameters. Tools that are expecting one implementation of a function may end up calling a different one depending on which files were loaded when.
Is there a simple way to search all of the files for repeated function names?
For Example, a function looks like this:
(defun MyInStr (SearchIn SearchFor)
...
)
How could I search all files for (defun MyInStr (SearchIn SearchFor)
I would suggest using ctags to generate the TAGS file, then searching it for duplicate lines:
$ ctags -R
$ sort TAGS -o - | uniq -c | grep -v '^ *1 '
The above will produce output like this:
...
3 defun MyInStr (SearchIn SearchFor)
...
which will tell you that MyInStr is re-defined 3 times in the codebase with the identical signature.
You can also extract just the function name using sed or do a more complicated processing of the TAGS file with perl or lisp or python any other scripting tool.

How to extract strings from plist files for translation (localization)?

I need to prepare list of strings for translation of my iPhone application.
I have extracted strings from *.m files using genstring and from the XIB files using ibtool command.
But I have also lots of texts to translate in plist files (String field types enclosed in string tag).
Is there a nice bash script / command to extract those strings into a flat txt file?
I could review and filter it so my translators can work with nice list but not with alien looking XML file.
I made a custom shell script which tries to figure out the values needed. You can then use the localize.py script in a modified way (see below) to automatically create the translation files. (The line break where somehow very important) If there more entities to be translated, the shell script can be modified accordingly
#!/bin/bash
rm -f $2
sed -n 'N;/<key>Title<\/key>/{N;/<string>.*<\/string>/{s/.*<string>\(.*\)<\/string>.*/\/* \1 *\/\
"\1" = "\1";\
/p;};}' $1 >> $2
sed -n 'N;/<key>FooterText<\/key>/{N;/<string>.*<\/string>/{s/.*<string>\(.*\)<\/string>.*/\/* \1 *\/\
\"\1" = "\1";\
/p;}
;}' $1 >> $2
sed -n 'N;/<key>Titles<\/key>/{N;/<array>/{:a
N;/<\/array>/!{
/<string>.*<\/string>/{s/.*<string>\(.*\)<\/string>.*/\/* \1 *\/\
\"\1" = "\1";\
/p;}
ba
;};};}' $1 >> $2
the localize.py script needed some modification. Therefore I created a small package containing the localizer for the source code and for the plist Files. The new script even supports Duplikates (meaning it will kick them)
We recently made a small online application to do that, please take a look on: http://www.icapps.be/plist-translator/
I can't think of any command off the top of my head. However, plists are glorified xml files and there are various parsers available for them.
It shouldn't be too difficult to create a simple python script to get all the strings from the file.
Does this help?
http://www.icanlocalize.com/site/tutorials/how-to-translate-plist-files/
We much prefer paying clients who use our translation system with our translators, but you can translate yourself in our GUI at no charge.