replace capture match with capture group in bash GNU sed - sed

I've looked around to find a solution to my problem in other posts listed bellow, but it looks my regex is quit different and need special care:
How to output only captured groups with sed
Replace one capture group with another with GNU sed (macOS) 4.4
sed replace line with capture groups
I'm trying to replace a regex match group in big JSON file,
My file has mongoDB exported objects, and I'm trying to replace the objectId with the string:
{"_id":{"$oid":"56cad2ce0481320c111d2313"},"recordId":{"$oid":"56cad2ce0481320c111d2313"}}
So the output in the original file should look like this:
{"_id":"56cad2ce0481320c111d2313","recordId":"56cad2ce0481320c111d2313"}
That's the command I run in the shell:
sed -i 's/(?:{"\$oid":)("\w+")}/\$1/g' data.json
I get no error, but the file remains the same.
What exactly am I doing wrong?

Finally I've managed to make it work, the way regex works in bash is different then in regexr.com tester tool.
echo '{"$oid":"56cad2ce0481320c111d2313"}' | sed 's/{"$oid":\("\w*"\)}/\1/g'
gives the correct output:
"56cad2ce0481320c111d2313"
I found it even better to read from stdin and output to file, instead of writing first to JSON file, then read, replace and write again.
Since I use mongoexport to export collection, replace the objectId and write the output to JSON file, my final solution looks like this:
mongoexport --host localhost --db myDB --collection my_collection | sed 's/{"$oid":\\("\\w*"\\)}/\\1/g' >> data.json

Related

What "*#" means after executint a command in PostgreSql 10 on Windows 7?

I'm using PostgreSQL on Windows 7 through the command line. I want to import the content of different CSV files into a newly created table.
After executing the command the database name appeared like:
database=#
Now appears like
database*# after executing:
type directory/*.csv | psql -c 'COPY sch.trips(value1, value2) from stdin CSV HEADER';
What does *# mean?
Thanks
This answer is for Linux and as such doesn't answer OP's question for Windows. I'll leave it up anyway for anyone that comes across this in the future.
You accidentally started a block comment with your type directory/*.csv. type doesn't do what you think it does. From the bash built-ins:
With no options, indicate how each name would be interpreted if used as a command name.
Try doing cat instead:
cat directory/*.csv | psql -c 'COPY sch.trips(value1, value2) from stdin CSV HEADER';
If this gives you issues because each CSV has its own header, you can also do:
for file in directory/*.csv; do cat "$file" | psql -c 'COPY sch.trips(value1, value2) from stdin CSV HEADER'; done
Type Command
The type built-in command in Bash is a way of viewing command interpreter results. For example, using it with ssh:
$ type ssh
ssh is /usr/bin/ssh
This indicates how ssh would be interpreted when you run ssh as a command in the current Bash environment. This is useful for things like aliases. As an example for this, ll is usually an alias to ls -l. Here's what my Bash environment had for ll:
$ type ll
ll is aliased to `ls -l --color=auto'
For you, when you pipe the result of this command to psql, it encounters the /* in the input and assumes it's a block comment, which is what the database*# prompt means (the * indicates it's waiting for the comment close pattern, */).
Cat Command
cat is for concatenating multiple files together. By default, it writes to standard out, so cat directory/*.csv will write each CSV file to standard out one after another. However, piping this means that each CSV's header will also be piped mid-stream of the copy. This may not be desirable, so:
For Loop
We can use for to loop over each file and individually import it. The version I have above, for file in directory/*.csv, will properly handle files with spaces. Properly formatted:
for file in directory/*; do
cat "$file" | psql -c 'COPY sch.trips(value1, value2) from stdin CSV HEADER'
done
References
PostgreSQL 10 Comments Documentation (postgresql.org)
type built-in Manual page (mankier.com)
cat Manual page (mankier.com)
Bash looping tutorial (tldp.org)

Wrong copy command output with postgres

I'm trying to use the copy command to copy the content of a file into a database.
One of the lines have this:
CCc1ccc(cc1)C(=O)/N=c\1/n(ccs1)C
and when i insert this normally into database there is no errors.
But when i'm trying to use the following command, this line is not insert correctly.
cat smile_test.txt | psql -c "copy testzincsmile(smile) from stdout" teste
This i what i get (it is wrong):
CCc1ccc(cc1)C(=O)/N=c/n(ccs1)C
What's wrong here?
Thank you :)
copy expects a specific input format and cannot just be used to read random text from a file into a field.
See the manual.
The specific issue you're hitting is probably a backslash being interpreted as an escape by the default copy in/out format.
I figure out how to do this:
This is my answer:
cat smile_test.txt | sed '1d; s/\\/\\\\/g' | psql -c "copy testzincsmile(smile) from stdout" teste

Inserting numbers with sed in Linux?

I have the following line in cmdline
sed -e '1s/^/\\documentstyle\[11pt\]\{article\}\n/' -e 's/[0-9]//g' test.txt
My desired output is something like this
\documentstyle[11pt]{article}
rest of the file
However I only get this
\documentstyle[pt]{article}
rest of the file
I can't seem to find a way to insert numbers. I tried backslashing. Solution might be simple, but I'm a newbie with sed.
Note that sed has more commands than just s///. To insert a line at the top of a file:
sed -e '1i\
\\\documentstyle[11pt]{article}' -e 's/[0-9]//g' file
(frustratingly, the number of backslashes to achieve a backslash in the output was found by trial and error)
The bonus is that does not affect your goal to remove numbers.
My second command was removing numbers, working as intended indeed, but I was just trying to do it all at once. Credits to Jonathan Leffler.

Extracting the contents between two different strings using bash or perl

I have tried to scan through the other posts in stack overflow for this, but couldn't get my code work, hence I am posting a new question.
Below is the content of file temp.
<?xml version="1.0" encoding="UTF-8"?>
<env:Envelope xmlns:env="http://schemas.xmlsoap.org/soap/envelope/<env:Body><dp:response xmlns:dp="http://www.datapower.com/schemas/management"><dp:timestamp>2015-01-
22T13:38:04Z</dp:timestamp><dp:file name="temporary://test.txt">XJzLXJlc3VsdHMtYWN0aW9uX18i</dp:file><dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:file></dp:response></env:Body></env:Envelope>
This file contains the base64 encoded contents of two files names test.txt and test1.txt. I want to extract the base64 encoded content of each file to seperate files test.txt and text1.txt respectively.
To achieve this, I have to remove the xml tags around the base64 contents. I am trying below commands to achieve this. However, it is not working as expected.
sed -n '/test.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test.txt">##g'|perl -p -e 's#</dp:file>##g' > test.txt
sed -n '/test1.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test1.txt">##g'|perl -p -e 's#</dp:file></dp:response></env:Body></env:Envelope>##g' > test1.txt
Below command:
sed -n '/test.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test.txt">##g'|perl -p -e 's#</dp:file>##g'
produces output:
XJzLXJlc3VsdHMtYWN0aW9uX18i
<dp:file name="temporary://test1.txt">lc3VsdHMtYWN0aW9uX18i</dp:response> </env:Body></env:Envelope>`
Howeveer, in the output I am expecting only first line XJzLXJlc3VsdHMtYWN0aW9uX18i. Where I am commiting mistake?
When i run below command, I am getting expected output:
sed -n '/test1.txt"\>/,/\<\/dp:file\>/p' temp | perl -p -e 's#<dp:file name="temporary://test1.txt">##g'|perl -p -e 's#</dp:file></dp:response></env:Body></env:Envelope>##g'
It produces below string
lc3VsdHMtYWN0aW9uX18i
I can then easily route this to test1.txt file.
UPDATE
I have edited the question by updating the source file content. The source file doesn't contain any newline character. The current solution will not work in that case, I have tried it and failed. wc -l temp must output to 1.
OS: solaris 10
Shell: bash
sed -n 's_<dp:file name="\([^"]*\)">\([^<]*\).*_\1 -> \2_p' temp
I add \1 -> to show link from file name to content but for content only, just remove this part
posix version so on GNU sed use --posix
assuming that base64 encoded contents is on the same line as the tag around (and not spread on several lines, that need some modification in this case)
Thanks to JID for full explaination below
How it works
sed -n
The -n means no printing so unless explicitly told to print, then there will be no output from sed
's_
This is to substitute the following regex using _ to separate regex from the replacement.
<dp:file name=
Regular text
"\([^"]*\)"
The brackets are a capture group and must be escaped unless the -r option is used( -r is not available on posix). Everything inside the brackets is captured. [^"]* means 0 or more occurrences of any character that is not a quote. So really this just captures anything between the two quotes.
>\([^<]*\)<
Again uses the capture group this time to capture everything between the > and <
.*
Everything else on the line
_\1 -> \2
This is the replacement, so replace everything in the regex before with the first capture group then a -> and then the second capture group.
_p
Means print the line
Resources
http://unixhelp.ed.ac.uk/CGI/man-cgi?sed
http://www.grymoire.com/Unix/Sed.html
/usr/xpg4/bin/sed works well here.
/usr/bin/sed is not working as expected in case if the file contains just 1 line.
below command works for a file containing only single line.
/usr/xpg4/bin/sed -n 's_<env:Envelope\(.*\)<dp:file name="temporary://BackUpDir/backupmanifest.xml">\([^>]*\)</dp:file>\(.*\)_\2_p' securebackup.xml 2>/dev/null
Without 2>/dev/null this sed command outputs the warning sed: Missing newline at end of file.
This because of the below reason:
Solaris default sed ignores the last line not to break existing scripts because a line was required to be terminated by a new line in the original Unix implementation.
GNU sed has a more relaxed behavior and the POSIX implementation accept the fact but outputs a warning.

Sed command to fetch particular string from full string

I've got a file which contains lot of strings like below input.
Need to extract the below output and process it further.
Input:
History={ExecAt=[2013-05-03 03:00:20,2013-05-03 03:00:23,2013-05-03 03:00:26],MId=["msgId3","msgId4","msgId5"]};
Output should be:
MId=["msgId3","msgId4","msgId5"]
using (sed 's/^.*,MId=/MId/') command i got the output like MId=["msgId3","msgId4","msgId5"]};
but still wanted the exact output (need to remove last 2 special chars }; here).
This works for me:
sed 's/.*\(MId=.*\)\}.*/\1/'
If your grep supports the -o option, you can use it rather than sed:
grep -o 'MId=\[[^]]\+\]'
Using the same regex in sed works fine, just remove anything before and after:
sed -e 's/.*\(MId=\[[^]]\+\]\).*/\1/'