Removing newlines in Postgres dump

Removing newlines in Postgres dump - postgresql

I'm trying to format a postgres dump (pg_dump) to be able to import it using a JDBC connection. pg_dump exports text fields that contain newlines to as just that, text with newlines, so when I later try to import using JDBC I reach the end of line and the statement fails.
What I want to do is take the dump, pass it through sed and escape all newlines, so that I end up with one INSERT statement per line. Problem is that I cannot just remove all newlines, but I can remove all newlines that do no match this );\nINSERT INTO. Is there a simple way to do just this?
Update:
A sample would look like this:
INSERT INTO sometable (123, And here goes some text
with
newlines
in
it', 'some more fields');
and the result I'm looking for is something like this:
INSERT INTO sometable (123, And here goes some text\nwith\nnewlines\nin\nit', 'some more fields');
So that each INSERTstatement is on a single line, with the string's newlines escaped.

Not a sed solution, but might the following work?
cat test_dump.txt | perl -pe "s/[^(\);INSERT INTO)]\n/\\$1\\n/"

You can do it in vim.
vim my_dump.sql
:%s/\();\)\#<!\n\(INSERT\)\#!//c
% .. do for all lines
s .. substitute
\n .. newline (Unix style; you are aware, that Windows has \r\n and Apple \r for line breaks?)
flags:
c .. Confirm each substitution (for testing first)
info on negative lookahead and lookbehind
:help \#!
:help \#<!
sed normally operates on lines, it needs to go out of its way to replace line breaks.
Google for "sed multi-line replace", you'll find stuff like this.

Related

Use sed for Mixed Case Tags

Trying to reformat tags in an xlm file with gnu sed v4.7 on win10 (shoot me). sed is in the path and run from the Command Prompt. Need to escape some windows command-line characters with ^.
sourcefile
BEGIN
...
<trn:description>V7906 03/11 ALFREDOCAMEL HATSWOOD 74564500125</trn:description>
...
END
(There are three spaces at the start of the line.)
Expected output:
BEGIN
...
<trn:description>V7906 03/11 Alfredocamel Hatswood 74564500125</trn:description>
...
END
I want Title Case but this does in-place to lower case:
sed -i 's/^<trn:description^>\(.*\)^<\/trn:description^>$/^<trn:description^>\L\1^<\/trn:description^>/g' sourcefile
This command changes to Title Case:
sed 's/.*/\L^&/; s/\w*/\u^&/g' sourcefile
Can this be brought together as a one-liner to edit the original sourcefile in-place?
I want to use sed because it is available on the system and the code is consistently structured. I'm aware I should use a tool like xmlstarlet as explained:
sed ... code can't distinguish a comment that talks about sessionId tags from a real sessionId tag; can't recognize element encodings; can't deal with unexpected attributes being present on your tag; etc.

Thanks to Whirlpool Forum members for the answer and discussion.
It was too hard to achieve pattern matching "within the tags" in sed and the file was well formed so the required lines were changed:
sed -i.bak '/^<trn:description^>/s/\w\+/\L\u^&/g; s/^&.*;\^|Trn:Description/\L^&/g' filename
Explanation
in-place edit saving original file with .bak extension
select lines containing <trn:description>
for one or more words
replace first character with uppercase and rest with lowercase
select strings starting with & and ending with ; or Trn:Description
restore codes by replacing characters with lowercase
source/target filename
Note: ^ is windows escape character and is not required in other implementations

Why won't the tab be inserted on the first added line?

I am trying to add multiple lines to a file, all with a leading a tab. The lines should be inserted on the first line after matching a string.
Assume a file with only one line, called "my-file.txt" as follows:
foo
I have tried the following sed command:
sed "/^foo\$/a \tinsert1\n\tinsert2" my-file.txt
This produces the following output:
foo
tinsert1
insert2
Notice how the the tab that should be on the first (inserted) line is omitted. Instead it prints an extra leading 't'.
Why? And how can I change my command to print the tab on the first line, as expected?

With GNU sed:
sed '/^foo$/a \\tinsert1\n\tinsert2' file
<---- single quotes! --->
Produces:
foo
insert1
insert2
From the manual:
a \
text Append text, which has each embedded newline preceded by a backslash.
Since the text to be append itself has to to be preceded by a backslash, it needs to be \\t at the beginning.
PS: If you need to use double quotes around the sed command because you want to inject shell variables, you need to escape the \ which precedes the text to be appended:
ins1="foo"
ins2="bar"
sed "/^foo\$/a \\\t${ins1}\n\t${ins2}" file

sed is for doing s/old/new on individual strings, that is all. Just use awk:
$ awk '{print} $0=="foo"{print "\tinsert1\n\tinsert2"}' file
foo
insert1
insert2
The above will work using any awk in any shell on every UNIX box and is trivial to modify to do anything else you might want to do in future.

sed command over multiple lines not working

I am using sed to replace 14 different abbreviations like CA_23456, CB_scaffold34532,... with 'proper' names in a file and it works putting it all on one line.
acc=$1
sed -e 's/CA_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_arizonica/;s/CB_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_bakeri/;s/CM_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_macrocarpa/;s/CS_[A-Z]*[a-z]*[0-9]*/Cupressus_sempervirens/;s/CT_[A-Z]*[a-z]*[0-9]*/Cupressus_torulosa/;s/JD_[A-Z]*[a-z]*[0-9]*/Juniperus_drupacea/;s/JF_[A-Z]*[a-z]*[0-9]*/Juniperus_flaccida/;s/JI_[A-Z]*[a-z]*[0-9]*/Juniperus_indica/;s/JP_[A-Z]*[a-z]*[0-9]*/Juniperus_phoenicea/;s/JX_[A-Z]*[a-z]*[0-9]*/Juniperus_procera/;s/JS_[A-Z]*[a-z]*[0-9]*/Juniperus_scopulorum/;s/MD_[A-Z]*[a-z]*[0-9]*/Microbiota_decussata/;s/XN_[A-Z]*[a-z]*[0-9]*/Xanthocyparis_nootkatensis/;s/XV_[A-Z]*[a-z]*[0-9]*/Xanthocyparis_vietnamensis/' ${acc}.nex > ${acc}_replaced.nex
To make it more readable I'd like to have the command split over multiple lines using '\' (not all the replacements are shown for brevity)
acc=$1
sed -e 's/CA_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_arizonica/;\
s/CB_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_bakeri/;\
s/CM_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_macrocarpa/'\
${acc}.nex > ${acc}_replaced.nex
However, I get an error message: sed: -e expression #1, char 168: unterminated address regex. I have looked at the answers to similar problems on various webforums and tried various things (using 's/.../.../' on every line, leaving ';' out,....) but I can't get it to work. What am I doing wrong?

Drop the \ that escapes the newlines. (They are not actually doing it!, they are interpreted as wrong syntax by sed). However I would suggest to put it into a file and run it like this:
sed -f script.sed input
where script.sed looks like this:
s/CA_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_arizonica/
s/CB_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_bakeri/
s/CM_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_macrocarpa/

Remove the backslashes from the sed code.
Inside singly-quoted shell strings, backslashes are not needed to escape newlines and are not removed because they are not parsed as escape characters. This has the effect that sed sees them as part of its code, and it then expects to find an address regex with a different delimiter than / before the command ends at the next newline (similar to \,/home/, !d). This address regex does not appear (nor an associated command), and so sed complains about invalid code.
Apart from that: The semicolons in the sed code are no longer necessary when you terminate commands with newlines, and anything involving shell variables should be quoted to avoid splitting in case of whitespace.
In sum:
sed -e 's/CA_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_arizonica/
s/CB_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_bakeri/
s/CM_[A-Z]*[a-z]*[0-9]*/Hesperocyparis_macrocarpa/' \
"${acc}.nex" > "${acc}_replaced.nex"

Export text content to text file without \n mark

When I try to export the text content of a field, and that content have carriage return characters, that chars are output like \N string.
For example:
create table foo ( txt text );
insert into foo ( txt ) values ( 'first line
second line
...
and other lines');
copy foo ( txt ) to '/tmp/foo.txt';
I want to return the following (a):
first line
second line
...
and other lines
But, output is (b):
first line\Nsecond line\N...\Nand other lines
Anybody knows how to get the (a) output?

The \N comes from the fact that one line must correspond to one database row.
This rule is relaxed for the CSV format where multi-line text is possible but then a quote character (by default: ") would enclose the text.
If you want multi-line output and no enclosing character around it, you shouldn't use COPY but SELECT.
Assuming a unix shell as the execution environment of the caller, you could do:
psql -A -t -d dbname -c 'select txt from foo' >/tmp/file.txt

Have you tried: \r\n?
Here's another solution that might work:
E'This is the first part \\n And this is the second'
via https://stackoverflow.com/a/938/1085891
Also, rather than copy the other responses, see here: String literals and escape characters in postgresql

What's the clearest way to replace trailing backslash \ with \n?

I want multi-line strings in java, so I seek a simple preprocessor to convert C-style multi-lines into single lines with a literal '\n'.
Before:
System.out.println("convert trailing backslashes\
this is on another line\
\
\
above are two blank lines\
But don't convert non-trailing backslashes, like: \"\t\" and \'\\\'");
After:
System.out.println("convert trailing backslashes\nthis is on another line\n\n\nabove are two blank lines\nBut don't convert non-trailing backslashes, like: \"\t\" and \'\\\'");
I thought sed would do it well, but sed is line-based, so replacing the '\' and the newline that follows it (effectively joining the two lines) is not very natural in sed. I adapted sredden79's oneliner to the following - it works, it's clever, but it's not clear:
sed ':a { $!N; s/\\\n/\\n/; ta }'
The substitute is of escaped literal backslash, newline with escaped literal backslash, n. :a is a label and ta is goto label if the substitute found a match; $ means the last line, and $! is the opposite (i.e. all lines but the last). N means to append the next line to the pattern space (thus making the \n character visible.)
EDIT here's a variation to keep compiler error line numbers etc accurate: it turns each extended line into "..."+\n (and handles the first and last lines of the String correctly):
sed ':a { $!N; s/\\\n/\\n"+\n"/; ta }'
giving:
System.out.println("convert trailing backslashes\n"+
"this is on another line\n"+
"\n"+
"\n"+
"above are two blank lines\n"+
"But don't convert non-trailing backslashes, like: \"\t\" and \'\\\'");
EDIT Actually, it would be better have Perl/Python style multi-line, where it starts and ends with a special code on one line (""" for python, I think).
Is there a simpler, saner, clearer way (maybe not using sed)?

Is there a simpler, saner, clearer way.
Forget the pre-processor, live with the limitation, complain about it (so that it will maybe be fixed in Java 7 or 8), and use an IDE to ease the pain.
Other alternatives (too troublesome I suppose, but still better than messing with the compilation process):
use a JVM-based language that does support here-docs
externalize the string into a resource file

A perl one-liner:
perl -0777 -pe 's/\\\n/\\n/g'
This will read either stdin or the file(s) named after it on the command line and write the output to stdout.
If you're using an editor that supports filtering, like vi or emacs, just filter your text through the above command and you're done:
If you're using Windows and have to worry about \r :
C:\> perl -0777 -pe "s/\\\r?\n/\\n/g"
although I think win32 Perl handles \r itself so this may be unnecessary.
The -0777 option is a special case of the -0 (that's a zero) option that defines the line or record separator. In this case, it means that we don't want any separator so read the entire file in as a single string.
The -pe option is a combination of -p (process line-by-line and print the result) and -e (next argument is (a line of) the program to execute)

A perl script to what you asked for.
while (<>) {
chomp;
print $_;
if (/\\$/) {
print "n";
} else {
print "\n";
}
}

sed 's/\x5c\x5c$/\x22\x5c\x5cn\x22/'
Hex for backslash and double quote is \x5c and \x22 respectively - it needs to be escaped so \x5c is doubled and the $ anchors to the end of the line.
Updated again per OP comment:
sed "{:a;N;\$!b a};s/\x5c\x5c\n/\x5c\x5cn/g"
The :a creates a label and the N appends a line to the pattern space, the b a branches back to the label :a except when its the last line $!;
After its all loaded - a single line substitution replaces all occurrences of a newline \n with a literal '\n' using the hex ascii code \x5c for the backslash.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Removing newlines in Postgres dump - postgresql

Not a sed solution, but might the following work? cat test_dump.txt | perl -pe "s/[^(\);INSERT INTO)]\n/\\$1\\n/"

Related

Use sed for Mixed Case Tags

Why won't the tab be inserted on the first added line?

sed command over multiple lines not working

Export text content to text file without \n mark

What's the clearest way to replace trailing backslash \ with \n?

Categories

Resources