Correct syntax to run this as a sed script - sed

The sed below works on the command line and does the following
sed -n '/CREATE EXTERNAL/,/^[)]/{/^[()]/d;p}' ddl.txt
Print all lines between lines containing CREATE EXTERNAL and ) except those that start with an ( or ) from the file ddl.txt
As you may note the exception for lines starting ( or ) is handled as an embedded address, lines which match it are deleted when encountered.
All attempts I have made to put this into a file that can be run with sed -f have failed. I would be grateful if someone can show me the right syntax.
Here is some test data
CREATE EXTERNAL TABLE foo.bar
(
mykey INT,
mydata VARCHAR(20),
lastfield VARCHAR(20)
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\137'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
which should yield
CREATE EXTERNAL TABLE foo.bar
mykey INT,
mydata VARCHAR(20),
lastfield VARCHAR(20)

Related

Exporting new line characters to text in Postgres

There is a text field in a Postgres database containing new lines. I would like to export the content of that field to a text file, preserving those new lines. However, the COPY TO command explictly transforms those characters into the \n string. For example:
$ psql -d postgres -c "COPY (SELECT CHR(10)) TO '/tmp/out.txt';"
COPY 1
$ cat /tmp/out.txt
\n
This behaviour seems to match the short description in the documents:
Presently, COPY TO will never emit an octal or hex-digits backslash sequence, but it does use the other sequences listed above for those control characters.
Is there any workaround to get the new line in the output? E.g. that a command like:
$ psql -d postgres -c "COPY (SELECT 'A line' || CHR(10) || 'Another line') TO '/tmp/out.txt';"
Results in something like:
A line
Another line
Update: I do not wish to obtain a CSV file. The output must not have headers, column separators or column decorators such as quotes (exactly as exemplified in the output above). The answers provided in a different question with COPY AS CSV do not fulfil this requirement.
Per my comment:
psql -d postgres -U postgres -c "COPY (SELECT CHR(10)) TO '/tmp/out.txt' WITH CSV;"
Null display is "NULL".
COPY 1
cat /tmp/out.txt
"
"
psql -d postgres -U postgres -c "COPY (SELECT 'A line' || CHR(10) || 'Another line') TO '/tmp/out.txt' WITH CSV;"
Null display is "NULL".
COPY 1
cat /tmp/out.txt
"A line
Another line"
Using the CSV format will maintain the embedded line breaks in the output. This is explained here COPY under CSV Format
The values in each record are separated by the DELIMITER character. If the value contains the delimiter character, the QUOTE character, the NULL string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the QUOTE character, and any occurrence within the value of a QUOTE character or the ESCAPE character is preceded by the escape character. You can also use FORCE_QUOTE to force quotes when outputting non-NULL values in specific columns.
...
Note
CSV format will both recognize and produce CSV files with quoted values containing embedded carriage returns and line feeds. Thus the files are not strictly one line per table row like text-format files.
UPDATE
Alternate method that does not involve quoting, using psql.
create table line_wrap(id integer, fld_1 varchar);
insert into line_wrap values (1, 'line1
line2');
insert into line_wrap values (2, 'line3
line4');
select fld_1 from line_wrap
\g (format=unaligned tuples_only=on) out.txt
cat out.txt
line1
line2
line3
line4

sed search and replace number with another on specific line

I would like to replace in multiple dump files (*.sql):
`this_pattern` varchar(255)
by
`this_pattern` varchar(100)
i.e. only when "this_pattern" is present on the line.
I tried simple with:
echo "varchar(255)" | sed -e 's/\(255\)/100/g'
but I don't know how to put the this_pattern condition.
sed "/^`this_pattern`/ s/\(255\)/100/g"
did nothing.
Thanks for your help.
Try this, using a capturing group ( ) :
$ cat file
`this_pattern` varchar(255)
varchar(255)
$ sed -E '/^`this_pattern`/ s/(varchar)\(255\)/\1(100)/g' file
`this_pattern` varchar(100)
varchar(255)
echo 'this_pattern varchar(255)' | sed '/^this_pattern.*/ s/varchar(255)/varchar(100)/g'
Without -r switch, sed treats round parens just as characters.
If this_pattern is masked by backticks, like in the original post, it's a bit more complicated, but that doesn't look plausible. However, here we go:
echo '`this_pattern` varchar(255)' | sed '/^`this_pattern` .*/ s/varchar(255)/varchar(100)/g'
`this_pattern` varchar(100)
Note: There might be variable whitespace between 'pattern' and 'varchar' and 'varchar' and '(' and so on. So you have to be careful about this. Sometimes, it's a one time job, with dozens of lines, sometimes it's a repeated job. Sometimes the input is of generic nature, and expected to always look the same, sometimes it's man made, and may differ from time to time. You have to take that into account, when doing massive updates.
Keeping a copy of the source is often a good idea. Often editors allow using regular expressions, where mass renaming is possible, but they mark the places so you can oversee the changes, which will be made.

How to safely unload/copy a table in RedShift?

In RedShift, it is convenient to use unload/copy to move data to S3 and load back to redshift, but I feel it is hard to choose the delimiter each time. The right delimiter is relevant to the content of the table! I had to change the delimiter each time I met load errors.
For example, when I use the following command to unload/copy a table:
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes allowoverwrite;
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes;
I will get load error if the table happens to have a field with its content as "||". Then I have to change the delimiter '|' to another one like ',' and try again, if I'm unlucky, maybe it takes multiple tries to get a success.
I'm wondering if there's a way to unload/copy a redshift table which is irrelevant to the content of the table, which will always succeed no mater what weird strings are stored in the table.
Finally I figured out the right approach, to add escape in both unload and copy command:
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' addquotes escape allowoverwrite;
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter '|' removequotes escape;
With escape in unload command, for CHAR and VARCHAR columns in delimited unload files, an escape character (\) is placed before every occurrence of the following characters:
Linefeed: \n
Carriage return: \r
The delimiter character specified for the unloaded data.
The escape character: \
A quote character: " or ' (if both ESCAPE and ADDQUOTES are specified
in the UNLOAD command).
And with escape in copy command, the backslash character () in input data is treated as an escape character. The character that immediately follows the backslash character is loaded into the table as part of the current column value, even if it is a character that normally serves a special purpose. For example, you can use this option to escape the delimiter character, a quote, an embedded newline, or the escape character itself when any of these characters is a legitimate part of a column value.
Try unload like below
unload ('select * from tbl_example') to 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter as ',' addquotes escape
To load it back use as below
copy tbl_example2 from 's3://s3bucket/tbl_example' CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=xxx' delimiter ',' removequotes escape;
This will work irrespective of your data might have , in between.
Since this topic comes up in many places we decided to package up the UNLOAd/extract process into a Docker service. All the code is on Github so you can use it as-is or grab the underlying Python code to create your own version: https://github.com/openbridge/ob_redshift_unload
You can set the delimiter, dates and ad hoc SQL via run-time configuration. This will also export a header row as well, something that is a little more complicated to undertake.
Here are a few of the runtime options:
-t: The table you wish to UNLOAD
-f: The S3 key at which the file will be placed
-s (Optional): The file you wish to read a custom valid SQL WHERE clause from. This will be sanitized then inserted into the UNLOAD command.
-r (Optional): The range column you wish to use to constrain the results. Any type supported by Redshift's BETWEEN function is accepted here (date, integer, etc.)
-r1 (Optional): The desired start range to constrain the result set
-r2 (Optional): The desired end range to constrain the result set
Note: -s and -d are mutually exlusive and cannot be used together. If neither is used, the script will default to not specifying a WHERE clause and output the entire table.
Then you can run it like this to UNLOAD:
docker run -it -v /local/path/to/my/config.json:/config.json openbridge/ob_redshift_unload python /unload.py -t mytable -f s3://dest-bucket/foo/bar/output_file.csv -r datecol -r1 2017-01-01 -r2 2017-06-01
The goal was to enhance the default UNLOAD process and wrap it into something that can help ensure consistency in generating outputs.
Here is a write-up that details the features/capabilities:
https://blog.openbridge.com/how-to-easily-extract-data-from-amazon-redshift-4e55435f7003

PostgreSQL. Import data skip n lines

I would like to know how I can import my data to table. I know the COPY command and the option HEADER. But the file I have to import has the following format:
Line 1: header1, header2, header3,...
Line 2: vartype, vartype, vartype,...
Line 3: data1, data2,...
Like you can see, I need to skip the second line too. For example:
"phonenumber","countrycode","firstname","lastname"
INTEGER,INTEGER,VARCHAR(50),VARCHAR(50)
123456789,44,"James","Bond"
5551234567,1,"Angelina","Jolie"
912345678,34,"Antonio","Banderas"
The first line is the exact name of the table's columns. I have tried to use the INSERT INTO command but I have not got good result.
I am using these two strategies for this type of problem:
1) Import all
import all rows into temporary table where columns have varchar type
delete rows you do not want
insert data into final table, cast varchar to desired types
2) Pre-process
delete rows from imported file
import
For your case, you can delete 2nd line using sed for example:
sed -i '2d' importfile.txt
This will remove 2nd line from file named importfile.txt. Note that flag -i will overwrite the file immediately, so use it with care.
You can use this to delete range of lines:
sed -i '2,4d' importfile.txt
This will remove lines 2, 3, 4 from file.
If you are working in a Linux shell you could always just stream in the records you want, eg
tail -[number of lines minus header] <file> | psql <db> -c "COPY <table> FROM STDIN CSV;"
or if your header is marked by say "#"
grep -v "^#" <file> | psql <db> -c "COPY <table> FROM STDIN CSV;"
You'll have to pre-process the file I'm afraid. There are far too many strange formats (like this one) around for COPY to understand - it just concentrates on handling the basics. You can trim the second line out with a simple bit of sed or perl.
perl -ne 'print unless ($.==2)' source_file.txt

Export text content to text file without \n mark

When I try to export the text content of a field, and that content have carriage return characters, that chars are output like \N string.
For example:
create table foo ( txt text );
insert into foo ( txt ) values ( 'first line
second line
...
and other lines');
copy foo ( txt ) to '/tmp/foo.txt';
I want to return the following (a):
first line
second line
...
and other lines
But, output is (b):
first line\Nsecond line\N...\Nand other lines
Anybody knows how to get the (a) output?
The \N comes from the fact that one line must correspond to one database row.
This rule is relaxed for the CSV format where multi-line text is possible but then a quote character (by default: ") would enclose the text.
If you want multi-line output and no enclosing character around it, you shouldn't use COPY but SELECT.
Assuming a unix shell as the execution environment of the caller, you could do:
psql -A -t -d dbname -c 'select txt from foo' >/tmp/file.txt
Have you tried: \r\n?
Here's another solution that might work:
E'This is the first part \\n And this is the second'
via https://stackoverflow.com/a/938/1085891
Also, rather than copy the other responses, see here: String literals and escape characters in postgresql