combine one colum from two files into a thrid file - sed

I have two files a.txt and b.txt which contains the following data.
$ cat a.txt
0x5212cb03caa111e0
0x5212cb03caa113c0
0x5212cb03caa115c0
0x5212cb03caa117c0
0x5212cb03caa119e0
0x5212cb03caa11bc0
0x5212cb03caa11dc0
0x5212cb03caa11fc0
0x5212cb03caa121c0
$ cat b.txt
36 65 fb 60 7a 5e
36 65 fb 60 7a 64
36 65 fb 60 7a 6a
36 65 fb 60 7a 70
36 65 fb 60 7a 76
36 65 fb 60 7a 7c
36 65 fb 60 7a 82
36 65 fb 60 7a 88
36 65 fb 60 7a 8e
I want to generate a third file c.txt that contains
0x5212cb03caa111e0 36 65 fb 60 7a 5e
0x5212cb03caa113c0 36 65 fb 60 7a 64
0x5212cb03caa115c0 36 65 fb 60 7a 6a
Can I achieve this using awk? How do I do this?

use paste command:
paste a.txt b.txt

paste is really the shortest solution, however if you're looking for awk solution as stated in question then:
awk 'FNR==NR{a[++i]=$0;next} {print a[FNR] "\t" $0}' a.txt b.txt

Here is an awk solution that only stores two lines in memory at a time:
awk '{ getline b < "b.txt"; print $0, b }' OFS='\t' a.txt
Lines from a.txt are implicitly stored in $0 and for each line in a.txt a line is read from b.txt by getline.

Related

How to read currency symbol in a perl script

I have a Perl script where we are reading data from a .csv file which is having some different currency symbol . When we are reading that file and write the content I can see it is printing
Get <A3>50 or <80>50 daily
Actual value is
Get £50 or €50 daily
With Dollar sign it is working fine if there is any other currency code it is not working
I tried
open my $in, '<:encoding(UTF-8)', 'input-file-name' or die $!;
open my $out, '>:encoding(latin1)', 'output-file-name' or die $!;
while ( <$in> ) {
print $out $_;
}
$ od -t x1 input-file-name
0000000 47 65 74 20 c2 a3 35 30 20 6f 72 20 e2 82 ac 35
0000020 30 20 64 61 69 6c 79 0a
0000030
od -t x1 output-file-name
0000000 47 65 74 20 a3 35 30 20 6f 72 20 5c 78 7b 32 30
0000020 61 63 7d 35 30 20 64 61 69 6c 79 0a
0000034
but that is also not helping .Output I am getting
Get \xA350 or \x8050 daily
od -t x1 output-file-name
0000000 47 65 74 20 a3 35 30 20 6f 72 20 5c 78 7b 32 30
0000020 61 63 7d 35 30 20 64 61 69 6c 79 0a
0000034
Unicode Code Point
Glyph
UTF-8
Input File
ISO-8859-1
Output File
U+00A3 POUND SIGN
£
C2 A3
C2 A3
A3
A3
U+20AC EURO SIGN
€
E2 82 AC
E2 82 AC
N/A
5C 78 7B 32 30 61 63 7D
("LATIN1" is an alias for "ISO-8859-1".)
There are no problems with the input file.
£ is correctly encoded in your input file.
€ is correctly encoded in your input file.
As for the output file,
£ is correctly encoded in your output file.
€ isn't found in the latin1 charset, so \x{20ac} is used instead.
Your program is working as expected.
You say you see <A3> instead of £. That's probably because the program you are using is expecting a file encoded using UTF-8, but you provided a file encoded using ISO-8859-1.
You also say you see <80> instead of €. But there's no way you'd see that for the file you provided.

Copying a CSV file from stdin throws "missing data for column"

I have some data that has been exported from postgres, reworked a bit using a spreadsheet and I know want the data back into a table, but I keep failing on the import:
cat extract.csv | psql -h 10.135.0.44 myapp myapp -f copy-user.sql`
psql:copy-user.sql:7: ERROR: missing data for column "email"
CONTEXT: COPY to_update, line 1: ""
The actual data is supplied below. I first converted the CSV file from DOS to Unix style line endings. It didn't seem to matter much.
copy-user.sql
COPY "to_update"
FROM STDIN
WITH DELIMITER ';' CSV;
extract.csv
bfb92e29-1d2c-45c4-b9ab-357a3ac7ad13;test#test90239023783457843.com;x
aeccc3ea-cc1f-43ef-99ff-e389d5d63b22;tester#testerkjnaefgjnwerg.no;x
9cec13ae-c880-4371-9b1c-dd201f5cf233;bloblo#gmail.com;x
aeada2bc-a362-4f3e-80f2-06a717206802;vet#gmail.com;x
fb85ddd8-7d17-4d41-8bc3-213b1e469506;navnnavnesen#ptflow.com;x
528e1f2e-1baa-483b-bc8c-85f993014696;kklk#hotmail.com;x
dbc8a9c1-56cf-4589-8b2c-cf1a2e0832ed;ghiiii#hotmail.com;x
fbf23553-baa2-410a-8f96-32b5c4deb0c7;lala#lala.no;x
e22ec0de-06f9-428a-aa3e-171c38f9a1f7;x2#gmail.com;x
8e8d0f73-8eb7-43b4-8019-b79042731b97;mail#mail.com;x
table definition for to_update
create table to_update(id text, email text, text char);
-- also tried this variant, but same error
-- create table to_update(id uuid, email text, text char);
EDIT: Additional info
It seems this exact same thing doesn't throw on my local machine:
$ cat extract.csv | psql postgres -f copy-user.sql
Timing is on.
Line style is unicode.
Border style is 2.
Null display is "[NULL]".
Expanded display is used automatically.
COPY 0
Time: 0.430 ms
It still doesn't work (as it just copies 0 rows), but at least it doesn't throw an error. That points to it being related to the environment (versions, locale settings, etc).
Local machine (which doesn't throw error)
$ psql --version
psql (PostgreSQL) 10.6
$ psql postgres -c "SHOW server_version;"
Timing is on.
Line style is unicode.
Border style is 2.
Null display is "[NULL]".
Expanded display is used automatically.
┌────────────────┐
│ server_version │
├────────────────┤
│ 10.6 │
└────────────────┘
(1 row)
Time: 40.960 ms
$ printenv | grep LC
LC_CTYPE=UTF-8
Remote server(s) (which throws error)
$ psql --version # this is the client, not the same physical server as the db
psql (PostgreSQL) 9.5.12
$ psql -h 10.135.0.44 myapp myapp -c "SHOW server_version;"
Password for user pete:
server_version
----------------
9.5.12
(1 row)
$ printenv | grep LC
LC_ALL=C.UTF-8
LC_CTYPE=UTF-8
LANG=C.UTF-8
Hex dump of extract.csv (all 7 lines)
$ wc -l extract.csv
10 extract.csv
$ hexdump -C extract.csv
00000000 62 66 62 39 32 65 32 39 2d 31 64 32 63 2d 34 35 |bfb92e29-1d2c-45|
00000010 63 34 2d 62 39 61 62 2d 33 35 37 61 33 61 63 37 |c4-b9ab-357a3ac7|
00000020 61 64 31 33 3b 74 65 73 74 40 74 65 73 74 39 30 |ad13;test#test90|
00000030 32 33 39 30 32 33 37 38 33 34 35 37 38 34 33 2e |239023783457843.|
00000040 63 6f 6d 3b 78 0a 61 65 63 63 63 33 65 61 2d 63 |com;x.aeccc3ea-c|
00000050 63 31 66 2d 34 33 65 66 2d 39 39 66 66 2d 65 33 |c1f-43ef-99ff-e3|
00000060 38 39 64 35 64 36 33 62 32 32 3b 74 65 73 74 65 |89d5d63b22;teste|
00000070 72 40 74 65 73 74 65 72 6b 6a 6e 61 65 66 67 6a |r#testerkjnaefgj|
00000080 6e 77 65 72 67 2e 6e 6f 3b 78 0a 39 63 65 63 31 |nwerg.no;x.9cec1|
00000090 33 61 65 2d 63 38 38 30 2d 34 33 37 31 2d 39 62 |3ae-c880-4371-9b|
000000a0 31 63 2d 64 64 32 30 31 66 35 63 66 32 33 33 3b |1c-dd201f5cf233;|
000000b0 62 6c 6f 62 6c 6f 40 67 6d 61 69 6c 2e 63 6f 6d |bloblo#gmail.com|
000000c0 3b 78 0a 61 65 61 64 61 32 62 63 2d 61 33 36 32 |;x.aeada2bc-a362|
000000d0 2d 34 66 33 65 2d 38 30 66 32 2d 30 36 61 37 31 |-4f3e-80f2-06a71|
000000e0 37 32 30 36 38 30 32 3b 76 65 74 40 67 6d 61 69 |7206802;vet#gmai|
000000f0 6c 2e 63 6f 6d 3b 78 0a 66 62 38 35 64 64 64 38 |l.com;x.fb85ddd8|
00000100 2d 37 64 31 37 2d 34 64 34 31 2d 38 62 63 33 2d |-7d17-4d41-8bc3-|
00000110 32 31 33 62 31 65 34 36 39 35 30 36 3b 6e 61 76 |213b1e469506;nav|
00000120 6e 6e 61 76 6e 65 73 65 6e 40 70 74 66 6c 6f 77 |nnavnesen#ptflow|
00000130 2e 63 6f 6d 3b 78 0a 35 32 38 65 31 66 32 65 2d |.com;x.528e1f2e-|
00000140 31 62 61 61 2d 34 38 33 62 2d 62 63 38 63 2d 38 |1baa-483b-bc8c-8|
00000150 35 66 39 39 33 30 31 34 36 39 36 3b 6b 6b 6c 6b |5f993014696;kklk|
00000160 40 68 6f 74 6d 61 69 6c 2e 63 6f 6d 3b 78 0a 64 |#hotmail.com;x.d|
00000170 62 63 38 61 39 63 31 2d 35 36 63 66 2d 34 35 38 |bc8a9c1-56cf-458|
00000180 39 2d 38 62 32 63 2d 63 66 31 61 32 65 30 38 33 |9-8b2c-cf1a2e083|
00000190 32 65 64 3b 67 68 69 69 69 69 40 68 6f 74 6d 61 |2ed;ghiiii#hotma|
000001a0 69 6c 2e 63 6f 6d 3b 78 0a 66 62 66 32 33 35 35 |il.com;x.fbf2355|
000001b0 33 2d 62 61 61 32 2d 34 31 30 61 2d 38 66 39 36 |3-baa2-410a-8f96|
000001c0 2d 33 32 62 35 63 34 64 65 62 30 63 37 3b 6c 61 |-32b5c4deb0c7;la|
000001d0 6c 61 40 6c 61 6c 61 2e 6e 6f 3b 78 0a 65 32 32 |la#lala.no;x.e22|
000001e0 65 63 30 64 65 2d 30 36 66 39 2d 34 32 38 61 2d |ec0de-06f9-428a-|
000001f0 61 61 33 65 2d 31 37 31 63 33 38 66 39 61 31 66 |aa3e-171c38f9a1f|
00000200 37 3b 78 32 40 67 6d 61 69 6c 2e 63 6f 6d 3b 78 |7;x2#gmail.com;x|
00000210 0a 38 65 38 64 30 66 37 33 2d 38 65 62 37 2d 34 |.8e8d0f73-8eb7-4|
00000220 33 62 34 2d 38 30 31 39 2d 62 37 39 30 34 32 37 |3b4-8019-b790427|
00000230 33 31 62 39 37 3b 6d 61 69 6c 40 6d 61 69 6c 2e |31b97;mail#mail.|
00000240 63 6f 6d 3b 78 0a |com;x.|
00000246
I think you want \copy ... from pstdin... on a single line. Both the starting backslash and pstdin instead of stdin are on purpose.
This mailing-list thread: psql -f COPY from STDIN explains the problem and the solution.
COPY FROM STDIN expects data inline after the COPY command, as in a dump file, not from the standard input of the psql process.
Relevant snippet from the mailing list summing up the alternatives
I'd like the store the COPY command in a separate file without
specifying an input file name. I want to feed it the data from the
shell script that calls psql
"STDIN: All rows are read from the same source that issued the
command"
- As I understand now, this applies to both COPY and \COPY. In other words the input file must contain command and data.
I have found a few solutions to achieve my objective:
1) using COPY FROM STDIN cat event.csv | psql -c "$(cat event.sql)"
2) using COPY FROM STDIN psql -f <(cat event.sql event.csv)
3) using \COPY FROM PSTDIN cat event.csv | psql -f event.sql
4) using \COPY FROM STDIN psql -f <(cat event.sql event.csv <(echo
"."))
What I don't like about \COPY is that it has to be on one line. Indeed
it can't be split over multiple lines
following works in my setup:
cat extract.csv | psql -d db_name -U user_name -c "copy to_update from stdin with delimiter ';' csv"
or
psql -d db_name -U user_name -c "\copy public.to_update(id, email, text) from '/path_to/extract.csv' with delimiter ';' csv"
With regards to the actual thrown error, after some debugging, I found that this error only happens with Postgres 9.5.12, not my local database running 10.6. That's using the exact same script in the sql file.
Postgres 9.5.12 doesn't handle multi-line COPY FROM STDIN statements! Deleting the newlines so that the entire expression was on a single line made it run. It still didn't work, though, as it still showed 0 rows being copied, but that is really a different question ... Krishna was onto something though ... I'll post a separate question for that and link it up.

Why does s/$/,/g replace first character?

OS Ubuntu 12.04
Shell bash
Why does this sed command
sed -e 's/$/,/g' test.in
replace the 5 in test.in?
Here are the contents of test.in
52147398,9480,12/31/2011 23:22,101049000,LNAM,FNAM,80512725,43,0,75,1/1/2012 6:45,101049000
Here are the results of running sed
,2147398,9480,12/31/2011 23:22,101049000,LNAM,FNAM,80512725,43,0,75,1/1/2012 6:45,101049000
I want to put a comma after 1010489000
After replacing the date-time format so that it's in yyyy-mm-dd hh:mm:ss format, now
s/$/,/g
works as expected. Is this because sed got hung up on the mm/dd/yyyy hh:mm format?
Probably your file is in DOS format. The carriage return just before the newline would cause the comma to be written as the first character on the line overwriting an existing character (in this case the 5).
Convert to Unix format first:
tr -d '\r' < file > newfile
$ cat test.txt | sed 's/$/,/g'
52147398,9480,12/31/2011 23:22,101049000,LNAM,FNAM,80512725,43,0,75,1/1/2012 6:45,101049000,
Maybe the file is corrupt? Try copy/pasting into a new file...
Also try,
$ hexdump test.txt
hexdump test.txt
0000000 35 32 31 34 37 33 39 38 2c 39 34 38 30 2c 31 32
0000010 2f 33 31 2f 32 30 31 31 20 32 33 3a 32 32 2c 31
0000020 30 31 30 34 39 30 30 30 2c 4c 4e 41 4d 2c 46 4e
0000030 41 4d 2c 38 30 35 31 32 37 32 35 2c 34 33 2c 30
0000040 2c 37 35 2c 31 2f 31 2f 32 30 31 32 20 36 3a 34
0000050 35 2c 31 30 31 30 34 39 30 30 30 0a
000005c
The first character should be 0x35.

Hex dump parsing in perl

I have a hex dump of a message in a file which i want to get it in an array
so i can perform the decoding logic on it.
I was wondering if that was a easier way to parse a message which looks like this.
37 39 30 35 32 34 35 34 3B 32 31 36 39 33 34 35
3B 32 31 36 39 33 34 36 00 00 01 08 40 00 00 15
6C 71 34 34 73 69 6D 31 5F 33 30 33 31 00 00 00
00 00 01 28 40 00 00 15 74 65 6C 63 6F 72 64 69
74 65 6C 63 6F 72 64 69
Note that the data can be max 16 bytes on any row. But any row can contain fewer bytes too (minimum :1 )
Is there a nice and elegant way rather than to read 2 chars at a time in perl ?
Perl has a hex operator that performs the decoding logic for you.
hex EXPR
hex
Interprets EXPR as a hex string and returns the corresponding value. (To convert strings that might start with either 0, 0x, or 0b, see oct.) If EXPR is omitted, uses $_.
print hex '0xAf'; # prints '175'
print hex 'aF'; # same
Remember that the default behavior of split chops up a string at whitespace separators, so for example
$ perl -le '$_ = "a b c"; print for split'
a
b
c
For every line of the input, separate it into hex values, convert the values to numbers, and push them onto an array for later processing.
#! /usr/bin/perl
use warnings;
use strict;
my #values;
while (<>) {
push #values => map hex($_), split;
}
# for example
my $sum = 0;
$sum += $_ for #values;
print $sum, "\n";
Sample run:
$ ./sumhex mtanish-input
4196
I would read a line at a time, strip the whitespace, and use pack 'H*' to convert it. It's hard to be more specific without knowing what kind of "decoding logic" you're trying to apply. For example, here's a version that converts each byte to decimal:
while (<>) {
s/\s+//g;
my #bytes = unpack('C*', pack('H*', $_));
print "#bytes\n";
}
Output from your sample file:
55 57 48 53 50 52 53 52 59 50 49 54 57 51 52 53
59 50 49 54 57 51 52 54 0 0 1 8 64 0 0 21
108 113 52 52 115 105 109 49 95 51 48 51 49 0 0 0
0 0 1 40 64 0 0 21 116 101 108 99 111 114 100 105
116 101 108 99 111 114 100 105
I think reading in two characters at a time is the appropriate way to parse a stream whose logical tokens are two-character units.
Is there some reason you think that's ugly?
If you're trying to extract a particular sequence, you could do that with whitespace-insensitive regular expressions.

How can I convert a 48 hex string to bytes using Perl?

I have a hex string (length 48 chars) that I want to convert to raw bytes with the pack function in order to put it in a Win32 vector of bytes.
How I can do this with Perl?
my $bytes = pack "H*", $hex;
See perlpacktut for more information.
The steps are:
Extract pairs of hexadecimal characters from the string.
Convert each pair to a decimal number.
Pack the number as a byte.
For example:
use strict;
use warnings;
my $string = 'AA55FF0102040810204080';
my #hex = ($string =~ /(..)/g);
my #dec = map { hex($_) } #hex;
my #bytes = map { pack('C', $_) } #dec;
Or, expressed more compactly:
use strict;
use warnings;
my $string = 'AA55FF0102040810204080';
my #bytes = map { pack('C', hex($_)) } ($string =~ /(..)/g);
I have the string:
"61 62 63 64 65 67 69 69 6a"
which I want to interpret as hex values, and display those as ASCII chars (those values should reproduce the character string "abcdefghij").
Typically, I try to write something quick like this:
$ echo "61 62 63 64 65 67 69 69 6a" | perl -ne 'print "$_"; print pack("H2 "x10, $_)."\n";'
61 62 63 64 65 67 69 69 6a
a
... and then I wonder, why do I get only one character back :)
First, let me note down that the string I have, can also be represented as the hex values of bytes that it takes up in memory:
$ echo -n "61 62 63 64 65 67 68 69 6a" | hexdump -C
00000000 36 31 20 36 32 20 36 33 20 36 34 20 36 35 20 36 |61 62 63 64 65 6|
00000010 37 20 36 38 20 36 39 20 36 61 |7 68 69 6a|
0000001a
_(NB: Essentially, I want to "convert" the above byte values in memory as input, to these below ones, if viewed by hexdump:
$ echo -n "abcdefghij" | hexdump -C
00000000 61 62 63 64 65 66 67 68 69 6a |abcdefghij|
0000000a
... which is how the original values for the input hex string were obtained.
)_
Well, this Pack/Unpack Tutorial (AKA How the System Stores Data) turns out is the most helpful for me, as it mentions:
The pack function accepts a template string and a list of values [...]
$rec = pack( "l i Z32 s2", time, $emp_id, $item, $quan, $urgent);
It returns a scalar containing the list of values stored according to the formats specified in the template [...]
$rec would contain the following (first line in decimal, second in hex, third as characters where applicable). Pipe characters indicate field boundaries.
Offset Contents (increasing addresses left to right)
0 160 44 19 62| 41 82 3 0| 98 111 120 101 115 32 111 102
A0 2C 13 3E| 29 52 03 00| 62 6f 78 65 73 20 6f 66
| b o x e s o f
That is, in my case, $_ is a single string variable -- whereas pack expects as input a list of several such 'single' variables (in addition to a formatting template string); and outputs again a 'single' variable (which could, however, be a sizeable chunk of memory!). In my case, if the output 'single' variable contains the ASCII code in each byte in memory, then I'm all set (I could simply print the output variable directly, then).
Thus, in order to get a list of variables from the $_ string, I can simply split it at the space sign - however, note:
$ echo "61 62 63 64 65 67 68 69 6a" | perl -ne 'print "$_"; print pack("H2", split(/ /, $_))."\n";'
61 62 63 64 65 67 68 69 6a
a
... that amount of elements to be packed must be specified (otherwise again we get only one character back); then, either of these alternatives work:
$ echo "61 62 63 64 65 67 68 69 6a" | perl -ne 'print "$_"; print pack("H2"x10, split(/ /, $_))."\n";'
61 62 63 64 65 67 68 69 6a
abcdeghij
$ echo "61 62 63 64 65 67 68 69 6a" | perl -ne 'print "$_"; print pack("(H2)*", split(/ /, $_))."\n";'
61 62 63 64 65 67 68 69 6a
abcdeghij