OS Ubuntu 12.04
Shell bash
Why does this sed command
sed -e 's/$/,/g' test.in
replace the 5 in test.in?
Here are the contents of test.in
52147398,9480,12/31/2011 23:22,101049000,LNAM,FNAM,80512725,43,0,75,1/1/2012 6:45,101049000
Here are the results of running sed
,2147398,9480,12/31/2011 23:22,101049000,LNAM,FNAM,80512725,43,0,75,1/1/2012 6:45,101049000
I want to put a comma after 1010489000
After replacing the date-time format so that it's in yyyy-mm-dd hh:mm:ss format, now
s/$/,/g
works as expected. Is this because sed got hung up on the mm/dd/yyyy hh:mm format?
Probably your file is in DOS format. The carriage return just before the newline would cause the comma to be written as the first character on the line overwriting an existing character (in this case the 5).
Convert to Unix format first:
tr -d '\r' < file > newfile
$ cat test.txt | sed 's/$/,/g'
52147398,9480,12/31/2011 23:22,101049000,LNAM,FNAM,80512725,43,0,75,1/1/2012 6:45,101049000,
Maybe the file is corrupt? Try copy/pasting into a new file...
Also try,
$ hexdump test.txt
hexdump test.txt
0000000 35 32 31 34 37 33 39 38 2c 39 34 38 30 2c 31 32
0000010 2f 33 31 2f 32 30 31 31 20 32 33 3a 32 32 2c 31
0000020 30 31 30 34 39 30 30 30 2c 4c 4e 41 4d 2c 46 4e
0000030 41 4d 2c 38 30 35 31 32 37 32 35 2c 34 33 2c 30
0000040 2c 37 35 2c 31 2f 31 2f 32 30 31 32 20 36 3a 34
0000050 35 2c 31 30 31 30 34 39 30 30 30 0a
000005c
The first character should be 0x35.
Related
I have a Perl script where we are reading data from a .csv file which is having some different currency symbol . When we are reading that file and write the content I can see it is printing
Get <A3>50 or <80>50 daily
Actual value is
Get £50 or €50 daily
With Dollar sign it is working fine if there is any other currency code it is not working
I tried
open my $in, '<:encoding(UTF-8)', 'input-file-name' or die $!;
open my $out, '>:encoding(latin1)', 'output-file-name' or die $!;
while ( <$in> ) {
print $out $_;
}
$ od -t x1 input-file-name
0000000 47 65 74 20 c2 a3 35 30 20 6f 72 20 e2 82 ac 35
0000020 30 20 64 61 69 6c 79 0a
0000030
od -t x1 output-file-name
0000000 47 65 74 20 a3 35 30 20 6f 72 20 5c 78 7b 32 30
0000020 61 63 7d 35 30 20 64 61 69 6c 79 0a
0000034
but that is also not helping .Output I am getting
Get \xA350 or \x8050 daily
od -t x1 output-file-name
0000000 47 65 74 20 a3 35 30 20 6f 72 20 5c 78 7b 32 30
0000020 61 63 7d 35 30 20 64 61 69 6c 79 0a
0000034
Unicode Code Point
Glyph
UTF-8
Input File
ISO-8859-1
Output File
U+00A3 POUND SIGN
£
C2 A3
C2 A3
A3
A3
U+20AC EURO SIGN
€
E2 82 AC
E2 82 AC
N/A
5C 78 7B 32 30 61 63 7D
("LATIN1" is an alias for "ISO-8859-1".)
There are no problems with the input file.
£ is correctly encoded in your input file.
€ is correctly encoded in your input file.
As for the output file,
£ is correctly encoded in your output file.
€ isn't found in the latin1 charset, so \x{20ac} is used instead.
Your program is working as expected.
You say you see <A3> instead of £. That's probably because the program you are using is expecting a file encoded using UTF-8, but you provided a file encoded using ISO-8859-1.
You also say you see <80> instead of €. But there's no way you'd see that for the file you provided.
I have some data that has been exported from postgres, reworked a bit using a spreadsheet and I know want the data back into a table, but I keep failing on the import:
cat extract.csv | psql -h 10.135.0.44 myapp myapp -f copy-user.sql`
psql:copy-user.sql:7: ERROR: missing data for column "email"
CONTEXT: COPY to_update, line 1: ""
The actual data is supplied below. I first converted the CSV file from DOS to Unix style line endings. It didn't seem to matter much.
copy-user.sql
COPY "to_update"
FROM STDIN
WITH DELIMITER ';' CSV;
extract.csv
bfb92e29-1d2c-45c4-b9ab-357a3ac7ad13;test#test90239023783457843.com;x
aeccc3ea-cc1f-43ef-99ff-e389d5d63b22;tester#testerkjnaefgjnwerg.no;x
9cec13ae-c880-4371-9b1c-dd201f5cf233;bloblo#gmail.com;x
aeada2bc-a362-4f3e-80f2-06a717206802;vet#gmail.com;x
fb85ddd8-7d17-4d41-8bc3-213b1e469506;navnnavnesen#ptflow.com;x
528e1f2e-1baa-483b-bc8c-85f993014696;kklk#hotmail.com;x
dbc8a9c1-56cf-4589-8b2c-cf1a2e0832ed;ghiiii#hotmail.com;x
fbf23553-baa2-410a-8f96-32b5c4deb0c7;lala#lala.no;x
e22ec0de-06f9-428a-aa3e-171c38f9a1f7;x2#gmail.com;x
8e8d0f73-8eb7-43b4-8019-b79042731b97;mail#mail.com;x
table definition for to_update
create table to_update(id text, email text, text char);
-- also tried this variant, but same error
-- create table to_update(id uuid, email text, text char);
EDIT: Additional info
It seems this exact same thing doesn't throw on my local machine:
$ cat extract.csv | psql postgres -f copy-user.sql
Timing is on.
Line style is unicode.
Border style is 2.
Null display is "[NULL]".
Expanded display is used automatically.
COPY 0
Time: 0.430 ms
It still doesn't work (as it just copies 0 rows), but at least it doesn't throw an error. That points to it being related to the environment (versions, locale settings, etc).
Local machine (which doesn't throw error)
$ psql --version
psql (PostgreSQL) 10.6
$ psql postgres -c "SHOW server_version;"
Timing is on.
Line style is unicode.
Border style is 2.
Null display is "[NULL]".
Expanded display is used automatically.
┌────────────────┐
│ server_version │
├────────────────┤
│ 10.6 │
└────────────────┘
(1 row)
Time: 40.960 ms
$ printenv | grep LC
LC_CTYPE=UTF-8
Remote server(s) (which throws error)
$ psql --version # this is the client, not the same physical server as the db
psql (PostgreSQL) 9.5.12
$ psql -h 10.135.0.44 myapp myapp -c "SHOW server_version;"
Password for user pete:
server_version
----------------
9.5.12
(1 row)
$ printenv | grep LC
LC_ALL=C.UTF-8
LC_CTYPE=UTF-8
LANG=C.UTF-8
Hex dump of extract.csv (all 7 lines)
$ wc -l extract.csv
10 extract.csv
$ hexdump -C extract.csv
00000000 62 66 62 39 32 65 32 39 2d 31 64 32 63 2d 34 35 |bfb92e29-1d2c-45|
00000010 63 34 2d 62 39 61 62 2d 33 35 37 61 33 61 63 37 |c4-b9ab-357a3ac7|
00000020 61 64 31 33 3b 74 65 73 74 40 74 65 73 74 39 30 |ad13;test#test90|
00000030 32 33 39 30 32 33 37 38 33 34 35 37 38 34 33 2e |239023783457843.|
00000040 63 6f 6d 3b 78 0a 61 65 63 63 63 33 65 61 2d 63 |com;x.aeccc3ea-c|
00000050 63 31 66 2d 34 33 65 66 2d 39 39 66 66 2d 65 33 |c1f-43ef-99ff-e3|
00000060 38 39 64 35 64 36 33 62 32 32 3b 74 65 73 74 65 |89d5d63b22;teste|
00000070 72 40 74 65 73 74 65 72 6b 6a 6e 61 65 66 67 6a |r#testerkjnaefgj|
00000080 6e 77 65 72 67 2e 6e 6f 3b 78 0a 39 63 65 63 31 |nwerg.no;x.9cec1|
00000090 33 61 65 2d 63 38 38 30 2d 34 33 37 31 2d 39 62 |3ae-c880-4371-9b|
000000a0 31 63 2d 64 64 32 30 31 66 35 63 66 32 33 33 3b |1c-dd201f5cf233;|
000000b0 62 6c 6f 62 6c 6f 40 67 6d 61 69 6c 2e 63 6f 6d |bloblo#gmail.com|
000000c0 3b 78 0a 61 65 61 64 61 32 62 63 2d 61 33 36 32 |;x.aeada2bc-a362|
000000d0 2d 34 66 33 65 2d 38 30 66 32 2d 30 36 61 37 31 |-4f3e-80f2-06a71|
000000e0 37 32 30 36 38 30 32 3b 76 65 74 40 67 6d 61 69 |7206802;vet#gmai|
000000f0 6c 2e 63 6f 6d 3b 78 0a 66 62 38 35 64 64 64 38 |l.com;x.fb85ddd8|
00000100 2d 37 64 31 37 2d 34 64 34 31 2d 38 62 63 33 2d |-7d17-4d41-8bc3-|
00000110 32 31 33 62 31 65 34 36 39 35 30 36 3b 6e 61 76 |213b1e469506;nav|
00000120 6e 6e 61 76 6e 65 73 65 6e 40 70 74 66 6c 6f 77 |nnavnesen#ptflow|
00000130 2e 63 6f 6d 3b 78 0a 35 32 38 65 31 66 32 65 2d |.com;x.528e1f2e-|
00000140 31 62 61 61 2d 34 38 33 62 2d 62 63 38 63 2d 38 |1baa-483b-bc8c-8|
00000150 35 66 39 39 33 30 31 34 36 39 36 3b 6b 6b 6c 6b |5f993014696;kklk|
00000160 40 68 6f 74 6d 61 69 6c 2e 63 6f 6d 3b 78 0a 64 |#hotmail.com;x.d|
00000170 62 63 38 61 39 63 31 2d 35 36 63 66 2d 34 35 38 |bc8a9c1-56cf-458|
00000180 39 2d 38 62 32 63 2d 63 66 31 61 32 65 30 38 33 |9-8b2c-cf1a2e083|
00000190 32 65 64 3b 67 68 69 69 69 69 40 68 6f 74 6d 61 |2ed;ghiiii#hotma|
000001a0 69 6c 2e 63 6f 6d 3b 78 0a 66 62 66 32 33 35 35 |il.com;x.fbf2355|
000001b0 33 2d 62 61 61 32 2d 34 31 30 61 2d 38 66 39 36 |3-baa2-410a-8f96|
000001c0 2d 33 32 62 35 63 34 64 65 62 30 63 37 3b 6c 61 |-32b5c4deb0c7;la|
000001d0 6c 61 40 6c 61 6c 61 2e 6e 6f 3b 78 0a 65 32 32 |la#lala.no;x.e22|
000001e0 65 63 30 64 65 2d 30 36 66 39 2d 34 32 38 61 2d |ec0de-06f9-428a-|
000001f0 61 61 33 65 2d 31 37 31 63 33 38 66 39 61 31 66 |aa3e-171c38f9a1f|
00000200 37 3b 78 32 40 67 6d 61 69 6c 2e 63 6f 6d 3b 78 |7;x2#gmail.com;x|
00000210 0a 38 65 38 64 30 66 37 33 2d 38 65 62 37 2d 34 |.8e8d0f73-8eb7-4|
00000220 33 62 34 2d 38 30 31 39 2d 62 37 39 30 34 32 37 |3b4-8019-b790427|
00000230 33 31 62 39 37 3b 6d 61 69 6c 40 6d 61 69 6c 2e |31b97;mail#mail.|
00000240 63 6f 6d 3b 78 0a |com;x.|
00000246
I think you want \copy ... from pstdin... on a single line. Both the starting backslash and pstdin instead of stdin are on purpose.
This mailing-list thread: psql -f COPY from STDIN explains the problem and the solution.
COPY FROM STDIN expects data inline after the COPY command, as in a dump file, not from the standard input of the psql process.
Relevant snippet from the mailing list summing up the alternatives
I'd like the store the COPY command in a separate file without
specifying an input file name. I want to feed it the data from the
shell script that calls psql
"STDIN: All rows are read from the same source that issued the
command"
- As I understand now, this applies to both COPY and \COPY. In other words the input file must contain command and data.
I have found a few solutions to achieve my objective:
1) using COPY FROM STDIN cat event.csv | psql -c "$(cat event.sql)"
2) using COPY FROM STDIN psql -f <(cat event.sql event.csv)
3) using \COPY FROM PSTDIN cat event.csv | psql -f event.sql
4) using \COPY FROM STDIN psql -f <(cat event.sql event.csv <(echo
"."))
What I don't like about \COPY is that it has to be on one line. Indeed
it can't be split over multiple lines
following works in my setup:
cat extract.csv | psql -d db_name -U user_name -c "copy to_update from stdin with delimiter ';' csv"
or
psql -d db_name -U user_name -c "\copy public.to_update(id, email, text) from '/path_to/extract.csv' with delimiter ';' csv"
With regards to the actual thrown error, after some debugging, I found that this error only happens with Postgres 9.5.12, not my local database running 10.6. That's using the exact same script in the sql file.
Postgres 9.5.12 doesn't handle multi-line COPY FROM STDIN statements! Deleting the newlines so that the entire expression was on a single line made it run. It still didn't work, though, as it still showed 0 rows being copied, but that is really a different question ... Krishna was onto something though ... I'll post a separate question for that and link it up.
This is workfile.txt
NC_001778
NC_005252
NC_004744
NC_003096
NC_005803
I want to read it in array and have only the string without spaces or lines .
this code does what I want on my laptop but it's not working on the linux desktop!
#nodes=<nodefile>;
chomp #nodes;
foreach my $el(#nodes){
chop ($el);
}
print Dumper #nodes;
#output: `bash-4.2$ perl main.pl
';AR1 = 'NC_000893
';AR2 = 'NC_001778
';AR3 = 'NC_005252
';AR4 = 'NC_004744
';AR5 = 'NC_003096
';AR6 = 'NC_005803
`
#hexdump -C workfile.txt |head -20
00000000 4e 43 5f 30 30 30 38 39 33 0d 0d 0a 4e 43 5f 30 |NC_000893...NC_0|
00000010 30 31 37 37 38 0d 0d 0a 4e 43 5f 30 30 35 32 35 |01778...NC_00525|
00000020 32 0d 0d 0a 4e 43 5f 30 30 34 37 34 34 0d 0d 0a |2...NC_004744...|
00000030 4e 43 5f 30 30 33 30 39 36 0d 0d 0a 4e 43 5f 30 |NC_003096...NC_0|
00000040 30 35 38 30 33 0d 0d 0a 4e 43 5f 30 30 36 35 33 |05803...NC_00653|
00000050 31 0d 0d 0a 4e 43 5f 30 30 34 34 31 37 0d 0d 0a |1...NC_004417...|
00000060 4e 43 5f 30 31 33 36 33 33 0d 0d 0a 4e 43 5f 30 |NC_013633...NC_0|
00000070 31 33 36 31 38 0d 0d 0a 4e 43 5f 30 30 32 37 36 |13618...NC_00276|
00000080 31 0d 0d 0a 4e 43 5f 30 31 33 36 32 38 0d 0d 0a |1...NC_013628...|
00000090 4e 43 5f 30 30 35 32 39 39 0d 0d 0a 4e 43 5f 30 |NC_005299...NC_0|
000000a0 31 33 36 30 39 0d 0d 0a 4e 43 5f 30 31 33 36 31 |13609...NC_01361|
000000b0 32 0d 0d 0a 4e 43 5f 30 30 32 36 34 36 0d 0d 0a |2...NC_002646...|
000000c0 4e 43 5f 30 30 34 35 39 35 0d 0d 0a 4e 43 5f 30 |NC_004595...NC_0|
000000d0 30 32 37 33 34 0d 0d 0a 4e 43 5f 30 30 34 35 39 |02734...NC_00459|
000000e0 38 0d 0d 0a 4e 43 5f 30 30 34 35 39 34 0d 0d 0a |8...NC_004594...|
000000f0 4e 43 5f 30 30 38 34 34 38 0d 0d 0a 4e 43 5f 30 |NC_008448...NC_0|
00000100 30 34 35 39 33 0d 0d 0a 4e 43 5f 30 30 32 36 34 |04593...NC_00264|
00000110 37 0d 0d 0a 4e 43 5f 30 30 32 36 37 34 0d 0d 0a |7...NC_002674...|
00000120 4e 43 5f 30 30 33 31 36 33 0d 0d 0a 4e 43 5f 30 |NC_003163...NC_0|
00000130 30 33 31 36 34 0d 0d 0a 4e 43 5f 30 32 30 31 35 |03164...NC_02015|
any suggestion ? thanks in advance
The problem is that you have Windows line endings in this file, which is why when you use linux, your chomp is not removing line endings properly. It does not explain why chop does not remove the last character, which should be \r after chomp.
Your output
';AR6 = 'NC_005803
Indicates that the last character in the string is in fact \r. This is not an actual problem with the string, just with the visual representation. If you want to see this character written out literally, you can use the option
$Data::Dumper::Useqq = 1;
Which will then produce the output
$VAR6 = "NC_005803\r";
How to fix it?
A simple fix is to use the dos2unix utility in linux to fix the file. To fix it in Perl, you can do something like
s/[\r\n]*\z// for #nodes; # remove all \r and \n from end of string
s/\s*\z// for #nodes; # remove all whitespace from end of string
s/\r//g for #nodes; # remove all \r from string
tr/\r//d for #nodes; # same
I have two files a.txt and b.txt which contains the following data.
$ cat a.txt
0x5212cb03caa111e0
0x5212cb03caa113c0
0x5212cb03caa115c0
0x5212cb03caa117c0
0x5212cb03caa119e0
0x5212cb03caa11bc0
0x5212cb03caa11dc0
0x5212cb03caa11fc0
0x5212cb03caa121c0
$ cat b.txt
36 65 fb 60 7a 5e
36 65 fb 60 7a 64
36 65 fb 60 7a 6a
36 65 fb 60 7a 70
36 65 fb 60 7a 76
36 65 fb 60 7a 7c
36 65 fb 60 7a 82
36 65 fb 60 7a 88
36 65 fb 60 7a 8e
I want to generate a third file c.txt that contains
0x5212cb03caa111e0 36 65 fb 60 7a 5e
0x5212cb03caa113c0 36 65 fb 60 7a 64
0x5212cb03caa115c0 36 65 fb 60 7a 6a
Can I achieve this using awk? How do I do this?
use paste command:
paste a.txt b.txt
paste is really the shortest solution, however if you're looking for awk solution as stated in question then:
awk 'FNR==NR{a[++i]=$0;next} {print a[FNR] "\t" $0}' a.txt b.txt
Here is an awk solution that only stores two lines in memory at a time:
awk '{ getline b < "b.txt"; print $0, b }' OFS='\t' a.txt
Lines from a.txt are implicitly stored in $0 and for each line in a.txt a line is read from b.txt by getline.
I have a hex dump of a message in a file which i want to get it in an array
so i can perform the decoding logic on it.
I was wondering if that was a easier way to parse a message which looks like this.
37 39 30 35 32 34 35 34 3B 32 31 36 39 33 34 35
3B 32 31 36 39 33 34 36 00 00 01 08 40 00 00 15
6C 71 34 34 73 69 6D 31 5F 33 30 33 31 00 00 00
00 00 01 28 40 00 00 15 74 65 6C 63 6F 72 64 69
74 65 6C 63 6F 72 64 69
Note that the data can be max 16 bytes on any row. But any row can contain fewer bytes too (minimum :1 )
Is there a nice and elegant way rather than to read 2 chars at a time in perl ?
Perl has a hex operator that performs the decoding logic for you.
hex EXPR
hex
Interprets EXPR as a hex string and returns the corresponding value. (To convert strings that might start with either 0, 0x, or 0b, see oct.) If EXPR is omitted, uses $_.
print hex '0xAf'; # prints '175'
print hex 'aF'; # same
Remember that the default behavior of split chops up a string at whitespace separators, so for example
$ perl -le '$_ = "a b c"; print for split'
a
b
c
For every line of the input, separate it into hex values, convert the values to numbers, and push them onto an array for later processing.
#! /usr/bin/perl
use warnings;
use strict;
my #values;
while (<>) {
push #values => map hex($_), split;
}
# for example
my $sum = 0;
$sum += $_ for #values;
print $sum, "\n";
Sample run:
$ ./sumhex mtanish-input
4196
I would read a line at a time, strip the whitespace, and use pack 'H*' to convert it. It's hard to be more specific without knowing what kind of "decoding logic" you're trying to apply. For example, here's a version that converts each byte to decimal:
while (<>) {
s/\s+//g;
my #bytes = unpack('C*', pack('H*', $_));
print "#bytes\n";
}
Output from your sample file:
55 57 48 53 50 52 53 52 59 50 49 54 57 51 52 53
59 50 49 54 57 51 52 54 0 0 1 8 64 0 0 21
108 113 52 52 115 105 109 49 95 51 48 51 49 0 0 0
0 0 1 40 64 0 0 21 116 101 108 99 111 114 100 105
116 101 108 99 111 114 100 105
I think reading in two characters at a time is the appropriate way to parse a stream whose logical tokens are two-character units.
Is there some reason you think that's ugly?
If you're trying to extract a particular sequence, you could do that with whitespace-insensitive regular expressions.