Why don't I get output when I use __DATA__ in Perl? - perl

Does anybody know if the Perl __DATA__ syntax on macOS Catalina is deprecated? I have perl v5.18.4 running, even a simple program like this gives no output (and no error either);
use strict;
use warnings;
while(<DATA>){
print $_;
}
__DATA__
line1
line2
line3
Edit:
This is weird. I said earlier that I have 2 Mac systems, both having the same problem. Not quite right, on one system the program works, on the other system the same program doesn’t.
Hexdump on both systems is the same:
Mac Mini:
Mac-mini-van-Theo:Programming theo$ hexdump -C test.pl
00000000 75 73 65 20 73 74 72 69 63 74 3b 0d 75 73 65 20 |use strict;.use |
00000010 77 61 72 6e 69 6e 67 73 3b 0d 77 68 69 6c 65 28 |warnings;.while(|
00000020 3c 44 41 54 41 3e 29 20 7b 0d 20 20 20 20 70 72 |<DATA>) {. pr|
00000030 69 6e 74 20 24 5f 3b 0d 7d 0d 5f 5f 44 41 54 41 |int $_;.}.__DATA|
00000040 5f 5f 0d 6c 69 6e 65 31 0d 6c 69 6e 65 32 0d 6c |__.line1.line2.l|
00000050 69 6e 65 33 0d |ine3.|
00000055
iMac:
Theo#iMac-van-Theo Programming % hexdump -C test.pl
00000000 75 73 65 20 73 74 72 69 63 74 3b 0a 75 73 65 20 |use strict;.use |
00000010 77 61 72 6e 69 6e 67 73 3b 0a 77 68 69 6c 65 28 |warnings;.while(|
00000020 3c 44 41 54 41 3e 29 7b 0a 20 20 20 20 70 72 69 |<DATA>){. pri|
00000030 6e 74 20 24 5f 3b 0a 7d 0a 0a 5f 5f 44 41 54 41 |nt $_;.}..__DATA|
00000040 5f 5f 0a 6c 69 6e 65 31 0a 6c 69 6e 65 32 0a 6c |__.line1.line2.l|
00000050 69 6e 65 33 0a |ine3.|
00000055
However, a 'cat’ or a ‘more’ shows differences:
Mac Mini:
Mac-mini-van-Theo:Programming theo$ more test.pl
use strict;^Muse warnings;^Mwhile(<DATA>) {^M print $_;^M}^M__DATA__^Mline1^Mline2^Mline3
iMac:
Theo#iMac-van-Theo Programming % more test.pl
use strict;
use warnings;
while(<DATA>){
print $_;
}
__DATA__
line1
line2
line3
The difference? The Mac Mini uses ‘bash’ as shell (where the program fails), the iMac uses ‘zsh’. So the problem is not really perl related but perl/shell related. With Catalina, Zsh is used as the default shell but the old Bash shell is still included with macOS and you can still switch to it. It seems to be related to how the shell handles line-endings, although I do not understand why this happens and moreover how to solve it.

__DATA__ is just fine, and there's no platform-specific issues with it (and there are lots of stupid tricks you can do with it).
However, if you want to know the state of any particular Perl thing, there's the perldeprecation docs. Sometimes perlexperiment is handy too.
How are you running your program?
Supply a hexdump of your program: hexdump -C program.pl. Maybe there are funny characters.

Related

Can someone tell me why I am getting this error, is it because of the spacing (I know quotations matter)? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
create table product(
productid int,
description varchar(20)
);

insert into product (
productid,
description )
Values ( 42 , ' tv');
ERROR: column "description" of relation "product" does not exist
As several people pointed out in comments, there are invisible characters (sometimes called "gremlins") in your SQL that make it invalid. Here's a hex dump of the contents (after copying the code from the question, using macOS commands):
$ pbpaste | xxd -g1
00000000: 63 72 65 61 74 65 20 74 61 62 6c 65 20 70 72 6f create table pro
00000010: 64 75 63 74 28 0a 70 72 6f 64 75 63 74 69 64 20 duct(.productid
00000020: 69 6e 74 2c e2 80 a8 0a 64 65 73 63 72 69 70 74 int,....descript
^^ ^^ ^^ ^^^
00000030: 69 6f 6e 20 76 61 72 63 68 61 72 28 32 30 29 0a ion varchar(20).
00000040: 29 3b 0a e2 80 a8 69 6e 73 65 72 74 20 69 6e 74 );....insert int
00000050: 6f 20 70 72 6f 64 75 63 74 20 28 e2 80 a8 70 72 o product (...pr
00000060: 6f 64 75 63 74 69 64 2c e2 80 a8 64 65 73 63 72 oductid,...descr
^^ ^^ ^^ ^^^
00000070: 69 70 74 69 6f 6e 20 29 e2 80 a8 56 61 6c 75 65 iption )...Value
^^ ^^ ^^ ^^^
00000080: 73 20 28 20 34 32 20 2c 20 27 20 74 76 27 29 3b s ( 42 , ' tv');
00000090: 0a 45 52 52 4f 52 3a 20 20 63 6f 6c 75 6d 6e 20 .ERROR: column
000000a0: 22 64 65 73 63 72 69 70 74 69 6f 6e 22 20 6f 66 "description" of
000000b0: 20 72 65 6c 61 74 69 6f 6e 20 22 70 72 6f 64 75 relation "produ
000000c0: 63 74 22 20 64 6f 65 73 20 6e 6f 74 20 65 78 69 ct" does not exi
000000d0: 73 74 st
(Note that xxd represents bytes that don't correspond to printable ASCII characters as "." in the text display on the right. The "."s that correspond to 0a in hex are newline characters.)
The hex codes e2 80 a8 correspond to the UTF-8 encoding of the unicode "line separator" character. I don't know how that character got in there; you'd have to trace back the origin of that code snippet to figure out where they were added.
I'd avoid using TextEdit for source code (and config files, etc) . Instead, I'd recommend using BBEdit or some other code-oriented editor. I think even in BBEdit's free-demo mode it can show (and let you remove) normally-invisible characters by choosing View menu -> Text Display -> Show Invisibles.
You can also remove non-plain-ASCII characters from a text file from the macOS Terminal with:
LC_ALL=C tr -d '\n\t -~' <infile.txt >cleanfile.txt
(Replacing infile.txt and cleanfile.txt with the paths/names of the input file and where you want to store the output.) Warning: do not try to write the cleaned contents back to the original file, that won't work. Also, don't use this to clean anything except plain text files (if the file has any sections that aren't supposed to be text sections, this may mangle those sections). Keep the original file as a backup until you've verified that the "clean" version works right.
You can also "clean" the paste buffer with:
pbpaste | LC_ALL=C tr -d '\n\t -~' | pbcopy
...so just copy the relevant code from your text editor, run that in Terminal, then paste the cleaned version back into the editor.

Can I tell GitHub (or eq.) to use ASCII to make my binary files readable?

I want to host a binary file on a web-based hosting service for git (i.e. GitHub) so I can easily see any changes made to it.
The binary file in question uses the common ASCII character encoding so that this binary
73 63 6F 70 65 20 68 75 72 72 72 20 69 6E 69 74 69 61 6C 69 7A 65 72 20 64 65 72 70 0D 0A 20 20 20 20 66 75 6E 63 74 69 6F 6E 20 64 65 72 70 20 74 61 6B 65 73 20 6E 6F 74 68 69 6E 67 20 72 65 74 75 72 6E 73 20 6E 6F 74 68 69 6E 67 0D 0A 20 20 20 20 20 20 20 20 63 61 6C 6C 20 53 65 74 53 74 61 72 74 4C 6F 63 50 72 69 6F 28 24 42 2C 24 41 2C 24 41 2C 4D 41 50 5F 4C 4F 43 5F 50 52 49 4F 5F 48 49 47 48 29 0D 0A 20 20 20 20 65 6E 64 66 75 6E 63 74 69 6F 6E 0D 0A 65 6E 64 73 63 6F 70 65
becomes this readable text (†)
scope hurrr initializer derp
function derp takes nothing returns nothing
call SetStartLocPrio($B,$A,$A,MAP_LOC_PRIO_HIGH)
endfunction
endscope
The problem is that services like GitHub will only show me the raw binary when I want to view the file in-browser (or have me download and open it in a text editor):
Right now, to have any changes made, I have to download the changed binary file, convert it to readable text, then use diff to see what changes have been made. This is tedious and loses the beautiful web interface that GitHub has.
So my question is this: Can I tell GitHub (or any equivalent service) to translate a binary file to readable text?
--
(†) For anyone interested in trivia, this is indeed vJass syntax for WarCraft III.

Base64 encode - decode issue

I read that Base64 is deterministic algorithm, and produce unique results. Consider these two encoded base64 values:
KFNjcmlwdCBYU1MpIFVSTDogaHR0cDovL2xvY2FsaG9zdDo4MDgwL2h0bWxQYXJzZXIvLiB8IElEOiBUaGlzIGlzIElEMC4gfCBDbGFzczogVW5hdmFpbGFibGUuLiB8IFRleHQgQ29udGVudDogCgogICAgRGF0YTogPGlucHV0IG1heGxlbmd0aD0iOTk5OTkiIG5hbWU9ImRhdGEiIHR5cGU9InRleHQiPiA8YnI+CiAgICA8aW5wdXQgdmFsdWU9InN1Ym1pdCIgdHlwZT0ic3VibWl0Ij4KICAgIAogIDxpbnB1dCBvbmNsaWNrPSJqYXZhc2NyaXB0OmFsZXJ0KCdJbnB1dCBJRCBYU1MgLSBtb3VzZWNsaWNrJykiIGF1dG9mb2N1cz0iIj4gIAogICAgIAogICA8c2NyaXB0IGNsYXNzPSJmZmYiPmphdmFzY3JpcHQ6YWxlcnQoJ1NjcmlwdCBGT1JNIFhTUycpPC9zY3JpcHQ+
&
KFNjcmlwdCBYU1MpIFVSTDogaHR0cDovL2xvY2FsaG9zdDo4MDgwL2h0bWxQYXJzZXIvLiB8IElEOiBUaGlzIGlzIElEMC4gfCBDbGFzczogVW5hdmFpbGFibGUuLiB8IFRleHQgQ29udGVudDogDQoNCiAgICBEYXRhOiA8aW5wdXQgbWF4bGVuZ3RoPSI5OTk5OSIgbmFtZT0iZGF0YSIgdHlwZT0idGV4dCI+IDxicj4NCiAgICA8aW5wdXQgdmFsdWU9InN1Ym1pdCIgdHlwZT0ic3VibWl0Ij4NCiAgICANCiAgPGlucHV0IG9uY2xpY2s9ImphdmFzY3JpcHQ6YWxlcnQoJ0lucHV0IElEIFhTUyAtIG1vdXNlY2xpY2snKSIgYXV0b2ZvY3VzPSIiPiAgDQogICAgIA0KICAgPHNjcmlwdCBjbGFzcz0iZmZmIj5qYXZhc2NyaXB0OmFsZXJ0KCdTY3JpcHQgRk9STSBYU1MnKTwvc2NyaXB0Pg==
These both give me same decoded output. How is this possible? I couldn't find any visible difference between decoded format of these two values.
Is it related to encoding and decoding schemes like UTF 8 and ASCII?
Decoded strings are different -- they have different CR/LF sequences at the end of lines: \n\n (0a0a) vs \r\n\r\n (0d0a0d0a):
00000060 20 54 65 78 74 20 43 6f 6e 74 65 6e 74 3a 20 0a | Text Content: .|
00000070 0a 20 20 20 20 44 61 74 61 3a 20 3c 69 6e 70 75 |. Data: <inpu|
00000060 20 54 65 78 74 20 43 6f 6e 74 65 6e 74 3a 20 0d | Text Content: .|
00000070 0a 0d 0a 20 20 20 20 44 61 74 61 3a 20 3c 69 6e |... Data: <in|
Hint: use hexdump -C <file> to get such output.

MSMQ How best to handle classes when using binary encoding

I'm new here, so please be gentle.
This question revolves around VB.net / VS2010 / MSMQ 4.0
I'm developing an application that has MSMQ at its heart. There are (currently) 3 separate VB solutions each of which send and receive message to a queue.
I tried using the XMLMessageFormatter and ran into problems with that, plus this is a high performance, time critical app and I understand that XMLMessaegFormatter has a high overhead, so I've switched over to using BinaryMessageFormatter for the messages.
I've established a class (clsTMessage) which provides the structure for the message data and resides in its own .vb file attached to the solution. I realize that the downside of using Binaryformatter is that the exact same class (down to version and all) has to encode and decode the messages and indeed I'm seeing that problem.
So I figured, no problem, I'd just copy clsTmessage.vb to each solution, but that doesn't quite do the trick as the messages encodes with the namespace of the host assemby and therefore the next solution to pick up the message is technically looking for a different class to decode it.
In this example, for instance, you can see that TelemanusWorkbench Version 1.0.0.0 encoded the message using TelemanusWorkbench.clsTMessage.
00 01 00 00 00 FF FF FF .....ÿÿÿ
FF 01 00 00 00 00 00 00 ÿ.......
00 0C 02 00 00 00 49 54 ......IT
65 6C 65 6D 61 6E 75 73 elemanus
57 6F 72 6B 62 65 6E 63 Workbenc
68 2C 20 56 65 72 73 69 h, Versi
6F 6E 3D 31 2E 30 2E 30 on=1.0.0
2E 30 2C 20 43 75 6C 74 .0, Cult
75 72 65 3D 6E 65 75 74 ure=neut
72 61 6C 2C 20 50 75 62 ral, Pub
6C 69 63 4B 65 79 54 6F licKeyTo
6B 65 6E 3D 6E 75 6C 6C ken=null
05 01 00 00 00 1E 54 65 ......Te
6C 65 6D 61 6E 75 73 57 lemanusW
6F 72 6B 62 65 6E 63 68 orkbench
2E 63 6C 73 54 4D 65 73 .clsTMes
73 61 67 65 09 00 00 00 sage....
0E 6E 65 77 4D 65 73 73 .newMess
61 67 65 54 79 70 65 12 ageType.
6E 65 77 50 72 6F 74 6F newProto
63 6F 6C 56 65 72 73 69 colVersi
6F 6E 0D 6E 65 77 49 64 on.newId
65 6E 74 69 66 69 65 72 entifier
0B 6E 65 77 53 6F 75 72 .newSour
63 65 49 50 0D 6E 65 77 ceIP.new
53 6F 75 72 63 65 50 6F SourcePo
72 74 10 6E 65 77 44 65 rt.newDe
73 74 69 6E 61 74 69 6F stinatio
6E 49 50 12 6E 65 77 44 nIP.newD
65 73 74 69 6E 61 74 69 estinati
6F 6E 50 6F 72 74 0C 6E onPort.n
65 77 54 69 6D 65 73 74 ewTimest
61 6D 70 0E 6E 65 77 4D amp.newM
65 73 73 61 67 65 42 6F essageBo
64 79 01 01 01 01 01 01 dy......
01 00 01 0D 02 00 00 00 ........
06 03 00 00 00 03 44 46 ......DF
58 06 04 00 00 00 01 30 X......0
06 05 00 00 00 0C 30 30 ......00
30 30 30 30 30 30 30 30 00000000
30 30 06 06 00 00 00 07 00......
30 2E 30 2E 30 2E 30 06 0.0.0.0.
07 00 00 00 01 30 06 08 .....0..
00 00 00 0B 31 39 32 2E ....192.
31 36 38 2E 31 2E 31 06 168.1.1.
09 00 00 00 04 35 30 30 .....500
30 20 46 FE 12 F9 32 CF 0 Fþ.ù2Ï
88 06 0A 00 00 00 49 70 .....Ip
2C 31 2C 31 32 33 34 35 ,1,12345
36 37 38 39 30 31 32 33 67890123
34 35 36 37 38 39 2C 31 456789,1
32 33 34 35 36 37 38 39 23456789
30 31 32 33 34 35 2C 31 012345,1
2C 69 6E 74 65 72 6E 65 ,interne
74 2C 75 73 65 72 6E 61 t,userna
6D 65 2C 70 61 73 73 77 me,passw
6F 72 64 2C 30 2C 33 30 ord,0,30
0B .
When I pick up the message from another solution/project within the app, it fails to parse the message even though it has an identical copy of clsTMessage it's in namespace TelemanusListener.clsTMessage.
Given that it's generically a bad idea to have multiple copies of the class in different parts of the app anyway, what's the reccomended way to do this ? I've read what MSDN has to say bout this, but it's very thin on how to actually implement it.
Hope I've explained that well enought, if not please ask for more info.
Duncan
Yes. One class library with a public message type needs to be referenced from the two projects.
Bit of warning about automatic properties - don't use them within classes that need to be serialised/deserialised. Each time a class type is compiled into an assembly, the compiler creates a randomly named backing field for each automatic property. This can cause serialisation problems when you deploy the one/same class library compiled at different times with different projects.

using sed, how does one match square brackets in a character class?

Here's a chunk of the raw data:
00000000 54 6f 70 69 63 20 46 6f 72 75 6d 20 52 65 70 6c |Topic Forum Repl|
00000010 69 65 73 20 4c 61 73 74 20 70 6f 73 74 20 31 20 |ies Last post 1 |
00000020 4c 69 6e 75 78 20 54 6f 64 61 79 20 31 34 3a 34 |Linux Today 14:4|
00000030 36 3a 35 37 20 62 79 20 4c 69 6e 75 78 20 4f 75 |6:57 by Linux Ou|
00000040 74 6c 61 77 73 20 32 36 39 20 e2 80 93 20 53 6f |tlaws 269 ... So|
00000050 6d 65 6f 6e 65 20 4b 6c 6f 73 65 20 54 68 61 74 |meone Klose That|
00000060 20 4f 75 74 6c 61 77 73 20 32 38 20 73 79 73 79 | Outlaws 28 sysy|
00000070 70 68 75 73 2e 6a 6f 6e 65 73 20 48 6f 6c 65 20 |phus.jones Hole |
00000080 62 79 20 59 4f 42 41 20 5b 20 31 20 32 20 5d 20 |by YOBA [ 1 2 ] |
00000090 32 20 4c 69 6e 75 78 20 26 20 54 6f 64 61 79 20 |2 Linux & Today |
000000a0 31 31 3a 34 34 3a 35 31 20 62 79 20 4c 6f 6f 6b |11:44:51 by Look|
000000b0 73 20 6c 69 6b 65 20 43 61 6e 6f 6e 69 63 61 6c |s like Canonical|
000000c0 20 69 73 20 61 6e 6e 6f 75 63 69 6e 67 20 70 6c | is annoucing pl|
000000d0 61 6e 73 20 46 72 65 65 64 6f 6d 20 31 20 6b 72 |ans Freedom 1 kr|
It's a hex dump and I'm interested in isolating the text part.
Here's a sed expression that almost works:
$ sed 's/.* |\([a-zA-Z0-9:& \.]*\)|$/\1/g' hex.dat
Topic Forum Repl
ies Last post 1
Linux Today 14:4
6:57 by Linux Ou
tlaws 269 ... So
meone Klose That
Outlaws 28 sysy
phus.jones Hole
00000080 62 79 20 59 4f 42 41 20 5b 20 31 20 32 20 5d 20 |by YOBA [ 1 2 ] |
2 Linux & Today
11:44:51 by Look
s like Canonical
is annoucing pl
ans Freedom 1 kr
Almost. But how to filter that last line though?
$ sed 's/.* |\([a-zA-Z0-9:&\[\] \.]*\)|$/\1/g' hex.dat
And:
$ sed 's/.* |\([a-zA-Z0-9:&\\[\\] \.]*\)|$/\1/g' hex.dat
Don't work at all (they fail to translate anything).
And:
$ sed 's/.* |\([a-zA-Z0-9:&[] \.]*\)|$/\1/g' hex.dat
obviously can't work.
Thanks for any help.
You almost had it.
Look at this section of a Unix regular expressions tutorial.
The way that yours could be done is by placing ][ immediately after you begin your character class.
So, try sed 's/.* |\([][a-zA-Z0-9:& \.]*\)|$/\1/g' hex.dat
For clarification, it does not matter where in the character class the [ is, so long as the closing bracket you intend to include in your character class (]) immediately follows the opening of your character class.
Also, as a further edit, try typing man cut and using what Tomasz said in a comment.
cut -d='|' -f2 hex.dat will cut your file, delimiting on a pipe, and take the second field.