SYS_open to SYS_write in x86-64 assembly - x86-64

So I have two functions. One opens a file and returns the file descriptor via reference. There other takes that file descriptor and writes to that file.
mov rax, sys_open ;open file
mov rdi, filename ;name of file
mov rsi, read_only ;access permission
syscall ;system call
(returns valid file descriptor in rax)
In the next function:
mov rax, sys_write
mov rdi, qword[file_descriptor]
mov rsi, string
mov rdx, qword[length]
syscall
this code gives me an error [-9] returned into rax. Is there anything wrong with the block of code that is fundamentally wrong?

Related

Perl warns about invalid encoding even if I don't read the problematic data from the file

I'm trying to read lines from the first part of a file that contains a text header encoded in the cp1252 encoding, and contains binary data after a specific keyword.
Problem
Perl warns about invalid encoding in parts of the file I never read. I've created an example in two files to demonstrate the problem.
Contents of linebug.pl:
#!/usr/bin/perl
use 5.028;
use strict;
use warnings;
open( my $fh, "<:encoding(cp1252)", "testfile" );
while( <$fh> ) {
print;
last if /Last/;
}
Hexdump of testfile, where the byte 0x81 right after the text Wrong is purposefully added because it is not a valid cp1252 codepoint:
46 69 72 73 74 0a |First.|
4c 61 73 74 0a |Last.|
42 75 66 66 65 72 0a |Buffer.|
57 72 6f 6e 67 81 0a |Wrong..|
The third line Buffer is just there to make it clear that I do not read too far. It is a valid line between the last line I read, and the "binary" data.
Here is the output showing that I only ever read two lines, but perl still emits a warning:
user#host$ perl linebug.pl
cp1252 "\x81" does not map to Unicode at ./linebug.pl line 6.
First
Last
user#host$
As can be seen, my program reads and prints the first two lines, and then exits. It should never try to read and interpret anything else, but I still get the warning about \x81 not mapping to Unicode.
Questions
Why does it warn? I'm not reading the line. A hunch tells me it's trying to read ahead, but why would it try to decode?
Is there a workaround, or a better way to handle files where the encoding changes from one section to another?
I still want the warning when reading the initial lines, in case the file is damaged.
Files don't have a concept of lines; they are just streams of bytes. Perl must request a number of bytes from the file from the OS and figure out where the line ends in order to return a line to the program.
Perl could request a single byte at a time from the OS until it has a full line, but that would be very inefficient. There's a lot of overhead involved in making system calls. As such, Perl requests 8 KiB at a time.
Then, the raw data must be decoded before Perl can determine where the line ends, because a raw 0A doesn't necessarily indicate the end of the line.
Similarly to why one doesn't read from a file one byte at a time, asking the decoder to decode just the next character would be inefficient. There is overhead involved every time you start and stop decoding. As such, Perl decodes all the data it reads as it reads it.
So that means that Perl both reads and decodes more than it returns to the program.
The solution is to treat the file as binary (because it's not really a text file if the encoding changes by section) and do the decoding yourself.
If you're dealing with a single-byte encoding like cp1252, you can continue using readline (aka <$fh>). However, instead of telling Perl to search for the Code Point of Line Feed (0A), you need to set $/ to the encoding of the Code Point. As it happens, that's also 0A for cp1252, so no change is needed.
use Encode qw( decode );
open( my $fh, "<:raw", $qfn )
or die( "Can't open \"$qfn\": $!\n" );
while( <$fh> ) {
$_ = decode( 'cp1252', $_ ); # :encoding(cp1252)
s/\r\n\z/\n/ if $^O eq 'Win32'; # :crlf
print;
last if /Last/;
}
If you weren't using a single-byte encoding, you might have to switch to using read. (You could keep using readline for UTF-8 because of the way it's designed.) When using read, the exact solution depends on a few specifics (that pertain to determining how much to read and how much to decode).
Perl reads from the file in 8 KiB chunks, so way more than a line is read at a time. Data is decoded right as it is read (since the stream must be decoded to find the line endings), so an unexpected encoding is noticed and warned about.
One way to deal with this: use non-buffered reads, via sysread, and read smaller chunks at a time.
Count characters read and once you run into that spot you can back up and continue reading character at a time, again counting them, so to detect the exact place. See this post for a working example of identifying the spot where a warning is fired.
In order to be able to stop there you'll likely want to throw a die out of a $SIG{__WARN__} handler, and have all that code in eval. This will allow you to stop at the place where the warning comes from and have control back.
As you've read right up to that spot, you can then re-open the file in the encoding suitable for the rest of the file and seek to that spot and read the rest.
I can't write and test all that right now, hopefully this helps.

Redirect printf() to STDIN of another process

I have been trying to write some code that redirects writes to STDOUT_FILENO, via write(1, line, strlen(line)) or printf(), to the STDIN_FILENO of another process. The other process will be /usr/bin/less. Nothing seems to work, despite trying quite a few attempts on this site, a lot of man page reading and trying every combination of close() and dup2().
Appreciate your help, thanks.
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/stat.h>
#include <termios.h>
#include <unistd.h>
#define INPUT_END 1
#define OUTPUT_END 0
int main(int argc, char* argv[]){
pid_t pid1;
pid_t pid2;
int fd1[2], fd2[2];
int x;
pipe(fd1);
pipe(fd2);
pid1 = fork();
if (pid1 == -1) {
fprintf(stderr, "pid error\n");
exit(1);
}
if(pid1 == 0) {
close(fd1[INPUT_END]);
dup2(fd1[OUTPUT_END], STDIN_FILENO);
close(fd1[OUTPUT_END]);
close(fd2[OUTPUT_END]);
dup2(fd2[INPUT_END], STDOUT_FILENO);
close(fd2[INPUT_END]);
execlp("less", "-r",(char*) NULL);
}else {
close(fd1[OUTPUT_END]);
dup2(fd1[INPUT_END], STDOUT_FILENO);
close(fd1[INPUT_END]);
close(fd2[INPUT_END]);
dup2(fd2[OUTPUT_END], STDIN_FILENO);
close(fd2[OUTPUT_END]);
for(x=0;x<100;x++) {
write(STDOUT_FILENO, "AAA\n", 4);
printf("AAA\n");
fflush(stdout);
}
close(fd1[OUTPUT_END]);
close(fd1[INPUT_END]);
close(fd2[OUTPUT_END]);
close(fd2[INPUT_END]);
waitpid(-1, NULL, 0);
}
}
Your program does exactly what you programmed it to do: It creates two pipes and a child running less, using the pipes to redirect stdout of the program to stdin of less AND stdout of less to stdin of the program. It then writes a bunch of data (which less will copy to its stdout -- stdin of the program) and the waits for less to exit. But that will never happen, as less is waiting for more input.
If you want things to stop, you'll need to close(STDOUT_FILENO) before the call to waitpid -- if you add this, less will exit, and the waitpid will return and your program will then proceed (and exit). You still won't see anything on your terminal, as nothing is ever being written to the terminal -- less is writing to stdin of your program (which you are never reading).
If you increase the amount of data being written, you will see a deadlock -- as you are never reading stdin, once the fd2 pipe fills up, less will block writing, and then once fd1 also fills up, your program will block. The default pipe sizes on Linux are quite large, however, so this will take a lot of data.
Note that you could just as well use cat as less -- less essentially devolves to cat when stdout is not a terminal.
If what you want is to output data to your terminal, using less to mediate that (stopping after a page, etc), then you need to leave stdout of less pointing at your terminal. Simply delete fd2 and every line referring to fd2 from your above code, and it should work -- though it will still hang at the end if you don't close(STDOUT_FILENO)

how to get rid of `Wide character in print at`?

I have file /tmp/xxx with next content:
00000000 D0 BA D0 B8 │ D1 80 D0 B8 │ D0 BB D0 B8 │ D0 BA к и р и л и к
When I read content of file and print it I get the error:
Wide character in print at ...
The source is:
use utf8;
open my $fh, '<:encoding(UTF-8)', '/tmp/xxx';
print scalar <$fh>
The output from print is:
кирилик
The use utf8 means Perl expects your source code to be UTF-8.
The open pragma can change the encoding of the standard filehandles:
use open qw( :std :encoding(UTF-8) );
And, whatever is going to deal with your output needs to expect UTF-8 too. If you want to see it correctly in your terminal, then you need to set up that correctly (but that's nothing to do with Perl).
You're printing to STDOUT which isn't expecting UTF8.
Add
binmode(STDOUT, "encoding(UTF-8)");
to change that on the already opened handle.

how to check in perl if a file is written as little endian or big endian?

Actually i have to parse some files which can be in any form of endian (Big or Little). Perl interpreter dies if I use one encoding and parse other.
open (my $fh, "<:raw:encoding(UTF-16LE):crlf", $ARGV[0]) or die cannot open file for reading : $! \n";
or
open (my $fh, "<:raw:encoding(UTF-16BE):crlf", $ARGV[0]) or die cannot open file for reading : $! \n";
output (for a file in LE and perl's encoding being BE)
UTF-16BE:Malformed HI surrogate dc00 at toASCII.pl line 123.
Most UTF-16le files are valid UTF-16be files, and vice-versa. For example, there's no way to tell if 0A 00 indicates U+000A (UTF-16le) or U+0A00 (UTF-16be). So, assuming there's no BOM, you have to guess.
Possible heuristics (in descending order of reliability):
U+FFFE is not a character (guaranteed).
If the file starts with FF FE, then it must be UTF-16le.
If the file starts with FE FF, then it must be UTF-16be.
If the file isn't valid UTF-16be, then it must be UTF-16le.
If the file isn't valid UTF-16le, then it must be UTF-16be.
If the file contains non-characters when decoded using UTF-16be, then it must be UTF-16le.
If the file contains non-characters when decoded using UTF-16le, then it must be UTF-16be.
U+0A00 isn't currently assigned, but U+000A (LINE FEED) is quite common.U+0D00 isn't currently assigned, but U+000D (CARRIAGE RETURN) is quite common.
If the file contains 0A 00 or 0D 00, then it's probably UTF-16le.
If the file contains 00 0A or 00 0D, then it's probably UTF-16be.
If the file contains unassigned characters when decoded using UTF-16be, then it's probably UTF-16le.
If the file contains unassigned characters when decoded using UTF-16le, then it's probably UTF-16be.
Heuristics based on knowledge of the file format. (Example)
A file is likely to contain more ASCII characters than characters numbers U+xx00
If the file contains many xx 00 and few 00 xx, then it's probably UTF-16le.
If the file contains many 00 xx and few xx 00, then it's probably UTF-16be.
Notes:
#4 and #5 say "it's probably" instead of "it must be" because what's unassigned today could be assigned tomorrow.
#3 includes #1, but #1 is a cheap test.
#5 includes #4, but #4 is almost as reliable as #5 without maintaining a long list of unassigned characters that changes over time.
You could slurp in the file using :raw, perform some or all of the above tests on it to determine the encoding, then use decode and s/\r\n/\n/g.
You don't show any code, but in general it's impossible to tell what endianness a file is unless you know what values you should be reading from the file. Many file formats, for instance, reserve a few bytes at the beginning to indicate what the format is, and if this applies to the data you are dealing with then you can just read those bytes, and change the open mode if you don't get what you're expecting
Alternatively, since your program dies if the wrong format is chosen, then you can use that to test whether the chosen format is correct. Something like this should suit
my $file = $ARGV[0];
open my $fh, '<:raw:encoding(UTF-16LE):crlf', $file or die $!;
eval { do_stuff_that_may_crash() };
if ( $# ) {
if ( $# =~ /Malformed HI surrogate/ ) {
open my $fh, '<:raw:encoding(UTF-16BE):crlf', $file or die $!;
do_stuff_that_may_crash();
}
else {
die $#;
}
}
but since it sounds like do_stuff_that_may_crash() is pretty much all of your program, you should probably find a better criterion

How do I determine or set the working directory of QtSpim?

I just want to run ANY kind of Spim programm using an Syscall for open, read and/or write a file, but that doesn´t work out. I am aware that probably my program and the file are not in the working directory of QtSpim, but I have no Idea how to chance it or set a new directory. So after the first Syscall $v0 is -1, which indaicates an error. I tried using the whole pathname for the to-read-file (example below) and tried to write/create a file to see, where QtSpim would save a file. If I have a fundamental flaw, do not hesitate to let me know. I am using QtSpim under Windows
.data
filename: .asciiz "C:\Users\...\test.txt" #einzulesender Dateiname
buffer: .space 1024
.text
main:
#open the file (to get the file descriptor)
li $v0, 13 # system call for open file
la $a0, filename # board file name
li $a1, 0 # Open for reading
li $a2, # Mode
syscall # open a file (file descriptor returned in $v0)
move $s1, $v0 # save the file descriptor
#read from file
li $v0, 14 # system call for read from file
move $a0, $s1 # file descriptor
la $a1, buffer # address of buffer to which to read
li $a2, 1024 # hardcoded buffer length
syscall # read from file
# Close the file
li $v0, 16 # system call for close file
move $a0, $s1 # file descriptor to close
i got exactly the same problem and my code looks nearly the same. The most documentaries about the syscall functions of qtspim also tell me, that the file descriptor gets returned in $a0. Even thought we get the descriptor actually in $v0.
Burning for the answer :)