I already know how to do this, but I figured I would go ahead and ask since the Rebol documentation is offline today and so in the future others will have an easier time finding out how.
In a nutshell, use read for reading a text file (a string will be returned) and write for writing a string down to a file. For example:
write %hello.txt "Hello World!"
print read %hello.txt
Such text-mode I/O relies on UTF-8 for reading/writing, other encodings will be supported in the future.
Additionally, you can use /binary refinement for both functions to switch to binary mode. You can also use the higher-level load and save counterparts, which will try to encode/decode the data using one of the available codecs (UTF-8 <=> Red values, JPG/PNG/GIF/BMP <=> image! value).
Use help followed by a function name for more info.
Related
I'm wondering how/if one converts a .dat file into something modifiable and readable. Any info would vastly help. For example a code snippet, program, or just general information.
I generally use hexdump, but it's not available for all operating systems, and probably wouldn't help you more than what you're looking at now.
As already pointed out in comments, DAT is a generic extension used to signify a (usually proprietary) binary format -- other than not being text, there's no commonality. You need to know what format it's in before you can have any hope of translating it into something legible by humans.
All we "know" from the extension is that the file is NOT line-based text -- and that could be wrong.
It is known that default printer can be confusing wrt lists because of no output for empty lists and 3 different notations being mixed (, vs (x;y;z) vs 1 2 3) and not obvious indentation/columnization (which is apparently optimized for table data). I am currently using -3! but it is still not ideal.
Is there a ready-made pretty-printer that has consistent uniform output format (basically what I am used to in any other language where list is not special)?
I've started using .j.j for string outputs in error messages more recently in preference to -3!. Mainly I think it is easier to parse in a text log, but also doesn't truncate in the same way.
It still transforms atoms and lists differently so it might not exactly meet your needs, if you really want that you could compose it with the old "ensure this is a list" trick:
myPrinter:('[.j.j;(),])
You might need to supply some examples to better explain your issues and your use-case for pretty-printing.
In general -3! is the most clear visual representation of the data. It is the stringified equivalent to another popular display method which is 0N!.
The parse function is useful for understanding how the interpreter reads/executes commands but I don't think that will be useful in your case
What is the effect of this at the top of a perl script?
use Encode qw/encode decode from_to/;
I found this on code I have taken over, but I don't know what it does.
Short story: for an experienced perl coded who knows what modules are:
The Encode module is for converting perl strings to "some other" format (for which there are many sub-modules that define difference formats). Typically, it's used for converting to and from Unicode formats eg:
... to convert a string from Perl's internal format into ISO-8859-1, also known as Latin1:
$octets = encode("iso-8859-1", $string);
decode is for going the other way, and from_to converts a string from one format to another in place;
from_to($octets, "iso-8859-1", "cp1250");
Long story: for someone who doesn't know what a module is/does:
This is the classic way one uses code from elsewhere. "Elsewhere" usually means one of two possibilities - either;
Code written "in-house" - ie: a part of your private application that a past developer has decided to factor out (presumably) because its applicable in several locations/applications; or
Code written outside the organisation and made available publicly, typically from the Comprehensive Perl Archive Network - CPAN
Now, it's possible - but unlikely - that someone within your organization has created in-house code and co-incidentally used the same name for a module on CPAN so, if you check CPAN by searching for "Encode" - you can see that there is a module of that name - and that will almost certainly be what you are using. You can read about it here.
The qw/.../ stands for "quote words" and is a simple short hand for creating a list of strings; in this case it translates to ("encode", "decode", "from_to") which in turn is a specification of what parts of the Encode module you (or the original author) want.
You can read about those parts under the heading "Basic methods" on the documentation (or "POD") page I referred earlier. Don't be put off by the reference to "methods" - many modules (and it appears this one) are written in such a way that they support both an Object Oriented and functional interface. As a result, you will probably see direct calls to the three functions mentioned earlier as if they were written directly in the program itself.
I'm currently working on internationalizing a very large Perl/Mason web application, as a team of one (does that make this a death march??). The application is nearing 20 years old, and is written in a relatively old-school Perl style; it doesn't use Moose or another OO module. I'm currently planning on using Locale::Maketext::Gettext to do message lookups, and using GNU Gettext catalog files.
I've been trying to develop some tools to aid in string extraction from our bigass codebase. Currently, all I have is a relatively simple Perl script to parse through source looking for string literals, prompt the user with some context and whether or not the string should be marked for translation, and mark it if so.
There's way too much noise in terms of strings I need to mark versus strings I can ignore. A lot of strings in the source aren't user-facing, such as hash keys, or type comparisons like
if (ref($db_obj) eq 'A::Type::Of::Db::Module')
I do apply some heuristics to each proposed string to see whether I can ignore it off the bat (ex. I ignore strings that are used for hash lookups, since 99% of the time in our codebase these aren't user facing). However, despite all that, around 90% of the strings my program shows me are ones I don't care about.
Is there a better way I could help automate my task of string extraction (i.e. something more intelligent than grabbing every string literal from the source)? Are there any commercial programs that do this that could handle both Perl and Mason source?
ALSO, I had a (rather silly) idea for a superior tool, whose workflow I put below. Would it be worth the effort implementing something like this (which would probably take care of 80% of the work very quickly), or should I just submit to an arduous, annoying, manual string extraction process?
Start by extracting EVERY string literal from the source, and putting it into a Gettext PO file.
Then, write a Mason plugin to parse the HTML for each page being served by the application, with the goal of noting strings that the user is seeing.
Use the hell out of the application and try to cover all use cases, building up a store of user facing strings.
Given this store of strings the user saw, do fuzzy matches against strings in the catalog file, and keep track of catalog entries that have a match from the UI.
At the end, anything in the catalog file that didn't get matched would likely not be user facing, so delete those from the catalog.
There are no Perl tools I know of which will intelligently extract strings which might need internationalization vs ones that will not. You're supposed to mark them in the code as you write them, but as you said that wasn't done.
You can use PPI to do the string extraction intelligently.
#!/usr/bin/env perl
use strict;
use warnings;
use Carp;
use PPI;
my $doc = PPI::Document->new(shift);
# See PPI::Node for docs on find
my $strings = $doc->find(sub {
my($top, $element) = #_;
print ref $element, "\n";
# Look for any quoted string or here doc.
# Does not pick up unquoted hash keys.
return $element->isa("PPI::Token::Quote") ||
$element->isa("PPI::Token::HereDoc");
});
# Display the content and location.
for my $string (#$strings) {
my($line, $row, $col) = #{ $string->location };
print "Found string at line $line starting at character $col.\n";
printf "String content: '%s'\n", string_content($string);
}
# *sigh* PPI::Token::HereDoc doesn't have a string method
sub string_content {
my $string = shift;
return $string->isa("PPI::Token::Quote") ? $string->string :
$string->isa("PPI::Token::HereDoc") ? $string->heredoc :
croak "$string is neither a here-doc nor a quote";
}
You can do more sophisticated examination of the tokens surrounding the strings to determine if it's something significant. See PPI::Element and PPI::Node for more details. Or you can examine the content of the string to determine if it's significant.
I can't go much further because "significant" is up to you.
Our Source Code Search Engine is normally used to efficiently search large code bases, using indexes constructed from the lexemes of the languages it knows. That list of languages is pretty broad, including Java, C#, COBOL and ... Perl. The lexeme extractors are language precise (because they are "stolen" from our DMS Software Reengineering Toolkit, a language-agnostic program transformation system, where precision is fundamental).
Given an indexed code base, one can then enter queries to find arbitrary sequences of lexemes in spite of language-specific white space; one can log the hits of such queries and their locations.
The extremely short query:
S
to the Search Engine finds all lexical elements which are classified as strings (keywords, variable names, comments are all ignored; just strings!). (Normally people write more complex queries with regular expression constraints, such as S=*Hello to find strings that end with "Hello")
The relevance here is that the Source Code Search Engine has precise knowledge of lexical syntax of strings in Perl (including specifically elements of interpolated strings and all the wacky escape sequences). So the query above will find all strings in Perl; with logging on, you get all the strings and their locations logged.
This stunt actually works for any langauge the Search Engine understands, so it is a rather general way to extract the strings for such internationalization tasks.
Something I keep doing is removing comments from a file as I process it. I was was wondering if there a module to do this.
Sort of code I keep writing time and again is
while(<>) {
s/#.*// ;
next if /^ \s+ $/x ;
**** do something useful here ****
}
Edit Just to clarify, the input is not Perl. It is a text file of my own making that might have data I want to process in some way or other. I want to beable to place comments that are ignored by my programs
Unless this is a learning experience I suggest you use Regexp::Common::comment instead of writing your own regular expressions.
It supports quite a few languages.
The question does not make clear what type of file it is. Are we dealing with perl source files? If so, your approach is not entirely correct - see gbacon's comment. Perl source files are notoriously difficult (impossible?) to parse with regex. In that case, or if you need to deal with several types of files, use Regexp::Common::comment as suggested by Niffle. Otherwise, if you think your regex logic is correct for your scenario, then I personally prefer to write it explicitly, it's just a pair of strighforward lines, there is little to be gained by using a module (and you introduce a dependency).