I have lots of parametrs which I have to add between text: e.g. I would like to write:
Anne has 3 apples.
with
name="Anne"
numFruit=3
fruit="apples"
with print I could do the following:
print '{myName} has {myNumFruit} {myFruit}.'.format(
... myName=name, myNumFruit=numFruit, myFruit=fruit)
In this it does not matter in which order I give the variables.
Is there somethins similar for write (write to file), or do I have to do it via %s, %d, etc and keep track of the order in which I call the variables in the text?
Thanks
Related
I have a table with about 500 variables and 2000 cases. The type of these variables varies. My supervisor has asked me to produce a table listing all the numeric variables, along with their maximums and minimums. I am supposed to use SPSS because R apparently messes up the value labels.
I've only done very basic things in SPSS before this, like finding statistics for single variables, and I'm not sure how to do this. I think I should probably do something like:
*Create new table*
DATASET DECLARE maxAndMin.
*Loop through all variables: Use conditional statement to identify numeric variables*
DO REPEAT R=var1 TO varN.
FREQUENCIES VARIABLES /STATISTICS=MINIMUM
END REPEAT
*Find max and minimum*
I'm not sure how to go about this though. Any suggestions would be appreciated.
The following code will first make a list of all numeric variables in the dataset (and store it in a macro called !nums) and then it will run an analysis of those variables to tell you the mean, maximum and minimum of each:
SPSSINC SELECT VARIABLES MACRONAME="!nums" /PROPERTIES TYPE= NUMERIC.
DESCRIPTIVES !nums /STATISTICS=MEAN MIN MAX.
You can use the following code to create a tiny dataset to test the above code on:
data list list/n1 (f1) t1(a1) n2(f1) t2(a1).
begin data
1 "a" 34 "b"
2 "a" 23 "b"
3 "a" 52 "b"
4 "a" 71 "b"
end data.
If SUMMARIZE produces a nice enough table for you, here is a "non-extension" way of doing it.
file handle mydata /name="<whatever/wherever>".
data list free /x (f1) y (a5) z (F4.2).
begin data.
1 yes 45.67
2 no 32.00
3 maybe .
4 yes 22.02
5 no 12.79
end data.
oms select tables
/destination format=sav outfile=mydata
/if subtypes="Descriptive Statistics" /tag="x".
des var all.
omsend tag="x".
get file mydata.
summarize Var1 Mean Minimum Maximum /format list nocasenum nototal
/cells none /statistics none /title "Numeric Variables Only".
or use a DATASET command instead of file handle if you don't need the file on disk.
I'm using RRDs::Simple function and it needs bunch of parameters.
I have placed these parameters in a special variable (parsing, sorting and calculating data from a file) with all quotes, commas and other stuff.
Of course
RRDs::create ($variable);
doesn't work.
I've glanced through all perl special variables and have found nothing.
How to substitute name of variable for the data that contained in that variable?
At least could you tell me with what kind of tools(maybe another special variables) it can be done?
Assuming I'm understanding what you're asking:
You've build the 'create' data in $variable, and are now trying to use RRDs::create to actually do it?
First step is:
print $variable,"\n"; - to see what is actually there. You should be able to use this from the command line, with rrdtool create. (Which needs a filename, timestep, and some valid DS and RRA parameters)
usually, I'll use an array to pass into RRDs::create:
RRDs::create ( "test.rrd", "-s 300",
"DS:name:GAUGE:600:U:U", )
etc.
If $variable contains this information already, then that should be ok. The way to tell what went wrong is:
if ( RRDs::error ) { print RRDs::error,"\n"; }
It's possible that creating the file is the problem, or that your RRD definitions are invalid for some reason. rrdtool create on command line will tell you, as will RRDs::error;
Let's say I am trying to read in data line by line from a file called input.txt. There's about 20 lines and each line consists of 3 different data types. If I use this code:
while(!file.eof){ ..... }
Does this function look at only one data type from each line per iteration, or does it look at the all the data types at once for each line per iteration--so the next iteration would look at the next line instead of the next data type?
Many thanks.
.eof() looks at the end of file flag. The flag is set after you run over the end of the file. This is not desirable.
A great blog post on how this works and best practice can be found here.
Basically, use
std::string line;
while(getline(file, line)) { ... }
or
while (file >> some_data) { ... }
as it will notice errors and the end of the file at the correct time and act accordingly.
For those that are experts in lexing and parsing... I am attempting to write a series of programs in perl that would parse out IBM mainframe z/OS JCL for a variety of purposes, but am hitting a roadblock in methodology. I am mostly following the lexing/parsing ideology put forth in "Higher Order Perl" by Mark Jason Dominus, but there are some things that I can't quite figure out how to do.
JCL has what's called inline data, which is very similar to "here" documents. I am not quite sure how to lex these into tokens.
The layout for inline data is as follows:
//DDNAME DD *
this is the inline data
this is some more inline data
/*
...
Conventionally, the "*" after the "DD" signifies that following lines are the inline data itself, terminated by either "/*" or the next valid JCL record (starting with "//" in the first 2 columns).
More advanced, the inline data could appear as such:
//DDNAME DD *,DLM=ZZ
//THIS LOOKS LIKE JCL BUT IT'S ACTUALLY DATA
//MORE DATA MASQUERADING AS JCL
ZZ
...
Sometimes the inline data is itself JCL (perhaps to be pumped to a program or the internal reader, whatever).
But here's the rub. In JCL, the records are 80 bytes, fixed in length. Everything past column 72 (cols 73-80) is a "comment". As well, everything following a blank that follows valid JCL is likewise a comment. Since I am looking to manipulate JCL in my programs and spit it back out, I'd like to capture comments so that I can preserve them.
So, here's an example of inline comments in the case of inline data:
//DDNAME DD *,DLM=ZZ THIS IS A COMMENT COL73DAT
data
...
ZZ
...more JCL
I originally thought that I could have my top-most lexer pull in a line of JCL and immediately create a non-token for cols 1-72 and then a token (['COL73COMMENT',$1]) for the column 73 comment, if any. This would then pass downstream to the next iterator/tokenizer a string of the cols 1-72 text followed by the col73 token.
But how would I, downstream from there, grab the inline data? I'd originally figured that the top-most tokenizer could look for a "DD \*(,DLM=(\S*))" (or the like) and then just keep pulling records from the feeding iterator until it hit the delimiter or a valid JCL starter ("//").
But you may see the issue here... I can't have 2 topmost tokenizers... either the tokenizer that looks for COL73 comments must be the top or the tokenizer that gets inline data must be at the top.
I imagine that perl parsers have the same challenge, since seeing
<<DELIM
isn't necessarily the end of the line, followed by the here document data. After all, you could see perl like:
my $this=$obj->ingest(<<DELIM)->reformat();
inline here document data
more data
DELIM
How would the tokenizer/parser know to tokenize the ")->reformat();" and then still grab the following records as-is? In the case of the inline JCL data, those lines are passed as-is, cols 73-80 are NOT comments in that case...
So, any takers on this? I know there will be tons of questions clarifying my needs and I'm happy to clarify as much as is needed.
Thanks in advance for any help...
In this answer I will concentrate on heredocs, because the lessons can be easily transferred to the JCL.
Any language that supports heredocs is not context-free, and thus cannot be parsed with common techniques like recursive descent. We need a way to guide the lexer along more twisted paths, but in doing so, we can maintain the appearance of a context-free language. All we need is another stack.
For the parser, we treat introductions to heredocs <<END as string literals. But the lexer has to be extended to do the following:
When a heredoc introduction is encountered, it adds the terminator to the stack.
When a newline is encountered, the body of the heredoc is lexed, until the stack is empty. After that, normal parsing is resumed.
Take care to update the line number appropriately.
In a hand-written combined parser/lexer, this could be implemented like so:
use strict; use warnings; use 5.010;
my $s = <<'INPUT-END'; pos($s) = 0;
<<A <<B
body 1
A
body 2
B
<<C
body 3
C
INPUT-END
my #strs;
push #strs, parse_line() while pos($s) < length($s);
for my $i (0 .. $#strs) {
say "STRING $i:";
say $strs[$i];
}
sub parse_line {
my #strings;
my #heredocs;
$s =~ /\G\s+/gc;
# get the markers
while ($s =~ /\G<<(\w+)/gc) {
push #strings, '';
push #heredocs, [ \$strings[-1], $1 ];
$s =~ /\G[^\S\n]+/gc; # spaces that are no newlines
}
# lex the EOL
$s =~ /\G\n/gc or die "Newline expected";
# process the deferred heredocs:
while (my $heredoc = shift #heredocs) {
my ($placeholder, $marker) = #$heredoc;
$s =~ /\G(.*\n)$marker\n/sgc or die "Heredoc <<$marker expected";
$$placeholder = $1;
}
return #strings;
}
Output:
STRING 0:
body 1
STRING 1:
body 2
STRING 2:
body 3
The Marpa parser simplifies this a bit by allowing events to be triggered once a certain token is parsed. These are called pauses, because the built-in lexing pauses a moment for you to take over. Here is a high-level overview and a short blogpost describing this technique with the demo code on Github.
In case anyone was wondering how I decided to resolve this, here is what I did.
My main lexing routine accepts an iterator that pumps full lines of text (which can take it from a file, a string, whatever I want). The routine uses that to create another iterator, which examines the line for "comments" after column 72, which it will then return as a "mainline" token followed by a "col72" token. This iterator is then used to create yet another iterator, which passes the col72 tokens through unchanged, but takes the mainline tokens and lexes them into atomic tokens (things like STRING, NUMBER, COMMA, NEWLINE, etc).
But here's the crux... the lexing routine has the ORIGINAL ITERATOR still... so when it receives a token that indicates there is a "here" document, it continues processing tokens until it hits a NEWLINE token (meaning end of the actual line of text) and then uses the original iterator to pull off the here document data. Since that iterator feeds the atomic tokens iterator, pulling from it then prevents those lines from being atomized.
To illustrate, think of iterators like hoses. The first hose is the main iterator. To that I attach the col72 iterator hose, and to that I attach the atomic tokenizer hose. As streams of characters go in the first hose, atomized tokens come out the end of the third hose. But I can attach a 2-way nozzle to the first hose that will allow its output to come out the alternate nozzle, preventing that data from going into the second hose (and hence the third hose). When I'm done diverting the data through the alternate nozzle, I can turn that off and then data begins flowing through the second and third hoses again.
Easy-peasey.
So me being the 'noob' that I am, being introduced to programming via Perl just recently, I'm still getting used to all of this. I have a .fasta file which I have to use, although I'm unsure if I'm able to open it, or if I have to work with it 'blindly', so to speak.
Anyway, the file that I have contains DNA sequences for three genes, written in this .fasta format.
Apparently it's something like this:
>label
sequence
>label
sequence
>label
sequence
My goal is to write a script to open and read the file, which I have gotten the hang of now, but I have to read each sequence, compute relative amounts of 'G' and 'C' within each sequence, and then I'm to write it to a TAB-delimited file the names of the genes, and their respective 'G' and 'C' content.
Would anyone be able to provide some guidance? I'm unsure what a TAB-delimited file is, and I'm still trying to figure out how to open a .fasta file to actually see the content. So far I've worked with .txt files which I can easily open, but not .fasta.
I apologise for sounding completely bewildered. I'd appreciate your patience. I'm not like you pros out there!!
I get that it's confusing, but you really should try to limit your question to one concrete problem, see https://stackoverflow.com/faq#questions
I have no idea what a ".fasta" file or 'G' and 'C' is.. but it probably doesn't matter.
Generally:
Open input file
Read and parse data. If it's in some strange format that you can't parse, go hunting on http://metacpan.org for a module to read it. If you're lucky someone has already done the hard part for you.
Compute whatever you're trying to compute
Print to screen (standard out) or another file.
A "TAB-delimite" file is a file with columns (think Excel) where each column is separated by the tab ("\t") character. As quick google or stackoverflow search would tell you..
Here is an approach using 'awk' utility which can be used from the command line. The following program is executed by specifying its path and using awk -f <path> <sequence file>
#NR>1 means only look at lines above 1 because you said the sequence starts on line 2
NR>1{
#this for-loop goes through all bases in the line and then performs operations below:
for (i=1;i<=length;i++)
#for each position encountered, the variable "total" is increased by 1 for total bases
total++
}
{
for (i=1;i<=length;i++)
#if the "substring" i.e. position in a line == c or g upper or lower (some bases are
#lowercase in some fasta files), it will carry out the following instructions:
if(substr($0,i,1)=="c" || substr($0,i,1)=="C")
#this increments the c count by one for every c or C encountered, the next if statement does
#the same thing for g and G:
c++; else
if(substr($0,i,1)=="g" || substr($0,i,1)=="G")
g++
}
END{
#this "END-block" prints the gene name and C, G content in percentage, separated by tabs
print "Gene name\tG content:\t"(100*g/total)"%\tC content:\t"(100*c/total)"%"
}