Regarding Capture in Stata - append

I have code that I mostly took from here (bottom column): http://www.ats.ucla.edu/stat/stata/faq/append_many_files.htm
clear
file open myfile9 using C:\Users\RNCZF01\Documents\Cameron-Fen\Economics-Projects\Neighborhood-Project\list.csv, read
file read myfile9 line
insheet using `line', comma
save `line'.dta, replace
save master_data.dta, replace
drop _all
file read myfile9 line
while r(eof)==0 {
capture insheet using `line', comma
if _rc!=0 {
insheet using `line', comma
save `line'.dta, replace
append using master_data.dta, force
save master_data.dta, replace
}
drop _all
file read myfile9 line
}
Originally I had insheet using line', comma (I removed the back tick before the line because it was interfering with formatting). But the problem was that some of my sheets I was attempting to read were blank and so Stata would close. Thus I changed that to this:
capture insheet using `line', comma
if _rc!=0 {
insheet using `line', comma
However this closes after only reading the first document (and exits in the while loop before the first iteration of the while loop (second document) is done). My thought was that macros may disappear when they are used but I have no idea.

The reason your loop is closing is because you want
if _rc==0
for the inner loop. I am guessing your first file is not found, insheet throws up an error, and you are triggering the if _rc!=0 condition. Then the loop tries to run insheet without the capture and errors out. Another way of diagnosing this would be to run
set trace on
which I have found helpful in these sort of situations.
P.S. Not sure of the etiquette, but credit for this answer goes to lmo. I just thought it was worth writing up as an answer rather than a comment.

Related

Perl Command Understanding

I have been working on a product code to resolve an issue but am stuck on a line of code
Can anyone help me understand what exactly does this command do?
perl -MText::CSV -lne 'BEGIN{$p = Text::CSV->new()} print join "|", $p->fields() if $p->parse($_)' /home/daily/${FULL_FILENAME} > /home/output.txt
I think its to copy the file to my home location with some transformations but not sure exactly
This is a slightly broken program that translates a comma-separated values (CSV) file to a pipe-separated values file.
The particular command-line switches are documented in perlrun. This is a "one-liner", so you can read about those to see what's going on there.
The Text::CSV module deals with CSV files, and the program is parsing a line from the file and re-outputting as a pipe-separated file.
But, this program deals with each line as a complete record. That might be fine for you, but at some point you might end up with a literal value that has a newline in it, like a,"b\nc",d. Now reading line-by-line breaks the program since the quotes appear to be unclosed within the first line. Note only that, it blindly concatenates the parsed fields without considering if any of the fields should be quoted. It might be unlikely that a pipe character would be in the data, but the problem isn't it's rarity but the consequences and costliness when it does show up.
The rewrite.pl example script in the related module Text::CSV_XS is a tool that could replace this one-liner. It properly reads the input and knows how to properly translate it.

Specman how to read a specific line from a file, with no loop

I have a long file, and I want to read a specific line not from the first sequence lines.
is there a way to do it without looping all over the file and counting the lines?
For example files.read, which get an index from which line to read?
Thanks
You can use the predefined method files.get_text_lines().

Error when making .bat file to open programs? scripting

****so this is how the bat file program look like and the error code i am geting ( click the drop box link/copy and paste to browser)****
https://www.dropbox.com/sh/6o4h666m0bwav69/AACTAbDe4jhyWdApEQOoAl7Na?dl=0
only program that is coming up is hitmanpro, every others get error
First of all, paste your code and error directly into your questions in the future.
Now then, the issue comes from the incorrect quotes being used. See how the first three pairs are kinda curly and the last ones are straight? Batch can't recognize curly quotes, probably because it's some Unicode thing. Always use straight quotes in batch. It's best if you code in notepad or a similar program; some text editors make the quotes curly automatically and call them "smart quotes."

Store row numbers which are causing "error"

I have to retrieve certain information from urls. For this I have to enter text into fields of the url. I am using GET operation for this. I have to modify the text to replace spaces with "%20". Some times the text(which is taken from the database) is badly formed. I would like to know the row numbers so I can manually change the text for such rows in the database and run it again. I have tried to use the logs and errors section but with little luck. Does anybody have an idea of how to do this?
First shot: Output bad urls on the console
So far, I came up with the following job design for your problem:
The trick is to catch the exceptions of the tHttpRequest component and print the necessary details on the console. For this example, I included the line number, the exception message and the URL that produced the exception.
Output (I couldn't reproduce your "Illegal character error", so I took a different one):
Second shot: Output to a file
If you really need to output the line numbers to a file, things get a little more complicated.
Instead of printing the info straight onto the console, we collect all line numbers into a context variable of type (Java) List inside the tJavaFlex. After the usual URL processing (which I have left out from the job design to keep the example small), we iterate over the Java List
and save it into a tHashOutput, so that we can finally write to a file.
We cannot directly write to the file in the tLoop section, since the Iterate flow would lead to the situation the the tFileInputDelimited would be opened several times. If "Append" was disabled, only the last bad URL line number would finally appear in the output file. If "Append" was enabled, you would get the full list of line numbers after the very first job run - but you would append every time you run the job, making the list longer and longer. Workarounds would be to use a runtime-dependent file name (e.g. timestamp) or to delete the file at the beginning of the job run. I chose the third option, that overwrites the file every time we run the job. Feel free to chose among those options the one which suits your use case best.
Details
The tHashOutput/tHashInput components are not visible on default, but must be enabled first to show up: https://www.talendforge.org/forum/viewtopic.php?pid=107249#p107249
Context variable:
INIT:
tJavaFlex "catch errors", end code:
tLoop:
tFixedFlowInput "badURL":
tHashOutput:
Needs to have "Append" enabled.

Processing text inside variable before writing it into file

I'm using Perl WWW::Mechanize package in order to fetch and process data from some websites. Usually my way of action is as follows:
Fetch a webpage
$mech->get("$url");
Save the webpage contents in a variable (BTW, I'm not sure if it's the right way to save this amount of text inside a scalar which, as far as I know, supposed to be used for a single value)
my $list = $mech->content();
Use a subroutine that I've created to write the contents of the variable to a text file. (The writetoFile subroutine includes few more features, like path and existing file validations..)
writeToFile("$filename.tmp","$path",$list);
Processing the text in a file created in the previous step by creating an additional file and save the processed content there (Then deleting the initial temporary file).
What I wonder about, is whether it is possible to perform the processing before storing the text in a file, directly inside the $list variable? The whole process is working as expected but I don't really like the logic behind it and it seems a bit inefficient as well, since I have to rewrite the same file multiple times.
EDIT:
Just to give a bit more information about what I'm actually after when I process the variable contents. So the data I fetch from the website in this case is actually a list of items separated by a blank line and the first line is irrelevant to me. So what I'm doing while processing this data is 2 things:
Remove the empty (CRLF) lines
Remove the first line if it includes a particular text.
Ideally I want to save the processed list (no blank spaces and first line removed) in a file without creating any additional files on the way. In order to save the file I would like to use the writeToFile sub (I wrote) since it also performs validation on whether such file already exists (If a file will be saved before final processing - the writeToFile will always rewrite the existing file).
Hope it makes sense.
You're looking for split. The pattern depends: use (?<=\n) split at a new line character and keep it. If that doesn't matter, use \R to include all sort of line breaks.
foreach my $line (split qr/\R/, $mech->content) {
…
}
Now the obligatory HTML-parsing-with-regex admonishment: if you get HTML source with Mechanize, parsing it line-by-line does not make much sense. You probably want to process the HTML-stripped text version of the document instead, or pass the HTML source to a parser such as Web::Query to declaratively get at the pieces you need.