I am trying to find a tool that have some these inputs
1. User input(mostly numeric values)
2. Template files that have keywords in a specific format
The tool should then be able to replace the keywords in template files with values based on user input
This could be a standalone tool, a script or a library to use in any language
Related
While working with my Doxygen output doc, I've a requirement to extract all the functions into a spreadsheet. Additionally, each function had a requirement mapped to it using ALIASES defined in the configuration file. sample function as below:
#requirement{req-id}
void Myfunc()
I am able to see all the requirements documented in a separate page in my HTML output. But, I need to fetch the list of functions with respective requirement Ids into a .csv file for further processing. Could anyone please hep me out?
Thanks, Badri
Doxygen has no irect CSV output.
You would need the XML output (GENERATE_XML=YES) and process the resulting file into a format you want / directly process the file without the need of a CSV file.
When you have an ALIASES like
ALIASES += req{3}="\xrefitem req \"Requirement\" \"SW Requirements\" ID: \1 Requirement: \2 Verification Criteria: \3"
you will get a file req.xml that you can process further.
I use a loop to append each regression table for various dependent variables into one file:
global all_var var1 var2 var3
foreach var of global all_var {
capture noisily : eststo mod0: reg `var' i.female
capture noisily : eststo mod1: reg `var' i.female
capture noisily : eststo mod2: reg `var' i.female
esttab mod0 mod1 mod2 using "file_name.rtf", append
}
However, in the final rtf file some tables are stretching over two pages which does not look good.
Is there any way to avoid that, e.g. introduce some sort of pagebreak?
The community-contributed package rtfutil provides a solution:
net describe rtfutil, from(http://fmwww.bc.edu/RePEc/bocode/r)
TITLE
'RTFUTIL': module to provide utilities for writing Rich Text Format (RTF) files
DESCRIPTION/AUTHOR(S)
The rtfutil package is a suite of file handling utilities for
producing Rich Text Format (RTF) files in Stata, possibly
containing plots and tables. These RTF files can then be opened
by Microsoft Word, and possibly by alternative free word
processors. The plots can be included by inserting, as linked
objects, graphics files that might be produced by the graph
export command in Stata. The tables can be included by using the
listtex command, downloadable from SSC, with the handle() option.
Exact syntax will depend on your specific use case for which you do not provide any example data.
After installing rtfutil, you may use rtfappend. Suppose you want a page break between mod1 and mod2.
esttab mod0 mod1 using "file_name.rtf", replace
tempname handle
rtfappend `handle' using "file_name.rtf", replace
file write `handle' "\page" _n
rtfclose `handle'
esttab mod2 using "file_name.rtf", append
If you want a line break, just replace \page with \line.
I have a large amount of PDF files in my local filesystem I use as documentation base and I would like to create an index of these files.
I would like to :
Parse the contents of the PDF files to get keywords.
Select the most relevant keywords to make a summary.
Create static HTML pages for some keywords with entries linked to the appropriate files.
My questions are :
Is there an existing tool to perform the whole job ?
What is the most appropriate tool to parse PDF files content, filter (by words size) and counting the words?
I consider using Perl, swish-e, pdfgrep to make a script. Do you know other tools which could be useful?
Given that points 2 and 3 seem custom I'd recommend to have your own script, use a tool out of it to parse pdf, process its output as you please, and write HTML (perhaps using another tool).
Perl is well suited for that, since it excels in processing that you'll need and also provides support for working with all kinds of file formats, via modules.
As for reading pdf, here are some options if your needs aren't too elaborate
Use CAM::PDF (and CAM::PDF::PageText) or PDF-API2 modules
Use pdftotext from the poppler library (probably in poppler-utils package)
Use pdftohtml with -xml option, read the generated simple XML file with XML::libXML or XML::Twig
The last two are external tools which you use via Perl's builtins like system.
The following text processing, to build your summary and design the output, is precisely what languages like Perl are for. The couple of tasks that are mentioned take a few lines of code.
Then write out HTML, either directly if simple or using a suitable module. Given your purpose, you may want to look into HTML::Template. Also see this post, for example.
Full parsing of PDF may be infeasible, but if the files aren't too complex it should work.
If your process for selecting keywords and building statistics is fairly common, there are integrated tools for document management (search for bibliography managers). However, I think that most of them resort to external tools to parse pdf so you may still be better off with your own script.
So I have some SAS DIS jobs which create "kickout" data when run - by this I mean that if things run smoothly, none of the "kickout" data is generated, but it is known that there will be exceptions and I would like to have those exceptions put into a table and automatically emailed to me so that I am notified when something is behaving in a non-ideal manner.
I can create a transformation which will send an email containing the data I'm looking for, but the data is formatted as html and thus not in a form conducive to analysis. I'd like the transformation to email a .csv file which is more easily manipulated.
There is the option to send a .spk file but I'm having issues getting that to work and in any case am not sure it really suits my needs.
Is what I want possible, with or without the standard Publish to Email transformation provided by SAS DIS? Looking at the SAS DIS user guide I'm guessing that there is no pre-built transformation which does what I want, but can the base SAS code accomdate this requirement?
Thanks much!
The "Publish to Email transformation" uses ODS HTML to generate the output so you'll get a HTML output. If you want an XLS output then there is a way. You could change the extension of the output file to xls to generate xls file from the ODS HTML. This is an old way of generating xls from ODS HTML.
Now coming to the SPK file. This is something you should look into. Since you are looking into getting an xls/csv attachement which you can open and do some manipulation etc. SPK file is like a ZIP file. You can right click and unzip spk file. Basically you can put in all your files within a archive/spk file and get that emailed as attachement using the "Publish to Email Transformation"
To get this done, go to the properties of the "Publish to Email Transformation" and Under Publishing option=>
select Send report in an archive (.spk) file as an email attachment in the Select viewer file/attachment option field
provide folder/path where the spk file would be stored under Select path of where to store archive file containing report
provide the name of the spk file under Specify filename of archive file containing report
provide name=value pair of the package under Specify one or more desired package name/value pairs for package. For example this transformation is generating a PROC PRINT of an INPUT data set and the output file is c:\sushil\test.html then enter myname=(test.html) . The myname is for labeling purpose when you unzip the spk you should get test.html
Now Under REPORT SPECIFICATION option in the "Publish to Email Transformation" transformation select "Generate PROC PRINT from input table" and then enter the path and filename of generated report which based on our previous entry should be c:\sushil\test.html
Also, to select "Generate PROC PRINT from input table" you would need to right click the "Publish to Email Transformation" and select Ports -> Add Input Port. This how you can connect a table with the transformation. Now this is the minimum settings required to generate spk package from the transformation. Let me know if it helps!!
Note: This information is as per SAS DI Studio 4.6. I don't know if the transformation is updated in the newer version of DI Studio.
I am trying to write a script that will allow me to download numerous (1000s) of data files from a data server (e.g, http://hydro1.sci.gsfc.nasa.gov/thredds/catalog/GLDAS_NOAH10SUBP_3H/2011/345/). Unfortunately, the names of the files in each directory are not formatted in a similar way (the time that they were created were appended to the end of the file name). I need to be able to specify the file name to subset the data (I have a special tool for these data types) and download it. I cannot find a function in matlab that will extract the file names.
I have looked at URLREAD, but it downloads everything including html code.
Thanks for your help!
You can easily parse the link.
x=urlread(url)
links=regexp(x,'<a href=''([^>]+)''>','tokens')
Reads every link, you have to filter all unwanted links.
For example this gets all grb files:
a=regexp(x,'<a href=''([^>]+.grb)''>','tokens')