What is the best file parsing solution for converting files? - powershell

I am looking for the best solution for custom file parsing for our enterprise import routines. I want to basically change one file format into a standard file format and have one routine that imports that data into the database. I need to be able to create custom scripts for each client since its difficult to get the customer to comply with a standard or template format. I have looked at PowerShell and Iron Python to do this so far but I am not sure this is the route I want to go. I have also looked at some tools such as Talend which is a drag and drop style tool which may or may not give me what I want as far as flexibility. We are a .NET shop and have created custom code to do this in the past but I need something that is quicker to create then coding custom parsing functions each time we get a new file format in.

Depending on the complexity and variability of your work, you should consider an ETL tool like SSIS (SQL Server Integration Services).

Python is wonderful for this kind of thing. That's why we use. Each new customer transfer is a new adventure and Python gives us the flexibility to respond quickly.
Edit. All python scripts that read files are "custom file parsers". Without an actual example, it's not sensible to provide a detailed example.
with open( "some file", "r" ) as source:
for line in source:
process( line )
That's about all there is to a "custom file parser". If you're parsing .csv or .xml files, then Python has modules for that. If you're parsing fixed-format files, you'd use string slicing operations. If you're parsing other files (X12? JSON? YAML?) you'll need appropriate parsers.
Tab-Delim.
from collections import namedtuple
RecordLayout = namedtuple('RecordLayout',['field1','field2','field3',...])
def process( aLine ):
record = RecordLayout( aLine.split('\t') )
...
Fixed Layout.
from collections import namedtuple
RecordLayout = namedtuple('RecordLayout',['field1','field2','field3',...])
def process( aLine ):
fields = ( aLine[:10], aLine[10:20], aLine[20:30], ... )
record = RecordLayout( fields )
...

Related

How can I decrypt the Triplestore files of an RDF4J database?

I am currently trying to read the files of an RDF4J triplestore from the universAAL platform and put them into an InfluxDB to merge the data from different smart living systems.
However, I have noticed that the individual index files of the Native repository are encrypted/unreadable (See image below).
Is there any experience from the community on how to get human readable content out of the RDF4J files (namespace, triples.prop, triples-cosp, triples-posc, triples-spoc, values.hash, values.dat, values.id) and merge them into another database?
The documentation of RDF4J did not help me here, so I could not create a decent export.
Encrypted File from Triplestore
The files are not encrypted, they're simply a binary format, optimized for efficient storage and retrieval, used by RDF4J's Native Store database implementation. They're not meant for direct manipulation.
The easiest way to convert them to readable RDF is to spin up a Native Store on top of them and then use the RDF4J API to query/export its data. Assuming you have a complete set of data files it should be as simple as something like this:
Repository rep = new SailRepository(new NativeStore(new File("/path/to/datafiles/");
try(RepositoryConnection conn = rep.getConnection()) {
conn.export(Rio.createWriter(RDFFormat.TURTLE, System.out));
}
finally {
rep.shutDown();
}
Obviously, replace System.out with a FileOutputstream if you want to write the data to file rather than the console. And change RDFFormat.TURTLE to something else if you want a different syntax format.

Draw.io import diagrams from CSV using an API

In draw.io there is a very nice option to create a diagram using CSV import utility (Arrange->Insert->Advanced->CSV). It is very simple and straight forward.
I was trying to find a way to do it using an API (REST for example), is there a way to do it?
One more question:
Does anybody knows if there's a way to create draw.io file with multiple pages using the CSV import utility?
Thanks
Danny
Absolutely possible. Working example here: https://github.com/GanizaniSitara/drawio/
pyMX.py you want to have a look at first.
It creates the file in XML then encodes it and packs it into the drawio format.
Needs input data in CSV in format:
Level0,Level1,Level2,AppName,TC,StatusRAG,Status,HostingPercent,HostingPattern1,HostingPattern2,Arrow1,Arrow2,Link
Cool Division,Some Department,Some Department2,SomeString,Zero,25,red,green,0,Azure,Linux,up,up,http://www.gooogle.com
Rinse and repeat for anything else you need to create. It's rough code, ping me here or on GitHub if anything needs clarification.

Spark - split csv file using scala

I have the following schema of csv file
(Id, OwnerUserId, CreationDate, ClosedDate, Score, Title, Body)
And I would like to split the data using:
val splitComma = file.map(x => x.split (",")
val splitComma = file.map(x => x.split (",(?![^<>]*</>)(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)"))
both of them didn't work, the below is sample of my csv file :
90,58,2008-08-01T14:41:24Z,2012-12-26T03:45:49Z,144,Good branching and merging tutorials for TortoiseSVN?,"<p>Are there any really good tutorials explaining branching and merging with Apache Subversion? </p>
<p>All the better if it's specific to TortoiseSVN client.</p>
"
120,83,2008-08-01T15:50:08Z,NA,21,ASP.NET Site Maps,"<p>Has anyone got experience creating <strong>SQL-based ASP.NET</strong> site-map providers?</p>
<p>I've got the default XML file <code>web.sitemap</code> working properly with my Menu and <strong>SiteMapPath</strong> controls, but I'll need a way for the users of my site to create and modify pages dynamically.</p>
<p>I need to tie page viewing permissions into the standard <code>ASP.NET</code> membership system as well.</p>
"
180,2089740,2008-08-01T18:42:19Z,NA,53,Function for creating color wheels,"<p>This is something I've pseudo-solved many times and never quite found a solution. That's stuck with me. The problem is to come up with a way to generate <code>N</code> colors, that are as distinguishable as possible where <code>N</code> is a parameter.</p>
"
What's the best way to work with this ?
You can't load CSVs with multi-line values (i.e. newlines within cells) using Spark: the underlying HadoopInputFormat will split the file based on newlines, disregarding the CSV's encapsulating double-quotes, so there isn't much Spark can do about it (see discussion here).
Unfortunately that means you'll have to find some why of "cleaning" your data (e.g. replacing newlines with some placeholder) before writing it to disk or loading it using Spark.

Combine two TCPDF documents

I'm using TCPDF to create two separate reports in different parts of my website. I would like that, in the end of the first report, the second report should be loaded.
It's different than import a PDF file, because the second report is also generated by TCPDF. Is there a way to do this?
I assume from your question that what you ultimately want to provide is one PDF file that consists of the first PDF concatenated with the second PDF.
One quick and dirty solution is to utilize the pdftk command line PDF processor and call it from within your PHP code using the exec() function. The pdftk command has many features and concatenating files is only one of them, but it does an awesome job. Depending on your hosting situation, this may or may not be an option for you.
The other option would be to use FPDI to import the two PDF files and concatenate them within your PHP code and then send the concatenated version to the user.
More information on using PFDI here:
Merge existing PDF with dynamically generated PDF using TCPDF
Given that you're already using TCPDF, importing the pre-existing file that you want to concatenate with the one you've just created shouldn't be too difficult.
Just add FPDI to your project/composer from:
https://www.setasign.com/products/fpdi/downloads/
Can you still used tcpdf.
FPDI support all the methods of tcpdf, just used new FPDI() instead new tcpdf() the result will be the same in your report, after you create your report marge the files with the code from this page:
https://www.setasign.com/products/fpdi/about/
In a loop, once set the first file and after this set the second...
If you will need help i am here for you.

How can I create RRD files in Perl?

I have a separate application printing logs in every 10 seconds. I need to create RRD files from the log files. I need some Perl code to read the log files and create the RRD only without the graphs.
I have also gone through the available Perl module in CPAN, i.e. RRD::Simple and RRD::Simple::Examples, but I still need help.
I'd start with RRD::Simple. There's some example code in the documentation. Since you don't need to create a graph, simply skip that section of the example.
Some of the examples read a single sample of data, call the update function once, and then exit. Those scripts are meant to be run periodically to collect data in real time. The example that's probably more pertinent to your needs is ApacheAccessLogActivity.pl, which reads an Apache log file, parses each line with a regular expression, does a bit of analysis to figure out what it just read, and then calls update, all in a loop. Note that that example uses the standalone functions rather than the object-oriented versions.
If you've already read the documentation for that module and need more information about how to use it, or if you've tried it and found that it has shortcomings that prevent you from using it, then please be more specific about what you need to do.
RRDTool::OO also looks promising.
I'd recommend RRDTool::OO.
Exerpt from the perldoc:
$rrd->create( ... )
Creates a new round robin database (RRD). A RRD consists of one or
more data sources and one or more archives:
$rrd->create(
step => 60,
data_source => { name => "mydatasource",
type => "GAUGE" },
archive => { rows => 5 });