Using Weka on command line to generate cluster assignment arff file - cluster-analysis

On the explorer in weka you can perform clustering on data then use the visualisation to save a new arff file with the cluster assignments as attributes.
Is there a way to do this automatically by calling executables on the command line?

If anyone else is having this problem, you can try using weka.filters.unsupervised.attribute.AddCluster. The -W argument should contain your clustering algorithm and you can use the -i and -o as you would with any other filter to save your new arff.

Related

How do I set up configuration variables in Tesseract to better recognize code?

I want to use Tesseract to recognize code. It is said on their website that I can disable dictionaries by setting both of the configuration variables load_system_dawg and load_freq_dawg to false.
However I haven't been able to do it correctly.
$ tesseract img.jpg output.txt --oem 0 -c load_system_dawg=0 load_freq_dawg=0
read_params_file: Can't open load_freq_dawg=0
Error: Tesseract (legacy) engine requested, but components are not present in /usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata!!
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
Any ideas on best ways to handle it?
First of all, get eng.traineddata with the legacy engine or other OCR engine value (OEM).
Next, read the output of tesseract --help-extra carefully:
-c VAR=VALUE Set value for config variables.
Multiple -c arguments are allowed.

Import CSV with many columns to pgAdmin v4.1

I'm new to pgAdmin and GIS DB in general. I want to upload a CSV file to pgAdmin v4.1 and I'm trying to understand the logic to do so. I am able to do this by creating a new table under the desired DB and then manually defined the column (name, type etc.), only then I am able to load the CSV into pgAdmin using the GUI. This seems a bit cumbersome way to import a CSV file, because let's say I have a CSV file with 200 columns, it is not practical to define them all manually, and there must be a way to tell pgAdmin: this is the CSV file, now get the columns by yourself and get (or at least assume) the columns type, ad create a new table, much similar to how pandas reads CSV in python. As I'm new to this topic, please elaborate your answer\comment as much as possible.
NO: Unfortunately, we can only import CSV after the table is created.
YES: There is no GUI method, but:
There is a utility called pgFutter which will do exactly what you want. This is a command line utility. Here are the binaries.
You can write a function that does that. Here is an example.
I would look into using GDAL to upload your CSV file into postgis.
I used this recently to do a similar job.
ogr2ogr -f "PostgreSQL" -lco GEOMETRY_NAME=geometry -lco FID=gid PG:"host=127.0.0.1 user=username dbname=dbname password=********" postgres.vrt -nln th_new_data_2019 -t_srs EPSG:27700
Code used to upload a csv to postgis and transform the coordinate system.
-f = file format name
output file format name, some possible values are:
-f "ESRI Shapefile"
-f "TIGER"
-f "MapInfo File"
-f "GML"
-f "PostgreSQL
-lco = NAME=VALUE:
Layer creation option (format specific)
-nln name:
Assign an alternate name to the new layer
-t_srs srs_def:
target spatial reference set. The coordinate systems that can be passed are anything supported by the OGRSpatialReference.SetFromUserInput() call, which includes EPSG PCS and GCSes (i.e. EPSG:4296), PROJ.4 declarations (as above), or the name of a .prj file containing well known text.
The best and simplest guide for installing GDAL that I have used is :
https://sandbox.idre.ucla.edu/sandbox/tutorials/installing-gdal-for-windows

Can we give two files as input while using JasperStarter

I am using JasperStarter to create pdf from several jrprint files and then print it using JasperStarter functtions.
I want to create one single pdf file with all the .jrprint files.
If I give command like:
jasperstarter pr a.jprint b.jprint -f pdf -o rep
It does not recognise the files after the first input file.
Can we create one single output file with many input jasper/jrprint files?
Please help.
Thanks,
Oshin
Looking at the documentation, this is not possible:
The command process (pr)
The command process is for processing a report.
In direct comparison to the command for compiling:
The command compile (cp)
The command compile is for compiling one report or all reports in a directory.

Export and Append Data to CSV

I am trying to export data to existing csv file.
I have been using these methods to export data.
Microsoft.Jet.OLEDB.4.0
SQLCMD
Data Export Wizard
However I cannot find if there is any parameter / option to append the exported data to existing file. Is there any way? Thanks.
Note: answer is biased towards *nix operating systems; I'm not too familiar with windows.
If you can run your sql query via the command line,
using a scripting language, you can use a library that creates an MSSQL connection, (an example of this is a node.js program I authored (https://github.com/skilbjo/aqtl but any tool will do), or
a windows binary that runs something like sqlcmd from the command line,
you can just pipe the output to the csv file. For example:
$ node runquery.js myquery.sql >> existing_csv_file.csv

Export DataStage Job Designs to .dsx file

I am trying to export the DataStage job designs with executables. Below is the screenshots I use to export from the GUI.
This is the two commands I use:
dsexport.exe /h=XX /U=XX /p=XX projectXXX /job=XXX jobname.dsx
dsexport.exe /h=XX /U=XX /p=XX projectXXX /job=XXX /EXEC /APPEND jobname.dsx
The file generated from commands is bigger than the one from GUI. Anyone knows how to use dsexport command to export jobs with the options as in the GUI screenshots. much appreciated. I am using Designer V8.5.
JS
C:\IBM\InformationServer\Clients\Classic>dsexport /d={ip address of server} /u={user id} /p={password} /job={job to export} {Project where job is located in} {FileName.dsx}
try this, it will export a single dsx file with all informations
P.S.I am using version 11.3
As you can see GUI is excluding some read-only files which is not excluded in command line this is why the file size difference is there.
You have "Include Dependent Items" unchecked in the GUI. The command line will include dependent items by default (i.e. shared containers or routines). You can disable this behaviour on the command line by using the /NODEPENDENTS command switch.