Build dictionary for Apache concept mapper - uima

I have got an Excel file with two columns "Symbols" and "Synonyms" each having more than a million entries. I would like to convert excel data to a dictionary that is supported by Apache UIMA ConceptMapper. Is there any automated tool that can do this task?
I have attached a sample of excel data and format of Apache UIMA ConceptMapper.
https://i.stack.imgur.com/QC9BU.png contains excel sample
https://i.stack.imgur.com/PMItP.png contains ConceptMapper dictionary format

I think you would need to convert the excel document, into a dictionary file. Where the excel cells are converted to keys.
I would suggest you look at Apache POI for the excel conversion to the dictionary XML file, and for context to what I am talking about, I would look
at this tutorial.

Related

Convert OpenStreetMap POI Data to CSV

I am looking to extract some Point of Interest (POI) data from OpenStreetMap in a tabular format. I have used this link navigated to the relevant country and downloaded the file,
http://download.geofabrik.de/
What I get is a file with the .osm.pbf extension. However, I think it is possible to download the files in other formats like .shp.zip and .osm.bz2. Is there some way that I can convert this data into a tabular format like a CSV file?
I came across a tool called Osmosis which can be used to manipulate data in these formats, but I am not sure if it can be used for this purpose,
https://wiki.openstreetmap.org/wiki/Osmosis
I was able to successfully install in on my Windows machine though.
To be frank, I am not even sure if this gives me what I want.
In essence, what I am looking for is Sri Lankan POI data that contains the following attributes,
https://data.humdata.org/dataset/hotosm_lka_points_of_interest
If the conversion of this file does not give me data in this format, then I am open to other approaches as well? What is the best way to go about acquiring this data?

Can I use a sql query or script to create format description files for multiple tables in an IBM DB2 for System I database?

I have an AS400 with an IBM DB2 database and I need to create a Format Description File (FDF) for each table in the DB. I can create the FDF file using the IBM Export tool but it will only create one file at a time which will take several days to complete. I have not found a way to create the files systematically using a tool or query. Is this possible or should this be done using scripting?
First of all, to correct a misunderstanding...
A Format Description File has nothing at all to do with the format of a Db2 table. It actually describes the format of the data in a stream file that you are uploading into the Db2 table. Sure you can turn on an option during the download from Db2 to create the FDF file, but it's still actually describing the data in the stream file you've just downloaded the data into. You can use the resulting FDF file to upload a modified version of the downloaded data or as the starting point for creating an FDF file that matches the actual data you want to upload.
Which explain why there's no built-in way to create an appropriate FDF file for every table on the system.
I question why you think you actually to generate an FDF file for every table.
As I recall, the format of the FDF (or it's newer variant FDFX) is pretty simple; it shouldn't be all that difficult to generate if you really wanted to. But I don't have one handy at the moment, and my Google-FU has failed me.

Foundry convert uploaded csv to dataframe

how can I convert an uploaded CSV to dataframe in foundry using a code workbook? should I use the #transform decorator with spark.read.... (not sure of the exact syntax)?
Thx!!
CSV is a "special format" where Foundry can infer the schema of the CSV and automatically convert it to a dataset. In this example, I uploaded a CSV with hail data from NOAA:
If you hit Edit Schema on the main page, you can use a front end tool to set certain configurations about the CSV (delimiter, column name row, column type etc).
Once it's a dataset, you can import the dataset in Code Workbooks and it should work out of the box.
If you wanted to parse the raw file, I would recommend you use Code Authoring, not Code Workbooks, as it's better suited for production level code, has less overhead and is more efficient for this type of workflow (parsing raw files). If you really wanted to use code workbooks, change the type of your input using the input helper bar, or in the inputs tab.
Once you finish iterating please move this to a Code Authoring repository and repartition your data. File reads at code workbook can substantially slow down your whole code workbook pipeline. Code Authoring offers preview of raw files now, so it's just as fast to develop as using Code Workbooks.
Note: Only imported datasets and persisted datasets can be read in as Python transform inputs. Transforms that are not saved as a dataset cannot be read in as Python transform inputs. Datasets with no schema should be read in as a transform input automatically.

How to parse EDIFACT file data using apache spark?

Can someone suggest me how to parse EDIFACT format data using Apache spark ?
i have a requirement as every day EDIFACT data will be written to aws s3 bucket. i am trying to find a best way to convert this data to structured format using Apache spark.
In case you have your invoices in EDIFACT format you can read each one of them as one String per Invoice using RDD´s. Then you will have a RDD[String] which represents the distributed invoice collection. Take a look to https://github.com/CenPC434/java-tools with this you can convert the EDIFACT strings to XML. This repo https://github.com/databricks/spark-xml shows how to use XML format as input source to create Dataframes and perform multiples queries, aggregation... Etc.

Import data from Excel File to Core Data

I have thousands of students records in Excel sheet. Now I will import that all data into Coradata [from Excel sheet to Coredata] and I will create my iPhone application using that coredata.
I don't have any idea, how to import Excel file data into coredata.
You're thinking to broadly. You need to decompose this problem further. Here's your real problem:
How do I read an Excel file into memory?
How do I create Core Data objects?
"Excel" has nothing to do with "Core Data". They are entirely disjoint topics.
For the first question, there are several options. You could try and find a library that reads .xls or .xlsx files directly, or you could require that the file be in a different format (like CSV or something).
For the second question, that's easily answered by reading the Core Data documentation.
I would convert the file to xml.
There are plenty of codes showing how to parse xml.