SAS Enterprise Guide - importing date data that contains gibberish as well - date

Regarding SAS Enterprise Guide and importing data. I have a column of data, some of which is in a date format and some of which is gibberish. Is there a way to force convert the column to date format (MMDDYY10.) and convert all the gibberish (non date format) information to blanks?

Your question lacks some information, but I assume that you are using proc import to read the file. That procedure doesn't provide any control over how to read each column, as it internally determines the type and structure of the data. In your case, if the file you are importing is a text file such as CSV and the internal scan cannot read the column as numeric then it reads it as character to preserve the information which is the right thing to do. The main purpose of proc import is to create a SAS data set from an external file format. You can post-process the data after the initial import.
Another option for you is to use the Import Wizard which steps you through a series of dialogues to gather information about the file you are importing. One of the steps allows to specify the type and format of the data in each column. For the date column in question, you can specify the appropriate date informat and get the desired result in your output data set. Additionally, a final step in the wizard gives you an option to save the import code to a file so that you can use it in the future to import more data from a similarly structure file. The code is a data step with the appropriate infile, informat, format and input statements from the information gathered from you in the wizard.
Of course, the ultimate in control is to write the data step yourself. I prefer to do this even with simple, well-structured files, because I can precisely control the type and length of variables and have the full power of the data step at my disposal. Once I develop a working version of the code, I find that's easier to modify if the future to adapt to changes in the input file.
Hopefully, this gives you options to solve your date problem.

Related

Convert OpenStreetMap POI Data to CSV

I am looking to extract some Point of Interest (POI) data from OpenStreetMap in a tabular format. I have used this link navigated to the relevant country and downloaded the file,
http://download.geofabrik.de/
What I get is a file with the .osm.pbf extension. However, I think it is possible to download the files in other formats like .shp.zip and .osm.bz2. Is there some way that I can convert this data into a tabular format like a CSV file?
I came across a tool called Osmosis which can be used to manipulate data in these formats, but I am not sure if it can be used for this purpose,
https://wiki.openstreetmap.org/wiki/Osmosis
I was able to successfully install in on my Windows machine though.
To be frank, I am not even sure if this gives me what I want.
In essence, what I am looking for is Sri Lankan POI data that contains the following attributes,
https://data.humdata.org/dataset/hotosm_lka_points_of_interest
If the conversion of this file does not give me data in this format, then I am open to other approaches as well? What is the best way to go about acquiring this data?

Can I use a sql query or script to create format description files for multiple tables in an IBM DB2 for System I database?

I have an AS400 with an IBM DB2 database and I need to create a Format Description File (FDF) for each table in the DB. I can create the FDF file using the IBM Export tool but it will only create one file at a time which will take several days to complete. I have not found a way to create the files systematically using a tool or query. Is this possible or should this be done using scripting?
First of all, to correct a misunderstanding...
A Format Description File has nothing at all to do with the format of a Db2 table. It actually describes the format of the data in a stream file that you are uploading into the Db2 table. Sure you can turn on an option during the download from Db2 to create the FDF file, but it's still actually describing the data in the stream file you've just downloaded the data into. You can use the resulting FDF file to upload a modified version of the downloaded data or as the starting point for creating an FDF file that matches the actual data you want to upload.
Which explain why there's no built-in way to create an appropriate FDF file for every table on the system.
I question why you think you actually to generate an FDF file for every table.
As I recall, the format of the FDF (or it's newer variant FDFX) is pretty simple; it shouldn't be all that difficult to generate if you really wanted to. But I don't have one handy at the moment, and my Google-FU has failed me.

Foundry convert uploaded csv to dataframe

how can I convert an uploaded CSV to dataframe in foundry using a code workbook? should I use the #transform decorator with spark.read.... (not sure of the exact syntax)?
Thx!!
CSV is a "special format" where Foundry can infer the schema of the CSV and automatically convert it to a dataset. In this example, I uploaded a CSV with hail data from NOAA:
If you hit Edit Schema on the main page, you can use a front end tool to set certain configurations about the CSV (delimiter, column name row, column type etc).
Once it's a dataset, you can import the dataset in Code Workbooks and it should work out of the box.
If you wanted to parse the raw file, I would recommend you use Code Authoring, not Code Workbooks, as it's better suited for production level code, has less overhead and is more efficient for this type of workflow (parsing raw files). If you really wanted to use code workbooks, change the type of your input using the input helper bar, or in the inputs tab.
Once you finish iterating please move this to a Code Authoring repository and repartition your data. File reads at code workbook can substantially slow down your whole code workbook pipeline. Code Authoring offers preview of raw files now, so it's just as fast to develop as using Code Workbooks.
Note: Only imported datasets and persisted datasets can be read in as Python transform inputs. Transforms that are not saved as a dataset cannot be read in as Python transform inputs. Datasets with no schema should be read in as a transform input automatically.

Store .sav file into RDBMS including meta data

I want to know what is the best approach to store the data from .sav file into RDBMS database with out loosing any meta data model as well as actual response data.
Note first that you can save all the metadata in a sav file where you have deleted all the data and then reapply the metadata to a new, similar sav file using APPLY DICTIONARY.
Otherwise, you would need to create tables in the database for the various attributes. That's easy for variable labels, formats, measurement level, and missing value codes. For value labels it would take a bit more work.
One possible approach would be to use OMS to capture the output from CODEBOOK (without any statistics) as data files and then export those files to the database.

Use of mongoimport and an appropriate schema design for time series csv files

I have time series data in many csv files. The format is essentially this:
timestamp | type
yyyy-MM-dd hh:mm:ss | value
I can import this easily enough using the mongoimport function. The options I've used create one document for every entry in the CSV file. Instead, I would like to import the data (using mongoimport) and transform it at the same time to a better or more appropriate schema for time-series data as described in this blog post
http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb
The recommended approach is a document oriented design with multiple readings in one document. Is it possible to do this with mongoimport itself? Or should import it first and then transform it?
mongoimport is simply an "importer" for some data format.
As your source data with that format, you have to import first, then transform it to certain schema you expect.