SAS PROC IMPORT GROUPED VARIABLES - import

How to I keep variables in separate columns when using proc import with a tab delimited txt file? Only one variable is created called Name__Gender___Age. Is it only possible with the data step?
This is the code
proc import datafile= '/folders/myfolders/practice data/IMPORT DATA/class.txt'
out=new
dbms=tab
replace;
delimiter='09'x;
run;

You told PROC IMPORT that your text file had tabs between the fields. From the name of the variable it created it is most likely that instead your file just has spaces between the fields. And multiple spaces so that the lines look neatly aligned when viewed with a fixed width font.
Just write your own data step to read the file (something you should do anyway for text files).
data mew;
infile '/folders/myfolders/practice data/IMPORT DATA/class.txt' firstobs=2 truncover;
length Name $30 Gender $6 Age 8 ;
input name gender age;
run;
If there are missing values for either NAME or GENDER that are not entered as a period then you probably will want to read it using formatted or column mode input instead the simple list mode input style above.

The data file appears to have space delimiters instead of tab, contrary to your expectations.
Because you specified tab delimiting, the spaces in the header row are considered part of the column named Name Gender Age. Because spaces are not allowed in SAS column names (default setting), the spaces were converted to underscores. That is why you ended up with Name__Gender___Age
Change the delimiter to space and you should be able to import.
If the data file has a mix of space and tab delimiting, you will want to edit the data file to be consistent.

Related

Matlab dataimport

My matlab code for dataimport is giving me different results for what appear to be similar text files as input. Input1 gives me a normal cell with all lines from the text file as entries in the cell which i can reference using {i}.
Input2 gives me a scalar data structure where all numeric entries in my text file are converted to the input.data structure. I want all files to be converted to regular cell entries and I do not understand why for some files they are converted to scalar data structures.
Code: input = importdata(strcat(direct,'\',filename));
Input1 example: Correctly working dataimport, with text file on the right
File link: https://drive.google.com/open?id=1aHK4xivqEqJEmaA8Dv8Y0uW5giG-Bbip
Input2 example: Incorrectly working data import, with text file on the right FIle link: https://drive.google.com/open?id=1nzUj_wR1bNXFcGaSLGva6uVsxrk-R5vA
UTSL!
I'm guessing you are using GNU Octave although you are writing "Matlab" as topic of your question.
In importdata.m around line 178, the code tries to automatically detect the delimiter for your data:
delim = regexpi (row, '[-+\d.e*ij ]+([^-+\de.ij])[-+\de*.ij ]','tokens', 'once');
If you run this against W40A0060; you get A as delimiter because there is basically a number before and after it.
If you run this against W39E0016; you get {} as delimiter(empty) because the E could be part of a number in scientific notation and is therefore excluded.
Solution:
you really should add the correct delimiter to the importdata call and not trust that it's magically detected.
And if you just want the lines in a cell, use
strsplit (fileread ("W39E0016_Input2.txt"), "\n")
Analysis
This looks indeed strange!
EDIT: The cause for this strange looking behaviour has been deciphered by #Andy (See his solution).
When you use all outputs of importdata() function you can see what happens when reading the data:
[dat1,del1,headerrows1]=importdata('Input1.txt')
[dat2,del2,headerrows2]=importdata('Input2.txt')
For your first file it recognizes 69 header riws and no delimiter:
del1 = []
headerrows1 = 69
while in your second file only two header rows and a comma , delimiter is recognized
del2 = ','
headerrows2 = 2
I can not find an obvious reason in your files causing this different interpretation of data.
Suggestion
Your data format is rather complex. It is not a simple table like produced from excel. It has multiple lines with a different number of fields per line and varying data types. importdata() is not designed for this type of data. I suggest to write a specific import function for this kind of file. Have a look at textread() for a first guess. You can use it to read the lines of the files as text and later interpret it with sscanf() or use strsplit() to split the line contents into fields.

Import dot in xlsx as missing value in Stata

I have an Excel sheet as input for Stata. In the Excel, a dot in a cell marks a missing value, e.g.:
Column1 Column2
1 10
2 .
. 13
. 15
3 .
However, when importing the Excel to Stata, both columns above are identified as a String.
How can I tell Stata during the import that all dots should be recognized as missing values and thus my numeric columns remain numeric, although they include some dots/missing values?
Presuming you might be importing from Excel or a csv.
Excel
From the import excel guidance:
If the column contains at least one cell with nonnumerical text, the entire column is imported as a string variable.
So the easiest solution is:
destring the variables. You can destring a whole list in one go via:
destring var_1 var_2 var_3, replace
That will overwrite the variables as numeric variables and the . will be coded as missing.
Importing a CSV
As in Excel if there are non-numeric characters I believe Stata will think it might be a string. You could use the numericcols option when importing
import delimited, numericcols()
Then whatever columns you specify in the numericcols option are forced to be numeric and the . should be interpreted as missing.
Equally easy would just be still to destring as outlined above.

I am trying to read the time and message value field data as shown below and write it to an excel

Sample data and required excel image:
Also, Read Time section as shown in file, and populate excel file with the data in a column with the header name Time as shown above. Likewise, read the message value as shown in the .asc file and populate in excel file by converting the numbers from hexadecimal to decimal in columns named Data1, Data2, Data3,…
If your '.asc' file consists of tab delimited ASCII text then Excel will allow you to import it into an Excel worksheet.
The following explainer comes from Microsoft's Office support site:
There are two ways to import data from a text file by using Microsoft
Excel: You can open the text file in Excel, or you can import the text
file as an external data range. To export data from Excel to a text
file, use the Save As command.
There are two commonly used text file formats:
Delimited text files (.txt), in which the TAB character (ASCII
character code 009) typically separates each field of text.
Comma separated values text files (.csv), in which the comma character
(,) typically separates each field of text.
You can change the separator character that is used in both delimited
and .csv text files. This may be necessary to make sure that the
import or export operation works the way that you want it to.
If neither of those methods work for you and your '.asc' was generated by MATLAB then you may be able to use MATLAB to export directly to an Excel worksheet. MATLAB has a function xlswrite that you can use to write directly to a Microsoft Excel spreadsheet.
Another option, if you're comfortable writing some VBA code in Excel, is to use the textscan function to parse your '.asc' file.

Importing text file to database - unwanted space characters

I have a problem with importing data from a text file (comma-delimited and " text qualifier). That's the only type of export we can do from an almost-30-year-old system.
The problem comes from someone in the old system using a "space" in fields, where during import SQL shows me there is something and display cell as "NULL" When you open this text file in Excel it shows empty cell (which is correct) but cell behave differently compared to real empty cells.
Example (that's from Notepad++):
-> Orange Arrows shows TAB (did line them up to be readable)
. orange dots shows spaces
Some Column1 data has extra spaces ("N " and "B " rows) but don't cause a problem.
Column2 - first 8 columns are good. "" nothing between text qualifiers.
Rows 9-13 have space between TEXT Qualifiers. When loaded to Excel cell is empty and looks good. When loading up to SQL Server it has got errors, if I load this from Excel file SQL shows NULL in those cells. Tried to "wash" this with Access, load up good, save dbo and load up this dbo in SQL shows NULL.
Column3 same as Column2: row 1 is good, problem in row 2 and 3, then 4-8 is good showing X, and 9 till 13 shows NULL.
Any ideas how to load this up into SQL Server? Change some settings on column what data inside (to ignore the space)...?
Assuming you want spaces to be converted to empty strings in the database after importing the data, you could run SQL like
UPDATE [yourTableName]
SET [columnName] = ''
WHERE [columnName] = ' ';
Copy-and-paste for however many columns need to be sanitised and put in the correct table name and column names.
If you wanted to remove spaces from the start and end of strings at the same time as changing spaces to empty strings, you could use
UPDATE [yourTableName]
SET [columnName] = LTRIM(RTRIM([columnName]))
which would tidy up the "B " and "N " entries too.

long text file to SAS Dataset

I am trying to load a large text file(report) as a single cell in SAS dataset, but because of multiple spaces and formatting the data is getting split into multiple cells.
Data l1.MD;
infile 'E:\Sasfile\f1.txt' truncover;
input text $char50. #;
run;
I have 50 such files to upload so keeping each file as a single cell is most important. What am I missing here?
Data l1.MD;
infile 'E:\Sasfile\f1.txt' recfm=f lrecl=32767 pad;
input text $char32767.;
run;
That would do it. RECFM=F tells SAS to have fixed line lengths (ignoring line feeds) and the other options set the line length to the maximum for a single variable (lines can be longer, but one variable is limited to 32767 characters) and to fill it with blank spaces if it's too short.
You'd only end up with > 1 cell if your text file is longer than that. Note that the line feed and/or carriage return characters will be in this file, which may be good or may be bad. You can identify them with '0A'x and/or '0D'x (depending on your OS you may have one or both), and you can compress them with the 'c' option or translate them to a line separator of your preference.