I want to import a delimited text file into Stata. Some of the fields are numeric where the numbers are formatted with commas ( i.e 2,144.20). When I specify a numeric data type in the infix command for these columns, the values will be imputed to missing.
infix 2 first str id 2-15 double amount 16-25 using "{datasetname}"
Is there a way to specify the numeric format (e.g %20.2fc) so that Stata does not treat them as non-numeric? Another way is to import it as string and convert it to numeric later. But I want to see if there is a way to specify the format in the infix command itself.
There is no such syntax. It would not even make sense from a Stata point of view as a format such as %20.2fc is a display format and controls what is shown (output), not what is read in (input).
Use destring, ignore(",") replace to fix such variables after reading them in.
Related
I'm creating an OCR line for our remits that our scanner will read. The scanner doesn't allow the '.' in the field - it assumes the last 2 digits are the decimal place values. I'm converting the field to to text but not sure how to remove the '.' and keep the decimal place values.
The most simple solution would be to create a Formula Field and use the Replace() function. The formula for your Formula Field would look like this:
StringVar myVariable;
myVariable := Replace({table.column}, ".", "");
myVariable;
This will search {table.column} for the first occurrence of a decimal and replace it with an empty string.
However, if your intent is to barcode the value, there may be a UFL available that could also do this for you. When creating barcodes, User Function Libraries are usually preferred because they have functions specifically designed to encode your barcode values. They aren't required though and you can always choose to manually encode barcode values manually with Formula Fields.
I have an Excel sheet as input for Stata. In the Excel, a dot in a cell marks a missing value, e.g.:
Column1 Column2
1 10
2 .
. 13
. 15
3 .
However, when importing the Excel to Stata, both columns above are identified as a String.
How can I tell Stata during the import that all dots should be recognized as missing values and thus my numeric columns remain numeric, although they include some dots/missing values?
Presuming you might be importing from Excel or a csv.
Excel
From the import excel guidance:
If the column contains at least one cell with nonnumerical text, the entire column is imported as a string variable.
So the easiest solution is:
destring the variables. You can destring a whole list in one go via:
destring var_1 var_2 var_3, replace
That will overwrite the variables as numeric variables and the . will be coded as missing.
Importing a CSV
As in Excel if there are non-numeric characters I believe Stata will think it might be a string. You could use the numericcols option when importing
import delimited, numericcols()
Then whatever columns you specify in the numericcols option are forced to be numeric and the . should be interpreted as missing.
Equally easy would just be still to destring as outlined above.
I'm trying to format decimal's to round to the nearest hundredth with a temp-table declaration similar to this.
DEFINE TEMP-TABLE foo
FIELD random-decimal AS DECIMAL FORMAT "->>>,>>>,>>>.99".
The end result is displayed on a report through which I'm using the following to output:
EXPORT STREAM sStream DELIMITER ',' foo.
This does not seem to work as I'm intending it to. I'm still receiving values like this: 0.000073.
Does anyone have any insight to what I'm doing wrong? I was unable to find anything for this specific case anyone online.
FORMAT has no impact on storage. It is only a "hint" for default display and input purposes.
What you want is the "decimals" field attribute:
DEFINE TEMP-TABLE foo
FIELD random-decimal AS DECIMAL decimals 2 FORMAT "->>>,>>>,>>>.99".
create foo.
random-decimal = 1.12345.
display random-decimal format ">.9999".
I'm trying to do a simple mail merge in Word 2010 but when I insert an excel field that's supposed to represent a zip code from Connecticut (ie. 06880) I am having 2 problems:
the leading zero gets suppressed such as 06880 becoming 6880 instead. I know that I can at least toggle field code to make it so it works as {MERGEFIELD ZipCode # 00000} and that at least works.
but here's the real problem I can't seem to figure out:
A zip+4 field such as 06470-5530 gets treated like an arithmetic expression. 6470 - 5530 = 940 so by using above formula instead it becomes 00940 which is wrong.
Perhaps is there something in my excel spreadsheet or an option in Word that I need to set to make this properly work? Please advise, thanks.
See macropod's post in this conversation
As long as the ZIP codes are reaching Word (with or without "-" signs in the 5+4 format ZIPs, his field code should sort things out. However, if you are mixing text and numeric formats in your Excel column, there is a danger that the OLE DB provider or ODBC driver - if that is what you are using to get the data - will treat the column as numeric and return all the text values as 0.
Yes, Word sometimes treats text strings as numeric expressions as you have noticed. It will do that when you try to apply a numeric format, or when you try to do a calculation in an { = } field, when you sum table cell contents in an { = } field, or when Word decides to do a numeric comparison in (say) an { IF } field - in the latter case you can get Word to treat the expression as a string by surrounding the comparands by double-quotes.
in Excel, to force the string data type when entering data that looks like a number, a date, a fraction etc. but is not numeric (zip, phone number, etc.) simply type an apostrophe before the data.
=06470 will be interpreted as a the number 6470 but ='06470 will be the string "06470"
The simplest fix I've found is to save the Excel file as CSV. Word takes it all at face value then.
I'm using this:
COPY( select field1, field2, field3 from table ) TO 'C://Program
Files/PostgreSql//8.4//data//output.dat' WITH BINARY
To export some fields to a file, one of them is a ByteA field. Now, I need to read the file with a custom made program.
How can I parse this file?
The general format of a file generated by COPY...BINARY is explained in the documentation, and it's non-trivial.
bytea contents are the most easy to deal with, since they're not encoded.
Each other datatype has its own encoding rules, which are not described in the documentation but in the source code. From the doc:
To determine the appropriate binary format for the actual tuple data
you should consult the PostgreSQL source, in particular the *send and
*recv functions for each column's data type (typically these functions are found in the src/backend/utils/adt/ directory of the source
distribution).
It might be easier to use the text format rather than binary (so just remove the WITH BINARY). The text format has better documentation and is designed for better interoperability. The binary format is more intended for moving between postgres installations, and even there they have version incompatibilities.
Text format will write the bytea field as if it was text, and encode any non-printable characters with \nnn octal representation (except for a few special cases that it encodes with C style \x patterns, such as \n and \t etc.) These are listed in the COPY documentation.
The only caveat with this is you need to be absolutely sure that the character encoding you're using is the same when saving the file as when reading it. To make sure that the printable characters map to the same numbers. I'd stick to SQL_ASCII as it keeps thing simpler.