Importing text file to database - unwanted space characters - tsql

I have a problem with importing data from a text file (comma-delimited and " text qualifier). That's the only type of export we can do from an almost-30-year-old system.
The problem comes from someone in the old system using a "space" in fields, where during import SQL shows me there is something and display cell as "NULL" When you open this text file in Excel it shows empty cell (which is correct) but cell behave differently compared to real empty cells.
Example (that's from Notepad++):
-> Orange Arrows shows TAB (did line them up to be readable)
. orange dots shows spaces
Some Column1 data has extra spaces ("N " and "B " rows) but don't cause a problem.
Column2 - first 8 columns are good. "" nothing between text qualifiers.
Rows 9-13 have space between TEXT Qualifiers. When loaded to Excel cell is empty and looks good. When loading up to SQL Server it has got errors, if I load this from Excel file SQL shows NULL in those cells. Tried to "wash" this with Access, load up good, save dbo and load up this dbo in SQL shows NULL.
Column3 same as Column2: row 1 is good, problem in row 2 and 3, then 4-8 is good showing X, and 9 till 13 shows NULL.
Any ideas how to load this up into SQL Server? Change some settings on column what data inside (to ignore the space)...?

Assuming you want spaces to be converted to empty strings in the database after importing the data, you could run SQL like
UPDATE [yourTableName]
SET [columnName] = ''
WHERE [columnName] = ' ';
Copy-and-paste for however many columns need to be sanitised and put in the correct table name and column names.
If you wanted to remove spaces from the start and end of strings at the same time as changing spaces to empty strings, you could use
UPDATE [yourTableName]
SET [columnName] = LTRIM(RTRIM([columnName]))
which would tidy up the "B " and "N " entries too.

Related

Why are formulas not propagating in null fields?

I'm trying to change the value of nulls to something else that can be used to filter. This data comes from a QVD file. The field that contains nulls, contains nulls due to no action taken on those items ( they will eventually change to something else once an action has been taken). I found this link which was very informative but i tried multiple solutions from the document to no avail.
What i don't quite understand is that whenever i make a new field (in the script or as an expression) the formula does not propagate in the records that are null, it shows " - ". For instance, the expression isNull(ActionTaken) will return false in a field that that not null, but only " - " in fields that are null. If i export the table to Excel, the " - " is exported, i copy this cell to a text analyzer i the UTF-8 encoded is \x2D\x0A\x0A, i'm not sure if that's an artifact of the export process.
I also tried using the NullAsValue statement but no luck. Using a combination of Len & Trim = 0 will return the same result as above. This is only one table, no other tables are involved.
Thanks in advance.
I had a similar case few years ago where the field looked empty but actually it was filled with a character which just looked empty. Trimming the field also didnt worked as expected in this case, because the character code was different
What I can suggest you is to check if the character number, returned for the empty value, is actually an empty string. You can use the ord to check the character number for the empty values. Once you have the number then you can use this number to replace it with whatever you want (for example empty string)

SAS PROC IMPORT GROUPED VARIABLES

How to I keep variables in separate columns when using proc import with a tab delimited txt file? Only one variable is created called Name__Gender___Age. Is it only possible with the data step?
This is the code
proc import datafile= '/folders/myfolders/practice data/IMPORT DATA/class.txt'
out=new
dbms=tab
replace;
delimiter='09'x;
run;
You told PROC IMPORT that your text file had tabs between the fields. From the name of the variable it created it is most likely that instead your file just has spaces between the fields. And multiple spaces so that the lines look neatly aligned when viewed with a fixed width font.
Just write your own data step to read the file (something you should do anyway for text files).
data mew;
infile '/folders/myfolders/practice data/IMPORT DATA/class.txt' firstobs=2 truncover;
length Name $30 Gender $6 Age 8 ;
input name gender age;
run;
If there are missing values for either NAME or GENDER that are not entered as a period then you probably will want to read it using formatted or column mode input instead the simple list mode input style above.
The data file appears to have space delimiters instead of tab, contrary to your expectations.
Because you specified tab delimiting, the spaces in the header row are considered part of the column named Name Gender Age. Because spaces are not allowed in SAS column names (default setting), the spaces were converted to underscores. That is why you ended up with Name__Gender___Age
Change the delimiter to space and you should be able to import.
If the data file has a mix of space and tab delimiting, you will want to edit the data file to be consistent.

Finding the format of arbitrary delimited text file in MATLAB

I have a file that looks like this in notepad++
I can easily see the spaces (being the orange dots), and tabs (being the orange arrows). I can also right click this in MATLAB and import it in a variety of ways. The problem is firstly the delimiters are not consistent. It seems to go TAB then some spaces to make sure the total field equals 6 characters...
The only way I understand reading a file in is if you already know how it is delimited. But in this case I would like to parse each line so MATLAB has some 'token' of what goes where eg:
Line1: Text Space Text Space Text Tab Space Space Text NEWLINE
(Notepad++ seems to know just fine so surely MATLAB can get this info too?).
Is this possible? Then it would be nice to use this information to save the imported data back out to a file with exactly the same formatting.
The data is below. For some reason copying this into notepad++ does not preserve its delimiting, you will need to add the tabs in yourself so it looks like the file in the screenshot.
Average Counts : 56.2
Time : 120
Thanks
If you use textscan, the default behaviour should probably suit your needs:
Within each row of data, the default field delimiter is white space. White space can be any combination of space (' '), backspace ('\b'), or tab ('\t') characters. If you do not specify a delimiter, textscan interprets repeated white-space characters as a single delimiter.
The output is a cell array, where each column is saved as a cell. So C{1} would contain the strings, C{2} the colons, and C{2} the values.

What do I do wrong when I use database preview via dexplore?

I have some strange errors when I try to access my MS Access DB via MATLAB. Some hints:
The data preview to the tutorial.mdb (shipped with MATLAB) works very well via dexplore - I can see all the tables and I can import the data.
The data preview to my own DB works on another system where e.g the language settings differ. -> screenshot 1 This is how the DB preview looks on the other system. Note that table and column names are framed with the ` character. At first the preview resulted in an error too for one of the tables (0433_Slices) - when I renamed it by framing the name with the ' character inside MS Access, I could enforce the table name to be recognized as a string in MATLAB and now this table is importable too.
In my system, the preview does not work and gives error messages like "Syntax error in query message" or "Syntax error in FROM clause" "Incomplete query clause" -> screenshot 2 You see here one of the possible error messages, and that neither table nor column names are framed with the ` character. I tried to enforce the recognition of table names by using " characters or ' characters, but that did not work!
Comparing my system to the other system, the most prominent difference is that table names and column names are not recognized as strings (= they are framed with ' characters), and I think this results in the mentioned error messages. Can I fix this somehow, or am I on totally on the wrong track?
Screenshot 1: http://i.imgur.com/b0Ja4aR.png
Screenshot 2: http://i.imgur.com/dDyvjfM.png
edited Jan 26 at 16:05
asked Jan 26 at 15:59
R.S.
11
To anybody who will look for a solution for this freak problem in the distant future and finds this thread: neither table names nor column names are allowed to contain a "-" or spaces. "_" is fine though.
Just rename everything using only roman alphabet characters, numbers and "_" and it will work!
Also, dont start with numbers like "032_xyz". It will work if you rename it to "xyz".

Way to preserve formatting for lists when copy / pasting from table cell?

My Word interop application needs to get content out of a cell of a table in a word document. The problem is, that the formatting for some items seems broken. For example the last item of a list does not have the list style applyied. Headings are only normal text etc.
The same happens if you create a table, create a list in the table and try to copy / paste the list to somewhere else.
Has anyone else had this problem and maybe found a solution? Is there any way to trick word into giving the correct formatting?
Thanks in advance
Example code
Range range = cell.range;
range.MoveEnd(WdUnits.wdCharacter, -1);
...
range.FormattedText.copy()
The range includes the end-of-cell marker which should not be exported. I just noticed, when not altering the range, list are correctly formatted but the whole cell is exported as a table, which is bad because i want to import the content into another document (where this would nest tables infinitly)
Word2010 v14.06.6112.5000