Is it possible to view data as it is being imported in Teradata? - import

I'm trying to import data from a txt file and keep getting a 'Wrong number of data values in row xxx' error. Looking at the text file, everything looks fine but I can't tell what/how Teradata is interpreting it.
So is there a way to view or preview the data from Teradata's perspective? I tried running a SELECT statement, but since the import doesn't finish, nothing is even imported. Which brings me to my next question, is there a way to limit an external-file import to a certain # of rows? Like import just the first 50 rows from the text file?

May I suggest you obtain a copy of Notepad++ or Sublime Text, both of which are free to download, to view the text file. This will allow you to open the text file and identify what in the records is causing you trouble loading the file. You will be able to display non-printable characters and use advanced search techniques to traverse the files looking for problems with the data.
It is possible there is an embedded carriage return, line feed, or other non-printable character that is being interpreted during the import and generating this error.

Related

Dataprep import dataset does not detect headers in first row automatically

I am importing a dataset from Google Cloud Storage (parameterized) into Dataprep. So far, this worked perfectly fine and one of the feature that I liked is that it auto detects that the first row in my (application/octet-stream) .csv file are my headers.
However, today I tried to import a new dataset and it did not detect the headers, but it auto assigned column1, column2...
What has changed and or why is this the case. I have checked the box auto-detect and use UTF-8:
While the auto-detect option is usually pretty good, there are times that it fails for numerous reasons. I've specifically noticed this when the field names contain certain characters (e.g. comma, invisible characters like zero-width-non-joiners, null bytes), or when multiple different styles of newline delimiters are used within the same file.
Another case I saw this is when there were more columns of data than there were headers.
As you already hit on, you can use the following snippet to do mostly the same thing:
rename type: header method: filter sanitize: true
. . . or make separate recipe steps to convert the first row to header and then bulk-rename to your own liking.
More often than not, however, I've found that when auto-detect fails on a previously working file, it tends to be a sign of some sort of issue with the source file. I would look for mismatched data, as well as misplaced commas within the output, as well as comparing the header and some data rows to the original source using a plaintext editor.
When all else fails, you can try a CSV validator . . . but in my experience they tend to be incredibly opinionated when it comes to the formatting options of the file—so depending on the system generating the CSV, it could either miss any errors or give false-positives. I have had two experiences where auto-detect fails for no apparent reason on perfectly clean files, so it is possible that process was just skipped for some reason.
It should also be noted that if you have a structured file that was correctly detected but want to revert it, you can go to the dataset details, select the "..." (More) button, and choose "Remove structure..." (I'm hoping that one day they'll let you do the opposite when you want to add structure to a raw dataset or work around bugs like this!)
Best of luck!
Can be resolved as a transformation within a Flow:
rename type: header method: filter sanitize: true

Liquid-XML Editor Line Numbers for text files

I am able to open large files and it works great, however I do not get line numbers even though that option is on by default. It does for xml files, but a text file with .xml extension does not.
Any ideas on how to get the line numbers or maybe the software is not meant to do that?
The Large File Editor does not display line numbers.
It does have the concept of lines, so you can move to a specific line using menu Edit->Go to... (Ctrl+G).
Depending on your PCs specification, you may be able to open larger files without invoking the Large File Editor, please see:
Opening Large XML Documents in Liquid XML Studio

pgadmin importing csv file errors

I'm using pgadmin 1.18.
I have a copy of a table that I truncated. I simply want to load an import csv file which essentially looks like this:
20151228,12/28/2015,53,12,December,4,2015,1,Monday
20140828,08/28/2014,35,8,August,3,2014,4,Thursday
20150208,02/08/2015,6,2,February,1,2015,7,Sunday
I'm getting an error:
extra data after last expected column CONTEXT: COPY tblname, line 1:
"20151228,12/28/2015,53,12,December,4,2015,1,Monday"
This is the first line it´s trying to import. Any suggestions on how to fix this?
From the comments it appears you were using the wrong function in pgadmin.
If you have an existing table, which you have truncated and wish to load from a CSV file, select the table and then use Tools => Import, select the file and choose format 'CSV'.
There are other options in the import dialog to allow you to skip specified columns, use different quoting options, and specify how to deal with NULL values.
One tip that always trips me up: make sure there is no blank line at the end of the file.

Paginate a big text file

I have a big text file. Each line in the file is a record. And I need to parse the text file and show only 20 records in a HTML table at a time. I will have to support sorting as well.
What I am currently doing is read the file line by line based on the parameters start, stop, and page_size which is provided in querystring. It seems to work fine until I have to sort the records, because in order to sort I need to process every line in the text file.
So is there a Unix command which can I extract from line to line and sort? I tried grep but I do not know enough it to get this problem solved.
Take a look at the pr command. This is what we use to use all the time to paginate big files. You can set the page length, headers, footers, turn on line numbers, etc.
There's probably even a way to munge the output into HTML.
How big is the file?
man sort
Here

MATLAB: How to import multiple CSV files with mixed data types

I have just started learning MATLAB and have difficulties to import csv files to a 2-D array..
Here is a sample csv for my needs:(all the csv files are in the same format with fixed columns)
Date, Code, Number....
2012/1/1, 00020.x1, 10
2012/1/2, 00203.x1, 0300
...
As csvread() only works with integer numbers, should I import numeric data and text data separately or is there any quick way to import multiple csv files with mixed data types?
Thanks a lot!!
What you're looking for is maybe the function xlsread.
It opens any file recognized by Excel, and automatically separates text data from numerical data.
The problem is that the default delimiter for at least on my computer is ;, and not , (at least for my locale here in Brazil). So xlsread will try to separate the fields on the file with a ;, and not a comma as you'd like.
To change that you have to change your system locales to add the comma as the list separator. So if you feel like it, to do it in windows vista, click Start, Control Panel, Regional and Language Options, Customize this format, and change the List Separator from ';' to ','. On other windows the process should be almost the same.
After doing that, typing:
[num, txt, all] = xlsread('your_file.csv');
will return something like:
num =
10
300
txt =
'01/01/2012' ' 00020.x1'
'02/01/2012' ' 00203.x1'
all =
'01/01/2012' ' 00020.x1' [ 10]
'02/01/2012' ' 00203.x1' [300]
Notice that if your locale has already the list separator set to ',', you won't have to change anything on your system to make that work.
If you don't want to change your system just to use the xlsread function, then you could use the textscan function described here: http://www.mathworks.com/help/techdoc/ref/textscan.html
The problem is that it is not as simple as calling it, as you will have to open the file, iterate on the lines, and tell matlab explicitly the format of your file.
Best regards
I recently wrote a function that solves exactly this problem. See delimread.
It's worth noting that xlsread on csv files only works in windows. On Linux or Mac, xlsread works in 'basic' mode which cannot read csv files. It might not be a great idea in the longrun to use xlsread in case you need to migrate across platforms or automate code runs on Linux servers.
xlsread is also much slower than other text parsing functions since it opens an Excel session to read the file.