I am loading a 10 GB CSV file into an AWS Aurora postgres database. This file has a few fields where the values are decimal and the values are +/- 0.1 from whole number, but in reality they are supposed to be integers. When I loaded this data into Oracle using SQLLDR I was able to round the fields from decimal to integers. I would like to do the same in the PostgreSQL database using the \copy command, but I can't find any options which allow this.
Is there a way to import this data and round the values during a \copy without going through a multistep process like creating a temporary table?
There doesn't seem to be a built-in way to do this as I have seen in other database applications.
I didn't use an external program as suggested in the comments, but I did preprocess the data using an awk script that read each line and reformatted the incorrect field with the printf function to round the output with the parameter "%.0f".
Related
I use HeidiSQL to manage my database.
When I export grid row to CSV format file, the large number 89610185002145111111 become 8.96102E+19
How can I keep the number without science notation conversion?
HeidiSQL does not do such a conversion. I tried to reproduce but I get the unformatted number:
id;name
89610185002145111111;hey
Using a text editor, by the way. If you use Excel, you may have to format the cell in a different format.
I am having difficulty with my decimal columns. I have defined a view in which I convert my decimal values like this
E.g.
SELECT CONVERT(decimal(8,2), [ps_index]) AS PriceSensitivityIndex
When I query my view, the numbers appear correctly on the results window e.g. 0,50, 0,35.
However, when I export my view to file using Tasks > Export Data ... feature of SSMS, the decimals lower than zero appear as ,5, ,35.
How can I get the same output as in the results window?
Change your query to this:
SELECT CAST( CONVERT(decimal(8,2), [ps_index]) AS VARCHAR( 20 ) ) AS PriceSensitivityIndex
Not sure why, but bcp is dropping leading zero. My guess is it's either because of the transition from SQL Storage to a text file. Similar to how the "empty string" and nulls are exchanged on BCP in or out. Or there is some deeper config (windows, sql server, ?) where a SQL Server config differs from an OS config? Not sure yet. But since you are going to text/character data anyway when you BCP to a text file, it's safe (and likely better in most cases) to first cast/convert your data to a character data type.
I am using the import and export wizard and imported a large csv file. I get the following error.
Error 0xc02020a1: Data Flow Task 1: Data conversion failed. The data
conversion for column "firms" returned status value 2 and status text "The
value could not be converted because of a potential loss of data.".
(SQL Server Import and Export Wizard)
Upon importing, I use the advanced tab and make all of the adjustments. As for the field in question, I set it is numeric (8,0). I have since went through this process multiple times and tried 7,8,9,10,and 11 to no avail. I import the csv into excel and look at the respective column, firms. It shows no entry with more than 5 characters. I thought about making it DT_String but will need to manipulate that column eventually by averaging it. Also, have searched for spaces or strange characters and found none.
Any other ideas?
1) Try changing the Numeric precision to numeric(30,20) both in source and destination table.
2) Change the data type to str/wstr and adjust the output column width while importing. It will run fine. It happened with me as well while loading large CSV file of approx 5 GB. After load, use Try_convert function to convert it back to numeric and check the values which went null while conversion, you will find the root cause then.
I am using SLT to load tables into our Hana DB. SLT uses the ABAP dictionary and sends timestamps as decimal (15,0) to the HANA DB. Once in the HANA DB via a calculated column in a calculation view, I am trying to convert the decimals to timestamps or seconddates. Table looks like this:
I run a small SLT transformation to populate columns 27-30. The ABAP layer in SLT populates the columns based on the Database transactions.
The problem comes when I try and convert columns 28-30 to timestamps or seconddates. using syntax like this:
Select to_timestamp(DELETE_TIME)
FROM SLT_REP.AUSP
Select to_seconddate(DELETE_TIME)
FROM SLT_REP.AUSP
I get the following errors:
Problem being, It works some times as well:
The syntax in calculated column looks like this:
With the error from calculation view being:
Has anyone found a good way to convert ABAP timestamps (Decimal (15,0)) to Timestamp or Seconddate in HANA?
There are conversion functions available, that you can use here (unfortunately not very well documented).
select tstmp_to_seconddate(TO_DECIMAL(20110518082403, 15, 0)) from dummy;
TSTMP_TO_SECONDDATE(TO_DECIMAL(20110518082403,15,0))
2011-05-18 08:24:03.0
The problem was with the ABAP data type. I was declaring the target variable as DEC(15,0). The ABAP extracting the data was rounding up the timestamp in some instances to the 60th second. Once in Target Hana, the to_timestamp(target_field) would come back invalid when a time looked like "20150101121060" with the last two digits being the 60th second. This is invalid and would fail. The base Hana layer did not care as it was merely putting a length 14 into into a field. I changed the source variable to be DEC(21,0). This eliminated the ABAP rounding and fixed my problem.
First os all, sorry about my english...
I would like to know a better way to load and handle a big TXT file (around 32GB, matrix 83.000.000x66). I already tried some experiments with TEXTSCAN, IMPORT (out of memory), fgets, fget1,.... Except import approach, all methods works but take to much time (much more than 1 week).
I aim to use this database to execute my sampling process and, after that, a neural network for learning the behabiour.
Someone know how to import this type of data faster? I am thinking to make a database dump in other format (instead TXT), for exemplo SQL server and try to handle with this data accessing the database by queries.
Other doubt, after load all data, can I save in .MAT format and handle with this format in my experiments? Other better idea?
Thanks in advance.
It's impossible to hold such big matrix (5,478,000,000 values) in your workspace/memory (unless you've got tons of ram). So the file format (.mat or .csv) doesn't matter!
You definitly have to use a database (or split the file in sevaral smaller ones and calculate step by step (takes very long too).
Personaly, I only have experiances with sqlite3 and did similar with a 1.47mio x 23 matrix/csv file.
http://git.osuv.de/markus/sqlite-demo (Remember that my csv2sqlite.m was just designed to run with GNU Octave [19k seconds at night ...well, it was bad scripted too :) ].
After everything was imported to the sqlite3 database, I simply can access only the data I need within 8-12 seconds (take a look in the comment header of leistung.m).
If your csv file is straight, you can simply import it with sqlite3 itself
For example:
┌─[markus#x121e]─[/tmp]
└──╼ cat file.csv
0.9736834199195674,0.7239387515366997,0.3382008456696883
0.6963824911102146,0.8328410999877027,0.5863203843393815
0.2291736458336333,0.1427739134201017,0.8062332551565472
┌─[markus#x121e]─[/tmp]
└──╼ sqlite3 csv.db
SQLite version 3.8.4.3 2014-04-03 16:53:12
Enter ".help" for usage hints.
sqlite> CREATE TABLE csvtest (col1 TEXT NOT NULL, col2 TEXT NOT NULL, col3 TEXT NOT NULL);
sqlite> .separator ","
sqlite> .import file.csv csvtest
sqlite> select * from csvtest;
0.9736834199195674,0.7239387515366997,0.3382008456696883
0.6963824911102146,0.8328410999877027,0.5863203843393815
0.2291736458336333,0.1427739134201017,0.8062332551565472
sqlite> select col1 from csvtest;
0.9736834199195674
0.6963824911102146
0.2291736458336333
All is done with https://github.com/markuman/go-sqlite (Matlab and Octave compatible! but I guess no one but me has ever used it!)
However, I recommand Version 2-beta in branch 2 (git checkout -b 2 origin/2) running in coop mode (You'll hit max string length from sqlite3 in ego mode). There's a html doku for version 2 too. http://go-sqlite.osuv.de/doc/