SSIS, load file into table with '' (nothing) to get NULL - tsql

I have file with 150 columns and most of them are nothing, represented like 2 consecutive delimiters (100,,,,,,200) in this case comma is delimiter so Column0=100, Column1='', Column2='', etc...
What is the fastest (afraid to say mass) way to put this into target table, (and target table has custom DDL can not change, and I have to put in correct way have NULL for date, not 1900-1-1).
I have series of this files, I can go with SSIS and put isNothing ? NULL : Column1 for each of 150 columns, but maybe there is better way?
I tried to load this file into NEW table and got intermediate table with same Nothing, so now I can compose sql with CASE'ing too.
THanks for you help and sharing your knowledge.
M

Did you tried to check the "Retain null value" box in the flat file source ?
Hope this Help,

Related

Only saving files without null value on Nifi

an absolute newbie here trying out Nifi and postgresql on docker compose.
I have a sample CSV file with 4 columns.
I want to split this CSV file into two
based on whether if it contains a row with null value or not.
Grade ,BreedNm ,Gender ,Price
C++ ,beef_cattle ,Female ,10094
C++ ,milk_cow ,Female ,null
null ,beef_cattle ,Male ,12704
B++ ,milk_cow ,Female ,16942
for example, above table should be split into two tables each containing row 1,4 and 2,3
and save each of them into a Postgresql table.
Below is what I have tried so far.
I was trying to
split flowfile into 2 and only save rows without null value on left side and with null values on right side.
Write each of them into a table each named 'valid' and 'invalid'
but I do not know how to split the csv file and save them as a psql table through Nifi.
Can anyone help?
What you could do is use a RouteOnContent with the "Content Must Contain Match" factor, with the match being null. Therefore, anything that matches null would be routed that way, and anything not matching null would be routed a different way. Not sure if it's possible the way you're doing it, but that is 1 possibility. The match could be something like (.*?)null
I used QueryRecord processor with two SQL statements each sorting out the rows with null value and the other without the null value and it worked as intended!

Getting Redshift error 1214 during copy

I have the following table in redshift:
Column | Type
id integer
value varchar(255)
I'm trying to copy in (using the datapipeline's RedshiftCopyActivity), and the data has the line 1,maybe as the entry trying to be added, but I get back the error 1214:Delimiter not found, and the raw_field_data value is maybe. Is there something I'm missing in the copy parameters?
The entire csv is three lines that goes:
1,maybe
2,no
3,yes
You may want to take a look at the similar question Redshift COPY command delimiter not found.
Make sure your RedshiftCopyActivity configuration includes FORMAT AS CSV from https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-csv.
Be sure your input data has your configured delimiter between every field, even in the case of nulls.
Be sure you do not have any trailing blank lines.
You can run the following SQL (from the linked question) to see more specific details of what row is causing the problem.
SELECT le.starttime,
d.query,
d.line_number,
d.colname,
d.value,
le.raw_line,
le.err_reason
FROM stl_loaderror_detail d,
JOIN stl_load_errors le
ON d.query = le.query
ORDER BY le.starttime DESC;

Postgresql - selecting observations and putting in new table

Sorry if this has already been asked. I couldn't see it in previously asked questions.
I have a table - 'eightks'.
This file contains 1,000,000 text documents.
I only need those that mention the word 'other events'. So I am trying to do some text matching and then output these files into a new table.
My current code is;
SELECT * FROM eightks\d
WHERE to_tsvector(text) ## to_tsquery('other_events');
When I run this I get the following error
string is too long for tsvector (2368732 bytes, max 1048575 bytes)
Also How do I output the matching rows into a new table?
Any help is appreciated.
That's a documented limitation.
The length of a tsvector (lexemes + positions) must be less than 1 megabyte
It might be possible to change the source code and recompile. See ts_type.h. I suspect it won't be simple, though.
You might need to break the documents up into smaller pieces for searching, then combine the pieces for presentation to the user.
As for inserting the rows into another table, you can just insert a correct select statement. Basically . . .
insert into table_name
select ...
You might need to supply column names.

Redshift - Adding a column, do we have to change our previous CSVs to include it?

I currently have a redshift table in our database that has 10 columns, and I want to add another. It's trivial to do an alter table to do this.
My question - When I do this, will all my old CSV files fail to insert into redshift (via COPY from S3) given they won't have this new column?
I was hoping the columns would just be NULL vs. it failing on import, but I haven't seen any documentation on this.
Ideally I wish I could specify the actual column name in the header row of the CSV, but I haven't seen if that is possible anywhere.
FILLRECORD in COPY command does that: 'Allows data files to be loaded when contiguous columns are missing at the end of some of the records'.

How to import file into sqlite?

On a Mac, I have a txt file with two columns, one being an autoincrement in an sqlite table:
, "mytext1"
, "mytext2"
, "mytext3"
When I try to import this file, I get a datatype mismatch error:
.separator ","
.import mytextfile.txt mytable
How should the txt file be structured so that it uses the autoincrement?
Also, how do I enter in text that will have line breaks? For example:
"this is a description of the code below.
The text might have some line breaks and indents. Here's
the related code sample:
foreach (int i = 0; i < 5; i++){
//do some stuff here
}
this is a little more follow up text."
I need the above inserted into one row. Is there anything special I need to do to the formatting?
For one particular table, I want each of my rows as a file and import them that way. I'm guessing it is a matter of creating some sort of batch file that runs multiple imports.
Edit
That's exactly the syntax I posted, minus a tab since I'm using a comma. The missing line break in my post didn't make it as apparent. Anyways, that gives the mismatch error.
I was looking on the same problem. Looks like I've found an answer on the first part of your question — about importing a file into a table with ID field.
So yes, create a temporary table without ID, import your file into it, then do insert..select to copy its data into your target table. (Remove leading commas from mytextfile.txt).
-- assuming your table is called Strings and
-- was created like this:
-- create table Strings( ID integer primary key, Code text )
create table StringsImport( Code text );
.import mytextfile.txt StringsImport
insert into Strings ( Code ) select * from StringsImport;
drop table StringsImport;
Do not know what to do with newlines. I've read some mentions that importing in CSV mode will do the trick (.mode csv), but when I tried it did not seem to work.
In case anyone is still having issues with this you can download an SQLLite manager.
There are several that allow importing from a CSV file.
Here is one but a google search should reveal a few: http://sqlitemanager.en.softonic.com/
I'm in the process of moving data containing long text fields with various punctuation marks (they are actually articles on coding) into SQLite and I've been experimenting with various text imports.
I created a database in SQLite with a table:
CREATE TABLE test (id PRIMARY KEY AUTOINCREMENT, textfield TEXT);
then do a backup with .dump.
I then add the text below the "CREATE TABLE" line manually in the resulting .dump file as such:
INSERT INTO test textfield VALUES (1,'Is''t it great to have
really long text with various punctaution marks and
newlines');
Change any single quotes to two single quotes (change ' to ''). Note that an index number needs to be added manually (I'm sure there is an AWK/SED command to do it automatically). Change the auto increment number in the "sequence" line in the dump file to one above the last index number you added (I don't have SQLite in front of me to give you the exact line, but it should be obvious).
With the new file, I can then do a restore onto the database