PutDatabaseRecord failed with Index header CSVRecord - postgresql

We are trying to insert data to postgresql based database.
We use PutDatabaseRecord processor with following configurations :
But we get an warning and data is not inserted to database and records are not inserted.
Is this apache commoncsv related issue?
How can I solve this issue?
Edit :
After #matt's initial answer : I found intersting thing with data, in address field it has :
"No 60, Marine Drive,"
CSVReader in PutDatabaseRecord uses , value separator. So address must be read as 3 different column values.

The error seems to indicate you have more columns in the header than in (some lines of) data. If that's not the case, I suspect there's either a bug when handling empty columns, or Infer Schema doesn't work as expected with an empty column in the first row (how would it be able to guess the type of "nothing"?).

Related

How to fix an upsert script of a PutDatabaseRecord processor?

I'm working on an ETL that extracts data from Progress and stores it into PostgreSQL.
However; when I have duplicate keys, I'm having problems with my upsert.
The problem is that the component is creating an invalid script. As you can see, it is missing the double quotes. It needs to be: ON CONFLICT ("cdConstrucao")
My settings are :
Deplayed error :
Sample table :
Is there a way to fix it ?
Kind regards
Juliano
I am not sure if you are still facing this issue.
Try following 2 settings
Statement Type: INSERT_IGNORE,
Translate Field Names: false

What could be causing this 'invalid host' error on kdb query?

I get an odd error when trying to query too many dates from a date-partitioned historical database:
q)eod: h"select from eod where date within 2018.01.01 2018.04.22"
'/tablepath/2018.04.04/eod/somecolumn: invalid host
q)eod: h"select from eod where date within 2018.01.17 2018.04.20"
'/tablepath/2018.04.20/eod/othercolumn: invalid host
q)eod: h"select from eod where date within 2018.01.18 2018.04.20"
q)
Note that both dates mentioned in the error messages are within the date range that we manage to extract in the end, and that it fails on a different column each time. This seems to indicate it's something to do with the size of the table being pulled, but when we check the size of the largest table we managed to get:
q)(-22!eod) % 1024 * 1024
646.9043
q)count eod
2872546
we find that it's not particularly large by either memory size nor by number of rows.
Googling for "invalid host" errors doesn't seem to turn up anything relevant, and I'm not seeing anything in the kdb docs about size limits that would be relevant. Anyone got any ideas?
Edit:
When loading the table in a session and making the queries directly, we get what appears to be the same error, but with a different message. For instance:
q)jj: select from eod where date within 2018.01.01 2018.04.22
Too many compressed files open
k){0!(?).#[x;0;p1[;y;z]]}
'./2018.04.04/eod/settlecab: No such file or directory
.
?
(+`exch`date`class..
q.Q))
Note that the file ./2018.04.04/eod/settlecab does in fact exist, and contains data:
I have no problem loading the data for just the date mentioned in the error, and the column mentioned has meaningful values:
q)jj: select from eod where date=2018.04.04
q)select count i by settlecab from jj
settlecab| x
---------| -----
0 | 41573
1 | 2269
The key point seems to be the Too many compressed files open message, but what can I do about this?
Edit for Summary/Solutions:
The table in question had many columns, all stored in a compressed format. When issuing a query against too many dates at once, kdb would try to mmap all of those columns at once, running into a limit on how many compressed files could be open at once.
Once I understood the problem, several solutions were available:
I could pull only certain columns from the database, reducing the number of files that kdb needed to keep open,
I could force kdb to pull all the data into memory by adding a dummy where clause to the query, such as (null column) | not null column (hacky, but it works),
I could have upgraded the kdb version and lifted OS limits (not practical in my case).
I still have no idea why this resulted in an invalid host error when querying the database remotely.
First off, can we just clarify the database structure you're working with. It seems from the filepaths returned in your errors that you've got a date-partitioned database. Did you mean non-segmented database when you said non-partitioned in your original query?
In terms of a fix for your issue, have you tried loading your database into a session, and making those queries directly? If so do you get the same issues?
If that seems to be working alright, the problem might lie with how you're defining your database handle. How is h defined in your original example?
It might also be worth trying to select individual dates from your database, to try and isolate the problem, and to determine if it lies with your on-disk data. Try specifically querying the dates that are mentioned in your errors.
You could also try performing your original queries with a subset a columns, again to try and pinpoint where your issue is coming from.
Let us know if you get any further with this.
Joseph

Talend Data Itegration: Avoid nulls coming out of tExtractXMLField?

I have this simple flow in Talend DI 6 (simplified for posting on SO):
The last step crashes with a NullPointerException, because missing XML attributes are returned as null.
Is there a way to get empty string values instead of nulls?
For now I'm using a tReplace step to remove nulls as a work-around, but it's tedious and adds to the cost of maintenance by creating one more place where the list of attributes needs to be maintained.
In Talend DI 5.6.2 it is possible to add default data values to the schema. The column in the schema is called "Default". If you expect strings, you can set an empty string, which is set if the column value is null:
Talend schema view with Default column
Works also for other data types. Talend DI 6 should still be able to do this, although the field might be renamed.

Talend tMap Set Default Value for Rejected Inner Joins and connect them with the main data flow

i've got the following problem.
I have several tMaps, each has a lookup and at the end all the data is written in a db. The following mockup shall illustrate it:
There can be values in the main data stream which are not found in the lookup tables. For this values there is a rejected path which catches them from the specific tMap.
Requirements:
In case of a rejected inner join the looked up value shall be set to a default value (for example 0, which could be done in the schema of the tMap) and after that these "corrected" records should be added to the "normal" main data flow and process the next lookup.
The tUnite component is not able to handle this cases because it can not exist in a data flow loop.
Does anybody got an idea how to solve this problem?
Cheers.
The answer was so easy that i didn't got it in the first conception. I just have to change the join model from inner to left-join so all the formal rejected values will have a null value in it. Afterwards i can check the columns in the tmap and set them on a default value if they are null.
row1.id == null ? 0 : row1.id
Cheers.
If I understand correctly what you are trying to accomplish you will have to have staging files or staging tables on the database. Once you get the rejected rows, write them on a file or table. The accepted files will go also to a staging table(different than the rejected). Then you can union both tables or files by reading them. The key point is having a staging structure. I attach a picture what how would it be. In the picture the staging structure is a mysql table.
Let me know if it helps!

xx' property on 'yyy could not be set to a 'String' value. You must set this property to a non-null value of type 'Int32'

I am facing this problem due to unknown reason and I have tried every forum and blog for solving this but could not get any satisfactory answer for this.
Let me describe the scenario.
I have a view in database which is consisting columns from two tables. None of the tables have any column with data type "int" hence the resultant view (let's name is "MyRecord") also does not have any column with "int" data types. All the columns in the view have varchar as datatype.
Now, in my .edmx I am adding this view and the model is created (with name "MyRecord") fine with all the properties are created fine with datatype "String". I am using Silverlight with RIA services, to after builing the application related proxies are also created fine without any confiction.
The problem starts when I try to query the "MyRecord" using my domain context, I am getting following error.
Load operation failed for query 'GetMyRecords'. The 'CenterCode' property on 'MyRecord' could not be set to a 'String' value. You must set this property to a non-null value of type 'Int32'.
As seen in the error, it is clearly forcing me to convert data type of "string" column "CenterCode" to the "Int32" which is totally useless and unnecessary for me. The "String" or "varchar" columns are there because they have some business importance and changing them to "Int32" or "int" might break the application in future. Its true that "CenterCode" column has numeric data only in it but there can be character data in future thats why it is created with 'varchar' datatype.
I can not change type of my data just because EF is not supporting.
I used sql server profiler, the query is being executed correct and I can run the same query in SSMS without any error. The error comes in the application only when EF is building objects from the data returned by the query.
I am failed to understand why Entity Framework is throwing this error, it is simply not converting "varchar" to "String" and unnecessarily bringing "Int32" in picture and making the life difficult. I am struggling with this issue since last 4 hours and tried every possible way to resolve it but everything is in vein.
Please provide some information or solution on this if anyone is having it.
EF team, you must have some answer to this question or work around for this problem.
I had the same problem with a double datatype.
Solution:
Change your view/procedure and cast the column like:cast(columnname as int32)
Not sure if you solved this problem or not, but I just ran into something like this while working on multiple result sets with EF. In my case, I had reader.NextResult() that was causing a problem for me because I hadn't read all the records from the previous result and I think EF was failing due to trying to map data from the second result set into the first object.
CAST(columnName as Type) solve my problem in stored procedure.