Insert multiple records into fact table based on fields in single record - postgresql

I'm working in Pentaho 4.4.1-GA (Kettle / PDI). The database is Postgres.
I need to be able to insert multiple records into a fact table based on the fields that come from a single record. The single record contains fields:
productcode1, price1
productcode2, price2
productcode3, price3
...
productcode10,price10
So if there was a value for each of the 10 productcode / prices then I'd need to insert a total of 10 records into the fact table. If there were values for 4 of the combinations, then I'd need to insert 4 records into the fact table, etcetera. All field values for the fact records would be identical except for the PK (generated by sequence), product codes, and prices.
I figure that I need some type of looping construct which would let me check whether or not a value was present for each productx field, and if so, do an insert/update step on the fact table with the desired field values. I'm just not sure how to do this in Pentaho.
Any ideas? All suggestions are welcome :)
Thank You,
Rakesh

Could you give a sample input and output for your scenario??
From your example data I can infer that if there are 10 different product codes and only 4 product prices you want to have 4 records inserted into your table. Is that so?
Well for a start you can add a constant value of 1 to those records by filtering for NOT NULL and then use an Group BY Step to count the number of 1's. This would give you the count. BTW it would be helpful if you could provide more details on what columns you would be loading as there are ways to make a PDI transformation execute multiple times

Related

Isolating same data in Alteryx workflow

I have a large set of data, however below shows only 4 entries. I want to isolate the rows that have the same entries. For instance, on table one you can see that the first two rows have the same value in the columns number, ID, Brand, and Partner. I want to only get the rows with these same entries, so my final result will be Data Table 2.
Data Table
Data Table 2
This is the quickest way I can think of
https://drive.google.com/file/d/1Ufbl6J-gwCi5OpqaGHmC93aHuA6g-gSi/view?usp=sharing
Actually, I've just realised the 'RecordID' is redundant so you can leave that tool out, I think?

DAX Query to get a specific range of rows

How can I create a DAX query that retrieves rows from a given range in that order. Let's say I want the rows from row 1000 to row 2000. There is no unique id in my database. Should I add one, or is it possible without it?
If you can't distinguish a filter to create the subset of rows you are targeting then I would use a unique ID. I have not come across anything in DAX that allows to select rows in your powerpivot data set. If there isn't anything unique about the data you are targeting then I imagine you would need a unique ID.
i.e. I normally have column values I can filter with to target or create the subset of data I want to use.
I hope I am wrong and there is a way and look forward to someone posting a way.

How do you insert part of an exsisting column record into a new table in postgreSQL?

I'm in the process of formatting a database and have found a column I'd like to format. It has 3 types of information for every record in the column. For example, a record in my history column shows as (American, born Estonia. 19011974). What I want to do is put this data into new individual columns to make them atomic. I want to extract data such as 'American' into a country column, 'Estonia' into a born column and '1901' into a born column and '1974' into a death column. UPDATE: However, some of the columns hold nullls, for example, another record in the same column might be (German, 19242004), so a normal regular expression wouldn't work for all data would it? Any help is appreciated!
What PostgreSQL statements would I use to obtain specific parts of this data from the individual records? I understand it would be insert and have already came up with:
INSERT INTO historian (id,url)
SELECT object_id, url FROM maintable;
that statement allowed me to get those values into new columns, but those were atomic already so I could easily transition them. Thanks for any help! :)

Batch update table with 2 million rows

Hi all I've got an interesting task to update a single column in a table that has roughly 2 million rows. I've tried doing this using MVC Entity Framework, however I'm encountering "Out of memory exceptions" and I'm just wondering if there's another way.
The interesting part is that its not just a simple update. The procedure needs to read the TelephoneNumber column already in the table and this could be 014812001 for example. Then it needs to calculate a score for this number based on the number of occurrences greater than 1. So for example using the above number this would score a 6 as we have 3 x 1's and 3 x 0's giving a total of 6.
Once this score has been calculated this number needs to be inserted into the a column in the current row be processed, so in our case the row with the TelephoneNumber = 014812001.
Is this possible using TSQL or is it better to carry on with my Entity Framework approach?
For such a bulk update, I would always recommend doing this on the server itself - there's really no point in dragging down 2 million rows, updating a single column, and then pushing those back to the server again.....
I think based on your description, it should be fairly simple to create a little T-SQL user defined function that would calculate this score. Once you have that, you can issue a single T-SQL statement:
UPDATE dbo.YourTable
SET Score = dbo.fnCalculateScore(TelephoneNumber)
WHERE .... (whatever condition you might have) .....
That should be faster by several orders of magnitude than with your Entity Framework approach....

SQL Server 2008: Pivot column with no aggregate function workaround

Yes I know, this question has been asked MANY times but after reading all the posts I found that there wasn't an answer that fits my need. So, Heres my question. I would like to take a column of values and pivot them into rows of 6 columns.
I want to take this...... And turn it into this.......................
G Letter Date Code Ammount Name Account
081278 G 081278 12 00123535 John Doe 123456
12
00123535
John Doe
123456
I have 110000 values in this one column in one table called TempTable. I need all the values displayed because each row is an entity to itself. For instance, There is one unique entry for all of the Letter, Date, Code, Ammount, Name, and Account columns. I understand that the aggregate function is required but is there a workaround that will allow me to get this desired result?
Just use a MAX aggregate
If one row = one column (per group of 6 rows) then MAX of a single value = that row value.
However, the data you've posted in insufficient. I don't see anything to:
associate the 6 rows per group
distinguish whether a row is "Letter" or "Name"
There is no implicit row order or number to rely upon to generate the groups
Unfortunately, the max columns in a SQL 2008 select statement is 4,096 as per MSDN Max Capacity.
Instead of using a pivot, you might consider dynamic SQL to get what you want to do.
Declare #SQLColumns nvarchar(max),#SQL nvarchar(max)
select #SQLColumns=(select '''+ColName+'''',' from TableName for XML Path(''))
set #SQLColumns=left(#SQLColumns,len(#SQLColumns)-1)
set #SQL='Select '+#SQLColumns
exec sp_ExecuteSQL #SQL,N''