Filling additional columns of an internal table with additional data by SELECT statement? Can this be done? - select

SELECT matnr ersda ernam laeda
FROM mara
INTO CORRESPONDING FIELDS OF TABLE gt_mara
UP TO 100 ROWS.
At this point I have 100 entries in the itab gt_mara.
SELECT aenam vpsta pstat lvorm mtart
FROM mara
INTO CORRESPONDING FIELDS OF TABLE gt_mara
FOR ALL ENTRIES IN gt_mara
WHERE matnr = gt_mara-matnr AND
ersda = gt_mara-ersda AND
ernam = gt_mara-ernam AND
laeda = gt_mara-laeda.
At this point I have 59 entries. Which makes sense. This code is buggy, because it might be modifying the selection criteria at run time.
Anyway what i intended was this: select the first 4 fields of the table at one point, and then select the other 5 at some other.
Of course, this is just an example. Perhaps the second select would be done on a different table with the same key or with a different number of fields.
So can this even be done?
Are there more efficient methods to achieve this than what comes to my mind by default (redoing the complete select) ?

Ok I think the essence of your question is more about whether you can update certain unfilled fields in an internal table directly through a second select statement.
The answer is no. Your second select statement would replace the contents in table gt_mara, so you would be left with an internal table where the first 4 fields are blank, and the last 5 are filled.
The best you could do is something like this:
SELECT matnr ersda ernam laeda
FROM mara
INTO CORRESPONDING FIELDS OF TABLE gt_mara
UP TO 100 ROWS.
SELECT matnr aenam vpsta pstat lvorm mtart
FROM mara
INTO CORRESPONDING FIELDS OF TABLE gt_mara2
FOR ALL ENTRIES IN gt_mara
WHERE matnr = gt_mara-matnr AND
ersda = gt_mara-ersda AND
ernam = gt_mara-ernam AND
laeda = gt_mara-laeda.
loop at gt_mara2 into ls_mara.
modify gt_mara from ls_mara transporting aenam vpsta pstat lvorm mtart
where matnr = ls_mara-matnr.
endloop.
This is obviously quite inefficient, which is why you would always try to make the database do as much of the work for you before you bring the data back to the application server. Obviously if the data is coming from the same table selecting it all in one go is going to be your best option. In most cases even if the data is in different tables you would be better off creating a view or using a join.
In rare cases it is necessary to loop at your internal table to fill in some fields that were not available to you when you did the original select.

Either SELECT everything you need right away (which is the preferred solution if the data comes from the same table) or SELECT the additional stuff later (which is a good idea if the stuff comes from a different table that is not used for the first selection). For assembling the result set, the database usually needs to access the entire dataset anyway, so it doesn't really hurt to select some additional fields - in contrast to hitting the database again with a massive SELECT statement (if the FOR ALL ENTRIES table gets large). Also bear in mind that - depending on the kind of processing you're doing - the contents of the table might have changed in the meantime. If the database transaction (LUW) ends (which is always the case between dialog steps), you loose the database-level transaction isolation.

Related

Power Query - Appending two tables but the other table might be empty depending on the situation - throws an error in that case

I am working on a solution that involves merging two queries in Power Query to retrieve a single data table back to Excel. The first query is always populated but the other query comes from an ERP and might be empty (empty table) from time to time.
Appending the two queries involves making the header names the same in the two queries before the appending takes place. As the second query sometimes results in an empty table, the error arises in the steps when Power Query is modifying the header names in the second table (it cannot modify the header names as there are no headers).
"Error message: Expression.Error: The column 'PartMtl_Company' of the table wasn't found.
Details: PartMtl_Company" where the PartMtl_Company is the leftmost column in my table.
I am kind of thinking that I would need to evaluate whether the second table is empty and skip the renaming steps if that is the case. I assume merging the populated first table with an empty table would cause no problem and would only result in the first table. I have tried to look around for a suitable M-code but have not come across such.
I'm thinking you might be able to use Table.RowCount to solve this. Something along the lines of:
= if Table.RowCount(Table2) > 0 then...
You would modify the headers only if there is data in the second table. Same goes for the appending of the tables: you would only append if there is data in the second table, since you won't have renamed any headers otherwise.
Thank you Marc! That did the trick.
In the end, I wrote some in the lines of
= if Table.RowCount(Table2) > 0 then... (code that works on a non-empty table) ...else Table2
, which returns the empty table if it is empty to begin with. Appending the second table into the first table did not throw an error but returned only the first table like planned.

Why does `FOR ALL ENTRIES` lower performance of CDS view on DB6?

I'm reading data from a SAP Core Data Service (CDS view, SAP R/3, ABAP 7.50) using a WHERE clause on its primary (and only) key column. There is a massive performance decrease when using FOR ALL ENTRIES (about a factor 5):
Reading data using a normal WHERE clause takes about 10 seconds in my case:
SELECT DISTINCT *
FROM ZMY_CDS_VIEW
WHERE prim_key_col eq 'mykey'
INTO TABLE #DATA(lt_table1).
Reading data using FOR ALL ENTRIES with the same WHERE takes about 50 seconds in my case:
"" boilerplate code that creates a table with one entry holding the same key value as above
TYPES: BEGIN OF t_kv,
key_value like ZMY_CDS_VIEW-prim_key_col,
END OF t_kv.
DATA lt_key_values TYPE TABLE OF t_kv.
DATA ls_key_value TYPE t_kv.
ls_key_value-key_value = 'mykey'.
APPEND ls_key_value TO lt_key_values.
SELECT *
FROM ZMY_CDS_VIEW
FOR ALL ENTRIES IN #lt_key_values
WHERE prim_key_col eq #lt_key_values-key_value
INTO TABLE #DATA(lt_table2).
I do not understand why the same selection takes five times as long when utilising FOR ALL ENTRIES. Since the table lt_key_values has only 1 entry I'd expect the database (sy-dbsys is 'DB6' in my case) to do exactly the same operations plus maybe some small neglectable overhead ≪ 40s.
Selecting from the underlying SQL view instead of the CDS (with its Access Control and so on) makes no difference at all, neither does adding or removing the DISTINCT key word (because FOR ALL ENTRIES implies DISTINCT).
A colleague guessed, that the FOR ALL ENTRIES is actually selecting the entire content of the CDS and comparing it with the internal table lt_key_values at runtime. This seems about right.
Using the transaction st05 I recorded a SQL trace that looks like the following in the FOR ALL ENTRIES case:
SELECT
DISTINCT "ZMY_UNDERLYING_SQL_VIEW".*
FROM
"ZMY_UNDERLYING_SQL_VIEW",
TABLE( SAPTOOLS.MEMORY_TABLE( CAST( ? AS BLOB( 2G )) ) CARDINALITY 1 ) AS "t_00" ( "C_0" VARCHAR(30) )
WHERE
"ZMY_UNDERLYING_SQL_VIEW"."MANDT" = ?
AND "ZMY_UNDERLYING_SQL_VIEW"."PRIM_KEY_COL" = "t_00"."C_0"
[...]
Variables
A0(IT,13) = ITAB[1x1(20)]
A1(CH,10) = 'mykey'
A2(CH,3) = '100'
So what actually happens is: ABAP selects the entire CDS content and puts the value from the internal table in something like an additional column. Then it only keeps those values where internal table and SQL result entry do match. ==> No optimzation on database level => bad performance.

Range table and IN not behaving properly

The program just selects everything if the carrid is ok even if it is not ok with the lt_spfli. And there aren't any entries with that carrid it gets runtime error. If I try with for all entries he just selects absolutely the entire SFLIGHT.
PARAMETERS: pa_airp TYPE S_FROMAIRP,
pa_carid TYPE S_CARR_ID.
DATA: lt_spfli TYPE RANGE OF SPFLI,
lt_sflight TYPE TABLE OF SFLIGHT.
SELECT CONNID FROM SPFLI
INTO TABLE lt_spfli
WHERE AIRPFROM = pa_airp.
SELECT * FROM SFLIGHT
INTO TABLE lt_sflight
WHERE CARRID = pa_carid AND CONNID in lt_spfli.
I just suppose, that you want every flight connection from a given airport...
Notice, that a RANGE structure has two more fields in front of the actual "compare value". So selecting directly into it will result in a very gibberish table.
Possible Solutions:
Selecting with RANGE
If you really want to use this temporary table, you can have a look at my answer here where I describe the way to fill RANGEs without any overhead. After this step, your current snippet will work the way to wanted it too. Just make sure, that it really has been filled or everything will be selected.
Selecting with FOR ALL ENTRIES
Before you use this variant you should make absolutely sure, that your specified data object is filled. Otherwise it will result in the same mess as the first solution. To do that, you could write:
* select connid
IF lt_spfli[] IS NOT INITIAL.
* select on SFLIGHT
ELSE.
* no result
ENDIF.
Selecting with JOIN
The "correct" approach in this case would be a JOIN like:
SELECT t~*
FROM spfli AS i
JOIN sflight AS t
ON t~carrid = #pa_carid
AND t~connid = i~connid
INTO TABLE #DATA(li_conns)
WHERE i~airpfrom = #pa_airp.
Use a FOR ALL ENTRIES instead of CONNID in lt_SPFLI.
As so:
SELECT *
FROM sflight
FOR ALL ENTRIES IN lt_spfli
WHERE carrid = pa_carid
AND connid = lt_spfli-connid
You are misunderstanding what a "Ranges Table" is. You fill it incorrectly.
This part of your code demonstrates the misunderstanding (with a little debug, you would see the erroneous contents immediately):
DATA: lt_spfli TYPE RANGE OF SPFLI.
SELECT CONNID FROM SPFLI INTO TABLE lt_spfli ...
A "Ranges Table" is an internal table with 4 components (SIGN, OPTION, LOW, HIGH), used in Open SQL to do complex selections on one database column (NB: it can also be used in several ABAP statements to test the value of an ABAP variable).
So, with your SQL statement, you only initialize the first component of the Ranges table, while you should transfer CONNID into the third component.
In "modern" Open SQL, you'd better do:
SELECT 'I' as SIGN, 'EQ' as OPTION, CONNID as LOW FROM SPFLI INTO TABLE #lt_spfli ...
For more information about Ranges Tables, you may refer to the answer here: What actually high and low means in a ranges table

How to tell if record has changed in Postgres

I have a bit of an "upsert" type of question... but, I want to throw it out there because it's a little bit different than any that I've read on stackoverflow.
Basic problem.
I'm working on moving from mysql to PostgreSQL 9.1.5 (hosted on Heroku). As a part of that, I need to import multiple CSV files everyday. Some of the data is sales information and is almost guaranteed to be new and need to be inserted. But, other parts of the data is almost guaranteed to be the same. For example, the csv files (note plural) will have POS (point of sale) information in them. This rarely changes (and is most likely only via additions). Then there is product information. There are about 10,000 products (vast majority will be unchanged, but it's possible to have both additions and updates).
The final item (but is important), is that I have a requirement to be able to provide an audit trail/information for any given item. For example, if I add a new POS record, I need to be able to trace that back to the file it was found in. If I change a UPC code or description of a product, then I need to be able to trace it back to the import (and file) where the change came from.
Solution that I'm contemplating.
Since the data is provided to me via CSV, then I'm working around the idea that COPY will be the best/fastest way. The structure of the data in the files is not exactly what I have in the database (i.e. final destination). So, I'm copying them into tables in the staging schema that match the CSV (note: one schema per datasource). The tables in the staging schemas will have a before insert row triggers. These triggers can decide what to do with the data (insert, update or ignore).
For the tables that are most likely to contain new data, then it will try to insert first. If the record is already there, then it will return NULL (and stop the insert into the staging table). For tables that rarely change, then it will query the table and see if the record is found. If it is, then I need a way to see if any of the fields are changed. (because remember, I need to show that the record was modified by import x from file y) I obviously can just boiler plate out the code and test each column. But, was looking for something a little more "eloquent" and more maintainable than that.
In a way, what I'm kind of doing is combining a importing system with an audit trail system. So, in researching audit trails, I reviewed the following wiki.postgresql.org article. It seems like the hstore might be a nice way of getting changes (and being able to easily ignore some columns in the table that aren't important - e.g. "last_modified")
I'm about 90% sure it will all work... I've created some testing tables etc and played around with it.
My question?
Is a better, more maintainable way of accomplishing this task of finding the maybe 3 records out of 10K that require a change to the database. I could certainly write a python script (or something else) that reads the file and tries to figure out what to do with each record, but that feels horribly inefficient and will lead to lots of round trips.
A few final things:
I don't have control over the input files. I would love it if they only sent me the deltas, but they don't and it's completely outside of my control or influence.
he system is grow and new data sources are likely to be added that will greatly increase the amount of data being processed (so, I'm trying to keep things efficient)
I know this is not nice, simple SO question (like "how to sort a list in python") but I believe one of the great things about SO is that you can ask hard questions and people will share their thoughts about how they think the best way to solve it is.
I have lots of similar operations. What I do is COPY to temporary staging tables:
CREATE TEMP TABLE target_tmp AS
SELECT * FROM target_tbl LIMIT 0; -- only copy structure, no data
COPY target_tmp FROM '/path/to/target.csv';
For performance, run ANALYZE - temp. tables are not analyzed by autovacuum!
ANALYZE target_tmp;
Also for performance, maybe even create an index or two on the temp table, or add a primary key if the data allows for that.
ALTER TABLE ADD CONSTRAINT target_tmp_pkey PRIMARY KEY(target_id);
You don't need the performance stuff for small imports.
Then use the full scope of SQL commands to digest the new data.
For instance, if the primary key of the target table is target_id ..
Maybe DELETE what isn't there any more?
DELETE FROM target_tbl t
WHERE NOT EXISTS (
SELECT 1 FROM target_tmp t1
WHERE t1.target_id = t.target_id
);
Then UPDATE what's already there:
UPDATE target_tbl t
SET col1 = t1.col1
FROM target_tmp t1
WHERE t.target_id = t1.target_id
To avoid empty UPDATEs, simply add:
...
AND col1 IS DISTINCT FROM t1.col1; -- repeat for relevant columns
Or, if the whole row is relevant:
...
AND t IS DISTINCT FROM t1; -- check the whole row
Then INSERT what's new:
INSERT INTO target_tbl(target_id, col1)
SELECT t1.target_id, t1.col1
FROM target_tmp t1
LEFT JOIN target_tbl t USING (target_id)
WHERE t.target_id IS NULL;
Clean up if your session goes on (temp tables are dropped at end of session automatically):
DROP TABLE target_tmp;
Or use ON COMMIT DROP or similar with CREATE TEMP TABLE.
Code untested, but should work in any modern version of PostgreSQL except for typos.

How to do a SQL query using columns from a related table?

I've got three related SQL tables, simplified they look like this:
ShopTable
[ShopID]
ShelfTable
[ShelfID]
[ShopID]
InventoryTable
[ShelfID]
[Value]
[ShopID] and [ShelfID] are relations. Now what I want to do is get the SUM of [Value] for one [ShopID], but this obviously won't work since [ShopID] ain't part of InventoryTable:
SELECT SUM([Value]) WHERE [ShopID] = '1'
How do I have to write the query to filter the InventoryTable using the ShopID?
SELECT SUM(i.value)
FROM shelfTable s
JOIN inventoryTable i
ON i.shelfId = s.shelfId
WHERE s.shopId = 1
This is a fundamental question about relations between tables, so I'll provide some detail, hoping that you can use some of these ideas when writing SQL queries in the future.
Let's start with one basic thing first. [ShopID] could refer to two different but related columns, one in [ShopTable] and one in [ShelfTable]. The same things applies to [ShelfID]. It's useful to always specify the table.
You describe [ShopID] and [ShelfID] as "relations." As Damien_The_Unbeliever has commented, those columns are, in fact, two pairs of primary and foreign keys. That is, [ShelfTable].[ShelfID] identifies a "shelf" record, and [InventoryTable].[ShelfID] relates an "inventory item" (whatever that is) to a "shelf." (It's not always possible to interpret rows in a database this naively, but I'm willing to guess I'm not too far off from reality.)
Likewise, each "shelf" belongs to one "shop," and [ShelfTable].[ShopID] refers to that specific "shop." Notice that because we have the value of [ShopID] already (I'll call it "#MyShopID"), we don't even need the [ShopTable] here. We can just use [ShelfTable].[ShopID] to filter for the "shelves" we're interested in.
You're asking to get the sum total of [InventoryTable].[Value] for one [ShopID] value, but [ShopID] doesn't show up in [InventoryTable]. That's where your (inner) join comes into play. You know that you'll be adding up values from [InventoryTable], but you've got to specify the particular "shop." You specify #MyShopID for [ShelfTable].[ShelfID], which will do your filtering in [InventoryTable] for you.
One final thing before composing the query. I'm assuming that you haven't oversimplified your tables too much, and that [Value] is the total value of each "inventory item," and not just a unit value. If it wasn't, we'd have to multiply values by quantities, etc., but I'll let you check your own work here.
So, here's what we do:
We select FROM the [InventoryTable]
but we INNER JOIN to the [ShelfTable] on [ShelfID] from both tables
and we only want "shelves" from one "shop," i.e. WHERE [ShelfTable].[ShopID] = #MyShopID
and then we SELECT the SUM([InventoryTable].[Value])
and we're done. In SQL, let's remove the brackets, provide some table aliases, and we'll get a query that looks like this:
SELECT SUM(inv.Value)
FROM InventoryTable AS inv
INNER JOIN ShelfTable AS shf ON shf.ShelfID = inv.ShelfID
WHERE shf.ShopID = #MyShopID
;
Here are a few take-away points to consider. Notice we handled the FROM clause first. You'll always want to do that.
You'll also want a "driving table" to start with, in this case, [InventoryTable]. The other tables in your join add extra information and provide you a means to filter, but don't otherwise interfere with your summing up. More complex queries don't offer such an obvious luxury, but we're not getting too fancy here.
You'll also note, just briefly, that because [ShelfID] is a primary key in [ShelfTable], those [ShelfID]'s are unique values in [ShelfTable], and so each "inventory" thing belongs to a single "shelf." So the join won't cause us to double-count values. That's a good thing to remember when you're not dealing with primary and foreign keys, like we're doing here.
Hope that helps. And I hope I didn't come across as too pedantic.