If profile comes with N flag need to insert to profile_id, if comes U flag update to profile_temp. Some times user comes with N flag as well for update which needs to be updated to profile_ temp only. when it already loaded to profile I'd and later came after some profile has been assigned with N flag but this time it came for update. How do I do it in mapping in infa?
Pls use expression transformation and then update strategies.
First of all use a expression to create a flag.
flag_insert_update =
IIF ( user_flag = 'N', 'INS_PROFILE',
IIF ( user_flag = 'UPDATE', 'UPD_PROFILE_TEMP',
IIF (user_flag = 'n','UPD_PROFILE_TEMP'
))) -- You can calculate as per your logic.
Use a router to split data between profile and profile_temp.
group 1 = flag_insert_update = 'INS_PROFILE'
group 2 = flag_insert_update <> 'INS_PROFILE'
Then use multiple update strategies. one linked to PROFILE table, another linked to PROFILE_TEMP table.
update strategy logic for PROFILE table -
IIF(flag_insert_update ='INS_PROFILE', 'DD_INSERT' ) -- pls note else condition is null.
update strategy logic for PROFILE_TEMP table -
IIF(flag_insert_update ='UPD_PROFILE_TEMP', 'DD_UPDATE' ) -- pls note else condition is null.
Then link update strategies to their corresponding target.
Whole mapping should look like this.
|-UPD_PROFILE --> TGT_PROFILE
EXP_FLAG_INS_UPD -RTR_SPLIT ->|-UPD_PROFILE_TEMP --> TGT_PROFILE_TEMP
Related
Pseudocode:
Get all projects
Use a table function to get all related parts, which uses project id as input and returns 0..* part ids
Copy a value from project to all found part ids
Datamodel:
Table projects consists of fields pj_id and pj_desc
Table parts consists of fields pj_desc_copy and prt_id
There's a function LookupRelationShips(string) that outputs multiple columns (rel_type and rel_id, where if rel_type = 2, rel_id would be a prt_id
My best attempt is this, but it won't let me use the output of the subselect:
UPDATE parts
SET pj_desc_copy = rel.pj_desc
from parts prt
INNER JOIN
(select (select rel_type, rel_id, pj.pj_desc
from LookupRelationShips(pj.pj_id)
where rel_type = 2)
from projects pj) as rel
ON rel.rel_id = prt.prt_id
Use case/restrictions:
This is a one-time statement to update all current parts. From this point onwards project CRUD will result in syncing parts, but using the application to bulk update previous projects is less than ideal (built-in timeouts, lots of overhead, large dataset).
I think your query should be as follow. You can use CROSS APPLY() on the function
UPDATE prt
SET pj_desc_copy = rel.pj_desc
FROM parts prt
INNER JOIN projects pj ON pj.rel_id = prt.prt_id
CROSS APPLY LookupRelationShips(pj.pj_id) rel
WHERE rel.rel_type = 2
UPDATE parts
SET pj_desc_copy = pj.pj_desc
FROM projects pj
CROSS APPLY LookupRelationShips(pj.pj_id) rel
RIGHT JOIN parts prt on rel.rel_id = prt.prt_id
WHERE rel.rel_type = 2
Env: Rails 4.2.4, Postgres 9.4.1.0
Is there a guarantee that ActiveRecord#first method will always return a record with minimal ID and ActiveRecord#last - with maximum ID?
I can see from Rails console that for these 2 methods appropriate ORDER ASC/DESC is added to generated SQL. But an author of another SO thread Rails with Postgres data is returned out of order tells that first method returned NOT first record...
ActiveRecord first:
2.2.3 :001 > Account.first
Account Load (1.3ms) SELECT "accounts".* FROM "accounts" ORDER BY "accounts"."id" ASC LIMIT 1
ActiveRecord last:
2.2.3 :002 > Account.last
Account Load (0.8ms) SELECT "accounts".* FROM "accounts" ORDER BY "accounts"."id" DESC LIMIT 1
==========
ADDED LATER:
So, I did my own investigation (based on D-side answer) and the Answer is NO. Generally speaking the only guarantee is that first method will return first record from a collection. It may as a side effect add ORDER BY PRIMARY_KEY condition to SQL, but it depends on either records were already loaded into cache/memory or not.
Here's methods extraction from Rails 4.2.4:
/activerecord/lib/active_record/relation/finder_methods.rb
# Find the first record (or first N records if a parameter is supplied).
# If no order is defined it will order by primary key.
# ---> NO, IT IS NOT. <--- This comment is WRONG.
def first(limit = nil)
if limit
find_nth_with_limit(offset_index, limit)
else
find_nth(0, offset_index) # <---- When we get there - `find_nth_with_limit` method will be triggered (and will add `ORDER BY`) only when its `loaded?` is false
end
end
def find_nth(index, offset)
if loaded?
#records[index] # <--- Here's the `problem` where record is just returned by index, no `ORDER BY` is applied to SQL
else
offset += index
#offsets[offset] ||= find_nth_with_limit(offset, 1).first
end
end
Here's a few examples to be clear:
Account.first # True, records are ordered by ID
a = Account.where('free_days > 1') # False, No ordering
a.first # False, no ordering, record simply returned by #records[index]
Account.where('free_days > 1').first # True, Ordered by ID
a = Account.all # False, No ordering
a.first # False, no ordering, record simply returned by #records[index]
Account.all.first # True, Ordered by ID
Now examples with has-many relationship:
Account has_many AccountStatuses, AccountStatus belongs_to Account
a = Account.first
a.account_statuses # No ordering
a.account_statuses.first
# Here is a tricky part: sometimes it returns #record[index] entry, sometimes it may add ORDER BY ID (if records were not loaded before)
Here is my conclusion:
Treat method first as returning a first record from already loaded collection (which may be loaded in any order, i.e. unordered). And if I want to be sure that first method will return record with minimal ID - then a collection upon which I apply first method should be appropriately ordered before.
And Rails documentation about first method is just wrong and need to be rewritten.
http://guides.rubyonrails.org/active_record_querying.html
1.1.3 first
The first method finds the first record ordered by the primary key. <--- No, it is not!
If sorting is not chosen, the rows will be returned in an unspecified
order. The actual order in that case will depend on the scan and join
plan types and the order on disk, but it must not be relied on. A
particular output ordering can only be guaranteed if the sort step is
explicitly chosen.
http://www.postgresql.org/docs/9.4/static/queries-order.html (emphasis mine)
So ActiveRecord actually adds ordering by primary key, whichever that is, to keep the result deterministic. Relevant source code is easy to find using pry, but here are extracts from Rails 4.2.4:
# show-source Thing.all.first
def first(limit = nil)
if limit
find_nth_with_limit(offset_index, limit)
else
find_nth(0, offset_index)
end
end
# show-source Thing.all.find_nth
def find_nth(index, offset)
if loaded?
#records[index]
else
offset += index
#offsets[offset] ||= find_nth_with_limit(offset, 1).first
end
end
# show-source Thing.all.find_nth_with_limit
def find_nth_with_limit(offset, limit)
relation = if order_values.empty? && primary_key
order(arel_table[primary_key].asc) # <-- ATTENTION
else
self
end
relation = relation.offset(offset) unless offset.zero?
relation.limit(limit).to_a
end
it may change depending of your Database engine, it returns always the minimal ID in mysql with first method but it does not works the same for postgresql, I had several issues with this when I was a nobai, my app was working as expected in local with mysql, but everything was messed up when deployed to heroku with postgresql, so for avoid issues with postgresql always order your records by id before the query:
Account.order(:id).first
The above ensures minimal ID for mysql, postgresql and any other database engine as you can see in the query:
SELECT `accounts`.* FROM `accounts` ORDER BY `accounts`.`id` ASC LIMIT 1
I don't think that answer you reference is relevant (even to the question it is on), as it refers to non-ordered querying, whereas first and last do apply an order based on id.
In some cases, where you are applying your own group on the query, you cannot use first or last because an order by cannot be applied if the grouping does not include id, but you can use take instead to just get the first row returned.
There have been versions where first and/or last did not apply the order (one of the late Rails 3 on PostgreSQL as I recall), but they were errors.
I have two parameters X and Y
Rules for these are only one of them can be null. They both can be existing, it's ok but they both can't be null.
I'm using this to check if they exist in database so I can assign one and rest of the SP can continue inserting.
SELECT #Id=id FROM Table WHERE (No = #x) OR (No = #y)
What I want to add is if they are both existing I want the Id to be the Id of #x.
I can't get the Case Statement right in my mind. Normally this is a no brainer but somehow I managed to get stuck.
ISNULL() will take the first non null value it finds.
SELECT #Id=id FROM Table WHERE No = ISNULL(#x, #y)
I have a large index definition that takes too long to index. I suspect the main problem is caused by the many LEFT OUTER JOINs generated.
I saw this question, but can't find documentation about using source: :query, which seems to be part of the solution.
My index definition and the resulting query can be found here: https://gist.github.com/jonsgold/fdd7660bf8bc98897612
How can I optimize the generated query to run faster during indexing?
The 'standard' sphinx solution to this would be to use ranged queries.
http://sphinxsearch.com/docs/current.html#ex-ranged-queries
... splitting up the query into lots of small parts, so the database server has a better chance of being able to run the query (rather than one huge query)
But I have no idea how to actully enable that in Thinking Sphinx. Can't see anything in the documentation. Could help you edit the sphinx.conf, but also not sure how TS will cope with you manually editing the config file.
This is the solution that worked best (from the linked question). Basically, you can remove a piece of the main query sql_query and define it separately as a sql_joined_field in the sphinx.conf file.
It's important to add all relevant sql conditions to each sql_joined_field (such as sharding indexes by modulo on the ID). Here's the new definition:
ThinkingSphinx::Index.define(
:incident,
with: :active_record,
delta?: false,
delta_processor: ThinkingSphinx::Deltas.processor_for(ThinkingSphinx::Deltas::ResqueDelta)
) do
indexes "SELECT incidents.id * 51 + 7 AS id, sites.name AS site FROM incidents LEFT OUTER JOIN sites ON sites.id = site_id WHERE incidents.deleted = 0 AND EXISTS (SELECT id FROM accounts WHERE accounts.status = 'enabled' AND incidents.account_id = id) ORDER BY id", as: :site, source: :query
...
has
...
end
ThinkingSphinx::Index.define(
:incident,
with: :active_record,
delta?: true,
delta_processor: ThinkingSphinx::Deltas.processor_for(ThinkingSphinx::Deltas::ResqueDelta)
) do
indexes "SELECT incidents.id * 51 + 7 AS id, sites.name AS site FROM incidents LEFT OUTER JOIN sites ON sites.id = site_id WHERE incidents.deleted = 0 AND incidents.delta = 1 AND EXISTS (SELECT id FROM accounts WHERE accounts.status = 'enabled' AND incidents.account_id = id) ORDER BY id", as: :site, source: :query
...
has
...
end
The magic that defines the field site as a separate query is the option source: :query at the end of the line.
Notice the core index definition has the parameter delta?: false, while the delta index definition has the parameter delta?: true. That's so I could use the condition WHERE incidents.delta = 1 in the delta index and filter out irrelevant records.
I found sharding didn't perform any better, so I reverted to one unified index.
See the whole index definition here: https://gist.github.com/jonsgold/05e2aea640320ee9d8b2.
Important to remember!
The Sphinx document ID offset must be handled manually. That is, whenever an index for another model is added or removed, my calculated document ID will change. This must be updated.
So, in my example, if I added an index for a different model (not :incident), I would have to run rake ts:configure to find out my new offset and change incidents.id * 51 + 7 accordingly.
I've been running into this same issue repeatedly when trying to execute Postgres updates. First I'll run a SELECT query, like so:
SELECT stock_number
FROM products
WHERE available = true
EXCEPT
SELECT stock_number
FROM new_inventory_list;
This selects the stock numbers of all products that indicate that they're available in the current database, but no longer appear in the new list of inventory that's just been downloaded. This command runs very quickly. However, virtually any method I use to update this list seems to take at least ten minutes to run, slowing the server down in the process. For instance:
UPDATE products
SET available = false
WHERE stock_number IN (
SELECT stock_number
FROM products
WHERE available = true
AND stock_number IS NOT NULL
EXCEPT
SELECT stock_number
FROM new_inventory_list
);
There are usually at least 10,000 rows that need to be updated, and often a lot more if a supplier pushes a lot of new inventory at once. Additionally, we need to check for price updates. It's relatively fast and easy to get a list of stock numbers for products that have been changed in price:
WITH overlap AS (
SELECT stock_number
FROM products
INTERSECT
SELECT stock_number
FROM new_inventory_list
)
unchanged AS (
SELECT stock_number, price
FROM products
INTERSECT
SELECT stock_number, price
FROM new_inventory_list
)
SELECT * FROM overlap EXCEPT SELECT stock FROM unchanged;
For this query, I don't even try to use SQL commands to do it, instead I pull the list out into a script, then run UPDATE on each modified value individually. It's slow, but still seems to be faster than any command I've tried that was strictly in SQL. Plus, with an external script, I can output the progress periodically, so I approximate how long it will run for. Stock numbers are unique, although they're occasionally NULL. (Those should be ignored)
I feel like there has to be a much faster way of doing this, but so far I haven't had any luck figuring it out. Any thoughts?
edit:
I think I found a better solution to this problem than any that I've tried so far:
WITH removed AS (
SELECT stock_number
FROM products
WHERE available = true
EXCEPT
SELECT stock_number
FROM new_inventory_list
)
UPDATE products AS p
SET available = false
FROM removed
WHERE removed.stock_number = p.stock_number;
I hadn't considered the idea of using UPDATE and WITH together, and didn't even know it was possible until I read the UPDATE documentation for Postgres. Even though it's considerably faster, it still takes a few minutes to run, so to monitor it, I just run the above command in a loop, with LIMIT 1000 at the end of the SELECT clause, printing a message to the console every time it successfully updates another block.
This query:
WITH removed AS (
SELECT stock_number
FROM products
WHERE available = true
EXCEPT
SELECT stock_number
FROM new_inventory_list
)
UPDATE products AS p
SET available = false
FROM removed
WHERE removed.stock_number = p.stock_number;
… will, I trust, do a superfluous join on the entire table with itself. And probably a poorly performing one, at that, because of the except clause in the with statement.
Think of it this way: suppose a products table with a million rows, around 250k marked as available, and 50k of those that don't appear in a 200k-item strong inventory list. The with query runs like this: 1) find the 50k rows in products that need to be updated; 2) then, for each row in products, check if the id is in those 50k rows in order to re-select those same 50k rows; 3) and update the row.
For improved performance, the update query should select the candidate rows from products that need to be updated directly, and use an anti-join to eliminate unwanted rows. The query #wildplasser posted earlier seems fine:
UPDATE products dst
SET available = false
WHERE available
AND NOT EXISTS (
SELECT 1
FROM new_inventory_list nx
WHERE nx.stock_number = dst.stock_number
);
Another point is the "about 50 columns, 20 of which are indexed" you mentioned in the comments: That will slow down updates considerable. Imagine: each row that gets updated needs to be written into not just that table, but in an additional 20 tables. Are you sure this shouldn't be normalized a bit more and that you actually need each of those indexes?
Have you tried
WITH removed AS (
SELECT stock_number
FROM products p1
LEFT JOIN new_inventory_list n1
ON p1.stock_number=n1.stock_number
WHERE p1.available AND n1.stock_number IS NULL
)
I don't know how the EXCEPT is being done; perhaps this will retain some indexing for use in the UPDATE. Also, if available is usually false, I would add a partial index
CREATE INDEX product_available ON product(stock_number) WHERE available;