Single table inheritance insertion order - persistence

I have a following situation when I use OpenJPA:
Those 3 entities: SelectQueryElement, SingleSelectQueryElement, CompositeQueryElement are implemented thought SINGLE_TABLE inheritance.
I also have ColumnClass entity which holds top-level CompositeQueryElement. Each of these classes have EAGER loading and PERSIST/MERGE cascade type.
Now, if I want to mimic structure like this:
ColumnClass
| CompositeSelectQueryElement // top-level query element
| | SingleSelectQueryElement
| | CompositeSelectQueryElement
| | | SingleSelectQueryElement
| | SingleSelectQueryElement
| | CompositeSelectQueryElement
| | | SingleSelectQueryElement
... and I try to merge ColumnClass object, I expect for insertion order to be "preorder". But what I get not even close to that.
I expected the following insertion order:
ColumnClass
1 | CompositeSelectQueryElement // TOP-LEVEL
2 | | SingleSelectQueryElement
3 | | CompositeSelectQueryElement
4 | | | SingleSelectQueryElement
5 | | SingleSelectQueryElement
6 | | CompositeSelectQueryElement
7 | | | SingleSelectQueryElement
OR at least:
ColumnClass
1 | CompositeSelectQueryElement // TOP-LEVEL
2 | | SingleSelectQueryElement
3 | | CompositeSelectQueryElement
6 | | | SingleSelectQueryElement
4 | | SingleSelectQueryElement
5 | | CompositeSelectQueryElement
7 | | | SingleSelectQueryElement
But what I got was this: (note that first come all Composites and after them all Singles)
ColumnClass
1 | CompositeSelectQueryElement // TOP-LEVEL
4 | | SingleSelectQueryElement
2 | | CompositeSelectQueryElement
5 | | | SingleSelectQueryElement
6 | | SingleSelectQueryElement
3 | | CompositeSelectQueryElement
7 | | | SingleSelectQueryElement
Order of Singles is not consistent either. Sometimes it's nothing like this but all random.
The question: is there any way to work around this ordering issue and "suggest" to OpenJPA what I want to achieve.
I use OpenJPA v2.2.

This message from the OpenJPA mailing list seems to suggest your solution is to include <property name="openjpa.jdbc.UpdateManager" value="operation-order"> in your configuration. Hope that helps.

Related

Facet a Mutli-value(MVA) type field in sphinx

I have executed below query in sphinx,
select MVA_FIELD from mySphinxIndex facet MVA_FIELD order by count(*) desc;
What I got is like,
+----------------------------+----------+
| MVA_FIELD | count(*) |
+----------------------------+----------+
| | 664 |
| 0 | 536 |
| 13 | 439 |
| 4,13 | 8 |
| 19,13 | 8 |
| 18,13,20 | 8 |
| 8,17,18 | 8 |
| 8,18,13 | 8 |
| 8,15,18 | 8 |
| 8,13,20 | 7 |
| 17,13 | 7 |
| 18,19,20 | 7 |
| 8,17 | 7 |
| 13,17,19 | 7 |
| 11,6 | 7 |
| 6,11,13 | 7 |
| 15,18 | 7 |
| 11,13,20 | 7 |
| 11,13,17 | 7 |
| 6,18,19 | 6 |
| 7,20 | 6 |
| 8,11,13 | 6 |
| 13,17,20 | 6 |
I want to get the count of each ids in MVA_FIELD. For example, I just want the count of 0, 4, 13,... each id separately. How to achieve this ?
Honestly dont how how to do it with FACET suger, but with a normal GROUP BY query, would just use the GROUPBY() function when grouping by a MVA attribute
SELECT GROUPBY() AS value,COUNT(*) FROM mySphinxIndex GROUP BY MVA_FIELD ORDER BY COUNT(*) DESC;
From the docs
A special GROUPBY() function is also supported. It returns the GROUP BY key. That is particularly useful when grouping by an MVA value, in order to pick the specific value that was used to create the current group.

Is there a V-lookup effect in Microsoft Access?

I am a novice self-teaching Microsoft Access.
I have an MS Access database with a table of students (Table1).
Table1
+----+-----------+----------+------------+------------+
| id | firstname | lastname | Year_Group | Form_Group |
+----+-----------+----------+------------+------------+
| 2 | mnb | nbgfv | 7 | 1 |
| 3 | jhg | uhgf | 8 | 2 |
| 4 | poi | ijuy | 9 | 2 |
| 5 | tgf | tgfd | 10 | 2 |
| 6 | wer | qwes | 11 | 2 |
+----+-----------+----------+------------+------------+
Every day students days are recorded sort of like Table2.
Table2
+----------+----+-----------+----------+------------+--------+-----------+----------+
| Date | id | firstname | lastname | Year_Group | Effort | Behaviour | Homework |
+----------+----+-----------+----------+------------+--------+-----------+----------+
| 28/02/19 | 2 | mnb | nbgfv | 7 | Good | Good | Y |
| 28/02/19 | 3 | jhg | uhgf | 8 | OK | OK | Y |
| 28/02/19 | 4 | poi | ijuy | 9 | Bad | Bad | N |
| 01/03/19 | 5 | tgf | tgfd | 10 | Good | OK | Y |
| 01/03/19 | 6 | wer | qwes | 11 | Good | Good | Y |
+----------+----+-----------+----------+------------+--------+-----------+----------+
Is there a way (when using a list box or combo box) to select a student from Table1 so that their information is used for the corresponding columns in Table2?
Or is there a more efficient way to do this?
Firstly, you should normalise your data.
Currently, you are repeating the firstname, lastname, and Year_Group data in two separate tables, which not only bloats your database, but also means that such data must be maintained in two separate places, potentially leading to inconsistencies and then uncertainty as to which is the master.
Instead, I would suggest that your Students table should contain all information pertaining to the characteristics of a student:
Students
+----+-----------+----------+------------+------------+
| id | firstname | lastname | Year_Group | Form_Group |
+----+-----------+----------+------------+------------+
| 2 | mnb | nbgfv | 7 | 1 |
| 3 | jhg | uhgf | 8 | 2 |
| 4 | poi | ijuy | 9 | 2 |
| 5 | tgf | tgfd | 10 | 2 |
| 6 | wer | qwes | 11 | 2 |
+----+-----------+----------+------------+------------+
And the information pertaining to each school day should only reference the student ID in the Students table:
SchoolDays
+----------+----+--------+-----------+----------+
| Date | id | Effort | Behaviour | Homework |
+----------+----+--------+-----------+----------+
| 28/02/19 | 2 | Good | Good | Y |
| 28/02/19 | 3 | OK | OK | Y |
| 28/02/19 | 4 | Bad | Bad | N |
| 01/03/19 | 5 | Good | OK | Y |
| 01/03/19 | 6 | Good | Good | Y |
+----------+----+--------+-----------+----------+
Then, if you want to display the data in its entirety, you would use a query which joins the two tables, e.g.:
select
t2.date,
t1.firstname,
t1.lastname,
t1.year_group,
t2.effort,
t2.behaviour,
t2.homework
from
students t1 inner join schooldays t2 on t1.id = t2.id

Redshift Distribution By Child Columns

My Situation
I have some tables in my redshift cluster that all break down into either an order_id, shipment_id, or shipment_item_id depending on how granular the table is. order_id is a 1 to many relationship on shipment_id and shipment_id is a 1 to many on shipemnt_item_id.
My Question
I distribute on order_id, so all shipment_id and shipment_item_id records should be on the same nodes across the tables since they are grouped by order_id. My question is, when I have to join on shipment_id or shipment_item_id then will redshift know that the records are on the same nodes, or will it still broadcast the tables since they aren't joined on order_id?
Example Tables
unified_order shipment_details
+----------+-------------+------------------+ +-------------+-----------+--------------+
| order_id | shipment_id | shipment_item_id | | shipment_id | ship_day | ship_details |
+----------+-------------+------------------+ +-------------+-----------+--------------+
| 1 | 1 | 1 | | 1 | 1/1/2017 | stuff |
| 1 | 1 | 2 | | 2 | 5/1/2017 | other stuff |
| 1 | 1 | 3 | | 3 | 6/14/2017 | more stuff |
| 1 | 2 | 4 | | 4 | 5/13/2017 | less stuff |
| 1 | 2 | 5 | | 5 | 6/19/2017 | that stuff |
| 1 | 3 | 6 | | 6 | 7/31/2017 | what stuff |
| 2 | 4 | 7 | | 7 | 2/5/2017 | things |
| 2 | 4 | 8 | +-------------+-----------+--------------+
| 3 | 5 | 9 |
| 3 | 5 | 10 |
| 4 | 6 | 11 |
| 5 | 7 | 12 |
| 5 | 7 | 13 |
+----------+-------------+------------------+
Distribution
distribution_by_node
+------+----------+-------------+------------------+
| node | order_id | shipment_id | shipment_item_id |
+------+----------+-------------+------------------+
| 1 | 1 | 1 | 1 |
| 1 | 1 | 1 | 2 |
| 1 | 1 | 1 | 3 |
| 1 | 1 | 2 | 4 |
| 1 | 1 | 2 | 5 |
| 1 | 1 | 3 | 6 |
| 1 | 5 | 7 | 12 |
| 1 | 5 | 7 | 13 |
| 2 | 2 | 4 | 7 |
| 2 | 2 | 4 | 8 |
| 3 | 3 | 5 | 9 |
| 3 | 3 | 5 | 10 |
| 4 | 4 | 6 | 11 |
+------+----------+-------------+------------------+
The Amazon Redshift documentation does not go into detail how information is shared between nodes, but it is doubtful that it "broadcasts the tables".
Rather, information is probably sent between nodes based on need -- only the relevant columns would be shared, and possibly only sub-ranges of the data.
Rather than worrying too much about the internal implementation, you should test various DISTKEY and SORTKEY strategies against real queries to determine performance.
Follow the recommendations from Choose the Best Distribution Style to minimize the amount of data that needs to be sent between nodes and consult Amazon Redshift Best Practices for Designing Queries to improve queries.
You can EXPLAIN your query to see how data will be distributed (or not) during the execution. In this doc you'll see how to read the query plan:
Evaluating the Query Plan

Are pg_stat_database and pg_stat_activity really listing the same stuff aka how do I get a list of all backends

In this answer to the question Right query to get the current number of connections in a PostgreSQL DB the poster implies that
SELECT sum(numbackends) FROM pg_stat_database;
and
SELECT count(*) FROM pg_stat_activity;
give the same results.
However, if I do this on my db the first one says 119 and the second one 30.
This is the difference as shown by summing numbackends and counting:
+------+-------------+-------+
| | numbackends | count |
+------+-------------+-------+
| db1 | 1 | 1 |
| db2 | 1 | 1 |
| db3 | 1 | 1 |
| db4 | 1 | 1 |
| db5 | 2 | 2 |
| db6 | 2 | 2 |
| db7 | 12 | 3 | <--
| db8 | 4 | 4 |
| db9 | 5 | 5 |
| db10 | 78 | 35 | <--
+------+-------------+-------+
Why does this difference exist?
How can I list each of the 119-30=89 backends not shown in pg_stat_activity?

How to compute the dot product of two column (think full column as a vector)?

gave this table:
| a | b | c |
|---+---+----+
| 3 | 4 | |
| 1 | 2 | |
| 1 | 3 | |
| 2 | 2 | |
I want to get the dot product of two column a and b ,the result should be equel to (3*4)+(1*2)+(1*3)+(2*2) which is 21.
I don't want use the clumsy formula (B1*B2+C1*C2+D1*D2+E1*E2) because actually I have a large table waiting to calculate.
I know emacs's Calc tool has a "vprod" function which can do those sort of things ,but I dont' know how to turn the full column to a vector.
Can anybody tell me how to achieve this task,appreciate it!
In emacs-calc, the simple product of 2 vectors calculates the dot product.
This works (I put the result in #6$3; also the parenthesis can be omitted):
| a | b | c |
|---+---+----|
| 3 | 4 | |
| 1 | 2 | |
| 1 | 3 | |
| 2 | 2 | |
|---+---+----|
| | | 21 |
#+TBLFM: #6$3=(#I$1..#II$1)*(#I$2..#II$2)
#I and #II span from the 1st hline to the second.
This can be solved using babel and R in org-mode:
#+name: mytable
| a | b | c |
|---+---+----+
| 3 | 4 | |
| 1 | 2 | |
| 1 | 3 | |
| 3 | 2 | |
#+begin_src R :var mytable=mytable
sum(mytable$a * mytable$b)
#+end_src
#+RESULTS:
: 23