#BatchFetch type JOIN - jpa

I'm confused about this annotation for an entity field that is of type of another entity:
#BatchFetch(value = BatchFetchType.JOIN)
In the docs of EclipseLink for BatchFetch they explain it as following:
For example, consider an object with an EMPLOYEE and PHONE table in
which PHONE has a foreign key to EMPLOYEE. By default, reading a list
of employees' addresses by default requires n queries, for each
employee's address. With batch fetching, you use one query for all the
addresses.
but I'm confused about the meaning of specifying BatchFetchType.JOIN. I mean, doesn't BatchFetch do a join in the moment it retrieves the list of records associated with employee? The records of address/phone type are retrieved using the foreign key, so it is a join itself, right?
The BatchFetch type is an optional parameter, and for join it is said:
JOIN – The original query's selection criteria is joined with the
batch query
what does this means? Isn't the batch query a join itself?

Joining the relationship and returning the referenced data with the main data is a fetch join. So a query that brings in 1 Employee that has 5 phones, results in 5 rows being returned, with the data in Employee being duplicated for reach row. When that is less ideal, say a query over 1000 employees, you resort to a separate batch query for these phone numbers. Such a query would run once to return 1000 employee rows, and then run a second query to return all employee phones needed to build the read in employees.
The three batch query types listed here then determine how this second batch query gets built. These will perform differently based on the data and database tuning.
JOIN - Works much the same away a fetch join would, except it only returns the Phone data.
EXISTS - This causes the DB to execute the initial query on Employees, but uses the data in an Exists subquery to then fetch the Phones.
IN - EclipseLink agregates all the Employee IDs or values used to reference Phones, and uses them to filter Phones directly.
Best way to find out is always to try it out with SQL logging turned on to see what it generates for your mapping and query. Since these are performance options, you should test them out and record the metrics to determine which works best for your application as its dataset grows.

Related

How to execute graphql query for a specific schema in hasura?

As it can be seen in the following screenshot, the current project database (postgresql)
named default has these 4 schema - public, appcompany1, appcompany2 and appcompany3.
They share some common tables. Right now, when I want to fetch data for customers, I write a query like this:
query getCustomerList {
customer {
customer_id
...
...
}
}
And it fetches the required data from public schema.
But according to the requirements, depending on user interactions in front-end, that query will be executed for appcompanyN (N=1,2,3,..., any positive integer). How do I achieve this goal?
NOTE: Whenever the user creates a new company, a new schema is created for that company. So the total number of schema is not limited to 4.
I suspect that you see a problem where it does not exists actually.
Everything is much simpler than maybe it seems.
A. Where all those tables?
There are a lot of schemas with identical (or almost identical) objects inside them.
All tables are registered in hasura.
Hasura can't register different tables with the same name, so by default names will be [schema_name]_[table_name] (except for public)
So table customer will be registered as:
customer (from public)
appcompany1_customer
appcompany2_customer
appcompany3_customer
It's possible to customize entity name in GraphQL-schema with "Custom GraphQL Root Fields".
B. The problem
But according to the requirements, depending on user interactions in front-end, that query will be executed for appcompanyN (N=1,2,3,..., any positive integer). How do I achieve this goal?
There are identical objects that differs only with prefixes with schema name.
So solutions are trivial
1. Dynamic GraphQL query
Application stores templates of GraphQL-queries and replaces prefix with real schema name before request.
E.g.
query getCustomerList{
[schema]_customer{
}
}
substitute [schema] with appcompany1, appcompany2, appcompanyZ and execute.
2. SQL view for all data
If tables are 100% identical then it's possible to create an sql view as:
CREATE VIEW ALL_CUSTOMERS
AS
SELECT 'public' as schema,* FROM public.customer
UNION ALL
SELECT 'appcompany1' as schema,* FROM appcompany1.customer
UNION ALL
SELECT 'appcompany2' as schema,* FROM appcompany2.customer
UNION ALL
....
SELECT `appcompanyZ',* FROM appcompanyZ.customer
This way: no need for dynamic query, no need to register all objects in all schemas.
You need only to register view with combined data and use one query
query{
query getCustomerList($schema: string) {
all_customer(where: {schema: {_eq: $schema}}){
customer_id
}
}
About both solutions: it's hard to call them elegant.
I myself dislike them both ;)
So decide yourself which is more suitable in your case.

Postgres: remember current ordering of entities, which is produced by doing SELECT without ORDER BY

It is known, that Postgres doesn't guarantee any particular order for a query without ORDER BY. But in most cases it will be the same order in which entities were written into DB.
It so happened, that in the project I'm now taking care of there is a lot of data (Job entities) written without any field denoting the their correct order. Jobs are related to Persons and there is a need to understand which Job was the first created for a particular Person and which was the second.
So, I decided to add an order or just created_at field to Job and ORDER BY this field. I can write simple script to fill created_at or order. So far so good. But I am curious, is there a way to fill this order field automatically by issuing one UPDATE query?

Feedback about my database design (multi tenancy)

The idea of the SaaS tool is to have dynamic tables with dynamic custom fields and values of different types, we were thinking to use "force.com/salesforce.com" example but is seems to be too complicated to maintain moving forward, also making some reports to create with a huge abstraction level, so we came up with simple idea but we have to be sure that this is kinda good approach.
This is the architecture we have today (in few steps).
Each tenant has it own separate database on the cluster (Postgres 12).
TABLE table, used to keep all of those tables as reference, this entity has ManyToOne relation to META table and OneToMany relation with DATA table.
META table is used for metadata configuration, has OneToMany relation with FIELDS (which has name of the fields as well as the type of field e.g. TEXT/INTEGER/BOOLEAN/DATETIME etc. and attribute value - as string, only as reference).
DATA table has ManyToOne relation to TABLES and 50 character varying columns with names like: attribute1...50 which are NULL-able.
Example flow today:
When user wants to open a TABLE DATA e.g. "CARS", we load the META table with all the FIELDS (to get fields for this query). User specified that he want to query against: Brand, Class, Year, Price columns.
We are checking by the logic, the reference for Brand, Class, Year and Price in META>FIELDS table, so we know that Brand = attribute2, Class = attribute 5, Year = attribute6 and Price = attribute7.
We parse his request into a query e.g.: SELECT [attr...2,5,6,7] FROM DATA and then show the results to user, if user decide to do some filters on it, based on this data e.g. Year > 2017 AND Class = 'A' we use CAST() functionality of SQL for example SELECT CAST(attribute6 AS int) AND attribute5 FROM DATA WHERE CAST(attribute6 AS int) > 2017 AND attribute5 = 'A';, so then we can actually support most principles of SQL.
However moving forward we are scared a bit:
Manage such a environment for more tenants while we are going to have more tables (e.g. 50 per customer, with roughly 1-5 mil per TABLE (5mil is maximum which we allow, for bigger data we have BigQuery) which is giving us 50-250 mil rows in single table DATA_X) which might affect performance of the queries, especially when we gave possibilities to manage simple WHERE statements (less,equal,null etc.) using some abstraction language e.g. GET CARS [BRAND,CLASS,PRICE...] FILTER [EQ(CLASS,A),MT(YEAR,2017)] developed to be similar to JQL (Jira Query Language).
Transactions lock, as we allow to batch upload CSV into the DATA_X so once they want to load e.g. 1GB of the data, it kinda locks the table for other systems to access the DATA table.
Keeping multiple NULL columns which can affect space a bit (for now we are not that scared as while TABLE creation, customer can decide how many columns he wants, so based on that we are assigning this TABLE to one of hardcoded entities DATA_5, DATA_10, DATA_15, DATA_20, DATA_30, DATA_50, where numbers corresponds to limitations of the attribute columns, and those entities are different, we also support migration option if they decide to switch from 5 to 10 attributes etc.
We are on super early stage, so we can/should make those before we scale, as we knew that this is most likely not the best approach, but we kept it to run the project for small customers which for now is working just fine.
We were thinking also about JSONB objects but that is not the option, as we want to keep it simple for getting the data.
What do you think about this solution (fyi DATA has PRIMARY key out of 2 tables - (ID,TABLEID) and built in column CreatedAt which is used form most of the queries, so there will be maximum 3 indexes)?
If it seem bad, what would you recommend as the alternative to this solution based on the details which I shared (basically schema-less RDBMS)?
IMHO, I anticipate issues when you wanted to join tables and also using cast etc.
We had followed the approach below that will be of help to you
We have a table called as Cars and also have a couple of tables like CarsMeta, CarsExtension columns. The underlying Cars table will have all the common fields for a ll tenant's. Also, we will have the CarsMeta table point out what are the types of columns that you can have for extending the Cars entity. In the CarsExtension table, you will have columns like StringCol1...5, IntCol1....5, LongCol1...10
In this way, you can easily filter for data also like,
If you have a filter on the base table, perform the search, if results are found, match the ids to the CarsExtension table to get the list of exentended rows for this entity
In case the filter is on the extended fields, do a search on the extension table and match with that of the base entity ids.
As we will have the extension table organized like below
id - UniqueId
entityid - uniqueid (points to the primary key of the entity)
StringCol1 - string,
...
IntCol1 - int,
...
In this case, it will be easy to do a join for entity and then get the data along with the extension fields.
In case you are having the table metadata and data being inferred from separate tables, it will be a difficult task to maintain this over long period of time and also huge volume of data.
HTH

Restrict list of employees in NMBRS to just a few companies

I am creating a report on sick leave on nmbrs.nl using Invantive SQL.
By default this query retrieves data across all companies:
select *
from employees emp
join employeeabsence(emp.id)
This takes an enormous amount of time since for each company a SOAP request is done, plus one SOAP request per employee to retrieve the absence.
Is there an efficient way to restrict it to just a few companies instead of thousands?
You can use the 'use' statement or select a partition which is actually a company.
With use you can use a query like:
use select code from systempartitions#datadictionary where lower(name) like '%companyname%' limit 10
to retrieve the first 10 companies with a specific name.
Also see answer on use with alias on how to also specify the data container alias when running distributed queries.

Search Logic removing records with no association from results when ordering by that association

I'm using search logic to filter and order my results but it removes records from my results when I order by a association and when that association is not always present for all records.
For example say I have a user model which can have one vehicle model but does not have to, if I have a results table where you can order by the users vehicles make I would hope all users without a vehicle record would be considered empty strings and therefore ordered all at the beginning followed by the other user records which have vehicles ordered by the make name.
Unfortunately all the user records which do not have a vehicle are removed from the results.
Is there anyway round this and still use search logic as I find it extremely useful
I think you'll have to explicitly assign a default vehicle that has an empty name