I want to generate a report chart for this data from two tables are required. So i have a query like this ( Am using OrientDB )
select col1,col2 from (select col11,col22 from t1 where col11 = $P{col11}) where col1 = $P{col1} and col2 = $P{col2}
When i run this report i will get following exception
Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
at java.util.ArrayList.rangeCheck(ArrayList.java:635) at java.util.ArrayList.get(ArrayList.java:411) at com.orientechnologies.orient.core.sql.filter.OSQLPredicate.bindParameters(OSQLPredicate.java:366)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLResultsetAbstract.assignTarget(OCommandExecutorSQLResultsetAbstract.java:182)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.assignTarget(OCommandExecutorSQLSelect.java:435)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.executeSearch(OCommandExecutorSQLSelect.java:417)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLSelect.execute(OCommandExecutorSQLSelect.java:388)
at com.orientechnologies.orient.core.sql.OCommandExecutorSQLDelegate.execute(OCommandExecutorSQLDelegate.java:64)
at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.executeCommand(OAbstractPaginatedStorage.java:1163)
... 8 more
As my observation if i have single where condition i.e., either in subquery or outerquery it works, if it is having where clause in both they this is the exception thrown.
Don't know if it solves your problem, but use of aliases could maybe help you? For proper SQL alias on columns and table :
select col1,col2
from ( select col11 as col1, col22 as col2
from t1 where col11 = $P{col11}) as table
where col1 = $P{col1} and col2 = $P{col2}
Related
I have a PostgreSQL database that is partitioned into multiple schemas, one for each tenant and workspace (the exact meaning of these terms doesn't matter, they're just dimensions of the partition scheme):
reports/tenant1/workspace1
reports/tenant1/workspace2
reports/tenant2/workspace1
reports/tenant3/workspace1
reports/tenant3/workspace2
reports/tenant3/workspace3
Each workspace schema has the same set of tables with identical definitions, and each table includes "_tenant" and "_workspace" columns with the values of its enclosing schema, e.g., tenant1 and workspace1.
In the public schema, there is one view per table definition that unions the tables with that definition across all workspace schemas. For example, the view for "example_table" would be:
SELECT _tenant, _workspace, column1, column2, column3
FROM "reports/tenant1/workspace1".example_table
WHERE _tenant = 'tenant1' AND _workspace = 'workspace1'
UNION ALL
SELECT _tenant, _workspace, column1, column2, column3
FROM "reports/tenant1/workspace2".example_table
WHERE _tenant = 'tenant1' AND _workspace = 'workspace2'
UNION ALL
SELECT _tenant, _workspace, column1, column2, column3
FROM "reports/tenant2/workspace1".example_table
WHERE _tenant = 'tenant2' AND _workspace = 'workspace1'
UNION ALL
Note the "redundant" partition predicates in each SELECT. I added these because it seems to provide a hint to PostgreSQL not to execute queries on tables in unrelated partitions when querying the view with the same predicate. Indeed, EXPLAIN ANALYZE shows "(never executed)" for those queries.
Queries are made from a BI tool to the views, and the BI tool automatically adds predicates on the "_tenant" and "_workspace" columns based on attributes of the logged-in user.
Now that there are 50+ workspaces, I've noticed that queries on the views can have non-optimal plans when compared to equivalent queries on the underlying tables. For example, the following query on the views might use a nested loop join that takes 1 minute:
SELECT * FROM
(
SELECT column1, column2, column3
FROM example_view1
WHERE _tenant = 'tenant1' AND _workspace = 'workspace1'
) v1
JOIN
(
SELECT column4, column5, column6
FROM example_view2
WHERE _tenant = 'tenant1' AND _workspace = 'workspace1'
) v2
ON v1.column1 = v2.column4
Whereas the equivalent query on the underlying tables would use a hash join and complete in under a second:
SELECT * FROM
(
SELECT column1, column2, column3
FROM "reports/tenant1/workspace1".example_table1
WHERE _tenant = 'tenant1' AND _workspace = 'workspace1'
) v1
JOIN
(
SELECT column4, column5, column6
FROM "reports/tenant1/workspace1".example_table2
WHERE _tenant = 'tenant1' AND _workspace = 'workspace1'
) v2
ON v1.column1 = v2.column4
I know the subqueries are pointless, but it's how the BI tool's query builder generates the SQL for the join.
Is there a way to let the query planner know that all tables outside the selected partition won't return results and can be ignored? As I said before, EXPLAIN ANALYZE shows queries are never executed on these tables due to the "redundant" partition predicates in the view definition, but that doesn't seem to be used at planning time.
I have two queries :
Queries Simplified excluding Joins
Query 1 : select ProductName,NumberofProducts (in inventory) from Table1.....;
Query 2 : select ProductName, NumberofProductssold from Table2......;
I would like to know how I can get an output as :
ProductName NumberofProducts(in inventory) ProductName NumberofProductsSold
The relationships used for getting the outputs for each query are different.
I need the output this way for my SSRS report .
(I tried the union statement but it doesnt work for the output I want to see. )
Here is an example that does a union between two completely unrelated tables: the Student and the Products table. It generates an output that is 4 columns:
select
FirstName as Column1,
LastName as Column2,
email as Column3,
null as Column4
from
Student
union
select
ProductName as Column1,
QuantityPerUnit as Column2,
null as Column3,
UnitsInStock as Column4
from
Products
Obviously you'll tweak this for your own environment...
I think you are after something like this; (Using row_number() with CTE and performing a FULL OUTER JOIN )
Fiddle example
;with t1 as (
select col1,col2, row_number() over (order by col1) rn
from table1
),
t2 as (
select col3,col4, row_number() over (order by col3) rn
from table2
)
select col1,col2,col3,col4
from t1 full outer join t2 on t1.rn = t2.rn
Tables and data :
create table table1 (col1 int, col2 int)
create table table2 (col3 int, col4 int)
insert into table1 values
(1,2),(3,4)
insert into table2 values
(10,11),(30,40),(50,60)
Results :
| COL1 | COL2 | COL3 | COL4 |
---------------------------------
| 1 | 2 | 10 | 11 |
| 3 | 4 | 30 | 40 |
| (null) | (null) | 50 | 60 |
How about,
select
col1,
col2,
null col3,
null col4
from Table1
union all
select
null col1,
null col2,
col4 col3,
col5 col4
from Table2;
The problem is that unless your tables are related you can't determine how to join them, so you'd have to arbitrarily join them, resulting in a cartesian product:
select Table1.col1, Table1.col2, Table2.col3, Table2.col4
from Table1
cross join Table2
If you had, for example, the following data:
col1 col2
a 1
b 2
col3 col4
y 98
z 99
You would end up with the following:
col1 col2 col3 col4
a 1 y 98
a 1 z 99
b 2 y 98
b 2 z 99
Is this what you're looking for? If not, and you have some means of relating the tables, then you'd need to include that in joining the two tables together, e.g.:
select Table1.col1, Table1.col2, Table2.col3, Table2.col4
from Table1
inner join Table2
on Table1.JoiningField = Table2.JoiningField
That would pull things together for you into however the data is related, giving you your result.
If you mean that both ProductName fields are to have the same value, then:
SELECT a.ProductName,a.NumberofProducts,b.ProductName,b.NumberofProductsSold FROM Table1 a, Table2 b WHERE a.ProductName=b.ProductName;
Or, if you want the ProductName column to be displayed only once,
SELECT a.ProductName,a.NumberofProducts,b.NumberofProductsSold FROM Table1 a, Table2 b WHERE a.ProductName=b.ProductName;
Otherwise,if any row of Table1 can be associated with any row from Table2 (even though I really wonder why anyone'd want to do that), you could give this a look.
Old question, but where others use JOIN to combine unrelated queries to rows in one table, this is my solution to combine unrelated queries to one row, e.g:
select
(select count(*) c from v$session where program = 'w3wp.exe') w3wp,
(select count(*) c from v$session) total,
sysdate
from dual;
which gives the following one-row output:
W3WP TOTAL SYSDATE
----- ----- -------------------
14 290 2020/02/18 10:45:07
(which tells me that our web server currently uses 14 Oracle sessions out of the total of 290 sessions; I log this output without headers in an sqlplus script that runs every so many minutes)
Load each query into a datatable:
http://www.dotnetcurry.com/ShowArticle.aspx?ID=143
load both datatables into the dataset:
http://msdn.microsoft.com/en-us/library/aeskbwf7%28v=vs.80%29.aspx
This is what you can do. Assuming that your ProductName column have common values.
SELECT
Table1.ProductName,
Table1.NumberofProducts,
Table2.ProductName,
Table2.NumberofProductssold
FROM Table1
INNER JOIN Table2
ON Table1.ProductName= Table2.ProductName
Try this:
SELECT ProductName,NumberofProducts ,NumberofProductssold
FROM table1
JOIN table2
ON table1.ProductName = table2.ProductName
Try this:
GET THE RECORD FOR CURRENT_MONTH, LAST_MONTH AND ALL_TIME AND MERGE THEM INTO SINGLE ARRAY
$analyticsData = $this->user->getMemberInfoCurrentMonth($userId);
$analyticsData1 = $this->user->getMemberInfoLastMonth($userId);
$analyticsData2 = $this->user->getMemberInfAllTime($userId);
foreach ($analyticsData2 as $arr) {
foreach ($analyticsData1 as $arr1) {
if ($arr->fullname == $arr1->fullname) {
$arr->last_send_count = $arr1->last_send_count;
break;
}else{
$arr->last_send_count = 0;
}
}
foreach ($analyticsData as $arr2) {
if ($arr->fullname == $arr2->fullname) {
$arr->current_send_count = $arr2->current_send_count;
break;
}else{
$arr->current_send_count = 0;
}
}
}
echo "<pre>";
print_r($analyticsData2);die;
I have a postgresql table with 2 columns:
code
pharm
The code column doesn not contain unique values. There are duplicates in it. What I want is to count these values as:
SELECT code, COUNT(code) FROM TABLE GROUP BY code ORDER BY 1
And use the COUNT result from the query to assign it in the PHARM column. So the final table should look like this:
CODE PHARM
AB 3
AB 3
AB 3
CD 2
CD 2
...
I tried to experiment with the UPDATE query as:
UPDATE TABLE SET (pharm) = (SELECT COUNT(code) FROM TABLE GROUP BY code)
However this doesnt work and I am quite sure is not the right way to do it. I guess I need to build some function to do this type of update?
You can do it with a join of the table to your query:
update tablename t
set pharm = g.counter
from (
select code, count(*) counter
from tablename
group by code
) g
where g.code = t.code;
See the demo.
or:
update tablename t
set pharm = (select count(*) from tablename where code = t.code);
See the demo.
or:
update tablename t
set pharm = (select count(*) filter (where code = t.code) from tablename);
See the demo.
I have a query like this:
SELECT
table1.*,
sum(table2.amount) as totalamount
FROM table1
join table2 on table1.key = table2.key
GROUP BY table1.*;
I got the error: column "table1.key" must appear in the GROUP BY clause or be used in an aggregate function.
Are there any way to group "all" field?
There is no shortcut syntax for grouping by all columns, but it's probably not necessary in the described case. If the key column is a primary key, it's enough when you use it:
GROUP BY table1.key;
You have to specify all the column names in group by that are selected and are not part of aggregate function ( SUM/COUNT etc)
select c1,c2,c4,sum(c3) FROM totalamount
group by c1,c2,c4;
A shortcut to avoid writing the columns again in group by would be to specify them as numbers.
select c1,c2,c4,sum(c3) FROM t
group by 1,2,3;
I found another way to solve, not perfect but maybe it's useful:
SELECT string_agg(column_name::character varying, ',') as columns
FROM information_schema.columns
WHERE table_schema = 'your_schema'
AND table_name = 'your_table
Then apply this select result to main query like this:
$columns = $result[0]["columns"];
SELECT
table1.*,
sum(table2.amount) as totalamount
FROM table1
join table2 on table1.key = table2.key
GROUP BY $columns;
I need to upload multiple excel files to a postgresql table but they can olverlap each other in several registers, therefore I need to be aware of IntegrityErrors. I'm following two approaches:
cursor.copy_from: The fastest approach but I don't know how to catch and control all Integrityerrors due to duplicate registers
streamCSV = StringIO()
streamCSV.write(invoicing_info.to_csv(index=None, header=None, sep=';'))
streamCSV.seek(0)
with conn.cursor() as c:
c.copy_from(streamCSV, "staging.table_name", columns=dataframe.columns, sep=';')
conn.commit()
cursor.execute: I can count and handle each exception but it is very
slow.
data = invoicing_info.to_dict(orient='records')
with cursor as c:
for entry in data:
try:
c.execute(DLL_INSERT, entry)
successful_inserts += 1
connection.commit()
print('Successful insert. Operation number {}'.format(successful_inserts))
except psycopg2.IntegrityError as duplicate:
duplicate_registers += 1
connection.rollback()
print('Duplicate entry. Operation number {}'.format(duplicate_registers))
At the end of the routine, I need to determine the following info:
print("Initial shape: {}".format(invoicing_info.shape))
print("Successful inserts: {}".format(successful_inserts))
print("Duplicate entries: {}".format(duplicate_registers))
How can I modify the first approach to control all exceptions? How can I optimize the second approach?
while you have duplicate IDs in different excel sheets you have to answer for yourself how you make a decision to data from which excel sheet to trust?
while you are using multiple tables, and will use approach to have at least one row from conflicting pair you can always do following:
create temporary tables for each excel sheet
upload data to each table for excel sheet (like you do now in a bulk)
make an insert from select picking distinct on(id), in a manner:
INSERT INTO staging.table_name(id, col1, col2 ...)
SELECT DISTINCT ON(id)
id, col1, col2
FROM
(
SELECT id, col1, col2 ...
FROM staging.temp_table_for_excel_sheet1
UNION
SELECT id, col1, col2 ...
FROM staging.temp_table_for_excel_sheet2
UNION
SELECT id, col1, col2 ...
FROM staging.temp_table_for_excel_sheet3
) as data
with such insert postgreSQL will take the random row out of non-unique id sets.
In case you would like to trust the first record you can add some order:
INSERT INTO staging.table_name(id, col1, col2 ...)
SELECT DISTINCT ON(id)
id, ordering_column col1, col2
FROM
(
SELECT id, 1 as ordering_column, col1, col2 ...
FROM staging.temp_table_for_excel_sheet1
UNION
SELECT id, 2 as ordering_column, col1, col2 ...
FROM staging.temp_table_for_excel_sheet2
UNION
SELECT id, 3 as ordering_column, col1, col2 ...
FROM staging.temp_table_for_excel_sheet3
) as data
ORDER BY ordering_column
for initial count of objects:
SELECT sum(count)
FROM
(
SELECT count(*) as count FROM temp_table_for_excel_sheet1
UNION
SELECT count(*) as count FROM temp_table_for_excel_sheet2
UNION
SELECT count(*) as count FROM temp_table_for_excel_sheet3
) as data
after finishing this bulk inserts you can run select count(*) FROM staging.table_name to get a result for total number of inserted records
for duplicate count you can run:
SELECT sum(count)
FROM
(
SELECT count(*) as count
FROM temp_table_for_excel_sheet2 WHERE id in (select id FROM temp_table_for_excel_sheet1 )
UNION
SELECT count(*) as count
FROM temp_table_for_excel_sheet3 WHERE id in (select id FROM temp_table_for_excel_sheet1 )
)
UNION
SELECT count(*) as count
FROM temp_table_for_excel_sheet3 WHERE id in (select id FROM temp_table_for_excel_sheet2 )
) as data
If the excel sheets contain duplicate records, Pandas seems a likely choice for identifying and eliminated dupes: https://33sticks.com/python-for-business-identifying-duplicate-data/. Or is the issue that different records in different sheets have the same id/index? If so, a similar approach could work where you use Pandas to isolate the ids used multiple times and then correct them with unique identifiers before attempting to upload to the SQL db.
For a bulk upload, I'd use an ORM. SQLAlchemy has some great info on bulk uploads: http://docs.sqlalchemy.org/en/rel_1_0/orm/persistence_techniques.html#bulk-operations, and there's a related discussion here: Bulk insert with SQLAlchemy ORM