DB2 to Netezza Migration

DB2 to Netezza Migration - db2

I have one query in DB2 which has mentioned below.
What would be the syntax for the same in NETEZZA?
select distinct acct_num from GTD_demo_dim where ACCT_NUM fetch first 1 rows only);

First, I don't think your statement is valid.
select distinct acct_num from GTD_demo_dim where ACCT_NUM fetch first 1 rows only);
The where clause needs to be finished and you've used a closing parenthesis without an opening one.
fetch first is common (standard?) ODBC syntax, so it's very likely that this will work. However, the usual way to do this in netezza is using a limit. All that said, this is how I'd query and expect the intended result (omitting your where since I can't infer the intent):
select distinct acct_num from gtd_demo_dim limit 1;

Related

Pivot function without manually typing values in `for in`?

Documentation provides an example of using the pivot() function.
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN ('prop', 'rudder', 'wing')
);
I would like to use pivot() without having to manually specify each value of partname. I want all parts. I tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname);
That gave an error. Then tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN (select distinct partname from part)
);
That also threw an error.
How can I tell Redshift to include all values of partname in the pivot?

I don't think this can be done in a simple single query. This would mean that the query compiler would need to work without knowing how many output columns will be produced. I don't think it can do that.
You can do this in multiple queries - use a query to create the list of partnames and then use this to "generate" a second query that populates the IN list. So something needs issue these queries and generated the second. This can be some code external to Redshift (lots of options) or a stored procedure in Redshift. This code, no matter where it exists, should understand that Redshift has a max number of columns limit - 1,600.
The Redshift docs are fairly good on the topic of dynamic SQL for stored procedures. The EXECUTE statement will be used to fire off the second query in a stored procedure. See: https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-statements.html

PostgreSQL -must appear in the GROUP BY clause or be used in an aggregate function

I am getting this error in the pg production mode, but its working fine in sqlite3 development mode.
ActiveRecord::StatementInvalid in ManagementController#index
PG::Error: ERROR: column "estates.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT "estates".* FROM "estates" WHERE "estates"."Mgmt" = ...
^
: SELECT "estates".* FROM "estates" WHERE "estates"."Mgmt" = 'Mazzey' GROUP BY user_id
#myestate = Estate.where(:Mgmt => current_user.Company).group(:user_id).all

If user_id is the PRIMARY KEY then you need to upgrade PostgreSQL; newer versions will correctly handle grouping by the primary key.
If user_id is neither unique nor the primary key for the 'estates' relation in question, then this query doesn't make much sense, since PostgreSQL has no way to know which value to return for each column of estates where multiple rows share the same user_id. You must use an aggregate function that expresses what you want, like min, max, avg, string_agg, array_agg, etc or add the column(s) of interest to the GROUP BY.
Alternately you can rephrase the query to use DISTINCT ON and an ORDER BY if you really do want to pick a somewhat arbitrary row, though I really doubt it's possible to express that via ActiveRecord.
Some databases - including SQLite and MySQL - will just pick an arbitrary row. This is considered incorrect and unsafe by the PostgreSQL team, so PostgreSQL follows the SQL standard and considers such queries to be errors.
If you have:
col1 col2
fred 42
bob 9
fred 44
fred 99
and you do:
SELECT col1, col2 FROM mytable GROUP BY col1;
then it's obvious that you should get the row:
bob 9
but what about the result for fred? There is no single correct answer to pick, so the database will refuse to execute such unsafe queries. If you wanted the greatest col2 for any col1 you'd use the max aggregate:
SELECT col1, max(col2) AS max_col2 FROM mytable GROUP BY col1;

I recently moved from MySQL to PostgreSQL and encountered the same issue. Just for reference, the best approach I've found is to use DISTINCT ON as suggested in this SO answer:
Elegant PostgreSQL Group by for Ruby on Rails / ActiveRecord
This will let you get one record for each unique value in your chosen column that matches the other query conditions:
MyModel.where(:some_col => value).select("DISTINCT ON (unique_col) *")
I prefer DISTINCT ON because I can still get all the other column values in the row. DISTINCT alone will only return the value of that specific column.

After often receiving the error myself I realised that Rails (I am using rails 4) automatically adds an 'order by id' at the end of your grouping query. This often results in the error above. So make sure you append your own .order(:group_by_column) at the end of your Rails query. Hence you will have something like this:
#problems = Problem.select('problems.username, sum(problems.weight) as weight_sum').group('problems.username').order('problems.username')

#myestate1 = Estate.where(:Mgmt => current_user.Company)
#myestate = #myestate1.select("DISTINCT(user_id)")
this is what I did.

Outputting results from multiple sql queries in postgresql

I have postgresql-9.2 installed on my local machine (running windows 7) and I am also the administrator. I am using the Query Tool of pgAdmin III to query my database. My problem is as follows:
Say I have two tables Table_A and Table_B with different number of columns. Also, say I have following two very simple queries:
select * from Table_A;
select * from Table_B;
I want to run both these queries and see the output from both of them together. I dont mind if I see the output in the GUI or in a file.
I also tried the copy command and outputting to a csv. But instead of appending to the file it overwrites it. So, I always end up with the results from query 2 only. The same thing happens with the GUI.
It is really annoying to comment one query, run the another, output to two different files and then merge those two files together.

This is not currently supported by PostgreSQL - from the docs
(http://www.postgresql.org/docs/9.4/interactive/libpq-exec.html):
The command string can include multiple SQL commands (separated by semicolons). Multiple queries sent in a single PQexec call are processed in a single transaction, unless there are explicit BEGIN/COMMIT commands included in the query string to divide it into multiple transactions. Note however that the returned PGresult structure describes only the result of the last command executed from the string. Should one of the commands fail, processing of the string stops with it and the returned PGresult describes the error condition.

Your problem does not depend on the client.
Assuming all columns to be of type text, try this query:
SELECT col_a AS col_ac, col_b AS col_bd
,NULL::text AS col_e, NULL::text AS col_f
FROM table_a
UNION ALL
SELECT col_c, col_d, col_e, col_f
FROM table_b;
Column names and data tapes are defined by the first branch of a UNION SELECT. The rest has to fall in line.

The PSQL tool in the top menu under TOOLS (pgadmin4) gives results of multiple queries, unlike the query tool. In the PSQL command line tool, you can enter two or more queries separated by a semicolon and you'll get the results of each query displayed. The downside is that this is a command line tool so the results are not ideal if you have a lot of data. I use this when I have a lot of updates to string together and I want to see the number of rows updated in each. This would work well for select queries with small results.
psql tool

You can use UNION ALL, but you need to make sure each sub query has the same number of columns.
SELECT 'a', 'b'
UNION ALL
SELECT 'c' ;
won't work.
SELECT 'a', 'b'
UNION ALL
SELECT 'c', 'd'
will work

Proper GROUP BY syntax

I'm fairly proficient in mySQL and MSSQL, but I'm just getting started with postgres. I'm sure this is a simple issue, so to be brief:
SQL error:
ERROR: column "incidents.open_date" must appear in the GROUP BY clause or be used in an aggregate function
In statement:
SELECT date(open_date), COUNT(*)
FROM incidents
GROUP BY 1
ORDER BY open_date
The type for open_date is timestamp with time zone, and I get the same results if I use GROUP BY date(open_date).
I've tried going over the postgres docs and some examples online, but everything seems to indicate that this should be valid.

The problem is with the unadorned open_date in the ORDER BY clause.
This should do it:
SELECT date(open_date), COUNT(*)
FROM incidents
GROUP BY date(open_date)
ORDER BY date(open_date);
This would also work (though I prefer not to use integers to refer to columns for maintenance reasons):
SELECT date(open_date), COUNT(*)
FROM incidents
GROUP BY 1
ORDER BY 1;

"open_date" is not in your select list, "date(open_date)" is.
Either of these will work:
order by date(open_date)
order by 1
You can also name your columns in the select statement, and then refer to that alias:
select date(open_date) "alias" ... order by alias
Some databases require the keyword, AS, before the alias in your select.

Cannot sort a row of size 8130, which is greater than the allowable maximum of 8094

SELECT DISTINCT tblJobReq.JobReqId
, tblJobReq.JobStatusId
, tblJobClass.JobClassId
, tblJobClass.Title
, tblJobReq.JobClassSubTitle
, tblJobAnnouncement.JobClassDesc
, tblJobAnnouncement.EndDate
, blJobAnnouncement.AgencyMktgVerbage
, tblJobAnnouncement.SpecInfo
, tblJobAnnouncement.Benefits
, tblSalary.MinRateSal
, tblSalary.MaxRateSal
, tblSalary.MinRateHour
, tblSalary.MaxRateHour
, tblJobClass.StatementEval
, tblJobReq.ApprovalDate
, tblJobReq.RecruiterId
, tblJobReq.AgencyId
FROM ((tblJobReq
LEFT JOIN tblJobAnnouncement ON tblJobReq.JobReqId = tblJobAnnouncement.JobReqId)
INNER JOIN tblJobClass ON tblJobReq.JobClassId = tblJobClass.JobClassId)
LEFT JOIN tblSalary ON tblJobClass.SalaryCode = tblSalary.SalaryCode
WHERE (tblJobReq.JobClassId in (SELECT JobClassId
from tblJobClass
WHERE tblJobClass.Title like '%Family Therapist%'))
When i try to execute the query it results in the following error.
Cannot sort a row of size 8130, which is greater than the allowable maximum of 8094
I checked and didn't find any solution. The only way is to truncate (substring())the "tblJobAnnouncement.JobClassDesc" in the query which has column size of around 8000.
Do we have any work around so that i need not truncate the values. Or Can this query be optimised? Any setting in SQL Server 2000?

The [non obvious] reason why SQL needs to SORT is the DISTINCT keyword.
Depending on the data and underlying table structures, you may be able to do away with this DISTINCT, and hence not trigger this error.
You readily found the alternative solution which is to truncate some of the fields in the SELECT list.
Edit: Answering "Can you please explain how DISTINCT would be the reason here?"
Generally, the fashion in which the DISTINCT requirement is satisfied varies with
the data context (expected number of rows, presence/absence of index, size of row...)
the version/make of the SQL implementation (the query optimizer in particular receives new or modified heuristics with each new version, sometimes resulting in alternate query plans for various constructs in various contexts)
Yet, all the possible plans associated with a "DISTINCT query" involve *some form* of sorting of the qualifying records. In its simplest form, the plan "fist" produces the list of qualifying rows (records) (the list of records which satisfy the WHERE/JOINs/etc. parts of the query) and then sorts this list (which possibly includes some duplicates), only retaining the very first occurrence of each distinct row. In other cases, for example when only a few columns are selected and when some index(es) covering these columns is(are) available, no explicit sorting step is used in the query plan but the reliance on an index implicitly implies the "sortability" of the underlying columns. In other cases yet, steps involving various forms of merging or hashing are selected by the query optimizer, and these too, eventually, imply the ability of comparing two rows.
Bottom line: DISTINCT implies some sorting.
In the specific case of the question, the error reported by SQL Server and preventing the completion of the query is that "Sorting is not possible on rows bigger than..." AND, the DISTINCT keyword is the only apparent reason for the query to require any sorting (BTW many other SQL constructs imply sorting: for example UNION) hence the idea of removing the DISTINCT (if it is logically possible).
In fact you should remove it, for test purposes, to assert that, without DISTINCT, the query completes OK (if only including some duplicates). Once this fact is confirmed, and if effectively the query could produce duplicate rows, look into ways of producing a duplicate-free query without the DISTINCT keyword; constructs involving subqueries can sometimes be used for this purpose.
An unrelated hint, is to use table aliases, using a short string to avoid repeating these long table names. For example (only did a few tables, but you get the idea...)
SELECT DISTINCT JR.JobReqId, JR.JobStatusId,
tblJobClass.JobClassId, tblJobClass.Title,
JR.JobClassSubTitle, JA.JobClassDesc, JA.EndDate, JA.AgencyMktgVerbage,
JA.SpecInfo, JA.Benefits,
S.MinRateSal, S.MaxRateSal, S.MinRateHour, S.MaxRateHour,
tblJobClass.StatementEval,
JR.ApprovalDate, JR.RecruiterId, JR.AgencyId
FROM (
(tblJobReq AS JR
LEFT JOIN tblJobAnnouncement AS JA ON JR.JobReqId = JA.JobReqId)
INNER JOIN tblJobClass ON tblJobReq.JobClassId = tblJobClass.JobClassId)
LEFT JOIN tblSalary AS S ON tblJobClass.SalaryCode = S.SalaryCode
WHERE (JR.JobClassId in
(SELECT JobClassId from tblJobClass
WHERE tblJobClass.Title like '%Family Therapist%'))

FYI, running this SQL command on your DB can fix the problem if it is caused by space that needs to be reclaimed after dropping variable length columns:
DBCC CLEANTABLE (0,[dbo.TableName])
See: http://msdn.microsoft.com/en-us/library/ms174418.aspx

This is a limitation of SQL Server 2000. You can:
Split it into two queries and combine elsewhere
SELECT ID, ColumnA, ColumnB FROM TableA JOIN TableB
SELECT ID, ColumnC, ColumnD FROM TableA JOIN TableB
Truncate the columns appropriately
SELECT LEFT(LongColumn,2000)...
Remove any redundant columns from the SELECT
SELECT ColumnA, ColumnB, --IDColumnNotUsedInOutput
FROM TableA
Migrate off of SQL Server 2000

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

DB2 to Netezza Migration - db2

I have one query in DB2 which has mentioned below. What would be the syntax for the same in NETEZZA? select distinct acct_num from GTD_demo_dim where ACCT_NUM fetch first 1 rows only);

Related

Pivot function without manually typing values in `for in`?

PostgreSQL -must appear in the GROUP BY clause or be used in an aggregate function

Outputting results from multiple sql queries in postgresql

Proper GROUP BY syntax

Cannot sort a row of size 8130, which is greater than the allowable maximum of 8094

Categories

Resources