I try to merge SDC IPO data to Compustat on cusip codes. SDC offers both cusip9 (with some missings) and cusip6 (no missings), and Compustat only cusip9. I substring Compustat to get "my own" cusip6 variable in Compustat. However, I would rather first merge on the given cusip9, and only in case there is a missing in cusip9, then merge on cusip6.
So far, my code is the standard merging code on a proc sql step:
proc sql;
create table sdc_comp
as select a.*, b.*
from compustat as a left join sdc as b
on a.cusip9 = b.cusip9
order by ipodate, gvkey, fyear;
quit;
Any suggestions? If a data step would do this better, no problem with that. Thanks in advance.
Try to use coalsece function. It will return first nonmissing value (it will work in your case with numeric values). Try to get cusip6 value then try to get cusip9 afterwards.
It will look like: SELECT COALESCE(cusip6, cusip9);
Related
I'm trying to use MERGE, in HANA, to insert, or update a table without a second table as a source. This must be done in one command, no store procedures. Also, UPSERT won't work in this case.
I found this answer for SQL, but HANA doesn't seem to like VALUES in the USING clause.
SQL Server MERGE without a source table
Here is the answer for SQL from the post above:
MERGE TARGET_TABLE AS I
USING (VALUES ('VALUE1','VALUE2')) as s(COL1,COL2)
ON I.COL1 = s.COL1
WHEN MATCHED THEN
...
WHEN NOT MATCHED THEN
...
Thanks.
The MERGE INTO command is specifically designed for ETL-type use cases where data from one table should be merged into another table and its data in there. A single tuple insert is possible via a subquery like so:
select * from t; -- single integer column 'C'
insert into t values (2);
c
-
2
merge command, inserting 4 or updating matches to 100
merge into t
using (select 4 c from dummy) s
on t.c = s.c
when matched then
update set t.c = 100
when not matched then
insert values (s.c);
c
--
2
4
run the merge command again
c
--
2
100
So, that works just fine.
As for the UPSERT/REPLACE command using VALUES is perfectly possible and even explained in the command examples in the reference documentation.
In my table results from column work_time (interval type) display as 200:00:00. Is it possible to cut the seconds part, so it will be displayed as 200:00? Or, even better: 200h00min (I've seen it accepts h unit in insert so why not load it like this?).
Preferably, by altering work_time column, not by changing the select query.
This is not something you should do by altering a column but by changing the select query in some way. If you change the column you are changing storage and functional uses, and that's not good. To change it on output, you need to modify how it is retrieved.
You have two basic options. The first is to modify your select queries directly, using to_char(myintervalcol, 'HH24:MI')
However if your issue is that you have a common format you want to have universal access to in your select query, PostgreSQL has a neat trick I call "table methods." You can attach a function to a table in such a way that you can call it in a similar (but not quite identical) syntax to a new column. In this case you would do something like:
CREATE OR REPLACE FUNCTION myinterval_nosecs(mytable) RETURNS text LANGUAGE SQL
IMMUTABLE AS
$$
SELECT to_char($1.myintervalcol, 'HH24:MI');
$$;
This works on the row input, not on the underlying table. As it always returns the same information for the same input, you can mark it immutable and even index the output (meaning it can be run at plan time and indexed used).
To call this, you'd do something like:
SELECT myinterval_nosecs(m) FROM mytable m;
But you can then use the special syntax above to rewrite that as:
SELECT m.myinterval_nosecs FROM mytable m;
Note that since myinterval_nosecs is a function you cannot omit the m. at the beginning. This is because the query planner will rewrite the query in the former syntax and will not guess as to which relation you mean to run it against.
I need to execute statements conditionally in DB2. I searched for DB2 documentation and though if..then..elseif will serves the purpose. But can't i use if without a procedure.?
My DB2 verion is 9.7.6.
My requirement is I have a table say Group(name,gp_id). And I have another table Group_attr(gp_id,value,elem_id). We can ignore ant the elem_id for the requirement now.
-> I need to check the Group if it have a specific name.
-> If it has then nothing to be done.
-> If it doesn't have I need to add it to the Group. Then I need to insert corresponding rows in the Group_attr. Assume the value and elem_id are static.
You can use an anonymous block for PL/SQL or a compound statement for SQL PL code.
BEGIN ATOMIC
FOR ROW AS
SELECT PK, C1, DISCRETIZE(C1) AS D FROM SOURCE
DO
IF ROW.D IS NULL THEN
INSERT INTO EXCEPT VALUES(ROW.PK, ROW.C1);
ELSE
INSERT INTO TARGET VALUES(ROW.PK, ROW.D);
END IF;
END FOR;
END
Compound statements :
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0004240.html
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.apdv.sqlpl.doc/doc/c0053781.html
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0004239.html
Anonymous block :
http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/topic/com.ibm.db2.luw.apdv.plsql.doc/doc/c0053848.html
http://www.ibm.com/developerworks/data/library/techarticle/dm-0908anonymousblocks/
Many ot this features come since version 9.7
I got a solution for the conditional insert. For the scenario that I mentioned, the solution can be like this.
Insert into Group(name) select 'Name1' from sysibm.sysdummy1 where (select count(*) from Group where name='Name1')=0
Insert into Group_attr(gp_id,value,elem_id) select g.gp_id,'value1','elem1' Group g,group_attr ga where ga.gp_id=g.gp_id and (select count(*) from Group_attr Ga1 where Ga.gp_id=g.gp_id)=0
-- In my case Group_attr will contain some data for sure if the group has exists already
Is it possible to declare a variable within a View? For example:
Declare #SomeVar varchar(8) = 'something'
gives me the syntax error:
Incorrect syntax near the keyword 'Declare'.
You are correct. Local variables are not allowed in a VIEW.
You can set a local variable in a table valued function, which returns a result set (like a view does.)
http://msdn.microsoft.com/en-us/library/ms191165.aspx
e.g.
CREATE FUNCTION dbo.udf_foo()
RETURNS #ret TABLE (col INT)
AS
BEGIN
DECLARE #myvar INT;
SELECT #myvar = 1;
INSERT INTO #ret SELECT #myvar;
RETURN;
END;
GO
SELECT * FROM dbo.udf_foo();
GO
You could use WITH to define your expressions. Then do a simple Sub-SELECT to access those definitions.
CREATE VIEW MyView
AS
WITH MyVars (SomeVar, Var2)
AS (
SELECT
'something' AS 'SomeVar',
123 AS 'Var2'
)
SELECT *
FROM MyTable
WHERE x = (SELECT SomeVar FROM MyVars)
EDIT: I tried using a CTE on my previous answer which was incorrect, as pointed out by #bummi. This option should work instead:
Here's one option using a CROSS APPLY, to kind of work around this problem:
SELECT st.Value, Constants.CONSTANT_ONE, Constants.CONSTANT_TWO
FROM SomeTable st
CROSS APPLY (
SELECT 'Value1' AS CONSTANT_ONE,
'Value2' AS CONSTANT_TWO
) Constants
#datenstation had the correct concept. Here is a working example that uses CTE to cache variable's names:
CREATE VIEW vwImportant_Users AS
WITH params AS (
SELECT
varType='%Admin%',
varMinStatus=1)
SELECT status, name
FROM sys.sysusers, params
WHERE status > varMinStatus OR name LIKE varType
SELECT * FROM vwImportant_Users
also via JOIN
WITH params AS ( SELECT varType='%Admin%', varMinStatus=1)
SELECT status, name
FROM sys.sysusers INNER JOIN params ON 1=1
WHERE status > varMinStatus OR name LIKE varType
also via CROSS APPLY
WITH params AS ( SELECT varType='%Admin%', varMinStatus=1)
SELECT status, name
FROM sys.sysusers CROSS APPLY params
WHERE status > varMinStatus OR name LIKE varType
Yes this is correct, you can't have variables in views
(there are other restrictions too).
Views can be used for cases where the result can be replaced with a select statement.
Using functions as spencer7593 mentioned is a correct approach for dynamic data. For static data, a more performant approach which is consistent with SQL data design (versus the anti-pattern of writting massive procedural code in sprocs) is to create a separate table with the static values and join to it. This is extremely beneficial from a performace perspective since the SQL Engine can build effective execution plans around a JOIN, and you have the potential to add indexes as well if needed.
The disadvantage of using functions (or any inline calculated values) is the callout happens for every potential row returned, which is costly. Why? Because SQL has to first create a full dataset with the calculated values and then apply the WHERE clause to that dataset.
Nine times out of ten you should not need dynamically calculated cell values in your queries. Its much better to figure out what you will need, then design a data model that supports it, and populate that data model with semi-dynamic data (via batch jobs for instance) and use the SQL Engine to do the heavy lifting via standard SQL.
What I do is create a view that performs the same select as the table variable and link that view into the second view. So a view can select from another view. This achieves the same result
How often do you need to refresh the view? I have a similar case where the new data comes once a month; then I have to load it, and during the loading processes I have to create new tables. At that moment I alter my view to consider the changes.
I used as base the information in this other question:
Create View Dynamically & synonyms
In there, it is proposed to do it 2 ways:
using synonyms.
Using dynamic SQL to create view (this is what helped me achieve my result).
I have a complex query that requires a rank in it. I've learned that the standard way of doing that is by using the technique found on this page: http://thinkdiff.net/mysql/how-to-get-rank-using-mysql-query/. I'm using Infobright as the back end and it doesn't work quite as expected. That is, while a standard MySQL engine would show the rank as 1, 2, 3, 4, etc... Brighthouse (Infobright's engine) would return 1, 1, 1, 1, etc.... So I came up with a strategy of setting a variable, a function, and then execute it in the query. Here's a proof of concept query that does just that:
SET #rank = 0;
DROP FUNCTION IF EXISTS __GetRank;
DELIMITER $$
CREATE FUNCTION __GetRank() RETURNS INT
BEGIN
SET #rank = #rank + 1;
return #rank;
END$$
DELIMITER ;
select __GetRank() AS rank, id from accounts;
I then copied and pasted the function into Jasper Report's iReport and then compiled my report. After executing it, I get syntax errors. So I thought that perhaps the ; was throwing it off. So at the top of the query, I put in DELIMITER ;. This did not work either.
Is what I'm wanting to do even possible? If so, how? And if there's an Infobright way of getting a rank without writing a function, I'd be open to that too.
Infobright does not support functions.
From the site: http://www.infobright.org/forums/viewthread/1871/#7485
Indeed, IB supports stored procedures, but does not support stored functions nor user defined functions.
select if(#rank is null,#rank:= 0,#rank:= #rank +1) as rank, id from accounts
Does not work, because you cannot write to #vars in queries.
This:
SELECT
(SELECT COUNT(*)
FROM mytable t1
WHERE t1.rankedcolumn > t2.rankedcolumn) AS rank,
t2.rankedcolumn
FROM mytable t2 WHERE ...;
will work, but is very slow of course.
Disclaimer, not my code, but Jakub Wroblewski's (Infobright founder)
Hope this helps...
Here's how I solved this. I had my server side program execute a mysql script. I then took the output and converted it to a CSV. I then used this as the input data for my report. A little convoluted, but it works.