If you query tables from dbc.columns, you will get full metadata for every column, especially data type, length, nullable/non-nullable, etc. When querying views, you only get the database, table, and column names. All the other fields are null.
If I have a view that's doing nothing but select * from table, it seems that the underlying table's metadata would propagate to the view when it's compiled to database objects. This makes sense even for calculated columns since my experiments have shown that Teradata analyzes all possible logic paths to determine a calculated column's type. Here's an example:
replace view mydb.testview as
select case when 1 = 1 then 'a' else 'aaaa' end a;
create table mydb.testviewtotable as (select * from mydb.testview) with data;
show table mydb.testviewtotable;
In that case statement, only the first condition will ever return true, so the result will always be 'a'. However, when you look at the table DDL, you can see that it calculates the column as VARCHAR(4) which proves that it analyzes all cases:
a VARCHAR(4) CHARACTER SET UNICODE NOT CASESPECIFIC
Therefore, it seems reasonable to assume that this view metadata exists somewhere even though querying that view through DBC results in nulls for all but the aforementioned columns.
Related
I have two tables (it is example, of course) that I loaded to app from different sources by script
Table 1:
ID
Attribute T1
1
100
3
200
Table 2:
ID
Attribute T2
1
Value 1
2
Value 2
On a list I create table:
ID
Attribute T1
Attribute T2
Finally I have table
ID
Attribute T1
Attribute T2
1
100
Value 1
2
-
Value 2
3
200
-
So, as You know it limits me in filtering and analyzing data, for example I can't show all data that isn't represented in Table 1, or all data for Attribute T1 not equal 100.
I try to use NullAsValue, but it didn't help. Would be appreciate for idea how to manage my case.
To achieve what you're attempting, you'll need to Join or Concatenate your tables. The reason is because Null means something different depending on how the data is loaded.
There's basically two "types" of Null:
"Implied" Null
When you associate several tables in your data model, as you've done in your example, Qlik is essentially treating that as a natural outer join between the tables. But since it's not an actual join that happens when the script executes, the Nulls that arise from data incongruencies (like in your example) are basically implied, since there really is an absence of data there. There's nothing in the data or script that actually says "there are no Attribute T1 values for ID of 2." Because of that, you can't use a function like NullAsValue() or Coalesce() to replace Nulls with another value because those Nulls aren't even there -- there's nothing to actually replace.
The above tables don't have any actual Nulls -- just implied ones from their association and the fact that the ID fields in either table don't have all the same values.
"Realized" Null
If, instead of just using associations, you actually combine the tables using the Join or Concatenate prefixes, then Qlik is forced to actually generate a Null value in the absence of data. Instead of Null being implied, it's actually there in the data model -- it's been realized. In this case, we can actually use functions like NullAsValue() or Coalesce() or Alt() to replace Nulls with another value since we actually have something in our table to replace.
The above joined table has actual Nulls that are realized in the data model, so they can be replaced.
To replace Nulls at that point, you can use the NullAsValue() or Coalesce() functions like this in the Data Load Editor:
table1:
load * inline [
ID , Attribute T1
1 , 100
3 , 200
];
table2:
join load * inline [
ID , Attribute T2
1 , Value 1
2 , Value 2
];
NullAsValue [Attribute T1];
Set NullValue = '-NULL-';
new_table:
NoConcatenate load
ID
, [Attribute T1]
, Coalesce([Attribute T2], '-AlsoNULL-') as [Attribute T2]
Resident table1;
Drop Table table1;
That will result in a table like this:
The Coalesce() and Alt() functions are also available in chart expressions.
Here are some quick links to the things discussed here:
Qlik Null interpretation
Qlik table associations
NullAsValue() function
Coalesce() function
Alt() function
Background
I have a table with raster data (grib_data) created by using raster2pgsql.
I have created a second table (turb_mod) with a subset of the points in grib_data that has a value above a certain threshold.
This subset table (turb_mod) has been created with the following query
WITH turb AS (SELECT rid, rast, (ST_PixelAsPoints(rast)).val AS val
FROM grib_data
)
SELECT rid, rast INTO turb_mod
FROM turb WHERE val > 0.5;
The response when creating the table is "SELECT 53" indicating that the table turb_mod would now hold 53 rows
Problem
If I now try to return the raster data from turb_mod using the below query it returns all records from the original table, not the 53 that I am expecting
SELECT (ST_PixelAsPoints(rast)).x AS x FROM turb_mod;
Questions
Why does my query not return only the 53 records?
Is there a better way to create a table with a selection of raster points from the original table? I want to use the subset to apply further geospatial functions like spatial clustering.
In your final SELECT, you're calling the function ST_PixelAsPoints, which is a set-returning function. This results in an output row [being] generated for each element of the function's result set (reference), and can thus result in a different row count to that of your source table, turb_mod.
Your query is functionally equivalent to this (preferred) syntax:
SELECT points.x
FROM
turb_mod
JOIN LATERAL ST_PixelAsPoints(rast) points ON TRUE;
This syntax better shows what's happening, and also shows how you might choose to include more columns from the function's output, which may help to answer your second point.
I am fairly new to DB2 (and SQL in general) and I am having trouble finding an efficient method to DECODE columns
Currently, the database has a number of tables most of which have a significant number of their columns as numbers, these numbers correspond to a table with the real values. We are talking 9,500 different values (e.g '502=yes' or '1413= Graduate Student')
In any situation, I would just do WHERE clause and show where they are equal, but since there are 20-30 columns that need to be decoded per table, I can't really do this (that I know of).
Is there a way to effectively just display the corresponding value from the other table?
Example:
SELECT TEST_ID, DECODE(TEST_STATUS, 5111, 'Approved, 5112, 'In Progress') TEST_STATUS
FROM TEST_TABLE
The above works fine.......but I manually look up the numbers and review them to build the statements. As I mentioned, some tables have 20-30 columns that would need this AND some need DECODE statements that would be 12-15 conditions.
Is there anything that would allow me to do something simpler like:
SELECT TEST_ID, DECODE(TEST_STATUS = *TableWithCodeValues*) TEST_STATUS
FROM TEST_TABLE
EDIT: Also, to be more clear, I know I can do a ton of INNER JOINS, but I wasn't sure if there was a more efficient way than that.
From a logical point of view, I would consider splitting the lookup table into several domain/dimension tables. Not sure if that is possible to do for you, so I'll leave that part.
As mentioned in my comment I would stay away from using DECODE as described in your post. I would start by doing it as usual joins:
SELECT a.TEST_STATUS
, b.TEST_STATUS_DESCRIPTION
, a.ANOTHER_STATUS
, c.ANOTHER_STATUS_DESCRIPTION
, ...
FROM TEST_TABLE as a
JOIN TEST_STATUS_TABLE as b
ON a.TEST_STATUS = b.TEST_STATUS
JOIN ANOTHER_STATUS_TABLE as c
ON a.ANOTHER_STATUS = c.ANOTHER_STATUS
JOIN ...
If things are too slow there are a couple of things you can try:
Create a statistical view that can help determine cardinalities from the joins (may help the optimizer creating a better plan):
https://www.ibm.com/support/knowledgecenter/sl/SSEPGG_9.7.0/com.ibm.db2.luw.admin.perf.doc/doc/c0021713.html
If your license admits you can experiment with Materialized Query Tables (MQT). Note that there is a penalty for modifications of the base tables, so if you have more of a OLTP workload, this is probably not a good idea:
https://www.ibm.com/developerworks/data/library/techarticle/dm-0509melnyk/index.html
A third option if your lookup table is fairly static is to cache the lookup table in the application. Read the TEST_TABLE from the database, and lookup descriptions in the application. Further improvements may be to add triggers that invalidate the cache when lookup table is modified.
If you don't want to do all these joins you could create yourself an own LOOKUP function.
create or replace function lookup(IN_ID INTEGER)
returns varchar(32)
deterministic reads sql data
begin atomic
declare OUT_TEXT varchar(32);--
set OUT_TEXT=(select text from test.lookup where id=IN_ID);--
return OUT_TEXT;--
end;
With a table TEST.LOOKUP like
create table test.lookup(id integer, text varchar(32))
containing some id/text pairs this will return the text value corrseponding to an id .. if not found NULL.
With your mentioned 10k id/text pairs and an index on the ID field this shouldn't be a performance issue as such data amount should be easily be cached in the corresponding bufferpool.
As of some quirks in our DB model I am faced with a table that optionally links to itself. I want to write a query that selects each row in a way that either the original row is returned or - if present - the linked row.
SELECT
COALESCE(r2.*, r1.*)
FROM mytable r1
LEFT JOIN mytable r2
ON r1.sub_id = r2.id
While this works, all data is returned in one column 'COALESCE' as tuples instead of the actual table columns.
How can I unpack those tuples to get the actual table rows or 'fix' the query to avoid it altogether?
Suppose I have a table with a column that has repeats, e.g.
Column1
---------
a
a
a
a
b
a
c
d
e
... so on
Maybe it has hundreds of thousands of rows. Then say, I need to pull the distinct values from this column. I can do so easily in a SELECT with DISTINCT, but I'm wondering on performance?
I could also give each item in Column1 an id, and then create a new table referenced by Column1 (to normalize this more appropriately). Though, this adds extra complexity to making an insert, and adds in joins for other possible queries.
Is there some way to index just the distinct values in a column, or is the normalization thing the only way to go?
Index on column1 will considerably speed up processing of distinct, but if you are willing to trade some space and some (short) time during insert/update/delete, you can resort to materialized view. This is indexed view you might consider as dynamic table produced and maintained following view definition.
create view view1
with schemabinding
as
select column1,
count_big(*) cnt
from theTable
group by column1
-- create unique clustered index ix_view1 on view1(column1)
(Do not forget to execute commented create index command. I usually do it this way so that view definition contains index definition, reminding me to apply it if I need to change the view.)
When you want to use it be sure to add noexpand hint to force use of materialized data (this part will remain mistery to me - something created as performance enhancement is not turned on by default, but rather activated on spot).
select *
from view1 (noexpand)