Fill empty rows of set with non empty value - amazon-redshift

Note: I've already gone over related questions like following that don't address my query
SQL: how to pick one row for each set of rows with duplicate value in one column?
Fill missing values with first non-null following value in Redshift
I have a sparse, unclean dataset like this
| id | operation | title | channel_type | mode |
|-----|-----------|----------|--------------|------|
| abc | Start | | | |
| abc | Start | recovery | | Link |
| abc | Start | recovery | SMS | |
| abc | Set | | Email | |
| abc | Verify | | Email | |
| pqr | Start | | | OTP |
| pqr | Verfiy | sign_in | Push | |
| pqr | Verify | | | |
| xyz | Start | sign_up | | Link |
and I need to fill up empty rows of each id with non-empty data available from other rows
| id | operation | title | channel_type | mode |
|-----|-----------|----------|--------------|------|
| abc | Start | recovery | SMS | Link |
| abc | Start | recovery | SMS | Link |
| abc | Start | recovery | SMS | Link |
| abc | Set | recovery | Email | Link |
| abc | Verify | recovery | Email | Link |
| pqr | Start | sign_in | Push | OTP |
| pqr | Verfiy | sign_in | Push | OTP |
| pqr | Verify | sign_in | Push | OTP |
| xyz | Start | sign_up | | Link |
notes
some ids can have a certain field as empty in all rows
and while most ids will have same non-empty values for each field, edge cases could have different values. For such groups, filling up any non-empty value in all rows is acceptable. [this is too rare in my dataset and can be ignored]
another extra bit of pattern is that certain fields are mostly only present only against rows of certain operations, for e.g. mode is only present against operation='Start' rows
I've tried grouping rows by id while performing listagg over title, channel_type and mode columns, followed by coalesce, something along the lines of this:
WITH my_data AS (
SELECT
id,
operation,
title,
channel_type,
mode
FROM
my_db.my_table
),
list_aggregated_data AS (
SELECT
id,
listagg(title) AS titles,
listagg(channel_type) AS channel_types,
listagg(mode) AS modes
FROM
my_data
GROUP BY
id
),
coalesced_data AS (
SELECT DISTINCT
id,
coalesce(titles) AS title,
coalesce(channel_types) AS channel_type,
coalesce(modes) AS mode
FROM
list_aggregated_data
),
joined_data AS (
SELECT
md.id,
md.operation,
cd.title,
cd.channel_type,
cd.mode
FROM
my_data AS md
LEFT JOIN
coalesced_data AS cd ON cd.id = md.id
)
SELECT
*
FROM
joined_data
ORDER BY
id,
operation
But for some reason this is resulting in concatenation of values (presumably from coalesce operation), where I get
| id | operation | title | channel_type | mode |
|-----|-----------|------------------|--------------|------|
| abc | Start | recoveryrecovery | SMS | Link |
| abc | Start | recoveryrecovery | SMS | Link |
| abc | Start | recoveryrecovery | SMS | Link |
| abc | Set | recoveryrecovery | Email | Link |
| abc | Verify | recoveryrecovery | Email | Link |
| pqr | Start | sign_in | Push | OTP |
| pqr | Verfiy | sign_in | Push | OTP |
| pqr | Verify | sign_in | Push | OTP |
| xyz | Start | sign_up | | Link |
What's the correct way to approach this problem?

I'd start with the first_value() window function with the ignore nulls option. You will partition by the first 2 columns and will need to work out the edge cases with some data massaging, likely in the order by clause of the window function.

Related

T-SQL : Pivot table without aggregate

I am trying to understand how to pivot data within T-SQL but can't seem to get it working. I have the following table structure
+-------------------+-----------------------+
| Name | Value |
+-------------------+-----------------------+
| TaskId | 12417 |
| TaskUid | XX00044497 |
| TaskDefId | 23 |
| TaskStatusId | 4 |
| Notes | |
| TaskActivityIndex | 0 |
| ModifiedBy | Orange |
| Modified | /Date(1554540200000)/ |
| CreatedBy | Apple |
| Created | /Date(2121212100000)/ |
| TaskPriorityId | 40 |
| OId | 2 |
+-------------------+-----------------------+
I want to pivot the name column to be columns expected output
+--------+------------------------+-----------+--------------+-------+-------------------+------------+-----------------------+-----------+-----------------------+----------------+-----+
| TASKID | TASKUID | TASKDEFID | TASKSTATUSID | NOTES | TASKACTIVITYINDEX | MODIFIEDBY | MODIFIED | CREATEDBY | CREATED | TASKPRIORITYID | OID |
+--------+------------------------+-----------+--------------+-------+-------------------+------------+-----------------------+-----------+-----------------------+----------------+-----+
| | | | | | | | | | | | |
| 12417 | XX00044497 | 23 | 4 | | 0 | Orange | /Date(1554540200000)/ | Apple | /Date(2121212100000)/ | 40 | 2 |
+--------+------------------------+-----------+--------------+-------+-------------------+------------+-----------------------+-----------+-----------------------+----------------+-----+
Is there an easy way of doing it? The columns are fixed (not dynamic).
Any help appreciated
Try this:
select * from yourtable
pivot
(
min(value)
for Name in ([TaskID],[TaskUID],[TaskDefID]......)
) as pivotable
You can also use case statements.
You must use the aggregate function in the pivot table.
If you want to learn more, here is the reference:
https://learn.microsoft.com/en-us/sql/t-sql/queries/from-using-pivot-and-unpivot?view=sql-server-2017
Output (I only tried three columns):
DB<>Fiddle

in postgresql how to get the last 4 numbers from a field and copy it to a new field

I'm trying to get the last four digits of the field "SERIAL8" and put that in a new field called "SS4". Here is the query I'm trying to use but it isn't working. I'm new at this, so any help would be appreciated
SELECT * FROM CUSTOMER_TABLE
SUBSTRING (SERIAL,4,4) as 'SS4'
CUSTOMER_TABLE
+-----------------------+------------+----------+--+
| "Complaint Full Date" | Source | SERIAL | |
+-----------------------+------------+----------+--+
| 02/04/16 | DAPIS_CAIR | DG540732 | |
| 04/18/16 | DAPIS_CAIR | DG553384 | |
| 03/23/17 | RO | DG559515 | |
| 03/29/16 | CAIR | DG559781 | |
| 12/10/14 | DAPIS_CAIR | DG561621 | |
+-----------------------+------------+----------+--+

Weird ghost records in PostgreSQL - what are they?

I have a very weird issue on our postgresql DB. I have a table called "statement" which has some strange records in it.
Using the command line console psql, I query select * from customer.statement where type in ('QUOTE'); and get 12 rows back. 7 rows look normal, 5 are missing all data except a single column which is a nullable column but seems to hold real values entered by the user. psql tells me that 7 rows were returned even though there are 12. Most of the other columns are not nullable. The weird records look like this:
select * from customer.statement where type = 'QUOTE';
id | issuer_id | recipient_id | recipient_name | recipient_reference | source_statement_id | catalogue_id | reference | issue_date | due_date | description | total | currency | type | tax_level | rounding_mode | status | recall_requested | time_created | time_updated | time_paid
------------------+------------------+------------------+----------------+---------------------+---------------------+--------------+-----------+------------+------------+------------------------------------------------------------------+-----------+----------+-------+-----------+---------------+-----------+------------------+----------------------------+----------------------------+-----------
... 7 valid records removed ...
| | | | | | | | | | Build bulkheads and sheet with plasterboard. +| | | | | | | | | |
| | | | | | | | | | Patch all patches. +| | | | | | | | | |
| | | | | | | | | | Set and sand all joints ready for painting. +| | | | | | | | | |
| | | | | | | | | | Use wall angle on bulkhead in main bedroom. +| | | | | | | | | |
| | | | | | | | | | Build nib and sheet and set in entrance | | | | | | | | | |
(7 rows)
If I run the same query using pgAdmin, I don't see those weird records.
Anyone know what these are?
The plus sign before the separator (+|) indicates a newline character in the displayed string value in psql. So no additional rows, just the same row continued with line breaks. The final line of output in your quote confirms as much: (7 rows).
In pgAdmin you don't see the extra lines as long as you don't increase the height of the field (or copy / paste the content somewhere), but there are multiple lines as well.
Try in psql and in pgAdmin:
test=# SELECT E'This\nis\na\ntest.' AS multi_line, 'foo' AS single_line;
multi_line | single_line
--------------+-------------
This +| foo
is +|
a +|
test. |
(1 row)
The manual about psql:
linestyle
Sets the border line drawing style to one of ascii, old-ascii, or unicode. [...] The default setting is ascii. [...]
ascii style uses plain ASCII characters. Newlines in data are shown using a + symbol in the right-hand margin. [...]

How to list Rackspace servers filtered by metadata using REST API?

I can see that it is possible to add metadata to a Rackspace virtual machine instance.
I want to get a list of running instances, filtered by a particular metatag value.
I can't see how to do so in the documentation however.
is it possible?
You should be able to do so using the openstack client... but it depends on which metatag you're interested in.
You can get a list of all servers:
openstack server list
Will spit something like
+--------------------------------------+------------------+--------+-----------------------------------------------------------------------------------------------------------+
| ID | Name | Status | Networks |
+--------------------------------------+------------------+--------+-----------------------------------------------------------------------------------------------------------+
| 97606ae9-7f18-4a3c-903a-1583d446119b | trysmallwin | ERROR | |
| cb78b8d5-2f03-4a3f-ab26-f389acbd0b76 | Win-try again | ERROR | public=2607:f298:5:101d:f816:3eff:fe9e:5cd4, 208.113.133.90, 2607:f298:5:101d:f816:3eff:fe36:da45, |
| | | | 208.113.133.93, 2607:f298:5:101d:f816:3eff:fe40:57d5, 208.113.133.95 |
| 040751d1-c4c5-47aa-8dec-1d69a468be1c | hnxhdkwskrvwvdwr | ACTIVE | public=2607:f298:5:101d:f816:3eff:fe60:324, 208.113.130.52 |
+--------------------------------------+------------------+--------+-----------------------------------------------------------------------------------------------------------+
note the ID of the server and investigate deeper:
openstack server show 040751d1-c4c5-47aa-8dec-1d69a468be1c
+--------------------------------------+------------------------------------------------------------+
| Field | Value |
+--------------------------------------+------------------------------------------------------------+
| OS-DCF:diskConfig | MANUAL |
| OS-EXT-AZ:availability_zone | iad-2 |
| OS-EXT-STS:power_state | Running |
| OS-EXT-STS:task_state | None |
| OS-EXT-STS:vm_state | active |
| OS-SRV-USG:launched_at | 2016-07-26T17:32:01.000000 |
| OS-SRV-USG:terminated_at | None |
| accessIPv4 | |
| accessIPv6 | |
| addresses | public=2607:f298:5:101d:f816:3eff:fe60:324, 208.113.130.52 |
| config_drive | True |
| created | 2016-07-26T17:31:51Z |
| flavor | gp1.semisonic (50) |
| hostId | e1efd75d1e8f6a7f5bb228a35db13647281996087d39c65af8ce83d9 |
| id | 040751d1-c4c5-47aa-8dec-1d69a468be1c |
| image | Ubuntu-14.04 (03f89ff2-d66e-49f5-ae61-656a006bbbe9) |
| key_name | stef |
| name | hnxhdkwskrvwvdwr |
| os-extended-volumes:volumes_attached | [] |
| progress | 0 |
| project_id | d2fb6996496044158cf977c2129c8660 |
| properties | |
| security_groups | [{u'name': u'default'}] |
| status | ACTIVE |
| updated | 2016-07-26T17:32:01Z |
| user_id | 5b2ca246f39a425f9a833460bf322603 |
+--------------------------------------+------------------------------------------------------------+
openstack --f json will output the same stuff but in json format that you can more easily manipulate programmatically.
HTH

Using named fields to determine ranges with vsum in Emacs org-table-mode, impossible?

I have been trying to simplify a semi-complex table that I have by adding named fields, without a problem, until I get to the vsum operator. I had the formula set to $M=vsum($3..#-4) which works, however I am continuously having to add and remove items from those fields, which changes the column numbering. This results in me having to change the field specifications of the vsum range after every update/change. I thus tried naming the top field and bottom fields with the thought of supplying the named variables to vsum, giving me a table similar to the following:
| / | <> | <> |
|---+--------+---------|
| | Title1 | Title 2 |
|---+--------+---------|
| _ | | START |
| | name | 1000 |
| | name | 3456 |
| | name | 123 |
| ^ | | END |
|---+--------+---------|
| _ | | MT |
| # | Total | #ERROR |
| # | | |
|---+--------+---------|
#+TBLFM: $MT=vsum($START..$END)
This is the debug formula output from the above table:
Substitution history of formula
Orig: vsum($START..$END)
$xyz-> vsum((1000)..(123))
#r$c-> vsum((1000)..(123))
$1-> vsum((1000)..(123))
-----------^
Error: Expected `)'
I have tried embrasing the named field variables in parenthesis, and several other ways but have thus far not been able to get this to work. I am hoping I am just missing something and being blind, but perhaps this is not possible to do?
I have also tried the sum-up function with no success as well. Thank you in advance for your assistance.
The following solution works by using #II and #III to refer to all entries between the second and third hline.
| / | <> | <> |
|---+--------+---------|
| | Title1 | Title 2 |
|---+--------+---------|
| | name | 1000 |
| | name | 3456 |
| | name | 123 |
|---+--------+---------|
| _ | | MT |
| # | Total | 4579 |
| # | | |
|---+--------+---------|
#+TBLFM: $MT=vsum(#II..#III)
Documentation: http://orgmode.org/manual/References.html#References