create an empty table which gets the column names from another table - kdb

We have two tables 't' and 's'.
These tables may or may not have data but the schema of both t and s will alwaya be same.
Tables:
q)t:([] id:("ab";"cd";"ef";"gh";"ij"); refid:("";"ab";"";"ef";""); typ:`BUY`SELL`BUY`SELL`BUY)
q)s:t / For example purpose
Now, in my function. I want to concatenate the output of these two tables and return it, for which I'm using variable named res.
The problem is initially res is empty and not of type 98h, hence if I try to join t or s to res then it fails(which is obvious).
q){$[not ((count res) ~ 0); res: res,t ; res:t ]; $[not ((count res) ~ 0); res: res,s ; res:s ]; :res}[]
'res
One solution to this is create an empty schema for res(same as t and s table) and it works perfectly.
q){res:([] id:(); refid:(); typ:`$());$[not ((count res) ~ 0); res: res,t ; res:t ]; $[not ((count res) ~ 0); res: res,s ; res:s ]; :res}[]
But, is there a way that we don't have to create empty schema for res with all columns before hand, rather assign res as null(empty) table which can get the schema same as t or s when t or s is joined with res.

Your example isn't entirely clear - you mention res already exists in a comment, but then state that "initially res is empty and not of type 98h".
If you only want to assign res to be an empty table if it doesn't already exist, you can use a system command to check if res has already been defined in the root namespace, like the below:
f:{
if[not res in system"a";res:()]
$[count res;res,:t;res:t];
$[count res;res,:s;res:s];
:res;
};

Assign res to be 0 take the schema in question.
q)t:([] id:("ab";"cd";"ef";"gh";"ij"); refid:("";"ab";"";"ef";""); typ:`BUY`SELL`BUY`SELL`BUY)
q)res:0#t
q)meta res
c | t f a
-----| -----
id |
refid|
typ | s
So in this case you can do the following
q){[]res:0#t;if[count res;res,:t];$[count res;res,:s;res:s]}[]

Related

Postgres array of Golang structs

I have the following Go struct:
type Bar struct {
Stuff string `db:"stuff"`
Other string `db:"other"`
}
type Foo struct {
ID int `db:"id"`
Bars []*Bar `db:"bars"`
}
So Foo contains a slice of Bar pointers. I also have the following tables in Postgres:
CREATE TABLE foo (
id INT
)
CREATE TABLE bar (
id INT,
stuff VARCHAR,
other VARCHAR,
trash VARCHAR
)
I want to LEFT JOIN on table bar and aggregate it as an array to be stored in the struct Foo. I've tried:
SELECT f.*,
ARRAY_AGG(b.stuff, b.other) AS bars
FROM foo f
LEFT JOIN bar b
ON f.id = b.id
WHERE f.id = $1
GROUP BY f.id
But it looks like the ARRAY_AGG function signature is incorrect (function array_agg(character varying, character varying) does not exist). Is there a way to do this without making a separate query to bar?
It looks like what you want is for bars to be an array of bar objects to match your Go types. To do this, you should use JSON_AGG rather than ARRAY_AGG since ARRAY_AGG only works on single columns and would produce in this case an array of type text (TEXT[]). JSON_AGG, on the other hand, creates an array of json objects. You can combine this with JSON_BUILD_OBJECT to select only the columns you want.
Here's an example:
SELECT f.*,
JSON_AGG(JSON_BUILD_OBJECT('stuff', b.stuff, 'other', b.other)) AS bars
FROM foo f
LEFT JOIN bar b
ON f.id = b.id
WHERE f.id = $1
GROUP BY f.id
Then you'll have to handle unmarshaling the json in Go, but other than that you should be good to go.
Note also that Go will ignore unused keys for you when unmarshaling json to a struct, so you can simplify the query by just selecting all fields on the bar table if you want. Like so:
SELECT f.*,
JSON_AGG(TO_JSON(b.*)) AS bars -- or JSON_AGG(b.*)
FROM foo f
LEFT JOIN bar b
ON f.id = b.id
WHERE f.id = $1
GROUP BY f.id
If you want to also handle cases where there are no entries in bar for a record in foo, you can use:
SELECT f.*,
COALESCE(
JSON_AGG(TO_JSON(b.*)) FILTER (WHERE b.id IS NOT NULL),
'[]'::JSON
) AS bars
FROM foo f
LEFT JOIN bar b
ON f.id = b.id
WHERE f.id = $1
GROUP BY f.id
Without the FILTER, you'll get [NULL] for rows in foo that have no corresponding rows in bar, and the FILTER gives you just NULL instead, then just use COALESCE to convert to an empty json array.
As you already know array_agg takes a single argument and returns an array of the type of the argument. So, if you want all of a row's columns to be included in the array's elements you can just pass in the row reference directly, e.g.:
SELECT array_agg(b) FROM b
If, however, you only want to include specific columns in the array's elements you can use the ROW constructor, e.g.:
SELECT array_agg(ROW(b.stuff, b.other)) FROM b
Go's standard library provides out-of-the-box support for scanning only scalar values. For scanning more complex values like arbitrary objects and arrays one has to either look for 3rd party solutions, or implement their own sql.Scanner.
To be able to implement your own sql.Scanner and properly parse a postgres array of rows you first need to know what format postgres uses to output the value, you can find this out by using psql and some queries directly:
-- simple values
SELECT ARRAY[ROW(123,'foo'),ROW(456,'bar')];
-- output: {"(123,foo)","(456,bar)"}
-- not so simple values
SELECT ARRAY[ROW(1,'a b'),ROW(2,'a,b'),ROW(3,'a",b'),ROW(4,'(a,b)'),ROW(5,'"','""')];
-- output: {"(1,\"a b\")","(2,\"a,b\")","(3,\"a\"\",b\")","(4,\"(a,b)\")","(5,\"\"\"\",\"\"\"\"\"\")"}
As you can see this can get pretty hairy but nevertheless it's parseable, the syntax looks to be something like this:
{"(column_value[, ...])"[, ...]}
where column_value is either an unquoted value, or a quoted value with escaped double quotes, and such a quoted value itself can contain escaped double quotes but only in twos, i.e. a single escaped double quote will not occur inside the column_value. So a rough and incomplete implementation of the parser might look something like this:
NOTE: there may be other syntax rules, that I do not know of, that need to be taken into consideration during parsing. In addition to that the code below doesn't handle NULLs properly.
func parseRowArray(a []byte) (out [][]string) {
a = a[1 : len(a)-1] // drop surrounding curlies
for i := 0; i < len(a); i++ {
if a[i] == '"' { // start of row element
row := []string{}
i += 2 // skip over current '"' and the following '('
for j := i; j < len(a); j++ {
if a[j] == '\\' && a[j+1] == '"' { // start of quoted column value
var col string // column value
j += 2 // skip over current '\' and following '"'
for k := j; k < len(a); k++ {
if a[k] == '\\' && a[k+1] == '"' { // end of quoted column, maybe
if a[k+2] == '\\' && a[k+3] == '"' { // nope, just escaped quote
col += string(a[j:k]) + `"`
k += 3 // skip over `\"\` (the k++ in the for statement will skip over the `"`)
j = k + 1 // skip over `\"\"`
continue // go to k loop
} else { // yes, end of quoted column
col += string(a[j:k])
row = append(row, col)
j = k + 2 // skip over `\"`
break // go back to j loop
}
}
}
if a[j] == ')' { // row end
out = append(out, row)
i = j + 1 // advance i to j's position and skip the potential ','
break // go to back i loop
}
} else { // assume non quoted column value
for k := j; k < len(a); k++ {
if a[k] == ',' || a[k] == ')' { // column value end
col := string(a[j:k])
row = append(row, col)
j = k // advance j to k's position
break // go back to j loop
}
}
if a[j] == ')' { // row end
out = append(out, row)
i = j + 1 // advance i to j's position and skip the potential ','
break // go to back i loop
}
}
}
}
}
return out
}
Try it on playground.
With something like that you can then implement an sql.Scanner for your Go slice of bars.
type BarList []*Bar
func (ls *BarList) Scan(src interface{}) error {
switch data := src.(type) {
case []byte:
a := praseRowArray(data)
res := make(BarList, len(a))
for i := 0; i < len(a); i++ {
bar := new(Bar)
// Here i'm assuming the parser produced a slice of at least two
// strings, if there are cases where this may not be the true you
// should add proper length checks to avoid unnecessary panics.
bar.Stuff = a[i][0]
bar.Other = a[i][1]
res[i] = bar
}
*ls = res
}
return nil
}
Now if you change the type of the Bars field in the Foo type from []*Bar to BarList you'll be able to directly pass in a pointer of the field to a (*sql.Row|*sql.Rows).Scan call:
rows.Scan(&f.Bars)
If you don't want to change the field's type you can still make it work by converting the pointer just when it's being passed to the Scan method:
rows.Scan((*BarList)(&f.Bars))
JSON
An sql.Scanner implementation for the json solution suggested by Henry Woody would look something like this:
type BarList []*Bar
func (ls *BarList) Scan(src interface{}) error {
if b, ok := src.([]byte); ok {
return json.Unmarshal(b, ls)
}
return nil
}

Dynamically framing column names to pass it for select while creating Spark Data frame

I'm working in Spark and Scala for the past 2 months and I'm new to this technology. I framed the select columns(with regexp_replace) as List [String] () and passed for Spark Data frame creation and its throwing me error as "Cannot resolve". Please find below the steps, I have followed and tried.
Defining the val:
Defining the column which I would like to identify in the src data frame
val col_name = "region_id"
Defining the column which will be used to replace the src data frame column from ref data frame
val surr_key_col_name = "surrogate_key"
I have created two Data frames as shown below
src_df
region id | region_name | region_code
10001189 | Spain | SP09 8545
10001765 | Africa | AF97 6754
ref_df
region id | surrogate_key
1189 | 2345
1765 | 8978
val src_df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("s3://bucket/src_details.csv")
val ref_df = spark.read.format("csv").option("header", "true").option("inferSchema", "true").load("s3://bucket/ref_details.csv")
I'm iterating through to identify the column I need to use reg match and replace with another Data Frame column value and storing it in List to pass it to Data Frames select
val src_header_rec = src_df.columns.toList
//Loop through source file header to identify the region_id and replace it with surrogate_id by doing a pattern match( I don't want to replace the
for (src_header_cols <- src_header_rec) {
if (col_name == src_header_cols) {
src_column_names :+="regexp_replace("+"$"+s""""src.$src_header_cols""""+","+"$"+s""""ref.$src_header_cols""""+","+"$"+s""""ref.$surr_key_col_name""""+")"+".as("+s""""$src_header_cols""""+")"
}
else {
src_column_names :+= "src."+src_header_cols
}
}
After building the select column in the List [String] () using the for loop above, I'm passing it to the select columns for final_df creation
val final_df = src_df.alias("src").join(ref_df.alias("ref"), src_df(col_name)=== ref_df(col_name),"left_outer").select(src_column_names.head,src_column_names.tail:_*)
If I directly pass the columns without using the List [String] () in the select of the data frame my regexp_replace substitution works
val final_df = src_df.alias("src").join(ref_df.alias("ref"), src_df(col_name)=== ref_df(col_name),"left_outer").select(regexp_replace($"src.region_id",$"ref.region_id",$"ref.surrogate_key").as("region_id"))
I'm not sure why its not working when I'm passing it as a List [String] ()
When I remove the regexp_replace substitution in the for loop and pass it as List [String] () for Data Frame select it works properly as shown below:
This code works very well with Data Frame select:
for (src_header_cols <- src_header_rec) {
if (col_name == src_header_cols) {
src_column_names :+= "ref."+surr_key_col_name
}
else {
src_column_names :+= "src."+src_header_cols
}
}
val final_df = src_df.alias("src").join(ref_df.alias("ref"), src_df(col_name)===ref_df(col_name),"left_outer").select(src_column_names.head,src_column_names.tail:_*)
The result/output Data Frame I'm trying to derive is
final_df
region id | region_name | region_code
1000**2345** | Spain | SP09 8545
1000**8978** | Africa | AF97 6754
So, when I'm trying to build the Spark Data Frame select in the for loop with regexp_replace as a List and use it its throwing me "Cannot resolve" error.
I have tried creating Temporary view of the Data Frame and used the same regexp in the select statement of the Temporary view. It worked. Please find below the code I have tried and it worked.
//This for loop will scan through my header list and whichever column matches it frames regexp for those columns.So, the region_id from the Data Frame header matches the variable value that I have defined.
for (src_header_cols <- src_header_rec) {
if (col_name == src_header_cols) {
src_column_names :+= "regexp_replace(src."+s"$src_header_cols"+",ref."+s"$ref_col_name"+",ref."+s"$surr_key_col_name"+")"+s" $src_header_cols"
}
else {
src_column_names :+= "src."+src_header_cols
}
}
//Creating Temporary view to apply SQL queries on it
src_df.createOrReplaceTempView("src")
ref_df.createOrReplaceTempView("ref")
//Framing SQL statements to be passed while querying
val selectExpr_1 = "select "+src_column_names.mkString(",")
val selectExpr_2 = selectExpr_1+" from src left outer join ref on(src."+s"$col_name"+" = ref."+s"$ref_col_name"+")"
// Creating a final Data Frame using the SQL statement created
val src_policy_masked_df = spark.sql(s"$selectExpr_2")

Psycopg2 insert python dictionary in postgres database

In python 3+, I want to insert values from a dictionary (or pandas dataframe) into a database. I have opted for psycopg2 with a postgres database.
The problems is that I cannot figure out the proper way to do this. I can easily concatenate a SQL string to execute, but the psycopg2 documentation explicitly warns against this. Ideally I wanted to do something like this:
cur.execute("INSERT INTO table VALUES (%s);", dict_data)
and hoped that the execute could figure out that the keys of the dict matches the columns in the table. This did not work. From the examples of the psycopg2 documentation I got to this approach
cur.execute("INSERT INTO table (" + ", ".join(dict_data.keys()) + ") VALUES (" + ", ".join(["%s" for pair in dict_data]) + ");", dict_data)
from which I get a
TypeError: 'dict' object does not support indexing
What is the most phytonic way of inserting a dictionary into a table with matching column names?
Two solutions:
d = {'k1': 'v1', 'k2': 'v2'}
insert = 'insert into table (%s) values %s'
l = [(c, v) for c, v in d.items()]
columns = ','.join([t[0] for t in l])
values = tuple([t[1] for t in l])
cursor = conn.cursor()
print cursor.mogrify(insert, ([AsIs(columns)] + [values]))
keys = d.keys()
columns = ','.join(keys)
values = ','.join(['%({})s'.format(k) for k in keys])
insert = 'insert into table ({0}) values ({1})'.format(columns, values)
print cursor.mogrify(insert, d)
Output:
insert into table (k2,k1) values ('v2', 'v1')
insert into table (k2,k1) values ('v2','v1')
I sometimes run into this issue, especially with respect to JSON data, which I naturally want to deal with as a dict. Very similar. . .But maybe a little more readable?
def do_insert(rec: dict):
cols = rec.keys()
cols_str = ','.join(cols)
vals = [ rec[k] for k in cols ]
vals_str = ','.join( ['%s' for i in range(len(vals))] )
sql_str = """INSERT INTO some_table ({}) VALUES ({})""".format(cols_str, vals_str)
cur.execute(sql_str, vals)
I typically call this type of thing from inside an iterator, and usually wrapped in a try/except. Either the cursor (cur) is already defined in an outer scope or one can amend the function signature and pass a cursor instance in. I rarely insert just a single row. . .And like the other solutions, this allows for missing cols/values provided the underlying schema allows for it too. As long as the dict underlying the keys view is not modified as the insert is taking place, there's no need to specify keys by name as the values will be ordered as they are in the keys view.
[Suggested answer/workaround - better answers are appreciated!]
After some trial/error I got the following to work:
sql = "INSERT INTO table (" + ", ".join(dict_data.keys()) + ") VALUES (" + ", ".join(["%("+k+")s" for k in dict_data]) + ");"
This gives the sql string
"INSERT INTO table (k1, k2, ... , kn) VALUES (%(k1)s, %(k2)s, ... , %(kn)s);"
which may be executed by
with psycopg2.connect(database='deepenergy') as con:
with con.cursor() as cur:
cur.execute(sql, dict_data)
Post/cons?
using %(name)s placeholders may solve the problem:
dict_data = {'key1':val1, 'key2':val2}
cur.execute("""INSERT INTO table (field1, field2)
VALUES (%(key1)s, %(key2)s);""",
dict_data)
you can find the usage in psycopg2 doc Passing parameters to SQL queries
Here is another solution inserting a dictionary directly
Product Model (has the following database columns)
name
description
price
image
digital - (defaults to False)
quantity
created_at - (defaults to current date)
Solution:
data = {
"name": "product_name",
"description": "product_description",
"price": 1,
"image": "https",
"quantity": 2,
}
cur = conn.cursor()
cur.execute(
"INSERT INTO products (name,description,price,image,quantity) "
"VALUES(%(name)s, %(description)s, %(price)s, %(image)s, %(quantity)s)", data
)
conn.commit()
conn.close()
Note: The columns to be inserted is specified on the execute statement .. INTO products (column names to be filled) VALUES ..., data <- the dictionary (should be the same **ORDER** of keys)

Querying by Keys with LogicBlox

If I have two predicates (not functional):
addblock 'city(city_dim_id) -> int(city_dim_id).'
addblock 'city_name[city_dim_id] = name -> int(city_dim_id), string(name).'
I can add facts:
exec '+city(1).'
exec '+city_name[0] = "N/A".'
exec '+city_name[1] = "Chicago".'
These are then queries of facts in the predicates:
query '_(city_name) <- city_name(city_name, _).'
query '_(city_name) <- city_name(_, city_name).'
query '_(city_dim_id, city_name) <- city_name(city_dim_id, city_name).'
My question is how do I make a query to show
1. what are the city_dim_id in both tables,
2. return city_dim_id and city_name, but only where city_dim_id present in both tables?
Thanks in advance.
Sorry I'm struggling to understand the question.
The following will return the city_dim_id's that have the same city_name.
_(c1, c2) <-
city(c1),
city(c2),
city_name[c1] = city_name[c2],
c1 != c2.
If by ' city_dim_id in both tables ' you mean 'city_dim_id which are in both tables' then you want
_(id) <-city(id), city_name[id] = _.
if on the other hand you want the id who are in either table, you need to replace the conjunction by a disjunction.
_(id) <- city(id); city_name[id] = _.
I think you want
_(id,name) <- city(id), city_name[id] = name.
note: if you use the square bracket syntax city_name[id] = name then the predicate WILL be functional

F# Navigate object graph to return specific Nodes

I'm trying to build a list of DataTables based on DataRelations in a DataSet, where the tables returned are only included by their relationships with each other, knowing each end of the chain in advance. Such that my DataSet has 7 Tables. The relationships look like this:
Table1 -> Table2 -> Table3 -> Table4 -> Table5
-> Table6 -> Table7
So given Table1 and Table7, I want to return Tables 1, 2, 3, 6, 7
My code so far traverses all the relations and returns all Tables so in the example it returns Table4 and Table5 as well. I've passed in the first, last as arguments, and know that I'm not yet using the last one yet, I am still trying to think how to go about it, and is where I need the help.
type DataItem =
| T of DataTable
| R of DataRelation list
let GetRelatedTables (first, last) =
let rec flat_rec dt acc =
match dt with
| T(dt) ->
let rels = [ for r in dt.ParentRelations do yield r ]
dt :: flat_rec(R(rels)) acc
| R(h::t) ->
flat_rec(R(t)) acc # flat_rec(T(h.ParentTable)) acc
| R([]) -> []
flat_rec first []
I think something like this would do it (although I haven't tested it). It returns DataTable list option because, in theory, a path between two tables might not exist.
let findPathBetweenTables (table1 : DataTable) (table2 : DataTable) =
let visited = System.Collections.Generic.HashSet() //check for circular references
let rec search path =
let table = List.head path
if not (visited.Add(table)) then None
elif table = table2 then Some(List.rev path)
else
table.ChildRelations
|> Seq.cast<DataRelation>
|> Seq.tryPick (fun rel -> search (rel.ChildTable::path))
search [table1]