Creating a table from literal values in Ibis - ibis

I'd like to use Ibis to create a table from literal values instead of a table.
In BigQuery SQL, I might do this with the a combination of the array and struct data types. See this example from the BigQuery docs.
WITH races AS (
SELECT "800M" AS race,
[STRUCT("Rudisha" as name, [23.4, 26.3, 26.4, 26.1] as laps),
STRUCT("Makhloufi" as name, [24.5, 25.4, 26.6, 26.1] as laps),
STRUCT("Murphy" as name, [23.9, 26.0, 27.0, 26.0] as laps),
STRUCT("Bosse" as name, [23.6, 26.2, 26.5, 27.1] as laps),
STRUCT("Rotich" as name, [24.7, 25.6, 26.9, 26.4] as laps),
STRUCT("Lewandowski" as name, [25.0, 25.7, 26.3, 27.2] as laps),
STRUCT("Kipketer" as name, [23.2, 26.1, 27.3, 29.4] as laps),
STRUCT("Berian" as name, [23.7, 26.1, 27.0, 29.3] as laps)]
AS participants)
SELECT
race,
participant
FROM races r
CROSS JOIN UNNEST(r.participants) as participant;
The ibis.table() method only constructs an empty table with a given schema, so I'm not sure how one might go from such a table to one with literal values. Also, the fact that the table is unbound makes it difficult to use in many backends.

This is now available via ibis.memtable -- there's a brief introduction in the tutorial available here: https://ibis-project.org/docs/4.1.0/tutorial/05-IO-Create-Insert-External-Data/#creating-new-tables-from-in-memory-pandas-dataframes

Related

TSQL: Special (National) char detection (SQL Server2016)

I found a lot of techniques to detect aka special chars($%##) avail on English keyboard, however I see that some national char like á works differently, though my LIKE condition should get it, as it should select anything but a-z1-9, what is the trick here: In sample below I'm missing my special á. I'm on TSQL 2016 with default settings in US.
;WITH cte AS (SELECT 'Euro a€' St UNION SELECT 'adgkjb$' St UNION SELECT 'Bravo Endá' St)
SELECT * FROM cte WHERE St LIKE '%[^a-zA-Z0-9 ]%'
St
adgkjb$
Euro a€
SELECT CAST(N'€' AS VARBINARY(8)) --0xAC20
SELECT CAST(N'á' AS VARBINARY(8)) --0xE100
SQL Server appears to be helping with ranges of characters due to the default collation. If you explicitly list all of the valid characters it will work as desired. Alternatively, you can force a collation on the pattern match that won't interpret the pattern a containing non-ASCII characters.
-- Explicit pattern for "bad" characters.
declare #InvalidCharactersPattern as VarChar(100) = '%[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ]%';
-- Query some sample data with the explicit pattern and an explicitly specified collation.
select Sample,
case when Sample like #InvalidCharactersPattern then 'Bad Character' else 'Okay' end as ExplicitStatus,
case when Sample like '%[^a-zA-Z0-9 ]%' collate Latin1_General_100_BIN
then 'Bad Character' else 'Okay' end as CollationStatus
from ( values ( 'a' ), ( 'A' ), ( 'á' ), ( 'Foo' ), ( 'F&o' ), ( '$%^&' ) ) as Samples( Sample );
-- Server collation.
select ServerProperty( 'collation' ) as ServerCollation;
-- Available collations.
select name, description
from sys.fn_helpcollations()
order by name;

nextval(seq_name) not fetching correct value from DB

I have a flask with sqlalchemy tied to a postgres db. All components are working with reads fully functional. I have a simple model:
class School(db.Model):
__tablename__ = 'schools'
id = db.Column(db.Integer, db.Sequence('schools_id_seq'), primary_key=True)
name = db.Column(db.String(80))
active = db.Column(db.Boolean)
created = db.Column(db.DateTime)
updated = db.Column(db.DateTime)
def __init__(self, name, active, created, updated):
self.name = name
self.active = active
self.created = created
self.updated = updated
which is working on a postgres table:
CREATE SEQUENCE schools_id_seq;
CREATE TABLE schools(
id int PRIMARY KEY NOT NULL DEFAULT nextval('schools_id_seq'),
name varchar(80) NOT NULL,
active boolean DEFAULT TRUE,
created timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
);
ALTER SEQUENCE schools_id_seq OWNED BY schools.id;
when I work with an insert on this table from psql, all is well:
cake=# select nextval('schools_id_seq');
nextval
---------
65
(1 row)
cake=# INSERT INTO schools (id, name, active, created, updated) VALUES (nextval('schools_id_seq'),'Test', True, current_timestamp, current_timestamp);
INSERT 0 1
resulting in:
66 | Test | 0 | t | 2016-08-25 14:12:24.928456 | 2016-08-25 14:12:24.928456
but when I try the same insert from flask, stack trace complains about a duplicate id, but it is using nextval to get that value:
sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "schools_pkey"
DETAIL: Key (id)=(7) already exists.
[SQL: "INSERT INTO schools (id, name, active, created, updated) VALUES (nextval('schools_id_seq'), %(name)s, %(active)s, %(created)s, %(updated)s) RETURNING schools.id"] [parameters: {'active': True, 'name': 'Testomg', 'updated': datetime.datetime(2016, 8, 25, 14, 10, 5, 703471), 'created': datetime.datetime(2016, 8, 25, 14, 10, 5, 703458)}]
Why would the sqlalchemy call to nextval not return the same next val that the same call within the postgres db yields?
UPDATE: #RazerM told me about the echo=true param that I didn't know about. With
app.config['SQLALCHEMY_ECHO']=True
I yielded from a new insert (note that on this try it fetched 10, should be 67):
2016-08-25 14:47:40,127 INFO sqlalchemy.engine.base.Engine select version()
2016-08-25 14:47:40,128 INFO sqlalchemy.engine.base.Engine {}
2016-08-25 14:47:40,314 INFO sqlalchemy.engine.base.Engine select current_schema()
2016-08-25 14:47:40,315 INFO sqlalchemy.engine.base.Engine {}
2016-08-25 14:47:40,499 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2016-08-25 14:47:40,499 INFO sqlalchemy.engine.base.Engine {}
2016-08-25 14:47:40,594 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2016-08-25 14:47:40,594 INFO sqlalchemy.engine.base.Engine {}
2016-08-25 14:47:40,780 INFO sqlalchemy.engine.base.Engine show standard_conforming_strings
2016-08-25 14:47:40,780 INFO sqlalchemy.engine.base.Engine {}
2016-08-25 14:47:40,969 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2016-08-25 14:47:40,971 INFO sqlalchemy.engine.base.Engine INSERT INTO schools (id, name, active, created, updated) VALUES (nextval('schools_id_seq'), %(name)s, %(active)s, %(created)s, %(updated)s) RETURNING schools.id
2016-08-25 14:47:40,971 INFO sqlalchemy.engine.base.Engine {'name': 'Testing', 'created': datetime.datetime(2016, 8, 25, 14, 47, 38, 785031), 'active': True, 'updated': datetime.datetime(2016, 8, 25, 14, 47, 38, 785050)}
2016-08-25 14:47:41,064 INFO sqlalchemy.engine.base.Engine ROLLBACK
sqlalchemy.exc.IntegrityError: (psycopg2.IntegrityError) duplicate key value violates unique constraint "schools_pkey"
DETAIL: Key (id)=(10) already exists.
[SQL: "INSERT INTO schools (id, name, active, created, updated) VALUES (nextval('schools_id_seq'), %(name)s, %(active)s, %(created)s, %(updated)s) RETURNING schools.id"] [parameters: {'updated': datetime.datetime(2016, 8, 25, 14, 54, 18, 262873), 'created': datetime.datetime(2016, 8, 25, 14, 54, 18, 262864), 'active': True, 'name': 'Testing'}]
Well, solution is simple in that case, it doesn't explain why, because I think we should look at entire environment, which you cannot show us or it will take too long. So try to insert as many records as it will reach 67 and next inserts should apply without any error, because sequence minimum will reach proper value. Of course you can try to add server_default option to id property first:
server_default=db.Sequence('schools_id_seq').next_value()
So
seq = db.Sequence('schools_id_seq')
And in a class:
id = db.Column(db.Integer, seq, server_default=seq.next_value(), primary_key=True)
Sqlalchemy mention about that in this way:
Sequence was originally intended to be a Python-side directive first and foremost so it’s probably a good idea to specify it in this way as well.
Sequences are always incremented, so both your select statement and SQLAlchemy incremented the value.
As stated in Sequence Manipulation Functions:
Advance the sequence object to its next value and return that value. This is done atomically: even if multiple sessions execute nextval concurrently, each will safely receive a distinct sequence value.
If a sequence object has been created with default parameters, successive nextval calls will return successive values beginning with 1. Other behaviors can be obtained by using special parameters in the CREATE SEQUENCE command; see its command reference page for more information.
Important: To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used and will not be returned again. This is true even if the surrounding transaction later aborts, or if the calling query ends up not using the value. For example an INSERT with an ON CONFLICT clause will compute the to-be-inserted tuple, including doing any required nextval calls, before detecting any conflict that would cause it to follow the ON CONFLICT rule instead. Such cases will leave unused "holes" in the sequence of assigned values. Thus, PostgreSQL sequence objects cannot be used to obtain "gapless" sequences.

Inserting values into multiple columns by splitting a string in PostgreSQL

I have the following heap of text:
"BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,
URLConnectionSample,ShortVersion,1.0,Version,1.0,BundleSize,155648,DynamicSize,
16384,Identifier,com.IdentifierForVendor3,Name,IdentifierForVendor3,ShortVersion,
1.0,Version,1.0,".
What I'd like to do is extract data from this in the following manner:
BundleSize:155648
DynamicSize:204800
Identifier:com.URLConnectionSample
Name:URLConnectionSample
ShortVersion:1.0
Version:1.0
BundleSize:155648
DynamicSize:16384
Identifier:com.IdentifierForVendor3
Name:IdentifierForVendor3
ShortVersion:1.0
Version:1.0
All tips and suggestions are welcome.
It isn't quite clear what do you need to do with this data. If you really need to process it entirely in the database (looks like the task for your favorite scripting language instead), one option is to use hstore.
Converting records one by one is easy:
Assuming
%s =
BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,URLConnectionSample,ShortVersion,1.0,Version,1.0
SELECT * FROM each(hstore(string_to_array(%s, ',')));
Output:
key | value
--------------+-------------------------
Name | URLConnectionSample
Version | 1.0
BundleSize | 155648
Identifier | com.URLConnectionSample
DynamicSize | 204800
ShortVersion | 1.0
If you have table with columns exactly matching field names (note the quotes, populate_record is case-sensitive to key names):
CREATE TABLE data (
"BundleSize" integer, "DynamicSize" integer, "Identifier" text,
"Name" text, "ShortVersion" text, "Version" text);
You can insert hstore records into it like this:
INSERT INTO data SELECT * FROM
populate_record(NULL::data, hstore(string_to_array(%s, ',')));
Things get more complicated if you have comma-separated values for more than one record.
%s = BundleSize,155648,DynamicSize,204800,Identifier,com.URLConnectionSample,Name,URLConnectionSample,ShortVersion,1.0,Version,1.0,BundleSize,155648,DynamicSize,16384,Identifier,com.IdentifierForVendor3,Name,IdentifierForVendor3,ShortVersion,1.0,Version,1.0,
You need to break up an array into chunks of number_of_fields * 2 = 12 elements first.
SELECT hstore(row) FROM (
SELECT array_agg(str) AS row FROM (
SELECT str, row_number() OVER () AS i FROM
unnest(string_to_array(%s, ',')) AS str
) AS str_sub
GROUP BY (i - 1) / 12) AS row_sub
WHERE array_length(row, 1) = 12;
Output:
"Name"=>"URLConnectionSample", "Version"=>"1.0", "BundleSize"=>"155648", "Identifier"=>"com.URLConnectionSample", "DynamicSize"=>"204800", "ShortVersion"=>"1.0"
"Name"=>"IdentifierForVendor3", "Version"=>"1.0", "BundleSize"=>"155648", "Identifier"=>"com.IdentifierForVendor3", "DynamicSize"=>"16384", "ShortVersion"=>"1.0"
And inserting this into the aforementioned table:
INSERT INTO data SELECT (populate_record(NULL::data, hstore(row))).* FROM ...
the rest of the query is the same.

Postgres control flow--multiple return values

I have a bunch of queries on a business database with addresses and I often want to reclassify those as either inside or outside a given (known) area.
The SELECT CASE construct is great for this purpose, but I have often been in a situation where I want more than one return value based on the same condition tested for. For example, if the business is in a certain area, I classify it as "inside", but I may also by the same token, preferably in the same CASE block, set another value or flag, differently weigh the observation, and so on conditional on the CASE criteria being true.
What is the best/easiest way to leverage long condition statements and get multiple return values at the same time? Is that the domain of plpgsql only?
EDIT: added mock data, below. This does the categorization, but if I wanted to weigh employment for each establishment, I would need a separate CASE block, with the same critieria. That is what I am trying to get around.
SELECT
City, CASE WHEN City =ANY (ARRAY['San Francisco', 'San Mateo','Oakland','Marin','Santa Clara'])
THEN City ELSE 'outside'::text END as area,EstabEmployees
FROM (VALUES
('San Francisco', 14),
('San Mateo', 23),
('San Mateo', 3),
('San Francisco', 34),
('Visalia', 65),
('Juneau', 23),
('Mendocino', 5),
('Santa Clara', 1),
('Los Angeles', 56),
('San Mateo', 11),
('Los Angeles', 30),
('Marin', 33),
('Oakland', 14),
('Oakland', 2)
) AS t (City, EstabEmployees)
;
The easiest way to do this is to leverage another table or two defining the relationship.
See: http://sqlfiddle.com/#!1/ef0bb/6
To preserve for future use, here's the "Schema" for sqlfiddle (which I'm using a combination of DDL and DML)
create table metros(
id serial primary key,
name varchar(100),
data varchar (100)
);
create table metromappings (
id serial primary key,
metroid int references metros(id),
cityname varchar(100) not null
);
insert into metros(name) values ('San Francisco Area');
insert into metromappings(metroid,cityname)
select currval(pg_get_serial_sequence('metros', 'id')), name
from (values ('San Francisco'),
('San Mateo'),
('Oakland'),
('Marin'),
('Santa Clara')) as t(name);
And here's my introduction of how to use it:
select
case when m.id IS NULL THEN 'outside::' ELSE t.City END AS area,EstabEmployees
from
(metros m inner join metromappings mm
on m.id = mm.metroid and m.name =ANY (ARRAY['San Francisco Area1'])) -- add more stuff here
full outer join
(VALUES
('San Francisco', 14),
('San Mateo', 23),
('San Mateo', 3),
('San Francisco', 34),
('Visalia', 65),
('Juneau', 23),
('Mendocino', 5),
('Santa Clara', 1),
('Los Angeles', 56),
('San Mateo', 11),
('Los Angeles', 30),
('Marin', 33),
('Oakland', 14),
('Oakland', 2)
) AS t (City, EstabEmployees) on t.City = mm.cityname
order by area, EstabEmployees;
Please note, you might want to do some clustering/unique indexing on metroid,cityname, if only to remove the possibility of adding the same city to the same area twice (or just define the pair as the key and set up the id as some unique index; I'm not sure which is best).

Nested SELECT statement in a CASE expression

Greetings,
Here is my problem.
I need to get data from multiple rows and return them as a single result in a larger query.
I already posted a similar question here.
Return multiple values in one column within a main query but I suspect my lack of SQL knowledge made the question too vague because the answers did not work.
I am using Microsoft SQL 2005.
Here is what I have.
Multiple tables with CaseID as the PK, CaseID is unique.
One table (tblKIN) with CaseID and ItemNum(AutoInc) as the combined PK.
Because each person in the database will likely have more than one relative.
If I run the following, in a SQL query window, it works.
DECLARE #KINList varchar(1000)
SELECT #KINList = coalesce(#KINList + ', ','') + KINRel from tblKIN
WHERE CaseID = 'xxx' and Address = 'yyy'
ORDER BY KINRel
SELECT #KINList
This will return the relation of all people who live at the same address. the results look like this...
Father, Niece, Sister, Son
Now, the problem for me is how do I add that to my main query?
Shortened to relevant information, the main query looks like this.
SELECT DISTINCT
c.CaseID,
c.Name,
c.Address,
Relatives=CASE WHEN exists(select k.CaseID from tblKIN k where c.CaseID = k.CaseID)
THEN DECLARE #KINList varchar(1000)
SELECT #KINList = coalesce(#KINList + ', ','') + KINRel from tblKIN
WHERE CaseID = 'xxx' and Address = 'yyy'
ORDER BY KINRel
SELECT #KINList
ELSE ''
END
FROM tblCase c
ORDER BY c.CaseID
The errors I receive are.
Server: Msg 156, Level 15, State 1, Line 13
Incorrect syntax near the keyword 'DECLARE'.
Server: Msg 156, Level 15, State 1, Line 18
Incorrect syntax near the keyword 'ELSE'.
I tried nesting inside parenthesis from the DECLARE to the end of the SELECT #KINList.
I tried adding a BEGIN and END to the THEN section of the CASE statement.
Neither worked.
The source table data looks something like this. (periods added for readability)
tblCase
CaseID Name Address
10-001 Jim......100 Main St.
10-002 Tom....150 Elm St.
10-003 Abe.....200 1st St.
tblKIN
CaseID ItemNum Name Relation Address
10-001 00001 Steve...Son........100 Main St.
10-002 00002 James..Father....150 Elm St.
10-002 00003 Betty....Niece......150 Elm St.
10-002 00004 Greta...Sister.....150 Elm St.
10-002 00005 Davey..Son........150 Elm St.
10-003 00006 Edgar...Brother...200 1st St.
If I run the query for CaseID = 10-002, it needs to return the following.
CaseID Name Address.......Relatives
10-002 Tom...150 Elm St. ..Father, Niece, Sister, Son
I am sure this is probably a simple fix, but I just don't know how to do it.
Thank you for your time, and I apologize for the length of the question, but I wanted to be clear.
Thanks !!!
When I did something similar I had to create a scalar function to do the coalesce that returns the varchar result. Then just call it in the select.
CREATE FUNCTION GetRelatives
(
#CaseID varchar(10)
)
RETURNS varchar(1000)
AS
BEGIN
DECLARE #KINList varchar(1000)
SELECT #KINList = coalesce(#KINList + ', ','') + KINRel from tblKIN
WHERE CaseID = #CaseID
ORDER BY KINRel
RETURN #KINList
END
Then your select
SELECT DISTINCT
c.CaseID,
c.Name,
c.Address,
database.dbo.GetRelatives(c.CaseID) AS Relatives
FROM tblCase c
ORDER BY c.CaseID
You can create a FUNCTION which takes in the caseID as the arguement and returns true or false.
Since you are calling the nested query multiple times, its definitely a performance hit. A better solution is to execute the query and store the results in a temporary table.
Then pass this temporary table and the caseID to the FUNCTION and check for containment.