I am new to PostgreSQL.Simple so please forgive if question is dumb.
I am going through this tutorial: Postgresql Data Access with Haskell
I got the first demo program to run.
Now I am taking this function:
retrieveClient :: Connection -> Int -> IO [Only String]
retrieveClient conn cid = query conn "SELECT ticker FROM spot_daily WHERE id = ?" $ (Only cid)
and wish to modify it to return IO [(String, Integer, Float)].
So I wrote:
retrieveClient2 :: Connection -> Float -> IO [(String, Integer, Float)]
retrieveClient2 conn cid = query conn "SELECT (ticker, timestamp, some_val) FROM spot_daily WHERE p_open > ?" $ (Only cid)
main :: IO ()
main = do
conn <- connect localPG
mapM_ print =<< (retrieveClient2 conn 50.0)
and I get this:
MyApp-exe.EXE: Incompatible {errSQLType = "record", errSQLTableOid = Nothing, errSQLField = "row", errHaskellType = "Text", errMessage = "types incompatible"}
It's common in the Haskell world to say, "Search for the type signature you want on Hackage!" but it's not clear to me from the error message what type signature would make GHC happy.
Is there a conversion function for this sort of thing? I tried doing this:
data MyStruct = { field1 :: String, field2 :: Integer, field3 :: Float} deriving (Eq, Show)
retrieveClient3 :: Connection -> Int -> IO [Only MyStruct]
retrieveClient3 conn cid = MyStruct (query conn "SELECT ticker FROM spot_daily WHERE id = ?" $ (Only cid))
but that results in a different error.
In response to a comment, here is schema for spot_daily:
Table "public.spot_daily"
Column | Type | Collation | Nullable | Default
-----------+-----------------------+-----------+----------+----------------------------------------
id | integer | | not null | nextval('spot_daily_id_seq'::regclass)
ticker | character varying(20) | | not null |
epoch | bigint | | not null |
p_open | double precision | | |
p_close | double precision | | |
p_high | double precision | | |
p_low | double precision | | |
synthetic | boolean | | |
Indexes:
"spot_daily_pkey" PRIMARY KEY, btree (id)
PostgreSQL types make a distinction between a "row" and a "record". As written (with parentheses), your SQL query is returning a record, which isn't handled by the FromRow instance for tuples.
SELECT (ticker, timestamp, some_val) FROM prices_daily WHERE p_open > ?
Changing the query (by removing the parentheses), makes it return a row, which postgresql-simple should be able to handle:
SELECT ticker, timestamp, some_val FROM prices_daily WHERE p_open > ?
Related
I have a table with ~300 columns filled with characters (stored as String):
valuesDF:
| FavouriteBeer | FavouriteCheese | ...
|---------------|-----------------|--------
| U | C | ...
| U | E | ...
| I | B | ...
| C | U | ...
| ... | ... | ...
I have a Data Summary, which maps the characters onto their actual meaning. It is in this form:
summaryDF:
| Field | Value | ValueDesc |
|------------------|-------|---------------|
| FavouriteBeer | U | Unknown |
| FavouriteBeer | C | Carlsberg |
| FavouriteBeer | I | InnisAndGunn |
| FavouriteBeer | D | DoomBar |
| FavouriteCheese | C | Cheddar |
| FavouriteCheese | E | Emmental |
| FavouriteCheese | B | Brie |
| FavouriteCheese | U | Unknown |
| ... | ... | ... |
I want to programmatically replace the character values of each column in valuesDF with the Value Descriptions from summaryDF. This is the result I'm looking for:
finalDF:
| FavouriteBeer | FavouriteCheese | ...
|---------------|-----------------|--------
| Unknown | Cheddar | ...
| Unknown | Emmental | ...
| InnisAndGunn | Brie | ...
| Carlsberg | Unknown | ...
| ... | ... | ...
As there are ~300 columns, I'm not keen to type out withColumn methods for each one.
Unfortunately I'm a bit of a novice when it comes to programming for Spark, although I've picked up enough to get by over the last 2 months.
What I'm pretty sure I need to do is something along the lines of:
valuesDF.columns.foreach { col => ...... } to iterate over each column
Filter summaryDF on Field using col String value
Left join summaryDF onto valuesDF based on current column
withColumn to replace the original character code column from valuesDF with new description column
Assign new DF as a var
Continue loop
However, trying this gave me Cartesian product error (I made sure to define the join as "left").
I tried and failed to pivot summaryDF (as there are no aggregations to do??) then join both dataframes together.
This is the sort of thing I've tried, and always getting a NullPointerException. I know this is really not the right way to do this, and can see why I'm getting Null Pointer... but I'm really stuck and reverting back to old, silly & bad Python habits in desperation.
var valuesDF = sourceDF
// I converted summaryDF to a broadcasted RDD
// because its small and a "constant" lookup table
summaryBroadcast
.value
.foreach{ x =>
// searchValue = Value (e.g. `U`),
// replaceValue = ValueDescription (e.g. `Unknown`),
val field = x(0).toString
val searchValue = x(1).toString
val replaceValue = x(2).toString
// error catching as summary data does not exactly mapping onto field names
// the joys of business people working in Excel...
try {
// I'm using regexp_replace because I'm lazy
valuesDF = valuesDF
.withColumn( attribute, regexp_replace(col(attribute), searchValue, replaceValue ))
}
catch {case _: Exception =>
null
}
}
Any ideas? Advice? Thanks.
First, we'll need a function that executes a join of valuesDf with summaryDf by Value and the respective pair of Favourite* and Field:
private def joinByColumn(colName: String, sourceDf: DataFrame): DataFrame = {
sourceDf.as("src") // alias it to help selecting appropriate columns in the result
// the join
.join(summaryDf, $"Value" === col(colName) && $"Field" === colName, "left")
// we do not need the original `Favourite*` column, so drop it
.drop(colName)
// select all previous columns, plus the one that contains the match
.select("src.*", "ValueDesc")
// rename the resulting column to have the name of the source one
.withColumnRenamed("ValueDesc", colName)
}
Now, to produce the target result we can iterate on the names of the columns to match:
val result = Seq("FavouriteBeer",
"FavouriteCheese").foldLeft(valuesDF) {
case(df, colName) => joinByColumn(colName, df)
}
result.show()
+-------------+---------------+
|FavouriteBeer|FavouriteCheese|
+-------------+---------------+
| Unknown| Cheddar|
| Unknown| Emmental|
| InnisAndGunn| Brie|
| Carlsberg| Unknown|
+-------------+---------------+
In case a value from valuesDf does not match with anything in summaryDf, the resulting cell in this solution will contain null. If you want just to replace it with Unknown value, instead of .select and .withColumnRenamed lines above use:
.withColumn(colName, when($"ValueDesc".isNotNull, $"ValueDesc").otherwise(lit("Unknown")))
.select("src.*", colName)
This is the dummy function I wrote to update the counter.
def updateTable(tableName, visitorId, dtWithZone):
db_uri = app.config["SQLALCHEMY_DATABASE_URI"]
engine = create_engine(db_uri, connect_args={"options": "-c timezone={}".format(dtWithZone.timetz().tzinfo.zone)})
# create session
Session = sessionmaker()
Session.configure(bind=engine)
session = Session()
meta = MetaData(engine, reflect=True)
table = meta.tables[tableName]
print dir(table)
# update row to database
row = session.query(table).filter(
table.c.visitorId == visitorId).first()
print 'original:', row.count
row.count = row.count + 1
print "updated {}".format(row.count)
session.commit()
conn.close()
but when it reaches the line row.count = row.count + 1 it throws error:
AttributeError: can't set attribute
this is the table
\d visitorinfo;
Table "public.visitorinfo"
Column | Type | Modifiers
--------------+--------------------------+-----------
platform | character varying(15) |
browser | character varying(10) |
visitorId | character varying(10) | not null
language | character varying(10) |
version | character varying(20) |
cl_lat | double precision |
cl_lng | double precision |
count | integer |
ip | character varying(20) |
visitor_time | timestamp with time zone |
Indexes:
"visitorinfo_pkey" PRIMARY KEY, btree ("visitorId")
what am I doing wrong ?why is it saying cannot set attribute?
part of updated code:
# update row to database
row = session.query(table).filter(
table.c.visitorId == visitorId).first()
print 'original:', row.count
val = row.count
row.count = val + 1
print "updated {}".format(row.count)
use an update query:
UPDATE public.visitorinfo SET counter = counter + 1 WHERE visitorId = 'VisitorID'
Make sure that last 'VisitorID' is filled from your application
I need to return only a portion of the value in a given field.
Example:
A given field returns something like 'AB-1X3.4567' but the desired value is only the '1X3.4567'portion. So for this example I need to remove anything that precedes the pattern of
[0-9,A-Z][0-9,A-Z][0-9,A-Z][.][0-9,A-Z][0-9,A-Z][0-9,A-Z][0-9,A-Z].
What query could I write to do this?
using stuff() and patindex():
create table t (val varchar(32))
insert into t values
('AB-1X3.4567') -- given example
,('1X3.4567AB-1X3.4567') --extra junk on the end
,('1X3.4567') -- goldy locks
,('X3.4567') -- too short
,('AB-1X#.4567') -- # is not [0-9A-Z]
select
val
, str = stuff(val,1,patindex('%[0-9A-Z][0-9A-Z][0-9A-Z][.][0-9A-Z][0-9A-Z][0-9A-Z][0-9A-Z]%',val)-1,'')
from t
rextester demo: http://rextester.com/ITUJ68634
returns:
+---------------------+---------------------+
| val | str |
+---------------------+---------------------+
| AB-1X3.4567 | 1X3.4567 |
| 1X3.4567AB-1X3.4567 | 1X3.4567AB-1X3.4567 |
| 1X3.4567 | 1X3.4567 |
| X3.4567 | NULL |
| AB-1X#.4567 | NULL |
+---------------------+---------------------+
Your pattern alludes to anything which is XXX.XXXX where X = any single digit or letter. In that case we can use RIGHT() and LEN()
DECLARE #value VARCHAR(4000)='AB-1X3.4567'
SELECT RIGHT(#value,LEN(#value) - 3)
I have certain table:
CREATE TABLE x(
id BIGSERIAL PRIMARY KEY,
data JSONB
);
INSERT INTO x(data)
VALUES( '{"a":"test", "b":123, "c":null, "d":true}' ),
( '{"a":"test", "b":123, "c":null, "d":"yay", "e":"foo", "f":[1,2,3]}' );
How to query types of each key in that table, so it would give an output something like this:
a | string:2
b | number:2
c | null:2
d | boolean:1 string:1
e | string:1
f | jsonb:1 -- or anything
I only know the way to get the keys and count, but don't know how to get the type of each key:
SELECT jsonb_object_keys(data), COUNT(id) FROM x GROUP BY 1 ORDER BY 1
that would give something like:
a | 2
b | 2
c | 2
d | 2
e | 1
f | 1
EDIT:
As pozs points out, there are two typeof functions: one for JSON and one for SQL. This query is the one you're looking for:
SELECT
json_data.key,
jsonb_typeof(json_data.value),
count(*)
FROM x, jsonb_each(x.data) AS json_data
group by key, jsonb_typeof
order by key, jsonb_typeof;
Old Answer: (Hey, it works...)
This query will return the type of the keys:
SELECT
json_data.key,
pg_typeof(json_data.value),
json_data.value
FROM x, jsonb_each(x.data) AS json_data;
... unfortunately, you'll notice that Postgres doesn't differentiate between the different JSON types. it regards it all as jsonb, so the results are:
key1 | value1 | value
------+--------+-----------
a | jsonb | "test"
b | jsonb | 123
c | jsonb | null
d | jsonb | true
a | jsonb | "test"
b | jsonb | 123
c | jsonb | null
d | jsonb | "yay"
e | jsonb | "foo"
f | jsonb | [1, 2, 3]
(10 rows)
However, there aren't that many JSON primitive types, and the output seems to be unambiguous. So this query will do what you're wanting:
with jsontypes as (
SELECT
json_data.key AS key1,
CASE WHEN left(json_data.value::text,1) = '"' THEN 'String'
WHEN json_data.value::text ~ '^-?\d' THEN
CASE WHEN json_data.value::text ~ '\.' THEN 'Number'
ELSE 'Integer'
END
WHEN left(json_data.value::text,1) = '[' THEN 'Array'
WHEN left(json_data.value::text,1) = '{' THEN 'Object'
WHEN json_data.value::text in ('true', 'false') THEN 'Boolean'
WHEN json_data.value::text = 'null' THEN 'Null'
ELSE 'Beats Me'
END as jsontype
FROM x, jsonb_each(x.data) AS json_data -- Note that it won't work if we use jsonb_each_text here because the strings won't have quotes around them, etc.
)
select *, count(*) from jsontypes
group by key1, jsontype
order by key1, jsontype;
Output:
key1 | jsontype | count
------+----------+-------
a | String | 2
b | Integer | 2
c | Null | 2
d | Boolean | 1
d | String | 1
e | String | 1
f | Array | 1
(7 rows)
You can improve your last query with jsonb_typeof
with jsontypes as (
SELECT
json_data.key AS key1,
jsonb_typeof(json_data.value) as jsontype
FROM x, jsonb_each(x.data) AS json_data
)
select *, count(*)
from jsontypes
group by 1, 2
order by 1, 2;
I have this query (PostgreSQL 9.1):
=> update tbp setĀ super_answer = null where packet_id = 18;
ERROR: syntax error at or near "="
I don't get it. I'm really out of words here.
Table "public.tbp"
Column | Type | Modifiers
--------------+------------------------+-----------
id | bigint | not null
super_answer | bigint |
packet_id | bigint |
It turned out I've copied some white unicode character and Postgres didn't like it.
In a Python console:
>>> u'update "tbp" setĀ "super_answer"=null where "packet_id" = 18'
u'update "tbp" set\xa0"super_answer"=null where "packet_id" = 18'
Life can be strange sometimes.