Pandas Groupby; SyntaxError: invalid syntax - group-by

Intraday data of 2000+ stocks is my data which is in hour minute seconds format. I want the data in every minute for each set of stocks.
My sample data
I tried the with the following code:
df1 = df1.set_index("LastTradeTime")
t = df1.groupby(['Symbol'])pd.Grouper(freq='1Min').agg({"OpenPrice": "first",
"LastTradePrice": "last",
"LowPrice": "min",
"HighPrice": "max"})
t.columns = ["open", "close", "low", "high"]
print(t)
I am expecting the result as:
Expected Result
But I am facing a syntax error:
SyntaxError: invalid syntax

Related

pyspark how to get the count of records which are not matching with the given date format

I have a csv file that contains (FileName,ColumnName,Rule and RuleDetails) as headers.
As per the Rule Detail I need to get the count of columnname(INSTALLDATE) which are not matching with the RuleDetail DataFormat
I have to pass ColumnName and RuleDetails dynamically
I tried with below Code
from pyspark.sql.functions import *
DateFields = []
for rec in df_tabledef.collect():
if rec["Rule"] == "DATEFORMAT":
DateFields.append(rec["Columnname"])
DateFormatValidvalues = [str(x) for x in rec["Ruledetails"].split(",") if x]
DateFormatString = ",".join([str(elem) for elem in DateFormatValidvalues])
DateColsString = ",".join([str(elem) for elem in DateFields])
output = (
df_tabledata.select(DateColsString)
.where(
DateColsString
not in (datetime.strptime(DateColsString, DateFormatString), "DateFormatString")
)
.count()
)
display(output)
Expected output is count of records which are not matching with the given dateformat.
For Example - If 4 out of 10 records are not in (YYYY-MM-DD) then the count should be 4
I got the below Error Message if u run the above code.

Debugging "String length exceeds DDL Length" error AWS Glue

I'm writing a dynamic frame to Redshift as a table and I'm getting the following error:
An error occurred while calling o3225.pyWriteDynamicFrame. Error (code 1204) while loading data into Redshift: "String length exceeds DDL length"
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("transactionId", "string", "transaction_id", "string"), ("basicChannelGroupingPath", "string", "channel_grouping", "string"))], transformation_ctx = "applymapping1")
datasink2 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = applymapping1,catalog_connection = "redshift_test", connection_options ={"preactions":"truncate table dw.table;","dbtable": "dw.table", "database": "test","postactions":post_query},redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink2")
The error you are getting is not happening on the AWS Glue level, it's being passed from Amazon Redshift. Error code 1204 from Redshift means:
Input data exceeded the acceptable range for the data type.
That being said, some string data you are trying to write to a Redshift table exceeds the byte size limit of the string value in the table. To resolve that, I would recommend you first check the system table STL_LOAD_ERRORS and check the raw_field_value column to see the pre-parsing value and the string that causes the issue. After that, you could do additional pre-processing for such cases (if needed) and resolve your issue.

Clickhouse client syntax error kafka integration

ClickHouse client version 18.16.1 and I'm following this blog post-https://altinity.com/blog/2020/5/21/clickhouse-kafka-engine-tutorial
when creating a table I'm using this syntex
CREATE TABLE readings (
readings_id Int32 Codec(DoubleDelta, LZ4),
time DateTime Codec(DoubleDelta, LZ4),
date ALIAS toDate(time),
temperature Decimal(5,2) Codec(T64, LZ4)
) Engine = MergeTree
PARTITION BY toYYYYMM(time)
ORDER BY (readings_id, time);
and I'm getting an error that says
"""
Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 76 (line 2, col 23): Codec(DoubleDelta, LZ4),
time DateTime Codec(DoubleDelta, LZ4),
date ALIAS toDate(time),
temperature Decimal(5,2) Codec(T64, LZ4)
)
ENGINE = MergeTr. Expected one of: token, ClosingRoundBracket, Comma, DEFAULT, MATERIALIZED, ALIAS, COMMENT, e.what() = DB::Exception
"""
let me know what I'm doing wrong thanks.

Store random data in Postgres database from Python

I have the data in this form:
data={'[{"info": "No", "uid": null, "links": ["";, ""], "task_id": 1, "created": "2017-02-15T09:07:09.068145", "finish_time": "2017-02-15T09:07:14.620174", "calibration": null, "user_ip": null, "timeout": null, "project_id": 1, "id": 1}]', 'uuid': u'abc:def:ghi'}
I want to store this data in the Postgres DB. I have this query:
quer1='UPDATE table_1 SET data = "%s" WHERE id = "%s" '%(data1,id)
db_session.execute(quer1)
db_session.commit()
This query does execute but doesn't store anything in the db. Datatype of data is 'text'. I am not able to make where I am wrong. Please help.
Edited::
I updated my query to this:
quer1='UPDATE table_1 SET data = "%s" WHERE hitid = %s '%(data1,id)
First, never use % or str.format to insert values into your queries!!!
Assuming you are using psycopg2, your query should use the following format:
db_session.execute('UPDATE table_1 SET data = %s WHERE id = %s', (data1, id))
As #groteworld mentions, data = {1,2,'3',[4],(5),{6}} is not valid Python.
I will assume you are using a proper value for data in your actual code.

SQLAlchemy: Problems Migrating to PostgreSQL from SQLite (e.g. sqlalchemy.exc.ProgrammingError:)

I am having difficulties migrating a working a working script to PGSQL from SQLite. I am using SQLalchemy. When I run the script, it raises the following errors:
raise exc.DBAPIError.instance(statement, parameters, e, connection_invalidated=is_disconnect)
sqlalchemy.exc.ProgrammingError: (ProgrammingError) can't adapt 'INSERT INTO cnn_hot_stocks (datetime, list, ticker, price, change, "pctChange") VALUES (%(datetime)s, %(list)s, %(ticker)s, %(price)s, %(change)s, %(pctChange)s)' {'price': Decimal('7.94'), 'list': 'active', 'datetime': datetime.datetime(2012, 6, 23, 11, 45, 1, 544361), 'pctChange': u'+1.53%', 'ticker': u'BAC', 'change': Decimal('0.12')}
The insert call works well when using sqlite engine, but I want to use pgsql to utilize the native Decimal type for keeping financial data correct. I copied the script and just changed the db engine to my postgresql server. Any advice on how to troubleshoot this error would be greatly appreciated for this SQLalchemy newbie... I think I am up a creek on this one! Thanks in advance!
Here are my relevant code segments and table descriptions:
dbstring = "postgresql://postgres:postgres#localhost:5432/algo"
db = create_engine(dbstring)
db.echo = True # Try changing this to True and see what happens
metadata = MetaData(db)
cnn_hot_stocks = Table('cnn_hot_stocks', metadata, autoload=True)
i = cnn_hot_stocks.insert() # running log from cnn hot stocks web-site
def scrape_data():
try:
html = urllib2.urlopen('http://money.cnn.com/data/hotstocks/').read()
markup, errors = tidy_document(html)
soup = BeautifulSoup(markup,)
except Exception as e:
pass
list_map = { 2 : 'active',
3 : 'gainer',
4 : 'loser'
}
# Iterate over 3 tables on CNN hot stock web-site
for x in range(2, 5):
table = soup('table')[x]
for row in table.findAll('tr')[1:]:
timestamp = datetime.now()
col = row.findAll('td')
ticker = col[0].a.string
price = Decimal(col[1].span.string)
change = Decimal(col[2].span.span.string)
pctChange = col[3].span.span.string
log_data = {'datetime' : timestamp,
'list' : list_map[x],
'ticker' : ticker,
'price' : price,
'change' : change,
'pctChange' : pctChange
}
print log_data
# Commit to DB
i.execute(log_data)
TABLE:
cnn_hot_stocks = Table('cnn_hot_stocks', metadata, # log of stocks data on cnn hot stocks lists
Column('datetime', DateTime, primary_key=True),
Column('list', String), # loser/gainer/active
Column('ticker', String),
Column('price', Numeric),
Column('change', Numeric),
Column('pctChange', String),
)
My reading of the documentation is that you have to use numeric instead of decimal.
PostgreSQL has no type named decimal (it's an alias for numeric but not a very full-featured one), and SQL Alchemy seems to expect numeric as the type it can use for abstraction purposes.