Pandas Groupby; SyntaxError: invalid syntax

Pandas Groupby; SyntaxError: invalid syntax - group-by

Intraday data of 2000+ stocks is my data which is in hour minute seconds format. I want the data in every minute for each set of stocks.
My sample data
I tried the with the following code:
df1 = df1.set_index("LastTradeTime")
t = df1.groupby(['Symbol'])pd.Grouper(freq='1Min').agg({"OpenPrice": "first",
"LastTradePrice": "last",
"LowPrice": "min",
"HighPrice": "max"})
t.columns = ["open", "close", "low", "high"]
print(t)
I am expecting the result as:
Expected Result
But I am facing a syntax error:
SyntaxError: invalid syntax

Related

pyspark how to get the count of records which are not matching with the given date format

I have a csv file that contains (FileName,ColumnName,Rule and RuleDetails) as headers.
As per the Rule Detail I need to get the count of columnname(INSTALLDATE) which are not matching with the RuleDetail DataFormat
I have to pass ColumnName and RuleDetails dynamically
I tried with below Code
from pyspark.sql.functions import *
DateFields = []
for rec in df_tabledef.collect():
if rec["Rule"] == "DATEFORMAT":
DateFields.append(rec["Columnname"])
DateFormatValidvalues = [str(x) for x in rec["Ruledetails"].split(",") if x]
DateFormatString = ",".join([str(elem) for elem in DateFormatValidvalues])
DateColsString = ",".join([str(elem) for elem in DateFields])
output = (
df_tabledata.select(DateColsString)
.where(
DateColsString
not in (datetime.strptime(DateColsString, DateFormatString), "DateFormatString")
)
.count()
)
display(output)
Expected output is count of records which are not matching with the given dateformat.
For Example - If 4 out of 10 records are not in (YYYY-MM-DD) then the count should be 4
I got the below Error Message if u run the above code.

Debugging "String length exceeds DDL Length" error AWS Glue

I'm writing a dynamic frame to Redshift as a table and I'm getting the following error:
An error occurred while calling o3225.pyWriteDynamicFrame. Error (code 1204) while loading data into Redshift: "String length exceeds DDL length"
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("transactionId", "string", "transaction_id", "string"), ("basicChannelGroupingPath", "string", "channel_grouping", "string"))], transformation_ctx = "applymapping1")
datasink2 = glueContext.write_dynamic_frame.from_jdbc_conf(frame = applymapping1,catalog_connection = "redshift_test", connection_options ={"preactions":"truncate table dw.table;","dbtable": "dw.table", "database": "test","postactions":post_query},redshift_tmp_dir = args["TempDir"], transformation_ctx = "datasink2")

The error you are getting is not happening on the AWS Glue level, it's being passed from Amazon Redshift. Error code 1204 from Redshift means:
Input data exceeded the acceptable range for the data type.
That being said, some string data you are trying to write to a Redshift table exceeds the byte size limit of the string value in the table. To resolve that, I would recommend you first check the system table STL_LOAD_ERRORS and check the raw_field_value column to see the pre-parsing value and the string that causes the issue. After that, you could do additional pre-processing for such cases (if needed) and resolve your issue.

Clickhouse client syntax error kafka integration

ClickHouse client version 18.16.1 and I'm following this blog post-https://altinity.com/blog/2020/5/21/clickhouse-kafka-engine-tutorial
when creating a table I'm using this syntex
CREATE TABLE readings (
readings_id Int32 Codec(DoubleDelta, LZ4),
time DateTime Codec(DoubleDelta, LZ4),
date ALIAS toDate(time),
temperature Decimal(5,2) Codec(T64, LZ4)
) Engine = MergeTree
PARTITION BY toYYYYMM(time)
ORDER BY (readings_id, time);
and I'm getting an error that says
"""
Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 76 (line 2, col 23): Codec(DoubleDelta, LZ4),
time DateTime Codec(DoubleDelta, LZ4),
date ALIAS toDate(time),
temperature Decimal(5,2) Codec(T64, LZ4)
)
ENGINE = MergeTr. Expected one of: token, ClosingRoundBracket, Comma, DEFAULT, MATERIALIZED, ALIAS, COMMENT, e.what() = DB::Exception
"""
let me know what I'm doing wrong thanks.

Store random data in Postgres database from Python

I have the data in this form:
data={'[{"info": "No", "uid": null, "links": ["";, ""], "task_id": 1, "created": "2017-02-15T09:07:09.068145", "finish_time": "2017-02-15T09:07:14.620174", "calibration": null, "user_ip": null, "timeout": null, "project_id": 1, "id": 1}]', 'uuid': u'abc:def:ghi'}
I want to store this data in the Postgres DB. I have this query:
quer1='UPDATE table_1 SET data = "%s" WHERE id = "%s" '%(data1,id)
db_session.execute(quer1)
db_session.commit()
This query does execute but doesn't store anything in the db. Datatype of data is 'text'. I am not able to make where I am wrong. Please help.
Edited::
I updated my query to this:
quer1='UPDATE table_1 SET data = "%s" WHERE hitid = %s '%(data1,id)

First, never use % or str.format to insert values into your queries!!!
Assuming you are using psycopg2, your query should use the following format:
db_session.execute('UPDATE table_1 SET data = %s WHERE id = %s', (data1, id))
As #groteworld mentions, data = {1,2,'3',[4],(5),{6}} is not valid Python.
I will assume you are using a proper value for data in your actual code.

SQLAlchemy: Problems Migrating to PostgreSQL from SQLite (e.g. sqlalchemy.exc.ProgrammingError:)

I am having difficulties migrating a working a working script to PGSQL from SQLite. I am using SQLalchemy. When I run the script, it raises the following errors:
raise exc.DBAPIError.instance(statement, parameters, e, connection_invalidated=is_disconnect)
sqlalchemy.exc.ProgrammingError: (ProgrammingError) can't adapt 'INSERT INTO cnn_hot_stocks (datetime, list, ticker, price, change, "pctChange") VALUES (%(datetime)s, %(list)s, %(ticker)s, %(price)s, %(change)s, %(pctChange)s)' {'price': Decimal('7.94'), 'list': 'active', 'datetime': datetime.datetime(2012, 6, 23, 11, 45, 1, 544361), 'pctChange': u'+1.53%', 'ticker': u'BAC', 'change': Decimal('0.12')}
The insert call works well when using sqlite engine, but I want to use pgsql to utilize the native Decimal type for keeping financial data correct. I copied the script and just changed the db engine to my postgresql server. Any advice on how to troubleshoot this error would be greatly appreciated for this SQLalchemy newbie... I think I am up a creek on this one! Thanks in advance!
Here are my relevant code segments and table descriptions:
dbstring = "postgresql://postgres:postgres#localhost:5432/algo"
db = create_engine(dbstring)
db.echo = True # Try changing this to True and see what happens
metadata = MetaData(db)
cnn_hot_stocks = Table('cnn_hot_stocks', metadata, autoload=True)
i = cnn_hot_stocks.insert() # running log from cnn hot stocks web-site
def scrape_data():
try:
html = urllib2.urlopen('http://money.cnn.com/data/hotstocks/').read()
markup, errors = tidy_document(html)
soup = BeautifulSoup(markup,)
except Exception as e:
pass
list_map = { 2 : 'active',
3 : 'gainer',
4 : 'loser'
}
# Iterate over 3 tables on CNN hot stock web-site
for x in range(2, 5):
table = soup('table')[x]
for row in table.findAll('tr')[1:]:
timestamp = datetime.now()
col = row.findAll('td')
ticker = col[0].a.string
price = Decimal(col[1].span.string)
change = Decimal(col[2].span.span.string)
pctChange = col[3].span.span.string
log_data = {'datetime' : timestamp,
'list' : list_map[x],
'ticker' : ticker,
'price' : price,
'change' : change,
'pctChange' : pctChange
}
print log_data
# Commit to DB
i.execute(log_data)
TABLE:
cnn_hot_stocks = Table('cnn_hot_stocks', metadata, # log of stocks data on cnn hot stocks lists
Column('datetime', DateTime, primary_key=True),
Column('list', String), # loser/gainer/active
Column('ticker', String),
Column('price', Numeric),
Column('change', Numeric),
Column('pctChange', String),
)

My reading of the documentation is that you have to use numeric instead of decimal.
PostgreSQL has no type named decimal (it's an alias for numeric but not a very full-featured one), and SQL Alchemy seems to expect numeric as the type it can use for abstraction purposes.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Pandas Groupby; SyntaxError: invalid syntax - group-by

Related

pyspark how to get the count of records which are not matching with the given date format

Debugging "String length exceeds DDL Length" error AWS Glue

Clickhouse client syntax error kafka integration

Store random data in Postgres database from Python

SQLAlchemy: Problems Migrating to PostgreSQL from SQLite (e.g. sqlalchemy.exc.ProgrammingError:)

Categories

Resources