I am looking for a way to update data input (date format) inside a database.
Client app --> Flask --> Database
I am unsure about the best way for processing that needs to be done within Flask.
Data from the client app comes in JSON and may look like this:
data = [{"id": 1, "date": "2022-01-01"}, {"id": 2, "date": null}]
Note that date can also be null.
Currently I am processing the date input in the following way. Is there a better way to do this instead of manually checking if the provided data is empty?
import datetime
#app.route('/update_date', methods=['POST'])
def update_date():
data = request.json['data']
for item in data:
# query db object
example = db.session.query(Example).filter_by(id=data['id']).first()
# checking if input exists --> looking for a better way to do this
if item['date'] == '' or item['date'] is None:
example.date = None
else:
# convert string date to actual date object
example.date = datetime.datetime.strptime(data['date'], '%d.%m.%Y')
db.session.commit()
One drawback of this method is that the data input needs to be in the format specified in the strptime function. If the data comes in a different format it will raise an error. But there has to be a better way than to just add anotehr elif statement, right?
database model:
from flask_sqlalchemy import SQLAlchemy
from sqlalchemy import Date
db = SQLAlchemy()
class Example(db.Model):
id = db.Column(db.Integer, primary_key=True)
date = db.Column(db.Date)
As mentionned in my previous comment, you can validate and process the input coming from the front end.
An example using voluputous and defining a custom validator could look like :
from datetime import datetime
from voluptuous import Coerce, Date, Maybe, Schema
def Date(fmt):
return lambda v: datetime.strptime(v, fmt).date()
# Maybe : Validate that the object matches given validator or is None.
validate = Schema([{"id": int, "date": Maybe(Coerce(Date("%Y-%m-%d")))}])
with_date = [{"id": 1, "date": "2022-08-17"}]
without_date = [{"id": 2, "date": None}]
with_bad_date = [{"id": 2, "date": "2022-99-99"}]
validate(with_date) # [{'id': 1, 'date': datetime.date(2022, 8, 17)}]
validate(without_date) # [{'id': 1, 'date': None}]
validate(with_bad_date) # MultipleInvalid: not a valid value for dictionary value # data[0]['date']
Related
I am trying to write a JSON file using spark. There are some keys that have null as value. These show up just fine in the DataSet, but when I write the file, the keys get dropped. How do I ensure they are retained?
code to write the file:
ddp.coalesce(20).write().mode("overwrite").json("hdfs://localhost:9000/user/dedupe_employee");
part of JSON data from source:
"event_header": {
"accept_language": null,
"app_id": "App_ID",
"app_name": null,
"client_ip_address": "IP",
"event_id": "ID",
"event_timestamp": null,
"offering_id": "Offering",
"server_ip_address": "IP",
"server_timestamp": 1492565987565,
"topic_name": "Topic",
"version": "1.0"
}
Output:
"event_header": {
"app_id": "App_ID",
"client_ip_address": "IP",
"event_id": "ID",
"offering_id": "Offering",
"server_ip_address": "IP",
"server_timestamp": 1492565987565,
"topic_name": "Topic",
"version": "1.0"
}
In the above example keys accept_language, app_name and event_timestamp have been dropped.
Apparently, spark does not provide any option to handle nulls. So following custom solution should work.
import com.fasterxml.jackson.module.scala.DefaultScalaModule
import com.fasterxml.jackson.module.scala.experimental.ScalaObjectMapper
import com.fasterxml.jackson.databind.ObjectMapper
case class EventHeader(accept_language:String,app_id:String,app_name:String,client_ip_address:String,event_id: String,event_timestamp:String,offering_id:String,server_ip_address:String,server_timestamp:Long,topic_name:String,version:String)
val ds = Seq(EventHeader(null,"App_ID",null,"IP","ID",null,"Offering","IP",1492565987565L,"Topic","1.0")).toDS()
val ds1 = ds.mapPartitions(records => {
val mapper = new ObjectMapper with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
records.map(mapper.writeValueAsString(_))
})
ds1.coalesce(1).write.text("hdfs://localhost:9000/user/dedupe_employee")
This will produce output as :
{"accept_language":null,"app_id":"App_ID","app_name":null,"client_ip_address":"IP","event_id":"ID","event_timestamp":null,"offering_id":"Offering","server_ip_address":"IP","server_timestamp":1492565987565,"topic_name":"Topic","version":"1.0"}
If you are on Spark 3, you can add
spark.sql.jsonGenerator.ignoreNullFields false
ignoreNullFields is an option to set when you want DataFrame converted to json file since Spark 3.
If you need Spark 2 (specifically PySpark 2.4.6), you can try converting DataFrame to rdd with Python dict format. And then call pyspark.rdd.saveTextFile to output json file to hdfs. The following example may help.
cols = ddp.columns
ddp_ = ddp.rdd
ddp_ = ddp_.map(lambda row: dict([(c, row[c]) for c in cols])
ddp_ = ddp.repartition(1).saveAsTextFile(your_hdfs_file_path)
This should produce output file like,
{"accept_language": None, "app_id":"123", ...}
{"accept_language": None, "app_id":"456", ...}
What's more, if you want to replace Python None with JSON null, you will need to dump every dict into json.
ddp_ = ddp_.map(lambda row: json.dumps(row, ensure.ascii=False))
Since Spark 3, and if you are using the class DataFrameWriter
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/DataFrameWriter.html#json-java.lang.String-
(same applies for pyspark)
https://spark.apache.org/docs/3.0.0-preview/api/python/_modules/pyspark/sql/readwriter.html
its json method has an option ignoreNullFields=None
where None means True.
So just set this option to false.
ddp.coalesce(20).write().mode("overwrite").option("ignoreNullFields", "false").json("hdfs://localhost:9000/user/dedupe_employee")
To retain null values converting to JSON please set this config option.
spark = (
SparkSession.builder.master("local[1]")
.config("spark.sql.jsonGenerator.ignoreNullFields", "false")
).getOrCreate()
I am trying to update an existing elasticsearch data pipeline and would like to use elasticsearch-dsl more fully. In the current process we create a document as a json object and then use requests to PUT the object to the relevant elasticsearch index.
I would now like to use the elasticsearch-dsl save method but am left struggling to understand how I might do that when my object or document is constructed as json.
Current Process:
//import_script.py
index = 'objects'
doc = {"title": "A title", "Description": "Description", "uniqueID": "1234"}
doc_id = doc["uniqueID"]
elastic_url = 'http://elastic:changeme#localhost:9200/' + index + '/_doc/ + doc_id
api = ObjectsHandler()
api.put(elastic_url, doc)
//objects_handler.py
class ObjectsHandler():
def put(self, url, object):
result = requests.put(url, json=object)
if result.status_code != requests.codes.ok:
print(result.text)
result.raise_for_status()
Rather than using this PUT method, I would like to tap into the Document.save functionality available in the DSL but I can't translate the examples in the api documentation for my use case.
I have amended my ObjectsHandler so that it can create the objects index:
//objects_handler.py
es = Elasticsearch([{'host': 'localhost', 'port': 9200}],
http_auth='elastic:changeme')
connections.create_connection(es)
class Object(Document):
physicalDescription = Text()
title = Text()
uniqueID = Text()
class Index:
name = 'objects'
using = es
class ObjectsHandler():
def init_mapping(self, index):
Object.init(using=es, index=index)
This successfully creates an index when I call api.init_mapping(index) from the importer script.
The documentation has this as an example for persisting the individual documents, where Article is the equivalent to my Object class:
# create and save and article
article = Article(meta={'id': 42}, title='Hello world!', tags=['test'])
article.body = ''' looong text '''
article.published_from = datetime.now()
article.save()
Is it possible for me to use this methodology but to persist my pre-constructed json object doc, rather than specifying individual attributes? I also need to be able to specify that the document id is the doc uniqueID.
I've extended my ObjectsHandler to include a save_doc method:
def save_doc(self, document, doc_id, index):
new_obj = Object(meta={'id': doc_id},
title="hello", uniqueID=doc_id,
physicalDescription="blah")
new_obj.save()
which does successfully save the object with uniqueID as id but I am unable to utilise the json object passed in to the method as document.
I've had some success at this by using elasticsearch.py bulk helpers rather than elasticsearch-dsl.
The following resources were super helpful:
Blog - Bulk insert from json objects
SO Answer, showing different ways to add keywords in a bulk action
Elastic documentation on bulk imports
In my question I was referring to a:
doc = {"title": "A title", "Description": "Description", "uniqueID": "1234"}
I actually have an array or list of 1 or more docs eg:
documents = [{"title": "A title", "Description": "Description", "uniqueID": "1234"}, {"title": "Another title", "Description": "Another description", "uniqueID": "1235"}]
I build up a body for the bulk import and append the id:
for document in documents:
bulk_body.append({'index': {'_id': document["uniqueID"]}})
bulk_body.append(document)
then run my new call to the helpers.bulk method:
api_handler.save_docs(bulk_body, 'objects')
with my objects_handler.py file looking like:
//objects_handler.py
from elasticsearch.helpers import bulk
es = Elasticsearch([{'host': 'localhost', 'port': 9200}],
http_auth='elastic:changeme')
connections.create_connection(es)
class Object(Document):
physicalDescription = Text()
title = Text()
uniqueID = Text()
class Index:
name = 'objects'
using = es
class ObjectsHandler():
def init_mapping(self, index):
Object.init(using=es, index=index)
def save_docs(self, docs, index):
print("Attempting to index the list of docs using helpers.bulk()")
resp = es.bulk(index='objects', body=docs)
print("helpers.bulk() RESPONSE:", resp)
print("helpers.bulk() RESPONSE:", json.dumps(resp, indent=4))
This works for single docs in a json format or multiple docs.
In Angular 2 projects (version 4.x), I make heavy use of Reactive Forms, both to create data and edit it.
When fetching an object from the database, I get all of its keys (via UnderscoreJS), then iterate over the object for each key and use patchValue to load the data into the form.
I'm using both Underscore.js and Moment.js, so they're imported into the component:
import * as _ from 'underscore';
import * as moment from 'moment';
The iterator and patch functions are as follows:
In the iterator, I'm using a regular expression to find values that match the date format coming from the database - these come as strings, not dates, and always in the format 'yyyy-mm-dd'. In the component, I'm using ng-bootstrap calendar pickers, so I need to catch the dates, convert them into an NgbDateStruct object format before patching them into the form.
updatedFormWithJSONData(inputObject: any) {
var _keys = _.keys(inputObject);
var _re = /\d{4}[-]\d{1,2}[-]\d{1,2}/g; //RexEx to match 2017-09-05 (or 2017-9-5) - yyyy-mm-dd format
for (let key of _keys) {
console.log("item.key", key);
console.log("item.value", inputObject[key]);
if (_re.test(inputObject[key])) {
console.log("Is Date", inputObject[key]);
var dateValue = moment(inputObject[key]);
var NgbDateStruct = { day: dateValue.date(), month: dateValue.month() + 1, year: dateValue.year()};
console.log("Date Value", NgbDateStruct);
this.updateSelected(key, NgbDateStruct);
} else {
this.updateSelected(key, inputObject[key]);
}
}
}
The update function tests to make sure the key and value both have values, and that the form contains the key, then patches the form.
updateSelected(key, value) {
if (key && value && this.form.contains(key)) {
this.form.controls[key].patchValue(value)
}
}
This all works, so I can test this by assigning a JSON block to a variable, and then calling the function, passing in the variable and the form loads.
var fetchedData = [{
"purchased_date": "2017-09-16",
"purchase_description": "Some description",
"purchase_reason": "Some Reason",
"pickup_date": "2017-09-14"
}]
And I can call it like this:
this.updatedFormWithJSONData(fetchedData[0]);
Note: The objects from the database always come in an array, which is why I'm plucking the first object off the array.
Now - the problem is, as long as there is a non-date key/value pair in the data between two date key/value pairs, everything works fine, but if there are two date key/value pairs beside each other, the second one doesn't match the regex, and doesn't get patched into the form.
So the above works fine, but the following skips the pickup_date:
var fetchedData = [{
"purchased_date": "2017-09-16",
"pickup_date": "2017-09-14"
"purchase_description": "Some description",
"purchase_reason": "Some Reason",
}]
I have no idea why the second date when adjacent in the data fails, but I would like to be sure that regardless of the ordering of the fetched data, all the dates will be handled properly.
Update: Here are some snippet from the console.log:
The first, working example logs out like this:
item.key purchased_date
item.value 2017-09-16
Is Date 2017-09-16
Date Value {day: 16, month: 9, year: 2017}
item.key purchase_description
item.value Some description
item.key purchase_reason
item.value Some Reason
item.key pickup_date
item.value 2017-09-14
Is Date 2017-09-14
Date Value {day: 14, month: 9, year: 2017}
The second, non-working example logs out like this:
item.key purchased_date
item.value 2017-09-16
Is Date 2017-09-16
Date Value {day: 16, month: 9, year: 2017}
item.key pickup_date
item.value 2017-09-14
item.key purchase_description
item.value Some description
item.key purchase_reason
item.value Some Reason
Followup: while ultimately the problem was in the parameters of the regex expression, the problem the question aimed to solve was with iterating over a JSON data file in order to load the values into a reactive form in an Angular 2 application, and the code provided was producing incorrect output in a very specific case. It wasn't clear that the regex was the problem, so the solutions similar to the actual regex solution weren't even on my radar.
Check out the answer to a similar question at https://stackoverflow.com/a/2851339/235648
Basically, the regular expression when specified with the /g flag saves its state between calls, and remembers the index into the string at which it should try to find the next match. You can either remove it, or make sure you recreate the regular expression before using it (removing the /g seems the most straightforward solution)
I'm trying to construct middleware-code in d-lang that reads an array of struct DATA from a connected POSIX named pipe and identify changes by comparing with existing data, if changes occurred add the changed struct data into db.
Struct is constructed like this and all variables are likely to be changed
struct DATA
{
char[20] name;
int_16 sect;
int_16 ptyp;
ulong mode;
ulong cpmap;
}
DATA [NO_OF_STRUCTS*sizeof(struct DATA)] flowDATA;
Input from pipe works but when it comes to fetch individual latest version from the database, I'm outside of my knowledge
I.e. my problem is how to construct the query (code not for terminal) to retrieve the latest inserted data for each array indexposition from the db to compare and if changed, write/insert into db and also inform och what changed back via the pipe.
Feels like I need to add extra fields like an index_no to filter on.
The query would be like this in below pseudo code (can't get this code correct in my d-code):
db.lab.aggregate([{$match {dateadded{$in:["dateadded"]}}.{ $group:{_id:"$name"}}}]);
Select DATA from lab Where LATEST(dateAdded) and index_no == arrayIndex
Any ideas?
My testcode init the db like this
db.lab.insert([
{"name": "News 1",
"sektion":2,
"ptyp":1,
"mode":1024,
"cpmap":886,
dateAdded: new Date()
},{
"name": "Base 2",
"sektion":2,
"ptyp":1,
"mode":1024,
"cpmap":886,
dateAdded: new Date()
},{
"name": "Name 3",
"sektion":1,
"ptyp":3,
"mode":24,
"cpmap":886,
dateAdded: new Date()
},{
"name": "Name 4",
"sektion":1,
"ptyp":1,
"mode":0,
"cpmap":1024,
dateAdded: new Date()
}]};
Looking at the example code
var accounts = new Backbone.Collection;
accounts.url = '/accounts';
accounts.fetch();
this works if the route returns an array
[{id:1, name:'bob'}, {id:2, name:'joe'}]
but the REST service I'm using returns an object like this
{
items: [{id:1, name:'bob'}, {id:2, name:'joe'}],
page: 1,
href: '/acounts'
}
How to I go about telling Backbone.Collection that the collection is in items?
Parse function seems appropriate.
From the documentation:
http://backbonejs.org/
"When fetching raw JSON data from an API, a Collection will automatically populate itself with data formatted as an array, while a Model will automatically populate itself with data formatted as an object:
[{"id": 1}] ..... populates a Collection with one model.
{"id": 1} ....... populates a Model with one attribute.
However, it's fairly common to encounter APIs that return data in a different format than what Backbone expects. For example, consider fetching a Collection from an API that returns the real data array wrapped in metadata:
{
"page": 1,
"limit": 10,
"total": 2,
"books": [
{"id": 1, "title": "Pride and Prejudice"},
{"id": 4, "title": "The Great Gatsby"}
]
}
In the above example data, a Collection should populate using the "books" array rather than the root object structure. This difference is easily reconciled using a parse method that returns (or transforms) the desired portion of API data:
var Books = Backbone.Collection.extend({
url: '/books',
parse: function(data) {
return data.books;
}
});
"
Hope it helps.