DynamoDB / Scanamo : The provided key element does not match the schema - scala

I've been trying to use DynamoDB through the Scanamo library. My scala code looks like this:
package my.package
import com.amazonaws.ClientConfiguration
import com.amazonaws.regions.{Region, Regions}
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient
import com.gu.scanamo._
import com.gu.scanamo.syntax._
import com.amazonaws.auth.{AWSStaticCredentialsProvider, BasicAWSCredentials}
import com.amazonaws.services.dynamodbv2.datamodeling._
object MusicService {
def main(args: Array[String]): Unit = {
val musicService = new MusicService
musicService.getAlbums()
}
}
class MusicService {
def getAlbums() {
val awsCreds = new BasicAWSCredentials("my","creds")
val client = AmazonDynamoDBClient
.builder()
.withRegion(Regions.EU_WEST_2)
.withCredentials(new AWSStaticCredentialsProvider(awsCreds))
.build();
case class Music(#DynamoDBIndexRangeKey(attributeName = "Artist")
artist: String, #DynamoDBIndexHashKey(attributeName = "SongTitle") songTitle: String);
val table = Table[Music]("Music")
val putOp = table.putAll(Set(
Music("The Killers", "Sam's Town"),
Music("The Killers", "Spaceman")
))
Scanamo.exec(client)(putOp)
}
I am getting this error on execing the putOp:
Exception in thread "main" com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: The provided key element does not match the schema (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: ValidationException; Request ID: 0KAFH90JO39COO143LC5H6RPPNVV4KQNSO5AEMVJF66Q9ASUAAJG)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1638)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1303)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1055)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:2186)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:2162)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.executeBatchWriteItem(AmazonDynamoDBClient.java:575)
at com.amazonaws.services.dynamodbv2.AmazonDynamoDBClient.batchWriteItem(AmazonDynamoDBClient.java:551)
at com.gu.scanamo.ops.ScanamoInterpreters$$anon$1.apply(ScanamoInterpreters.scala:51)
at com.gu.scanamo.ops.ScanamoInterpreters$$anon$1.apply(ScanamoInterpreters.scala:30)
at cats.free.Free.$anonfun$foldMap$1(Free.scala:126)
at cats.package$$anon$1.tailRecM(package.scala:41)
at cats.free.Free.foldMap(Free.scala:124)
at cats.free.Free.$anonfun$foldMap$1(Free.scala:127)
at cats.package$$anon$1.tailRecM(package.scala:41)
at cats.free.Free.foldMap(Free.scala:124)
at com.gu.scanamo.Scanamo$.exec(Scanamo.scala:17)
at my.package.MusicService.getAlbums(MusicService.scala:39)
at my.package.MusicService$.main(MusicService.scala:14)
at my.package.MusicService.main(MusicService.scala)
My table structure on DynamoDB is incredibly simple and looks like this:
Table name: Music
Partition key: Artist
Sort key: SongTitle
That's all there is.
Please can you give me some guidance why this is failing and what I can do to fix it?

First of all you need to swap #DynamoDBIndexHashKey and #DynamoDBIndexRangeKey (as #DynamoDBIndexHashKey should be for hash key - artist and #DynamoDBIndexRangeKey for sort key - songTitle).
Also you mentioned that Artist is a Partition key and SongTitle is a Sort key. So why you use #DynamoDBIndexHashKey and #DynamoDBIndexRangeKey? I guess you need #DynamoDBHashKey and #DynamoDBRangeKey instead (in case artist and songTitle are not index).

Related

Writing to multiple tables using sqlalchemy, fastapi, pydantic postgres

First API I've built so bear with me, I currently have a FastAPI that is supposed to save a record of an event and when it happened, as well as a list of people who assisted with each event. Currently, my crud.py "post" command currently only posts to 'test', but I also need it to post names of those who helped to 'whohelped'. I've tried to make 'whohelped.event_token' the foreign key of 'Save_Info.token'. A check on whether my models and schema are correctly made would be greatly appreciated. The main issue is I'm totally lost how to make "post" make changes to both tables at once.
models.py
from .database import Base
from sqlalchemy import Column, String, Integer, Date, ForeignKey
from sqlalchemy.orm import relationship
class Save_Info(Base):
__tablename__ = 'test'
token = Column(Integer, primary_key = True, autoincrement = True)
how = Column(String)
date = Column(Date)
children = relationship("Who_Helped",back_populates="test")
class Who_Helped(Base):
__tablename__ = 'whohelped'
id = Column(Integer, primary_key = True, autoincrement = True)
event_token = Column(Integer, ForeignKey('test.token'))
who_helped = Column(String)
schema.py
from pydantic import BaseModel
from typing import Optional, List
from sqlalchemy.orm import relationship
from sqlalchemy import DateTime
class Who_Helped(BaseModel):
id: int
event_token: int
who_helped: Optional[str]
class Save_Info(BaseModel):
token: int
how: str
date: str
class Config:
orm_mode = True
crud.py
from sqlalchemy.orm import Session
from . import schema, models
def post_info(db: Session, info: schema.Save_Info):
device_info_model = models.Save_Info(**info.dict())
db.add(device_info_model)
db.commit()
db.refresh(device_info_model)
return device_info_model
def get_info(db: Session, token: int = None):
if token is None:
return db.query(models.Save_Info).all()
else:
return db.query(models.Save_Info).filter(models.Save_Info.token == token).first()
def error_message(message):
return {
'error': message
}
main.py
from fastapi import FastAPI, Depends, HTTPException
from .database import SessionLocal, engine
from sqlalchemy.orm import Session
from .schema import Save_Info
from . import crud, models
models.Base.metadata.create_all(bind=engine)
app = FastAPI()
def db():
try:
db = SessionLocal()
yield db
finally:
db.close()
#app.post('/device/info')
def post_info(info: Save_Info, db=Depends(db)):
object_in_db = crud.get_info(db, info.token)
if object_in_db:
raise HTTPException(400, detail= crud.error_message('This account of saving the world already exists'))
return crud.post_info(db,info)
#app.get('/device/info/{token}')
def get_info(token: int, db=Depends(db)):
info = crud.get_info(db,token)
if info:
return info
else:
raise HTTPException(404, crud.error_message('No info found for this account of saving the world {}'.format(token)))
#app.get('/device/info')
def get_all_info(db=Depends(db)):
return crud.get_info(db)

Using class as a phonebook dictionary

class PhoneBook:
def __init__(self):
self.contacts = {}
def __str__(self):
return str(self.contacts)
def add(self, name, mobile=None, office=None, email=None):
self.contacts["Name"] = name
self.contacts["Mobile"] = mobile
self.contacts["Office"] = office
self.contacts["Email"] = email
obj = PhoneBook()
obj.add("Kim", office="1234567", email="kim#company.com")
obj.add("Park", office="2345678", email="park#company.com")
print(obj)
I tried to make PhoneBook class to add up the dictionary lists as I put .add method to the class variable but every time the class variable calls the PhoneBook() class, the dictionary initialization occurs and only the last data remains in the dictionary(I suppose :S)
Is there any way to solve this problem? Thank you.
The issue is, you're using the same dictionary key "Name" to store your contacts. Instead, put real name as a key to dictionary and this key will hold another dictionary. For example:
import pprint
class PhoneBook:
def __init__(self):
self.contacts = {}
def __str__(self):
return pprint.pformat(self.contacts, width=30)
def add(self, name, mobile=None, office=None, email=None):
self.contacts[name] = {
"Mobile": mobile,
"Office": office,
"Email": email,
}
obj = PhoneBook()
obj.add("Kim", office="1234567", email="kim#company.com")
obj.add("Park", office="2345678", email="park#company.com")
print(obj)
Prints:
{'Kim': {'Email': 'kim#company.com',
'Mobile': None,
'Office': '1234567'},
'Park': {'Email': 'park#company.com',
'Mobile': None,
'Office': '2345678'}}

Only update values if they are not null in Koltin

So i try to only update values if they are not null in the response body of my request. This is how it looks at the moment, and if i dont send all the values with it, they just get nulled in the database. Im using Kotlin with JpaRepositories.
#PutMapping(value = ["/patient"], produces = ["application/json"])
fun updateClient(#RequestBody client: Client): ResponseEntity<Client>{
val old = repository.findById(client.id).orElseThrow{ NotFoundException("no patient found for id "+ client.id) }
val new = old.copy(lastName= client.lastName, firstName = client.firstName,
birthDate = client.birthDate, insuranceNumber = client.insuranceNumber)
return ResponseEntity(repository.save(new), HttpStatus.OK)
}
This is how the model class looks like
#Entity
data class Client(
#Id
val id: String,
val lastName: String?,
val firstName: String?,
val birthDate: Date?,
val insuranceNumber: Int?
)
Is there a easier way then writing copy once for every value and checking if its not null before?
The only thing that comes to mind that might make the process easier without modifying the current model or having to create other helper model/functions would be to use the elvis operator.
val new = old.copy(
lastName = client.lastName ?: old.lastName,
firstName = client.firstName ?: old.firstName,
birthDate = client.birthDate ?: old.birthDate,
insuranceNumber = client.insuranceNumber ?: old.insuranceNumber
)
Other ways of doing this would be to create our own copy function that would ignore input nulls, or a custom constructor that does the same. But that would require more work, and it depends on the model if that makes sense or not, for the example model that would not make sense in my opinion, it would be overkill

Beam sql udf to split one column into multiple columns

How to implement a beam sql udf function to split one column into multiple column?
I have already implemented this in bigquery udf function:
CREATE TEMP FUNCTION parseDescription(description STRING)
RETURNS STRUCT<msg STRING, ip STRING, source_region STRING, user_name STRING>
LANGUAGE js AS """
var arr = description.substring(0, description.length - 1).split(",");
var firstIndex = arr[0].indexOf(".");
this.msg = arr[0].substring(0, firstIndex);
this.ip = arr[0].substring(firstIndex + 2).split(": ")[1];
this.source_region = arr[1].split(": ")[1];
this.user_name = arr[2].split(": ")[1];
return this;
""";
INSERT INTO `table1` (parseDescription(event_description).* FROM `table2`;
Does beam sql udf function also support this kind of operation?
I tried to return an object in beam udf function, but it seems that beam sql doesn't support object.* syntax. I also tried to return a map or an array but still got error.
Is there anyway to implement the same udf in beam?
I tried to use MapElement method but got error, seems that the output row expected the same schema as input row, example:
import org.apache.beam.runners.direct.DirectOptions;
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.extensions.sql.SqlTransform;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.schemas.Schema;
import org.apache.beam.sdk.transforms.*;
import org.apache.beam.sdk.values.PBegin;
import org.apache.beam.sdk.values.PCollection;
import org.apache.beam.sdk.values.Row;
public class BeamMain2 {
public static void main(String[] args) {
DirectOptions options = PipelineOptionsFactory.fromArgs(args).withValidation()
.as(DirectOptions.class);
Pipeline p = Pipeline.create(options);
// Define the schema for the records.
Schema appSchema = Schema.builder().addStringField("string1").addInt32Field("int1").build();
Row row1 = Row.withSchema(appSchema).addValues("aaa,bbb", 1).build();
Row row2 = Row.withSchema(appSchema).addValues("ccc,ddd", 2).build();
Row row3 = Row.withSchema(appSchema).addValues("ddd,eee", 3).build();
PCollection<Row> inputTable =
PBegin.in(p).apply(Create.of(row1, row2, row3).withRowSchema(appSchema));
Schema newSchema =
Schema.builder()
.addNullableField("string2", Schema.FieldType.STRING)
.addInt32Field("int1")
.addNullableField("string3", Schema.FieldType.STRING)
.build();
PCollection<Row> outputStream = inputTable.apply(
SqlTransform.query(
"SELECT * "
+ "FROM PCOLLECTION where int1 > 1"))
.apply(MapElements.via(
new SimpleFunction<Row, Row>() {
#Override
public Row apply(Row line) {
return Row.withSchema(newSchema).addValues("a", 1, "b").build();
}
}));
p.run().waitUntilFinish();
}
}
Reference: https://beam.apache.org/documentation/dsls/sql/overview/
You can use emit 'Row' elements from a transform which can be later used as a table
The pipeline would look something like
Schema
Schema schema =
Schema.of(Schema.Field.of("f0", FieldType.INT64), Schema.Field.of("f1", FieldType.INT64));
Transform
private static MapElements<Row, Row> rowsToStrings() {
return MapElements.into(TypeDescriptor.of(Row.class))
.via(
row -> Row.withSchema(schema).addValue(1L).addValue(2L).build(););
}
Pipeline:
pipeline
.apply(
"SQL Query 1",
SqlTransform.query(<Query string 1>))
.apply("Transform column", rowsToStrings())
.apply(
"SQL Query 2",
SqlTransform.query(<Query string 2>))

mongoengine connection and multiple databases

I have 2 databases I want to query from, but I only get results from one. I'm using mongoengine with python and graphene (it's my first time). I've exhausted my search and I don't understand how I can resolve this issue. Here is my code:
import graphene
from mongoengine import Document, connect
from mongoengine.context_managers import switch_collection
from mongoengine.fields import (
StringField,
UUIDField,
IntField,
FloatField,
BooleanField,
)
from graphene_mongo import MongoengineObjectType
from mongoengine.connection import disconnect
class UserModel(Document):
meta = {"collection": "users"}
userID = UUIDField()
first_name = StringField()
last_name = StringField()
class Users(MongoengineObjectType):
class Meta:
model = UserModel
class UsersQuery(graphene.ObjectType):
users = graphene.List(Users)
user = graphene.Field(Users, userID=graphene.UUID())
def resolve_users(self, info):
db = connect("users")
users = list(UserModel.objects.all())
db.close()
return users
def resolve_user(self, info, userID):
return UserModel.objects(userID=userID).first()
users_schema = graphene.Schema(query=UsersQuery)
import graphene
from mongoengine import Document, connect
from mongoengine.fields import StringField, UUIDField
from graphene_mongo import MongoengineObjectType
from mongoengine.connection import disconnect
class Workout(Document):
meta = {"collection": "workouts"}
workoutID = UUIDField()
workout_label = StringField()
class Workouts(MongoengineObjectType):
class Meta:
model = Workout
class Query(graphene.ObjectType):
workouts = graphene.List(Workouts)
workout = graphene.Field(Workouts, workoutID=graphene.UUID())
def resolve_workouts(self, info):
db = connect("workouts")
wks = list(Workout.objects.all())
db.close()
return wks
def resolve_workout(self, info, workoutID):
return Workout.objects(workoutID=workoutID).first()
workouts_schema = graphene.Schema(query=Query)
Now when I have my python server up, mongod running I can hit the /workouts and it will return the array I need. But /users will not return the results.
I get no errors, nothing is wrong with my graphene query.
I can only get one of the queries to work at once.
I have tried using alias, not closing the connections, declaring the connect at the top level even before class UserModel or Workout.
If each of your model is bound to a different database. You should use something like this (cfr docs):
connect('workouts', alias='dbworkouts') # init a connection to database named "workouts" and register it under alias "dbworkouts"
connect('users', alias='dbusers')
class Workout(Document):
meta = {"db_alias": "dbworkouts"}
workoutID = UUIDField()
...
class UserModel(Document):
meta = {"db_alias": "dbusers"}
userID = UUIDField()
...