DynamoDB Error: "Query key condition not supported" - nosql

I am querying data from a database in aws dynamodb and experiencing an error message on the KeyConditionExpression.
I am querying for "dominant_temporality" and "dt". These make up my composite partition key - dt is unique for each row and my sort key.
The code I'm running:
var params = {
TableName : "deardiary",
KeyConditionExpression: "#d = :dominant_temporality and dt between :minDate and :maxDate",
ExpressionAttributeNames: {
"#d" : "temporality"
},
ExpressionAttributeValues: { // the query values
":dominant_temporality": {S: "present"},
":minDate": {N: new Date("October 8, 2018").valueOf().toString()},
":maxDate": {N: new Date("October 9, 2018").valueOf().toString()}
}
};

Check if you are using BETWEEN on HASH which is not allowed - you can use only EQ for HASH or begins_with for range key.

Related

Avoiding Hasura sequential scan for query on large array relation

How do I get Hasura to generate SQL that uses the index on review.user_id?
Here is the seemingly obvious way of getting a user's reviews. I've included a simplified version of the SQL that Hasura generates, which is a sub query approach rather than a JOIN of the review table on user:
# SELECT * FROM review
# WHERE user_id = (
# SELECT id FROM "user" WHERE username = 'admin'
# )
# LIMIT 50
# This uses a sequential scan on `review` because Postgres
# can't know exactly what the subquery returns.
query ReviewsForUserSlow {
user(where: { username: { _eq: "admin" } }) {
reviews(limit: 50) {
text
}
}
}
Here is a hack to get Hasura to generate SQL that does indeed use the review.user_id index. However, the caveat is that we need to make a round trip to Hasura to get the user ID in order to build this query:
# Simplification of the SQL Hasura generates:
# SELECT * FROM review
# WHERE user_id = '8f3547e4-c8a9-480f-991f-0798c02f2ba2'
# LIMIT 50
query ReviewsForUserFast {
review(
limit: 50,
where: {
user_id: {
_eq: "8f3547e4-c8a9-480f-991f-0798c02f2ba2"
}
}
) {
text
}
}

How to avoid adding duplicate data in Scrapy using MongoDB?

I want to avoid adding duplicate data and just 1) update one field (number of views) or 2) all the fields that had changed in the website. To do so I'm using an ID (origin_id) that I have found in the website that I'm scraping.
Pipelines
class MongoDBPipeline(object):
def __init__(self):
connection = pymongo.MongoClient(
settings['MONGODB_SERVER'],
settings['MONGODB_PORT']
)
db = connection[settings['MONGODB_DB']]
self.collection = db[settings['MONGODB_COLLECTION']]
def process_item(self, item, spider):
valid = True
for data in item:
if not data:
valid = False
raise DropItem("Missing {0}!".format(data))
if valid:
# Update item if it is in the database and insert otherwise.
self.collection.update({'origin_id': item['origin_id']}, dict(item), upsert=True)
return item
MongoDB record
{
"_id" : ObjectId("59725e919a1a6b7f0350027a"),
"origin_id" : "12256699",
"views":"556",
"url":"...",
"title":"...",
}
Please let me know if you want more details ...
You need to increment views field by 1 if the origin_id exists in the document.
Note that you can only set the other fields as they hold non-numeric values.
This is also necessary in order to skip an extra query that checks if a document with that origin_id exists in the collection.
self.collection.update({
'origin_id': item['origin_id']},
{
'$set': {'url': item['url'], 'title': item['title']},
'$inc': {'views': 1}
}
},
upsert=True)

Mongo DB - map relational data to document structure

I have a dataset containing 30 million rows in a mongo collection. An example set of records would be:
{"_id" : ObjectId("568bc0f2f7cd2653e163a9e4"),
"EmailAddress" : "1234#ab.com",
"FlightNumber" : 1043,
"FlightTime" : "10:00"},
{"_id" : ObjectId("568bc0f2f7cd2653e163a9e5"),
"EmailAddress" : "1234#ab.com",
"FlightNumber" : 1045,
"FlightTime" : "12:00"},
{"_id" : ObjectId("568bc0f2f7cd2653e163a9e6"),
"EmailAddress" : "5678#ab.com",
"FlightNumber" : 1045,
"FlightTime" : "12:00"},
This has been imported directly from SQL server, hence the relational'esque nature of the data.
How can I best map this data to another collection so that all the data is then grouped by EmailAddress with the FlightNumbers nested? An example of the output would then be:
{"_id" : ObjectId("can be new id"),
"EmailAddress" : "1234#ab.com",
"Flights" : [{"Number":1043, "Time":"10:00"},{"Number":1045, "Time":"12:00"}]},
{"_id" : ObjectId("can be new id"),
"EmailAddress" : "5678#ab.com",
"Flights" : [{"Number":1045, "Time":"12:00"}]},
I've been working on an import routing that iterates through each record in the source collection and then bulk inserts into the second collection. This is working fine however doesn't allow me to group the data unless I back process through the records which adds a huge time overhead to the import routine.
The code for this would be:
var sourceDb = db.getSiblingDB("collectionSource");
var destinationDb = db.getSiblingDB("collectionDestination");
var externalUsers=sourceDb.CRM.find();
var index = 0;
var contactArray = new Array();
var identifierArray = new Array();
externalUsers.forEach(function(doc) {
//library code for NewGuid omitted
var guid = NewGuid();
//buildContact and buildIdentifier simply create 2 js objects based on the parameters
contactArray.push(buildContact(guid, doc.EmailAddress, doc.FlightNumber));
identifierArray.push(buildIdentifier(guid, doc.EmailAddress));
index++;
if (index % 1000 == 0) {
var now = new Date();
var dif = now.getTime() - startDate.getTime();
var Seconds_from_T1_to_T2 = dif / 1000;
var Seconds_Between_Dates = Math.abs(Seconds_from_T1_to_T2);
print("Written " + index + " items (" + Seconds_Between_Dates + "s from start)");
}
//bulk insert in batches
if (index % 5000 == 0) {
destinationDb.Contacts.insert(contactArray);
destinationDb.Identifiers.insert(identifierArray);
contactArray = new Array();
identifierArray = new Array();
}
});
Many thanks in advance
Hey there and welcome to MongoDB. In this situation you may want to consider using two different Collections -- one for users and one for flights.
User:
{
_id:
email:
}
Flight:
{
_id:
userId:
number: // if number is unique, you can actually specify _id as number
time:
}
In your forEach loop, you would first check to see if a user document with that specific email address already exists. If it doesn't, create it. Then use the User document's unique identifier to insert a new document into the Flights collection, storing the identifier under the field userId (or maybe passengerId?).

Slow session: fastest way to check if a document exists

I am trying to write a simpler session (haskell driver if it matters) with mongodb backend. I may be wrong but it seems a bit slow compared to when I run the bench without the session.
With session it gives me 25 connections a second - 10596 without.
Once the session is set on the initial load, all it does is compares the SID from cookie to the SID stored in session document in mongodb. So on every request it does a single trip to the database server. I get the SID from cookie and check if a document with such SID exist in mongodb. That's all. I am learning, so my session logic could be off too.
At the moment, I use count to check if the document exist. I count documents with relevant SID and test if it == 1. Is this a fast enough way to check if document exist?
I found in this document test if document exists that testing with find and limit is faster. But it only compares it to findOne - not to count.
So my question is: what is the fastest way to check if a document exist?
Thanks.
As to your question, have a look at the source code of find/findOne/count
rs0:PRIMARY> db.geo.count
function ( x ){
return this.find( x ).count();
}
rs0:PRIMARY> db.geo.findOne
function ( query , fields, options ){
var cursor = this.find(query, fields, -1 /* limit */, 0 /* skip*/,
0 /* batchSize */, options);
if ( ! cursor.hasNext() )
return null;
var ret = cursor.next();
if ( cursor.hasNext() ) throw "findOne has more than 1 result!";
if ( ret.$err )
throw "error " + tojson( ret );
return ret;
}
rs0:PRIMARY> db.geo.find
function ( query , fields , limit , skip, batchSize, options ){
var cursor = new DBQuery( this._mongo , this._db , this ,
this._fullName , this._massageObject( query ) , fields , limit , skip , batchSize , options || this.getQueryOptions() );
var connObj = this.getMongo();
var readPrefMode = connObj.getReadPrefMode();
if (readPrefMode != null) {
cursor.readPref(readPrefMode, connObj.getReadPrefTagSet());
}
return cursor;
}
The difference is, findOne/count uses something from this.find, while find uses DBQuery.
So I did a benchmark on the 3 ways:
function benchMark1() {
var date = new Date();
for (var i = 0; i < 100000; i++) {
db.zips.find({
"_id": "35004"
}, {
_id: 1
});
}
print(new Date() - date);
}
function benchMark2() {
var date = new Date();
for (var i = 0; i < 100000; i++) {
db.zips.findOne({
"_id": "35004"
}, {
_id: 1
});
}
print(new Date() - date);
}
function benchMark3() {
var date = new Date();
for (var i = 0; i < 100000; i++) {
db.zips.count({
"_id": "35004"
}, {
_id: 1
});
}
print(new Date() - date);
}
It turns out benchMark1 takes 1046ms, 2 takes 37611ms, 3 takes 63306ms.
It seems you are using the worst one.
EDIT: The reason why it's slow is described here: https://dba.stackexchange.com/questions/7573/difference-between-mongodbs-find-and-findone-calls
What else, make sure you have an unique index on the field SID:
rs0:PRIMARY> db.system.indexes.find()
If no index exists on SID,
rs0:PRIMARY> db.session.ensureIndex({SID: 1}, {unique: true}) // change "session" to your collection name
Note that although _id is usually an ObjectId, it doesn't have to be. So you can use the SID as _id. And there's already an index on it so that you can save an index and thus make the insertion faster. To do this, just set the _id field to SID when you insert a record.
{
_id: [value of SID]
... // rest of record
}
And if this still doesn't meat your requirements, you need to try analyse where the bottleneck is. That's another topic we can talk about if necessary.

Auto increment in MongoDB to store sequence of Unique User ID

I am making a analytics system, the API call would provide a Unique User ID, but it's not in sequence and too sparse.
I need to give each Unique User ID an auto increment id to mark a analytics datapoint in a bitarray/bitset. So the first user encounters would corresponding to the first bit of the bitarray, second user would be the second bit in the bitarray, etc.
So is there a solid and fast way to generate incremental Unique User IDs in MongoDB?
As selected answer says you can use findAndModify to generate sequential IDs.
But I strongly disagree with opinion that you should not do that. It all depends on your business needs. Having 12-byte ID may be very resource consuming and cause significant scalability issues in future.
I have detailed answer here.
You can, but you should not
https://web.archive.org/web/20151009224806/http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/
Each object in mongo already has an id, and they are sortable in insertion order. What is wrong with getting collection of user objects, iterating over it and use this as incremented ID? Er go for kind of map-reduce job entirely
I know this is an old question, but I shall post my answer for posterity...
It depends on the system that you are building and the particular business rules in place.
I am building a moderate to large scale CRM in MongoDb, C# (Backend API), and Angular (Frontend web app) and found ObjectId utterly terrible for use in Angular Routing for selecting particular entities. Same with API Controller routing.
The suggestion above worked perfectly for my project.
db.contacts.insert({
"id":db.contacts.find().Count()+1,
"name":"John Doe",
"emails":[
"john#doe.com",
"john.doe#business.com"
],
"phone":"555111322",
"status":"Active"
});
The reason it is perfect for my case, but not all cases is that as the above comment states, if you delete 3 records from the collection, you will get collisions.
My business rules state that due to our in house SLA's, we are not allowed to delete correspondence data or clients records for longer than the potential lifespan of the application I'm writing, and therefor, I simply mark records with an enum "Status" which is either "Active" or "Deleted". You can delete something from the UI, and it will say "Contact has been deleted" but all the application has done is change the status of the contact to "Deleted" and when the app calls the respository for a list of contacts, I filter out deleted records before pushing the data to the client app.
Therefore, db.collection.find().count() + 1 is a perfect solution for me...
It won't work for everyone, but if you will not be deleting data, it works fine.
Edit
latest versions of pymongo:
db.contacts.count() + 1
First Record should be add
"_id" = 1 in your db
$database = "demo";
$collections ="democollaction";
echo getnextid($database,$collections);
function getnextid($database,$collections){
$m = new MongoClient();
$db = $m->selectDB($database);
$cursor = $collection->find()->sort(array("_id" => -1))->limit(1);
$array = iterator_to_array($cursor);
foreach($array as $value){
return $value["_id"] + 1;
}
}
I had a similar issue, namely I was interested in generating unique numbers, which can be used as identifiers, but doesn't have to. I came up with the following solution. First to initialize the collection:
fun create(mongo: MongoTemplate) {
mongo.db.getCollection("sequence")
.insertOne(Document(mapOf("_id" to "globalCounter", "sequenceValue" to 0L)))
}
An then a service that return unique (and ascending) numbers:
#Service
class IdCounter(val mongoTemplate: MongoTemplate) {
companion object {
const val collection = "sequence"
}
private val idField = "_id"
private val idValue = "globalCounter"
private val sequence = "sequenceValue"
fun nextValue(): Long {
val filter = Document(mapOf(idField to idValue))
val update = Document("\$inc", Document(mapOf(sequence to 1)))
val updated: Document = mongoTemplate.db.getCollection(collection).findOneAndUpdate(filter, update)!!
return updated[sequence] as Long
}
}
I believe that id doesn't have the weaknesses related to concurrent environment that some of the other solutions may suffer from.
// await collection.insertOne({ autoIncrementId: 1 });
const { value: { autoIncrementId } } = await collection.findOneAndUpdate(
{ autoIncrementId: { $exists: true } },
{
$inc: { autoIncrementId: 1 },
},
);
return collection.insertOne({ id: autoIncrementId, ...data });
I used something like nested queries in MySQL to simulate auto increment, which worked for me. To get the latest id and increment one to it you can use:
lastContact = db.contacts.find().sort({$natural:-1}).limit(1)[0];
db.contacts.insert({
"id":lastContact ?lastContact ["id"] + 1 : 1,
"name":"John Doe",
"emails": ["john#doe.com", "john.doe#business.com"],
"phone":"555111322",
"status":"Active"
})
It solves the removal issue of Alex's answer. So no duplicate id will appear if any record is removed.
More explanation: I just get the id of the latest inserted document, add one to it, and then set it as the id of the new record. And ternary is for cases that we don't have any records yet or all of the records are removed.
this could be another approach
const mongoose = require("mongoose");
const contractSchema = mongoose.Schema(
{
account: {
type: mongoose.Schema.Types.ObjectId,
required: true,
},
idContract: {
type: Number,
default: 0,
},
},
{ timestamps: true }
);
contractSchema.pre("save", function (next) {
var docs = this;
mongoose
.model("contract", contractSchema)
.countDocuments({ account: docs.account }, function (error, counter) {
if (error) return next(error);
docs.idContract = counter + 1;
next();
});
});
module.exports = mongoose.model("contract", contractSchema);
// First check the table length
const data = await table.find()
if(data.length === 0){
const id = 1
// then post your query along with your id
}
else{
// find last item and then its id
const length = data.length
const lastItem = data[length-1]
const lastItemId = lastItem.id // or { id } = lastItem
const id = lastItemId + 1
// now apply new id to your new item
// even if you delete any item from middle also this work
}