Best way to visualize clusters

Best way to visualize clusters - visualization

Not sure if this is the right exchange to post this question but please feel free to redirect me.
I have a bunch of clusters each with a title and a bunch of items associated with it. What is the best/easiet way to visualize it?
[{
title: "Title1",
items: [ "a", "b", "c"]
},
{
title: "Title2",
items: [ "c", "d", "e"]
},
...]

My go-to way to visualize data is GraphViz. It's pretty simple to learn, and can generate some nice graphs. They aren't pretty, but they are very functional.
For example, see here:
http://www.ibm.com/developerworks/library/l-graphvis/

Related

How to filter documents by having a list of words which should not be inside of them in MongoDB Atlas Search?

For example I have the array ["java", "perl", "scrum"] and the following documents:
{
id: 1,
title: "Java Software Developer"
},
{
id: 2,
title: "Senior Software Engineer"
},
{
id: 3,
title: "Perl developer familiar with SCRUM methodology"
}
What I'd like to do in an atlas search aggregation pipeline is that to filter out the documents in which the title contains one of the words in the array. I've tried negating a $phrase, and also using Regex, but neither worked. Is there an elegant way to handle this situation, and if yes what would it be?
EDIT: After the aggregation only document with id 2 would be returned.

You could use the $nin (not in) operator with some regex inside. However, this type of query will not be very efficient.
Example
db.collection.find({
title: { "$nin": [/java/i, /perl/i, /scrum/i]}
})

Adding refs to another mongodb collection by matching multiple criteria

I'm totally new to MongoDB and can't sort this out from the docs. I have two collections, TREES and PLOTS. Every tree is in a plot; every plot has, say, 4-150 trees. So far the collections look something like this:
// TREES
[{"_id": objId(), "Tree": "1", "Project": "Alpha", "Plot": "A", "Year": 1979, "Size": 20},
{"_id": objId(), "Tree": "1", "Project": "Alpha", "Plot": "A", "Year": 1986, "Size": 21},
...
{"_id": objId(), "Tree": "54", "Project": "Omega", "Plot": "Z", "Year": 2016, "Size": 17}]
// PLOTS
[{"_id": objId(), "Plot": "A", "Project": "Alpha", "Year": 1979},
{"_id": objId(), "Plot": "A", "Project": "Alpha", "Year": 1986},
...
{"_id": objId(), "Plot": "Z", "Project": "Omega", "Year": 2016}]
I want to add a reference field to all the Trees with the objId of the appropriate Plot document, matching on Project, Plot and Year. I'd also like to add a refs array to all the Plots to contain the objIds with all of each one's Trees [Edit: Although maybe that's really not necessary?]. The real schemas both have 30-40 fields so embedding would be mad. The application development will most likely be done in pymongo, if there's anything relevant there.
To Clarify:
My problem is in matching trees to plots on the three criteria -- it seems like $lookup is no use here and I've tried $unionWith but can't figure it out. The docs and tutorials are full of toy problems where you add inter-collection references by matching on one field... and I can't figure out how to generalize that. Best result has been from doing
db.TREES.aggregate([
{
'$lookup': {
'from': 'PLOTS',
'localField': 'Plot',
'foreignField': 'Plot',
'as': 'TooManyPlots'
}
}
])
That gives me an array of all Plots with the right name -- but from all Years and all Projects, and I can't figure out how to weed that out and arrive at an updated Tree document with just the one correct Plot.
I haven't yet developed the Mongo-vision to see the proper flow for this.
Could be I'm also having some XY trouble -- plus it could be that MongoDB isn't the best fit for our project anyway. It seems worth a try though.

Okay -- no great mystery to it except for understanding how to understand it. In hopes of giving someone else a hand here's how I got it to work, closely matching the doc reference #D.SM noted in the comments, and my thanks to them.
db.TREES.aggregate(
[{$lookup: {
from: 'Plots',
let: { tPlot: '$Plot', tProj: '$Project', tYear: '$Year'},
pipeline: [
{ $match:
{ $expr:
{ $and:
[{$eq: ['$Plot', '$$tPlot']},
{$eq: ['$Project', '$$tProj']},
{$eq: ['$Year', '$$tYear']}]
}
}
},
{ $project: { _id:1} }
],
as: 'Plot_id_A'
} // That does what I wanted; from here it's just tidying things up
}, {$unwind: {
path: '$Plot_id_A',
}}, {$set: {
Plot_id: '$Plot_id_A._id'
}}, {$project: {
Plot_id_A : 0
}}]
Note that in the let: line the right-hand names are all coming from the TREES collection we're working in, while in the $and: the same-named left-hand names are all coming from the PLOTS collection we're joining, and so for the loss of clarity there I acknowlege D.SM's beef with my example field names. Anyway it works.
If anyone has an improvement to suggest though... lemme know!

Flatten array of array of dictionaries, taking only values

I have this array structure:
[
[
keyA: value1,
keyB: value2
],
[
keyA: value3,
keyB: value4
],
[
keyA: value5,
keyB: value6
]
]
And what I want to achieve is flattening the array into a single dictionary like:
[value1: value2, value3: value4]
Probably to achieve this is using merge twice?
I have tried with:
arrayToFlatten.reduce([:]) { $0.merging($1) { (current, _) in current } }
but I did not get the expected result:
[
[keyA: value1],
[keyB: value2]
]
This structure is sourced from a plist file:

If I understand this question correctly, I think this is what you're looking for
let source = [
[
"code": "DZ",
"name": "ALGERIA",
],
[
"code": "AS",
"name": "AMERICAN SAMOA",
],
[
"code": "AO",
"name": "ANGOLA",
],
]
let result = Dictionary(uniqueKeysWithValues: source.lazy.map { dict in
return (key: dict["code"]!, value: dict["name"]!)
})
print(result) // => ["AO": "ANGOLA", "AS": "AMERICAN SAMOA", "DZ": "ALGERIA"]
Dictionary merging wasn't the right tool for the job. It would take the dict, merge it with the second, which has the same keys. According to the closure you gave it, when two keys clash, it should take the old value and not the one that's attempting to be merged in. So merging the second dict had no effect. Similarly, the third didn't have any effect, either. You should read the documentation.
I hope today you've learned the importance of a minimal, reproducible example and a clear question. If you had just said "here's my source data, here's the expected output," then a question like this could have been answered in seconds.

Multi-language attributes in MongoDB

I'm trying to design a schema paradigm in MongoDB which would support multilingual values for variable attributes in documents.
For example, I would have a product catalog where each product may require storing its name, title or any other attribute in various languages.
This same paradigm should probably hold for other locale-specific properties, such as price/currency variations
I've been considering a key-value approach where key is the language code and value is the corresponding value:
{
sku: "1011",
name: { "en": "cheese", "de": "Käse", "es": "queso", etc... },
price: { "usd": 30.95, "eur": 20, "aud": 40, etc... }
}
The problem is I believe this would deny me of using indices on multilingual fields.
Eventually, I'd like a generic, yet intuitive, index-able design.
Any suggestion would be appreciated, thanks.

Wholesale recommendations over your schema design may be a bit broad a topic for discussion here. I can however suggest that you consider putting the elements you are showing into an Array of sub-documents, rather than the singular sub-document with fields for each item.
{
sku: "1011",
name: [{ "en": "cheese" }, {"de": "Käse"}, {"es": "queso"}, etc... ],
price: [{ "usd": 30.95 }, { "eur": 20 }, { "aud": 40 }, etc... ]
}
The main reason for this is consideration for access paths to your elements which should make things easier to query. This I went through in some detail here which may be worth your reading.
It could also be a possibility to expand on this for something like your name field:
name: [
{ "lang": "en", "value": "cheese" },
{ "lang": "de", "value: "Käse" },
{ "lang": "es", "value": "queso" },
etc...
]
All would depend on your indexing and access requirements. It all really depends on what exactly your application needs, and the beauty of MongoDB is that it allows you to structure your documents to your needs.
P.S As to anything where you are storing Money values, I suggest you do some reading and start maybe with this post here:
MongoDB - What about Decimal type of value?

MongoDB/NoSQL: Keeping Document Change History

A fairly common requirement in database applications is to track changes to one or more specific entities in a database. I've heard this called row versioning, a log table or a history table (I'm sure there are other names for it). There are a number of ways to approach it in an RDBMS--you can write all changes from all source tables to a single table (more of a log) or have a separate history table for each source table. You also have the option to either manage the logging in application code or via database triggers.
I'm trying to think through what a solution to the same problem would look like in a NoSQL/document database (specifically MongoDB), and how it would be solved in a uniform way. Would it be as simple as creating version numbers for documents, and never overwriting them? Creating separate collections for "real" vs. "logged" documents? How would this affect querying and performance?
Anyway, is this a common scenario with NoSQL databases, and if so, is there a common solution?

Good question, I was looking into this myself as well.
Create a new version on each change
I came across the Versioning module of the Mongoid driver for Ruby. I haven't used it myself, but from what I could find, it adds a version number to each document. Older versions are embedded in the document itself. The major drawback is that the entire document is duplicated on each change, which will result in a lot of duplicate content being stored when you're dealing with large documents. This approach is fine though when you're dealing with small-sized documents and/or don't update documents very often.
Only store changes in a new version
Another approach would be to store only the changed fields in a new version. Then you can 'flatten' your history to reconstruct any version of the document. This is rather complex though, as you need to track changes in your model and store updates and deletes in a way that your application can reconstruct the up-to-date document. This might be tricky, as you're dealing with structured documents rather than flat SQL tables.
Store changes within the document
Each field can also have an individual history. Reconstructing documents to a given version is much easier this way. In your application you don't have to explicitly track changes, but just create a new version of the property when you change its value. A document could look something like this:
{
_id: "4c6b9456f61f000000007ba6"
title: [
{ version: 1, value: "Hello world" },
{ version: 6, value: "Foo" }
],
body: [
{ version: 1, value: "Is this thing on?" },
{ version: 2, value: "What should I write?" },
{ version: 6, value: "This is the new body" }
],
tags: [
{ version: 1, value: [ "test", "trivial" ] },
{ version: 6, value: [ "foo", "test" ] }
],
comments: [
{
author: "joe", // Unversioned field
body: [
{ version: 3, value: "Something cool" }
]
},
{
author: "xxx",
body: [
{ version: 4, value: "Spam" },
{ version: 5, deleted: true }
]
},
{
author: "jim",
body: [
{ version: 7, value: "Not bad" },
{ version: 8, value: "Not bad at all" }
]
}
]
}
Marking part of the document as deleted in a version is still somewhat awkward though. You could introduce a state field for parts that can be deleted/restored from your application:
{
author: "xxx",
body: [
{ version: 4, value: "Spam" }
],
state: [
{ version: 4, deleted: false },
{ version: 5, deleted: true }
]
}
With each of these approaches you can store an up-to-date and flattened version in one collection and the history data in a separate collection. This should improve query times if you're only interested in the latest version of a document. But when you need both the latest version and historical data, you'll need to perform two queries, rather than one. So the choice of using a single collection vs. two separate collections should depend on how often your application needs the historical versions.
Most of this answer is just a brain dump of my thoughts, I haven't actually tried any of this yet. Looking back on it, the first option is probably the easiest and best solution, unless the overhead of duplicate data is very significant for your application. The second option is quite complex and probably isn't worth the effort. The third option is basically an optimization of option two and should be easier to implement, but probably isn't worth the implementation effort unless you really can't go with option one.
Looking forward to feedback on this, and other people's solutions to the problem :)

Why not a variation on Store changes within the document ?
Instead of storing versions against each key pair, the current key pairs in the document always represents the most recent state and a 'log' of changes is stored within a history array. Only those keys which have changed since creation will have an entry in the log.
{
_id: "4c6b9456f61f000000007ba6"
title: "Bar",
body: "Is this thing on?",
tags: [ "test", "trivial" ],
comments: [
{ key: 1, author: "joe", body: "Something cool" },
{ key: 2, author: "xxx", body: "Spam", deleted: true },
{ key: 3, author: "jim", body: "Not bad at all" }
],
history: [
{
who: "joe",
when: 20160101,
what: { title: "Foo", body: "What should I write?" }
},
{
who: "jim",
when: 20160105,
what: { tags: ["test", "test2"], comments: { key: 3, body: "Not baaad at all" }
}
]
}

We have partially implemented this on our site and we use the 'Store Revisions in a separate document" (and separate database). We wrote a custom function to return the diffs and we store that. Not so hard and can allow for automated recovery.

One can have a current NoSQL database and a historical NoSQL database. There will be a an nightly ETL ran everyday. This ETL will record every value with a timestamp, so instead of values it will always be tuples (versioned fields). It will only record a new value if there was a change made on the current value, saving space in the process. For example, this historical NoSQL database json file can look like so:
{
_id: "4c6b9456f61f000000007ba6"
title: [
{ date: 20160101, value: "Hello world" },
{ date: 20160202, value: "Foo" }
],
body: [
{ date: 20160101, value: "Is this thing on?" },
{ date: 20160102, value: "What should I write?" },
{ date: 20160202, value: "This is the new body" }
],
tags: [
{ date: 20160101, value: [ "test", "trivial" ] },
{ date: 20160102, value: [ "foo", "test" ] }
],
comments: [
{
author: "joe", // Unversioned field
body: [
{ date: 20160301, value: "Something cool" }
]
},
{
author: "xxx",
body: [
{ date: 20160101, value: "Spam" },
{ date: 20160102, deleted: true }
]
},
{
author: "jim",
body: [
{ date: 20160101, value: "Not bad" },
{ date: 20160102, value: "Not bad at all" }
]
}
]
}

For users of Python (python 3+, and up of course) , there's HistoricalCollection that's an extension of pymongo's Collection object.
Example from the docs:
from historical_collection.historical import HistoricalCollection
from pymongo import MongoClient
class Users(HistoricalCollection):
PK_FIELDS = ['username', ] # <<= This is the only requirement
# ...
users = Users(database=db)
users.patch_one({"username": "darth_later", "email": "darthlater#example.com"})
users.patch_one({"username": "darth_later", "email": "darthlater#example.com", "laser_sword_color": "red"})
list(users.revisions({"username": "darth_later"}))
# [{'_id': ObjectId('5d98c3385d8edadaf0bb845b'),
# 'username': 'darth_later',
# 'email': 'darthlater#example.com',
# '_revision_metadata': None},
# {'_id': ObjectId('5d98c3385d8edadaf0bb845b'),
# 'username': 'darth_later',
# 'email': 'darthlater#example.com',
# '_revision_metadata': None,
# 'laser_sword_color': 'red'}]
Full disclosure, I am the package author. :)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Best way to visualize clusters - visualization

My go-to way to visualize data is GraphViz. It's pretty simple to learn, and can generate some nice graphs. They aren't pretty, but they are very functional. For example, see here: http://www.ibm.com/developerworks/library/l-graphvis/

Related

How to filter documents by having a list of words which should not be inside of them in MongoDB Atlas Search?

Adding refs to another mongodb collection by matching multiple criteria

Flatten array of array of dictionaries, taking only values

Multi-language attributes in MongoDB

MongoDB/NoSQL: Keeping Document Change History

Categories

Resources