SpringBoot MongoDB: how to find the CLOSEST POSITIVE DIFF document? - mongodb

Considering a collection with the following documents:
[{
"_id" : ObjectId("5f984d04093a1a0be1237db2"),
"postalCode" : "",
"latitude" : -33.4939994812012,
"longitude" : 143.210403442383,
"accuracyRadius" : 1000,
"network" : "1.0.0.0/24",
"geoNameID" : 2077456,
"registeredCountryGeoNameID" : 2077456,
"anonymousProxy" : false,
"satelliteProvider" : false,
"_class" : "project.domain.maxmind.block.GeoLite2CityBlockIPv4"
},
{
"_id" : ObjectId("5f984d04093a1a0be1237db3"),
"postalCode" : "",
"latitude" : 34.7724990844727,
"longitude" : 113.726600646973,
"accuracyRadius" : 50,
"network" : "1.0.1.0/24",
"geoNameID" : 1814991,
"registeredCountryGeoNameID" : 1814991,
"anonymousProxy" : false,
"satelliteProvider" : false,
"_class" : "project.domain.maxmind.block.GeoLite2CityBlockIPv4"
},
{
"_id" : ObjectId("5f984d04093a1a0be1237db4"),
"postalCode" : "",
"latitude" : 34.7724990844727,
"longitude" : 113.726600646973,
"accuracyRadius" : 50,
"network" : "1.0.2.0/23",
"geoNameID" : 1814991,
"registeredCountryGeoNameID" : 1814991,
"anonymousProxy" : false,
"satelliteProvider" : false,
"_class" : "project.domain.maxmind.block.GeoLite2CityBlockIPv4"
}]
How can I write an efficient query for the closest network, starting from an IP address (string) as the input?
E.g. considering the network 1.0.1.0/24
and the IPv4 1.0.1.200
I should convert the IPv4 to its corresponding integer value
1.0.1.200 --> 16777672
and do the same for each network
network 1.0.0.0/24 --> 16777216
host min 1.0.0.1 --> 16777217
host max 1.0.0.254 --> 16777470
network 1.0.1.0/24 --> 16777472
host min 1.0.1.1 --> 16777473
host max 1.0.1.254 --> 16777726
network 1.0.2.0/23 --> 16777728
host min 1.0.2.1 --> 16777729
host max 1.0.3.254 --> 16778238
Then I would need to do a diff among the IP and the network to find the closest positive one:
16777672 - 16777216 = 456
16777672 - 16777472 = 200
16777672 - 16777728 = -56
How could I write such a query, considering that both the IPs and networks are Strings?
I could do just the conversion of the IP to its integer value, before passing it to the #Repository, but then I would need to convert the network addresses inside the #Query.
#Repository
public interface GeoLite2CityBlockIPv4Repository extends ReactiveMongoRepository<GeoLite2CityBlockIPv4, BigInteger>, ReactiveQuerydslPredicateExecutor<GeoLite2CityBlockIPv4> {
Mono<GeoLite2CityBlockIPv4> findOneByNetwork(String network);
#Query("{?}")
Mono<GeoLite2CityBlockIPv4> findOneByIP(Integer ip);
}

Related

Mongo greater than comparison returns 0

I am trying to count all car dealers who's total fleet units is over 1000. This is the code I wrote to do it, however, it returns 0 and I know for a fact there are quite a few records in this data set that are over 1000.
db.Car_Dealership.find({Totalfleetunits : {$gte: 1000} }).count()
This is a sample of what's in my database, both records have total fleets over 1000. Any ideas why it returns 0?
"_id" : ObjectId("5a203ab0b9574375830354d4"),
"2016rank" : 6,
"Dealershipgroupname" : "Hendrick Automotive Group",
"Address" : "6000 Monroe Road",
"City/State/Zip" : "Charlotte, NC 28212",
"Phone" : "(704) 568-5550",
"Companywebsite" : "www.hendrickauto.com",
"Topexecutive" : "Rick Hendrick",
"Topexecutivetitle" : "chairman",
"Totalnewretailunits" : "117,946",
"Totalusedunits" : "88,458",
"Totalfleetunits" : "4,646",
"Totalwholesaleunits" : "56,569",
"Total_units" : "267,619",
"Total_number_of _dealerships" : 103,
"Grouprevenuealldepartments*" : "$8,551,253,132",
"2015rank" : 6
}
{
"_id" : ObjectId("5a203ab0b9574375830354d5"),
"2016rank" : 5,
"Dealershipgroupname" : "Sonic Automotive Inc.?",
"Address" : "4401 Colwick Road",
"City/State/Zip" : "Charlotte, NC 28211",
"Phone" : "(704) 566-2400",
"Companywebsite" : "www.sonicautomotive.com",
"Topexecutive" : "B. Scott Smith",
"Topexecutivetitle" : "CEO",
"Totalnewretailunits" : "134,288",
"Totalusedunits" : "119,174",
"Totalfleetunits" : "1,715",
"Totalwholesaleunits" : "35,098",
"Total_units" : "290,275",
"Total_number_of _dealerships" : 112,
"Grouprevenuealldepartments*" : "$9,731,778,000",
"2015rank" : 4
That happens because the value of Totalfleetunits is stringType.
Now to solve your problem you have to options.
option 1:
You can change your schema for Totalfleetunits to the type of Number and change all the documents Totalfleetvalues value from string to a number. Like,
"Totalfleetunits": "4,646" needs to be changed with "Totalfleetunits"
: "4646"
option 2:
You can use javascript in your query to first remove , from your value then check the Totalfleetunits value for greater than or equal to ( >= ). Only need to change a single line of code as I given below.
db.Car_Dealership.find("this.Totalfleetunits.replace(',','') >= 1000").count()

MongoDB Aggregate count based on multiple collections

How to get tag (from iptag) total based on host (from services) total ? Ex: If a host match tagAAA filter condition, total += 1.
hostmin <= ip2int(host) <= hostmax
collection services:
db.getCollection('services').insert({"host" : "172.21.4.205", "port" : 80, "service" : "http"})
db.getCollection('services').insert({"host" : "172.21.4.97", "port" : 445, "service" : "smb"})
collection iptag:
db.getCollection('iptag').insert({
"tag" : [
"tagAAA"
],
"hostmin" : 2886991873.0,
"hostmax" : 2887057406.0,
"cidr" : "172.20.0.0/16"
}
db.getCollection('iptag').insert({
"tag" : [
"tagBBB"
],
"hostmin" : 2887057409.0,
"hostmax" : 2887122942.0,
"cidr" : "172.21.0.0/16"
}

How to find something from an array in mongo

{
"_id" : ObjectId("586aac4c8231ee0b98458045"),
"store_code" : NumberInt(10800),
"counter_name" : "R.N.Electric",
"address" : "314 khatipura road",
"locality" : "Khatipura Road (Jhotwara)",
"pincode" : NumberInt(302012),
"town" : "JAIPUR",
"gtm_city" : "JAIPUR",
"sales_office" : "URAJ",
"owner_name" : "Rajeev",
"owner_mobile" : "9828024073",
"division_mapping" : [//this contains only 1 element in every doc
{
"dvcode" : "cfc",
"dc" : "trade",
"beatcode" : "govindpura",
"fos" : {
"_id" : ObjectId("586ab8318231ee0b98458843"),
"loginid" : "9928483483",
"name" : "Arpit Gupta",
"division" : [
"cfc",
"iron"
],
"sales_office" : "URAJ", //office
"gtm_city" : "JAIPUR" //city
},
"beat" : {
"_id" : ObjectId("586d372b39f64316b9c3cbd7"),
"division" : {
"_id" : ObjectId("5869f8b639f6430fe4edee2a"),
"clientdvcode" : NumberInt(40),
"code" : "cfc",
"name" : "Cooking & Fabric Care",
"project_code" : "usha-fos",
"client_code" : "usha",
"agent_code" : "v5global"
},
"beatcode" : "govindpura",
"sales_office" : "URAJ",
"gtm_city" : "JAIPUR",
"active" : true,
"agency_code" : "v5global",
"client_code" : "USHA_FOS",
"proj_code" : "usha-fos",
"fos" : {
"_id" : ObjectId("586ab8318231ee0b98458843"),
"loginid" : "9928483483",
"name" : "Arpit Gupta",
"division" : [
"cfc",
"iron"
],
"sales_office" : "URAJ",
"gtm_city" : "JAIPUR"
}
}
}
],
"distributor_mail" : "sunil.todi#yahoo.in",
"project_code" : "usha-fos",
"client_code" : "usha",
"agent_code" : "v5global",
"distributor_name" : "Sundeep Electrical"
}
I am having only 1 element in division_mapping's array and I want to find those documents whose dc in division_mapping is trade.
I have tried following:
"division_mapping":{$elemMatch:{$eq:{"dc":"trade"}}}})
Dont know what I am doing wrong.
//Maybe I have to unwind the array but is there any other way?
According to MongoDB documentation
The $elemMatch operator matches documents that contain an array
field with at least one element that matches all the specified query
criteria.
According to above mentioned description to retrieve only documents whose dc in division_mapping is trade please try executing below mentioned query
db.collection.find({division_mapping:{$elemMatch:{dc:'trade'}}})

using 2 different result sets in mongodb

I'm using groovy with mongodb. I have a result set but need a value from a different grouping of documents. How do I pull that value into the result set I need?
MAIN:Network data
"resource_metadata" : {
"name" : "tapd2e75adf-71",
"parameters" : { },
"fref" : null,
"instance_id" : "9f170531-79d0-48ee-b0f7-9bd2788b1cc5"}
I need the display_name for the network data result set which is contained in the compute data.
CPU data
"resource_id" : "9f170531-79d0-48ee-b0f7-9bd2788b1cc5",
"resource_metadata" : {
"ramdisk_id" : "",
"display_name" : "testinstance0001"}
You can see the resource_id and the Instance_id are the same values. I know there is no relationship I can do but trying to reach to see if anyone has come across this. I'm using the table model to retrieve data for reporting. Hashtable has been suggested to me but I'm not seeing that working. Somehow in the hasNext I need to include the display_name value. in the networking data so GUID number doesn't only valid name shows from compute data.
def docs = meter.find(query).sort(sort).limit(50)\
while (docs.hasNext()) { def doc = docs.next()\
model.addRow([ doc.get("counter_name"),doc.get("counter_volume"),doc.get("timestamp"),\
doc.get("resource_metadata").getString("mac"),\
doc.get("resource_metadata").getString("instance_id"),\
doc.get("counter_unit")]
as Object[]);}
Full document:
1st set where I need the network data measure with no name only id {resource_metadata.instance_id}
{
"_id" : ObjectId("528812f8be09a32281e137d0"),
"counter_name" : "network.outgoing.packets",
"user_id" : "4d4e43ec79c5497491b23b13644c2a3b",
"timestamp" : ISODate("2013-11-17T00:51:00Z"),
"resource_metadata" : {
"name" : "tap6baab24e-8f",
"parameters" : { },
"fref" : null,
"instance_id" : "a8727a1d-4661-4565-9c0a-511279024a97",
"instance_type" : "50",
"mac" : "fa:16:3e:a3:bf:fc"
},
"source" : "openstack",
"counter_unit" : "packet",
"counter_volume" : 4611911,
"project_id" : "97dc4ca962b040608e7e707dd03f2574",
"message_id" : "54039238-4f22-11e3-8e68-e4115b99a59d",
"counter_type" : "cumulative"
}
2nd set where I want to grab the name as I get the values {resource_id}:
"_id" : ObjectId("5287bc3ebe09a32281dd2594"),
"counter_name" : "cpu",
"user_id" : "4d4e43ec79c5497491b23b13644c2a3b",
"message_signature" :
"timestamp" : ISODate("2013-11-16T18:40:58Z"),
"resource_id" : "a8727a1d-4661-4565-9c0a-511279024a97",
"resource_metadata" : {
"ramdisk_id" : "",
"display_name" : "vmsapng01",
"name" : "instance-000014d4",
"disk_gb" : "",
"availability_zone" : "",
"kernel_id" : "",
"ephemeral_gb" : "",
"host" : "3746d148a76f4e1a8203d7e2378ef48ccad8a714a47e7481ab37bcb6",
"memory_mb" : "",
"instance_type" : "50",
"vcpus" : "",
"root_gb" : "",
"image_ref" : "869be2c0-9480-4239-97ad-df383c6d09bf",
"architecture" : "",
"os_type" : "",
"reservation_id" : ""
},
"source" : "openstack",
"counter_unit" : "ns",
"counter_volume" : NumberLong("724574640000000"),
"project_id" : "97dc4ca962b040608e7e707dd03f2574",
"message_id" : "a240fa5a-4eee-11e3-8e68-e4115b99a59d",
"counter_type" : "cumulative"
}
This is another collection that contains the same value but just thought it would be easier to grab from same collection:
"_id" : "a8727a1d-4661-4565-9c0a-511279024a97",
"metadata" : {
"ramdisk_id" : "",
"display_name" : "vmsapng01",
"name" : "instance-000014d4",
"disk_gb" : "",
"availability_zone" : "",
"kernel_id" : "",
"ephemeral_gb" : "",
"host" : "3746d148a76f4e1a8203d7e2378ef48ccad8a714a47e7481ab37bcb6",
"memory_mb" : "",
"instance_type" : "50",
"vcpus" : "",
"root_gb" : "",
"image_ref" : "869be2c0-9480-4239-97ad-df383c6d09bf",
"architecture" : "",
"os_type" : "",
"reservation_id" : "",
}
Mike
It looks like these data are in 2 different collections, is this correct?
Would you be able to query CPU data for each "instance_id" ("resource_id")?
Or if this would cause too many queries to the database (looks like you limit to 50...) you could use $in with the list of all "Instance_id"s
http://docs.mongodb.org/manual/reference/operator/query/in/
Either way, you will need to query each collection separately.

Migrating from MongoDB to HBase

Hi I am very new to HBase database. I downloaded some twitter data and stored into MongoDB. Now I need to transform that data into HBase to speed-up Hadoop processing. But I am not able to create it's scheme. Here I have twitter data into JSON format-
{
"_id" : ObjectId("512b71e6e4b02a4322d1c0b0"),
"id" : NumberLong("306044618179506176"),
"source" : "Facebook",
"user" : {
"name" : "Dada Bhagwan",
"location" : "India",
"url" : "http://www.dadabhagwan.org",
"id" : 191724440,
"protected" : false,
"timeZone" : null,
"description" : "Founder of Akram Vignan - Practical Spiritual Science of Self Realization",
"screenName" : "dadabhagwan",
"geoEnabled" : false,
"profileImageURL" : "http://a0.twimg.com/profile_images/1647956820/M_DSC_0034_normal.jpg",
"biggerProfileImageURL" : "http://a0.twimg.com/profile_images/1647956820/M_DSC_0034_bigger.jpg",
"profileImageUrlHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034_normal.jpg",
"profileImageURLHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034_normal.jpg",
"biggerProfileImageURLHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034_bigger.jpg",
"miniProfileImageURLHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034_mini.jpg",
"originalProfileImageURLHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034.jpg",
"followersCount" : 499,
"profileBackgroundColor" : "EEE4C1",
"profileTextColor" : "333333",
"profileLinkColor" : "990000",
"lang" : "en",
"profileSidebarFillColor" : "FCF9EC",
"profileSidebarBorderColor" : "CBC09A",
"profileUseBackgroundImage" : true,
"showAllInlineMedia" : false,
"friendsCount" : 1,
"favouritesCount" : 0,
"profileBackgroundImageUrl" : "http://a0.twimg.com/profile_background_images/396759326/dadabhagwan-twitter.jpg",
"profileBackgroundImageURL" : "http://a0.twimg.com/profile_background_images/396759326/dadabhagwan-twitter.jpg",
"profileBackgroundImageUrlHttps" : "https://si0.twimg.com/profile_background_images/396759326/dadabhagwan-twitter.jpg",
"profileBannerURL" : null,
"profileBannerRetinaURL" : null,
"profileBannerIPadURL" : null,
"profileBannerIPadRetinaURL" : null,
"miniProfileImageURL" : "http://a0.twimg.com/profile_images/1647956820/M_DSC_0034_mini.jpg",
"originalProfileImageURL" : "http://a0.twimg.com/profile_images/1647956820/M_DSC_0034.jpg",
"utcOffset" : -1,
"contributorsEnabled" : false,
"status" : null,
"createdAt" : NumberLong("1284700143000"),
"profileBannerMobileURL" : null,
"profileBannerMobileRetinaURL" : null,
"profileBackgroundTiled" : false,
"statusesCount" : 1713,
"verified" : false,
"translator" : false,
"listedCount" : 6,
"followRequestSent" : false,
"descriptionURLEntities" : [ ],
"urlentity" : {
"url" : "http://www.dadabhagwan.org",
"start" : 0,
"end" : 26,
"expandedURL" : "http://www.dadabhagwan.org",
"displayURL" : "http://www.dadabhagwan.org"
},
"rateLimitStatus" : null,
"accessLevel" : 0
},
"contributors" : [ ],
"geoLocation" : null,
"place" : null,
"favorited" : false,
"retweet" : false,
"retweetedStatus" : null,
"retweetCount" : 0,
"userMentionEntities" : [ ],
"retweetedByMe" : false,
"currentUserRetweetId" : -1,
"possiblySensitive" : false,
"urlentities" : [
{
"url" : "http://t.co/gR1GohGjaj",
"start" : 113,
"end" : 135,
"expandedURL" : "http://fb.me/2j2HKHJrM",
"displayURL" : "fb.me/2j2HKHJrM"
}
],
"hashtagEntities" : [ ],
"mediaEntities" : [ ],
"truncated" : false,
"inReplyToStatusId" : -1,
"text" : "Spiritual Quote of the Day :\n\n‘I am Chandubhai’ is an illusion itself and from that are \nkarmas charged. When... http://t.co/gR1GohGjaj",
"inReplyToUserId" : -1,
"inReplyToScreenName" : null,
"createdAt" : NumberLong("1361801697000"),
"rateLimitStatus" : null,
"accessLevel" : 0
}
Here how to divide data into columns and column-family? I thought to make one "twitter" column-family that contain source, getlocation, place, retweet etc... and another "user" column-family and that contain name, location etc... (user's data). i.e new column family for each inner level sub-document.
Is this approach is correct? Now How I will differentiate urlentity for "user" column-family and "twitter" column-family?
And how to handle those keys that contain list of sub-documents (for e.g. urlentity)
There are many ways to model this in HBase ranging from storing everything in a single column to having a different table for each sub entity with several other tables for "indexing".
Generally speaking you model the data in hbase based on you read and write access patterns. fo r example column family are stored in different files on disk. A reason to divide data into two column families is if there are a lot of cases where you need data from one and not the other. etc.
There's a good presentation about HBAse schema design by Ian Varley from HBaseCon 2012 you can find the slides here and the video here