how to Write a script to read the data from the `MongoDB` database and produce a `JSON` file as the given format - mongodb

I have four collections.
CollegeLocation -- Json format
{
"MDTC": {
"collegeName": "College of Medicine - Tucson",
"1002": 10,
"1001": 234
},
"SCNC": {
"collegeName": "College of Science",
"1002": 24,
"1003": 21,
"1001": 16
},
"LAWC": {
"collegeName": "James E Rogers College of Law",
"1002": 234,
"1003": 213
}
}
LocationEmployeeDetails -- Json format
{
"1002": {
"loc_id": "1002",
"kmapids": [
"palmerjo",
"cwesterl",
"lumbee"
]
},
"1001": {
"loc_id": "1001",
"kmapids": [
"skaib",
"dcorso",
"ghuck",
"macmccallum"
]
},
"1003": {
"loc_id": "1003",
"kmapids": [
"ghuck",
"cwesterl",
"dcorso",
"witte",
"lumbee"
]
}
}
LocationDetails - json
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
32.0,
49.0
]
},
"properties": {
"country": "Ukraine",
"city": "",
"name": "Ukraine",
"state": "",
"loc_id": "1001"
}
},
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
-4.0,
40.0
]
},
"properties": {
"country": "Spain",
"city": "",
"name": "Spain",
"state": "",
"loc_id": "1002"
}
},
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
-73.087749,
41.6032207
]
},
"properties": {
"country": "United States",
"city": "",
"name": "Connecticut",
"state": "Connecticut",
"loc_id": "1003"
}
}
]
}
EmployeeDetails - csv
name kmapId college fullname collegeid
J Palmer palmerjo College of Medicine - Tucson John Palmer MDTC
C Westerland cwesterl College of Social and Behav Sci Chad Westerland SBSC
R Williams lumbee James E Rogers College of Law Robert Williams LAWC
S Kaib skaib College of Medicine - Phoenix Susan Kaib MDPX
M Witte witte College of Medicine - Tucson Marlys Witte MDTC
S Dougherty shonad College of Medicine - Tucson Shona Dougherty MDTC
D Corso dcorso College of Fine Arts Dawn Corso FNRT
G Huckleberry ghuck College of Science Gary Huckleberry SCNC
D Mccallum macmccallum James E Rogers College of Law David Mccallum LAWC
location_details.json: contains locations details with location id loc_id
location_employee_details.json: contains a list of the employees (kmapids) associated with each location.
employee_details.csv: contains employees kmapid, name, and college
colleges_location_values.json: contains college values for each location.
sample_output.json: example of output format
Question 1 - How to Write a script to import location_details.json, location_employee_details.json, employee_details.csv, and colleges_location_values.json to a MongoDB database.
Question 2 - How to Write a script to read the data from the MongoDB database and produce a JSON file as the given format sample_output.json. The code should work for the variable length of data.
Not sure how to
Write a script to import location_details.json, location_employee_details.json, employee_details.csv, and colleges_location_values.json to a MongoDB database.
Write a script to read the data from the MongoDB database and produce a JSON file as the given format sample_output.json. The code should work for the variable length of data.

Related

MongoDB - MongoImport of JSON (jsonl) - Rename, change types and add fields

i'm new to the topic MongoDB and have 4 different problems importing a big (16GB) file (jsonl) into my MongoDB (simple PSA-Cluster).
Below attached you will find a sample entry from the mentiond JSON-Dump.
With this file which i get from an external provider I actually have 4 problems.
"hotel_id" is the key and should normally be (re-)named as "_id"
"hotel_id" should not be treated as string rather than as Number
"location" is not properly formatted (if i understood correctly the MongoDB Manual) as GeoJSON as it should be like
"location": {
"type": "Point",
"coordinates": [-93.26838,37.15845]
}
instead of
"location": {
"coordinates": {
"latitude": 37.15845,
"longitude": -93.26838
}
}
"dates" can this be used to efficiently update just the records which needs to be updated?
So my challenge is now to transform the data according to my needs before importing the data or at time of import, but in both cases of course as quickly as possible.
Therefore i searched a lot for hints and best practices, but i was not able to find a solution yet, maybe due to the fact that i'm a beginner with MongoDB.
I played around with "jq" to adjust the data and for example add the type which seems to be necessary for the location (point 3), but wasn't really successful.
cat dump.jsonl | ./bin/jq --arg typeOfField Point '.location + {type: $typeOfField}'
Beside that i was injecting a sample dump of round-about 500MB which took 1,5 mins when importing it the first time (empty database). If i run it in "upsert" mode it will take round-about 12 hours. So i was also wondering what is the best practice to import such a big JSON-dump?
Any help is appreciated!! :-)
Kind regards,
Lumpy
{
"hotel_id": "12345",
"name": "Test Hotel",
"address": {
"line_1": "123 Test St",
"line_2": "Apt A",
"city": "Test City",
},
"ratings": {
"property": {
"rating": "3.5",
"type": "Star"
},
"guest": {
"count": 48382,
"average": "3.1"
}
},
"location": {
"coordinates": {
"latitude": 22.54845,
"longitude": -90.11838
}
},
"phone": "555-0153",
"fax": "555-7249",
"category": {
"id": 1,
"name": "Hotel"
},
"rank": 42,
"dates": {
"added": "1998-07-19T05:00:00.000Z",
"updated": "2018-03-22T07:23:14.000Z"
},
"statistics": {
"11": {
"id": 11,
"name": "Total number of rooms - 220",
"value": "220"
},
"12": {
"id": 12,
"name": "Number of floors - 7",
"value": "7"
}
},
"chain": {
"id": -2,
"name": "Test Hotels"
},
"brand": {
"id": 2,
"name": "Test Brand"
}
}

How to use embedsMany in laravel to get all attributes? Only foreign key is returning

Hello Good Developers,
I am trying to implement embedsMany relationship of jenssegers/laravel-mongodb
I have two collections:
ProfileSection - {
"_id": "5c865ea4257db43fe4007331",
"general_name": "MY_PROFILE",
"type": "public",
"points": 100,
"status": 1,
"translated": [
{
"con_lang": "US-EN",
"country_code": "US",
"language": "EN",
"text": "My Profile",
"description": "My Profile"
},
...
],
"updated_at": "2019-03-11T13:12:04.000Z",
"created_at": "2019-03-11T13:12:04.000Z"
}
Profile Questions - {
"_id": "5c865ea3257db43fe40072b2",
"id": "STANDARD_EDUCATION",
"general_name": "STANDARD_EDUCATION",
"country_code": "US",
"order": 1,
"profile_section_id": "5c865ea4257db43fe4007331",
"profile_section": "My Profile",
"translated": [
{
"con_lang": "US-EN",
"text": "What is the highest level of education you have completed?",
"hint": null,
"mapping": {},
"answers": [
{
"precode": "1",
"text": "3rd Grade or less",
"mapping": {}
}
]
},
{...}
],
"updated_at": "2019-03-11T13:12:03.000Z",
"created_at": "2019-03-11T13:12:03.000Z"
}
In ProfileSection I have added
public function questions()
{
return $this->embedsMany(ProfilerQuestions::class, '_id', 'profile_section_id');
}
If I execute ProfileSection::find('5c865ea4257db43fe4007331')->questions
It returns me Profile Questions Object with only one attribute: 5c865ea4257db43fe4007331 i.e ObjectId of Profile Section
I tried using ->with('questions) before accessing questions object
like this
but it's not working
I don't understand what's the issue will definitely need some help.

How to Parse json data having array in IONIC hybrid appliaction

I am beginner in hybrid Ionic App Development. I want to implement RESTful web service in my project.
My json Data is:
{
"records": [
{
"Name": "Alfreds Futterkiste",
"City": "Berlin",
"Country": "Germany"
},
{
"Name": "Ana Trujillo Emparedados y helados",
"City": "México D.F.",
"Country": "Mexico"
},
{
"Name": "Antonio Moreno Taquería",
"City": "México D.F.",
"Country": "Mexico"
}]
}
Here i want to Parse this Data in Listview in ionic. I don't no how to parse data with array.Please suggest the solution or tutorials to get result.I want to show all Names in Listview.
I am using this api link: http://api.geonames.org/earthquakesJSON?north=44.1&south=-9.9&east=-22.4&west=55.2&username=bertt
thanks in advance.
Here you no need to parse the JSON data, it is already in object format.
You can use like this
var myData = {
"records": [
{
"Name": "Alfreds Futterkiste",
"City": "Berlin",
"Country": "Germany"
},
{
"Name": "Ana Trujillo Emparedados y helados",
"City": "México D.F.",
"Country": "Mexico"
},
{
"Name": "Antonio Moreno Taquería",
"City": "México D.F.",
"Country": "Mexico"
}]
}
var record1 = myData.records[0];
var record2 = myData.records[1];
var record3 = myData.records[2];
console.log(record1)
console.log(record2)
console.log(record3)
console.log("Record 1 Data:")
console.log('\t'+"Name: "+record1.Name)
console.log('\t'+"City: "+record1.City)
console.log('\t'+"Country: "+record1.Country)
console.log("Record 2 Data:")
console.log('\t'+"Name: "+record2.Name)
console.log('\t'+"City: "+record2.City)
console.log('\t'+"Country: "+record2.Country)
console.log("Record 3 Data:")
console.log('\t'+"Name: "+record3.Name)
console.log('\t'+"City: "+record3.City)
console.log('\t'+"Country: "+record3.Country)

Can I retrieve house number on mapbox reverse geocoder?

Currently when I would like to retrieve address for coordinates I make following request as an example:
GET http://api.tiles.mapbox.com/v3/examples.map-zr0njcqy/geocode/-114.0701,51.0495.json
I get address information up to the street level but NO house number. Is there way to retrieve it as well? I think it's such an obvious need and cannot think of any problems extracting this date when you already extracted the rest.
{
"attribution": {
"mapbox-places": "<a href='https://www.mapbox.com/about/maps/' target='_blank'>© Mapbox © OpenStreetMap</a> <a class='mapbox-improve-map' href='https://www.mapbox.com/map-feedback/' target='_blank'>Improve this map</a>"
},
"query": [
-114.0701,
51.0495
],
"results": [
[
{
"id": "street.31973701",
"lat": 51.0476559,
"lon": -114.0703042,
"name": "3 St SW",
"type": "street"
},
{
"bounds": [
-114.36183200000002,
50.84361600000001,
-113.87432100000002,
51.217528999999985
],
"id": "mapbox-places.10008775",
"lat": 51.03095,
"lon": -114.108491,
"name": "Calgary",
"type": "city"
},
{
"bounds": [
-120.00138351899996,
48.99667665000002,
-110.004763853,
60.00042158400004
],
"id": "province.2553712403",
"lat": 54.872006,
"lon": -115.003552,
"name": "Alberta",
"type": "province"
},
{
"bounds": [
-141.00275000000013,
40.043430830999895,
-47.69751888999983,
86.45371111000011
],
"id": "country.1833980151",
"lat": 76.304456,
"lon": -105.801333,
"name": "Canada",
"type": "country"
}
]
]
}
#rbrundritt is correct.
Most mapping applications (Google, Bing, etc) merely interpolate the location when given a street address. They are aware of the starting and ending address on a given block and then make an educated guess as to where the address you are search for is located on that block based on that. They don't actually store the outlines and the addresses of each property.

MongoDB Database Structure and Best Practices Help

I'm in the process of developing Route Tracking/Optimization software for my refuse collection company and would like some feedback on my current data structure/situation.
Here is a simplified version of my MongoDB structure:
Database: data
Collections:
“customers” - data collection containing all customer data.
[
{
"cust_id": "1001",
"name": "Customer 1",
"address": "123 Fake St",
"city": "Boston"
},
{
"cust_id": "1002",
"name": "Customer 2",
"address": "123 Real St",
"city": "Boston"
},
{
"cust_id": "1003",
"name": "Customer 3",
"address": "12 Elm St",
"city": "Boston"
},
{
"cust_id": "1004",
"name": "Customer 4",
"address": "16 Union St",
"city": "Boston"
},
{
"cust_id": "1005",
"name": "Customer 5",
"address": "13 Massachusetts Ave",
"city": "Boston"
}, { ... }, { ... }, ...
]
“trucks” - data collection containing all truck data.
[
{
"truckid": "21",
"type": "Refuse",
"year": "2011",
"make": "Mack",
"model": "TerraPro Cabover",
"body": "Mcneilus Rear Loader XC",
"capacity": "25 cubic yards"
},
{
"truckid": "22",
"type": "Refuse",
"year": "2009",
"make": "Mack",
"model": "TerraPro Cabover",
"body": "Mcneilus Rear Loader XC",
"capacity": "25 cubic yards"
},
{
"truckid": "12",
"type": "Dump",
"year": "2006",
"make": "Chevrolet",
"model": "C3500 HD",
"body": "Rugby Hydraulic Dump",
"capacity": "15 cubic yards"
}
]
“drivers” - data collection containing all driver data.
[
{
"driverid": "1234",
"name": "John Doe"
},
{
"driverid": "4321",
"name": "Jack Smith"
},
{
"driverid": "3421",
"name": "Don Johnson"
}
]
“route-lists” - data collection containing all predetermined route lists.
[
{
"route_name": "monday_1",
"day": "monday",
"truck": "21",
"stops": [
{
"cust_id": "1001"
},
{
"cust_id": "1010"
},
{
"cust_id": "1002"
}
]
},
{
"route_name": "friday_1",
"day": "friday",
"truck": "12",
"stops": [
{
"cust_id": "1003"
},
{
"cust_id": "1004"
},
{
"cust_id": "1012"
}
]
}
]
"routes" - data collections containing data for all active and completed routes.
[
{
"routeid": "1",
"route_name": "monday1",
"start_time": "04:31 AM",
"status": "active",
"stops": [
{
"customerid": "1001",
"status": "complete",
"start_time": "04:45 AM",
"finish_time": "04:48 AM",
"elapsed_time": "3"
},
{
"customerid": "1010",
"status": "complete",
"start_time": "04:50 AM",
"finish_time": "04:52 AM",
"elapsed_time": "2"
},
{
"customerid": "1002",
"status": "incomplete",
"start_time": "",
"finish_time": "",
"elapsed_time": ""
},
{
"customerid": "1005",
"status": "incomplete",
"start_time": "",
"finish_time": "",
"elapsed_time": ""
}
]
}
]
Here is the process thus far:
Each day drivers begin by Starting a New Route. Before starting a new route drivers must first input data:
driverid
date
truck
Once all data is entered correctly the Start a New Route will begin:
Create new object in collection “routes”
Query collection “route-lists” for “day” + “truck” match and return "stops"
Insert “route-lists” data into “routes” collection
As driver proceeds with his daily stops/tasks the “routes” collection will update accordingly.
On completion of all tasks the driver will then have the ability to Complete the Route Process by simply changing “status” field to “active” from “complete” in the "routes" collection.
That about sums it up. Any feedback, opinions, comments, links, optimization tactics are greatly appreciated.
Thanks in advance for your time.
You database schema looks like for me as 'classic' relational database schema. Mongodb good fit for data denormaliztion. I guess when you display routes you loading all related customers, driver, truck.
If you want make your system really fast you may embedd everything in route collection.
So i suggest following modifications of your schema:
customers - as-is
trucks - as-is
drivers - as-is
route-list:
Embedd data about customers inside stops instead of reference. Also embedd truck. In this case schema will be:
{
"route_name": "monday_1",
"day": "monday",
"truck": {
_id = 1,
// here will be all truck data
},
"stops": [{
"customer": {
_id = 1,
//here will be all customer data
}
}, {
"customer": {
_id = 2,
//here will be all customer data
}
}]
}
routes:
When driver starting new route copy route from route-list and in addition embedd driver information:
{
//copy all route-list data (just make new id for the current route and leave reference to routes-list. In this case you will able to sync route with route-list.)
"_id": "1",
route_list_id: 1,
"start_time": "04:31 AM",
"status": "active",
driver: {
//embedd all driver data here
},
"stops": [{
"customer": {
//all customer data
},
"status": "complete",
"start_time": "04:45 AM",
"finish_time": "04:48 AM",
"elapsed_time": "3"
}]
}
I guess you asking yourself what do if driver, customer or other denormalized data changed in main collection. Yeah, you need update all denormalized data within other collections. You will probably need update billions of documents (depends on your system size) and it's okay. You can do it async if it will take much time.
What benfits in above data structure?
Each document contains all data that you may need to display in your application. So, for instance, you no need load related customers, driver, truck when you need display routes.
You can make any difficult queries to your database. For example in your schema you can build query that will return all routes thats contains stops in stop of customer with name = "Bill" (you need load customer by name first, get id, and look by customer id in your current schema).
Probably you asking yourself that your data can be unsynchronized in some cases, but to solve this you just need build a few unit test to ensure that you update your denormolized data correctly.
Hope above will help you to see the world from not relational side, from document database point of view.