MongoDB model for cross vendor time series data - mongodb

I know my problem seems better be solved by RDBMS models. But I really want to deploy it using MongoDB because I have potential irregular fields to add on each record in the future and also want to practice my NoSQL database skills.
PE ratio and PB ratio data provided by one vendor:
| Vendor5_ID| PE| PB|date |
|----------:|----:|-----:|:----------|
| 210| 3.90| 2.620|2017-08-22 |
| 210| 3.90| 2.875|2017-08-22 |
| 228| 3.85| 2.320|2017-08-22 |
| 214| 3.08| 3.215|2017-08-22 |
| 187| 3.15| 3.440|2017-08-22 |
| 181| 2.76| 3.460|2017-08-22 |
Price data and analyst covering provided by another vendor
|Symbol | Price| Analyst|date |
|:------|-----:|-------:|:----------|
|AAPL | 160| 6|2017-08-22 |
|MSFT | 160| 6|2017-08-22 |
|GOOG | 108| 4|2017-08-22 |
And I have key convert data:
| uniqueID|Symbol |from |to |
|--------:|:------|:----------|:----------|
| 1|AAPL |2016-01-10 |2017-08-22 |
| 2|MSFT |2016-01-10 |2017-08-22 |
| 3|GOOG |2016-01-10 |2017-08-22 |
| uniqueID| Vendor5_ID|from |to |
|--------:|----------:|:----------|:----------|
| 1| 210|2016-01-10 |2017-08-22 |
| 2| 228|2016-01-10 |2017-08-22 |
| 3| 214|2016-01-10 |2017-08-22 |
I want to execute time range query fast. I come up with an idea that store each column as a collection,
db.PE:
{
_id,
uniqueID,
Vendor5_ID,
value,
date
}
db.PB:
{
_id,
uniqueID,
Vendor5_ID,
value,
date
}
db.Price:
{
_id,
uniqueID,
Symbol,
value,
date
}
db.Analyst:
{
_id,
uniqueID,
Symbol,
value,
date
}
Is this a good solution? What model do you think is the best if there are far more data to add by different vendor?

I would consider using nested table or child table approach. I am not sure the extent of support mongo has for this kind of support. I would consider using Oracle NoSQL Database for this usecase with nested tables support with TTL and higher throughput (because of BDB as storage engine). With nested tables you could store PE and PB with timestamps in the child/nested table while the parent table continues to hold symbol/vendor_id and any other details. This will ensure that your queries are on the same shard, putting them in a different collection will not guarentee same shard.

Related

What exactly is a wide column store?

Googling for a definition either returns results for a column oriented DB or gives very vague definitions.
My understanding is that wide column stores consist of column families which consist of rows and columns. Each row within said family is stored together on disk. This sounds like how row oriented databases store their data. Which brings me to my first question:
How are wide column stores different from a regular relational DB table? This is the way I see it:
* column family -> table
* column family column -> table column
* column family row -> table row
This image from Database Internals simply looks like two regular tables:
The guess I have as to what is different comes from the fact that "multi-dimensional map" is mentioned along side wide column stores. So here is my second question:
Are wide column stores sorted from left to right? Meaning, in the above example, are the rows sorted first by Row Key, then by Timestamp, and finally by Qualifier?
Let's start with the definition of a wide column database.
Its architecture uses (a) persistent, sparse matrix, multi-dimensional
mapping (row-value, column-value, and timestamp) in a tabular format
meant for massive scalability (over and above the petabyte scale).
A relational database is designed to maintain the relationship between the entity and the columns that describe the entity. A good example is a Customer table. The columns hold values describing the Customer's name, address, and contact information. All of this information is the same for each and every customer.
A wide column database is one type of NoSQL database.
Maybe this is a better image of four wide column databases.
My understanding is that the first image at the top, the Column model, is what we called an entity/attribute/value table. It's an attribute/value table within a particular entity (column).
For Customer information, the first wide-area database example might look like this.
Customer ID Attribute Value
----------- --------- ---------------
100001 name John Smith
100001 address 1 10 Victory Lane
100001 address 3 Pittsburgh, PA 15120
Yes, we could have modeled this for a relational database. The power of the attribute/value table comes with the more unusual attributes.
Customer ID Attribute Value
----------- --------- ---------------
100001 fav color blue
100001 fav shirt golf shirt
Any attribute that a marketer can dream up can be captured and stored in an attribute/value table. Different customers can have different attributes.
The Super Column model keeps the same information in a different format.
Customer ID: 100001
Attribute Value
--------- --------------
fav color blue
fav shirt golf shirt
You can have as many Super Column models as you have entities. They can be in separate NoSQL tables or put together as a Super Column family.
The Column Family and Super Column family simply gives a row id to the first two models in the picture for quicker retrieval of information.
Most (if not all) Wide-column stores are indeed row-oriented stores in that every parts of a record are stored together. You can see that as a 2-dimensional key-value store. The first part of the key is used to distribute the data across servers, the second part of the key lets you quickly find the data on the target server.
Wide-column stores will have different features and behaviors. However, Apache Cassandra, for example, allows you to define how the data will be sorted. Take this table for example:
| id | country | timestamp | message |
|----+---------+------------+---------|
| 1 | US | 2020-10-01 | "a..." |
| 1 | JP | 2020-11-01 | "b..." |
| 1 | US | 2020-09-01 | "c..." |
| 2 | CA | 2020-10-01 | "d..." |
| 2 | CA | 2019-10-01 | "e..." |
| 2 | CA | 2020-11-01 | "f..." |
| 3 | GB | 2020-09-01 | "g..." |
| 3 | GB | 2020-09-02 | "h..." |
|----+---------+------------+---------|
If your partitioning key is (id) and your clustering key is (country, timestamp), the data will be stored like this:
[Key 1]
1:JP,2020-11-01,"b..." | 1:US,2020-09-01,"c..." | 1:US,2020-10-01,"a..."
[Key2]
2:CA,2019-10-01,"e..." | 2:CA,2020-10-01,"d..." | 2:CA,2020-11-01,"f..."
[Key3]
3:GB,2020-09-01,"g..." | 3:GB,2020-09-02,"h..."
Or in table form:
| id | country | timestamp | message |
|----+---------+------------+---------|
| 1 | JP | 2020-11-01 | "b..." |
| 1 | US | 2020-09-01 | "c..." |
| 1 | US | 2020-10-01 | "a..." |
| 2 | CA | 2019-10-01 | "e..." |
| 2 | CA | 2020-10-01 | "d..." |
| 2 | CA | 2020-11-01 | "f..." |
| 3 | GB | 2020-09-01 | "g..." |
| 3 | GB | 2020-09-02 | "h..." |
|----+---------+------------+---------|
If you change the primary key (composite of partitioning and clustering key) to (id, timestamp) WITH CLUSTERING ORDER BY (timestamp DESC) (id is the partitioning key, timestamp is the clustering key in descending order), the result would be:
[Key 1]
1:US,2020-09-01,"c..." | 1:US,2020-10-01,"a..." | 1:JP,2020-11-01,"b..."
[Key2]
2:CA,2019-10-01,"e..." | 2:CA,2020-10-01,"d..." | 2:CA,2020-11-01,"f..."
[Key3]
3:GB,2020-09-01,"g..." | 3:GB,2020-09-02,"h..."
Or in table form:
| id | country | timestamp | message |
|----+---------+------------+---------|
| 1 | US | 2020-09-01 | "c..." |
| 1 | US | 2020-10-01 | "a..." |
| 1 | JP | 2020-11-01 | "b..." |
| 2 | CA | 2019-10-01 | "e..." |
| 2 | CA | 2020-10-01 | "d..." |
| 2 | CA | 2020-11-01 | "f..." |
| 3 | GB | 2020-09-01 | "g..." |
| 3 | GB | 2020-09-02 | "h..." |
|----+---------+------------+---------|

SQL parameter table

I suspect this question is already well-answered but perhaps due to limited SQL vocabulary I have not managed to find what I need. I have a database with many code:description mappings in a single 'parameter' table. I would like to define a query or procedure to return the descriptions for all (or an arbitrary list of) coded values in a given 'content' table with their descriptions from the parameter table. I don't want to alter the original data, I just want to display friendly results.
Is there a standard way to do this?
Can it be accomplished with SELECT or are other statements required?
Here is a sample query for a single coded field:
SELECT TOP (5)
newid() as id,
B.BRIDGE_STATUS,
P.SHORTDESC
FROM
BRIDGE B
LEFT JOIN PARAMTRS P ON P.TABLE_NAME = 'BRIDGE'
AND P.FIELD_NAME = 'BRIDGE_STATUS'
AND P.PARMVALUE = B.BRIDGE_STATUS
ORDER BY
id
I want to produce 'decoded' results like:
| id | BRIDGE_STATUS |
|--------------------------------------|------------ |
| BABCEC1E-5FE2-46FA-9763-000131F2F688 | Active |
| 758F5201-4742-43C6-8550-000571875265 | Active |
| 5E51634C-4DD9-4B0A-BBF5-00087DF71C8B | Active |
| 0A4EA521-DE70-4D04-93B8-000CD12B7F55 | Inactive |
| 815C6C66-8995-4893-9A1B-000F00F839A4 | Proposed |
Rather than original, coded data like:
| id | BRIDGE_STATUS |
|--------------------------------------|---------------|
| F50214D7-F726-4996-9C0C-00021BD681A4 | 3 |
| 4F173E40-54DC-495E-9B84-000B446F09C3 | 3 |
| F9C216CD-0453-434B-AFA0-000C39EFA0FB | 3 |
| 5D09554E-201D-4208-A786-000C537759A1 | 1 |
| F0BDB9A4-E796-4786-8781-000FC60E200C | 4 |
but for an arbitrary number of columns.

DB2 add column, insert data and new id

Each month, I want to record meter readings in order to see trends over time, and also want to add any new meters to my history table. I would like to add a new column name each month based on date.
I know how to concatenate data in a query, but have not found a way to do the same thing when adding a column. If today is 06/14/2018, I want the column name to be Y18M06, as I plan to run this monthly.
Something like this to add the column (this doesn't work)
ALTER TABLE METER.HIST
ADD COLUMN ('Y' CONCAT VARCHAR_FORMAT(CURRENT TIMESTAMP, 'YY') CONCAT 'M' CONCAT VARCHAR_FORMAT(CURRENT TIMESTAMP, 'MM'))
DECIMAL(12,5) NOT NULL DEFAULT 0
Then, I want to insert data into that new column from another table. In this case, a list of meter id's, and the new column contains a meter reading. If a new id exists, then it also needs to be added.
Source: CURRENT Destination: HISTORY
Current Desired
+----+---------+ +----+---------+ +----+---------+---------+
| id | reading | | id | Y18M05 | | id | Y18M05 | Y18M06 |
+----+---------+ +----+---------+ +----+---------+---------+
| 1 | 321.234 | | 1 | 121.102 | | 1 | 121.102 | 321.234 |
+----+---------+ +----+---------+ +----+---------+---------+
| 2 | 422.634 | | 2 | 121.102 | | 2 | 121.102 | 422.634 |
+----+---------+ +----+---------+ +----+---------+---------+
| 3 | 121.456 | | 3 | | 121.456 |
+----+---------+ +----+---------+---------+
Any help would be much appreciated!
Don't physically add columns. Rather pivot the data on-the fly
https://www.ibm.com/developerworks/community/blogs/SQLTips4DB2LUW/entry/pivoting_tables56?lang=en
Adding columns is not a good idea. From a conceptional point and modelling point think about adding rows for each month. You have limited columns but more less unlimited number of rows and this will give you a oermanent model / table structure.

Mongodb dynamic column count

Hi i am making a straw poll website. I am using node.js for server and mongodb for database. My database structure is like this
objectId | userId | createdDate | endDate | dynamic columns
dynamic columns is for example one record has 3 choice, another record has 6 choice. Database with dynamic columns is should look like this.
randomId | admin | 07.03.2017 | 10.03.2017 | ch1-47 | ch2-20 | ch3-200
randomId | user2 | 07.03.2017 | 15.03.2017 | ch1-21 | ch2-7
and so. chN means Nth choice. And number that comes after chN is how much times it rated. When someone rates choice 1 in first poll, the database should look like this.
randomId | admin | 07.03.2017 | 10.03.2017 | ch1-48 | ch2-20 | ch3-200
randomId | user2 | 07.03.2017 | 15.03.2017 | ch1-21 | ch2-7
Is mongodb have dynamic columns, if it have how can i make something like this?

One SQL query over three tables

I would like to query over three tables. Right now I have managed to join two tables. I'm doing my first databases, but right now I'm really stuck. Here are my tables
Drivers
|DRIVER_ID|FIRST_NAME|LAST_NAME|AGE|
| 1|John |Smith |19 |
| 2|Steve |Oak |33 |
| 3|Mary |Sanchez |22 |
Drivers_in_Teams
|DRIVERS_IN_TEAMS_ID|DRIVER_ID|TEAM_ID|BEG_DATE |END_DATE |CAR |
| 1| 1| 1|18-NOV-05| |Toyota |
| 2| 3| 2|10-APR-12| |Ford |
| 3| 2| 3|19-JUL-01|02-AUG-04|Volkswagen |
Team
|TEAM_ID |NAME |COUNTRY |
| 1|Turbo |Sweden |
| 2|Rally |UK |
| 3|Baguette |France |
BEG_DATEs are done with "sysdate-number"
My goal is to find a driver, who is driving a Ford and still has a valid contract (END_DATE is not set)
I would like to make a query over three tables, so the result should display a drivers FIRST NAME, LAST NAME and a COUNTRY of the team
I tried some examples which I have found from StackOverFlow and edited those, but I got stuck adding third TEAMS table to the query.
Here's the one I used
SELECT FIRST_NAME, LAST_NAME
FROM DRIVERS
JOIN DRIVERS_IN_TEAMS ON DRIVERS.DRIVER_ID = DRIVERS_IN_TEAMS.DRIVER_ID
WHERE DRIVERS_IN_TEAMS.CAR = 'Ford' AND DRIVERS_IN_TEAMS.END_DATE IS NOT NULL
I think this should work(join all the tables with the corresponding ids and then you have your conditions):
SELECT d.FIRST_NAME, d.LAST_NAME, t.COUNTRY FROM DRIVERS d JOIN DRIVERS_IN_TEAMS dit ON dit.DRIVER_ID = d.DRIVER_ID JOIN TEAM t ON dit.TEAM_ID = t.TEAM_ID WHERE dit.END_DATE IS NULL AND dit.CAR = 'Ford'