I'm creating an virtual stamp card program for the iphone and have run into an issue with implementing my database. The program essentially has a main points system that can be utitlized through all merchants (sort've like air miles), but i also want to keep track of how many times you've been to EACH merchant
So far, i have created 3 main tables for users, merchants, and transactions.
1) Users table contains basic info like user_id and total points collected.
2) Merchants table contains info like merchant_id, location, total points given.
3) Transactions table simply creates a new row for every time someone checks into each merchant, and records date-stamp, user name, merchant name, and points awarded.
So the most basic way to deal with finding out how many times you've been to each merchant is to query the entire transaction table for both user and merchant, and this will give me a transaction history of how many times you've been to that specific merchant(which is perfect), but in the long run, i feel this will be horrible for performance.
The other straightforward, yet "dumb" method for implementing this, would be to create a column in the users table for EACH merchant, and keep the running totals there. This seems inappropriate, as I will be adding new merchants on a regular basis, and there would need to be new columns added to every user for every time this happens.
I've looked into one-to-many and many-to-many relationships for mySQL databases, but can't seem to come up with something very concrete, as i'm extremely new to web/PHP/mySQL development but i'm guessing this is what i'm looking for...
I've also thought of creating a special transaction table for each user, which will have a column for merchant and another for the # of times visited. Again, not sure if this is the most efficient implementation.
Can someone point me in the right direction?
You're doing the right thing in the sense of thinking up the different options, and weighing up the good and bad for each.
Personally, I'd go with a MerchantCounter table which joins on your Merchant table by id_merchant (for example) and which you keep up-to-date explicitly.
Over time it does not get slower (unlike an activity-search), and does not take up lots of space.
Edit: based on your comment, Janan, no I would use a single MerchantCounter table. So you've got your Merchant table:
id_merchant nm_merchant
12 Jim
15 Tom
17 Wilbur
You would add a single additional table, MerchantCounter (edited to show how to tally totals for individual users):
id_merchant id_user num_visits
12 101 3
12 102 8
15 101 6007
17 102 88
17 104 19
17 105 1
You can see how id_merchant links the table to the Merchant table, and id_user links to a further User table.
Related
I trying doing the manual sanitization, however I am getting a type mismatch error in performing the calculations.
I also need help in sanitizing the data and getting the insight as per the below instructions:
The column sellerproductcount gives you the count of products in the
form '1-16 of over 100,000 results' , and you can parse out the product count 100,000.
sellerratings - this columns gives you the % and count of positive ratings (e.g. 88% positive
in the last 12 months (118 ratings) ) if parsed correctly
sellerdetails - you can use this text to parse out phone numbers, and email IDs of
merchants, where available, so our team can reach out to them.
businessaddress - this will give you the business locations of the sellers. You can parse them
to identify if a seller is registered in the US , Germany (DE), or China (CN).
Hero Product 1 #ratings and Hero Product 2 #ratings - these 2 columns give you the number of
ratings of the 2 'hero products' or bestselling products of this seller.
I have attached the dataset for the same.
https://docs.google.com/spreadsheets/d/1PSqRCnmFgq7v7RzZaCXXoV0Edp_vM7QO/edit?usp=sharing&ouid=115547990006782902200&rtpof=true&sd=true
Most of this type of data prep can be done with string & RegEx functions like REGEX_MATCH(). Here are a few examples based on the data you shared:
Seller Product Count
INT(REGEXP_EXTRACT([Sellerproductcount], '(\d*,?\d*) results'))
1-16 of over 6,000 results >> 6000
Seller Rating (Percentage)
INT(REGEXP_EXTRACT([Sellerratings], '(\d*)% positive'))
92% positive in the last 12 months (181 ratings) >> 92
Seller Rating (Count)
INT(REGEXP_EXTRACT([Sellerratings], '(\d*) (?:total )?ratings'))
92% positive in the last 12 months (181 ratings) >> 181
Business Country Code
RIGHT([Businessaddress],2)
AM Treptower Park28-30Berlin12435DE >> DE
These examples all have very straightforward patterns that are present in all rows so they can be done pretty easily with one simple calculation. However, something like sellerdetails which is unstructured, inconsistent, and sometimes incomplete will be a bit more of a challenge. You will need to use a couple of different calculations and techniques combined together to find what you are looking for, as well as some manual data prep. Here's an example of how you can pull out email but it won't work for everything:
Email
REGEXP_EXTRACT([Sellerdetails], '([a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+#[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*)')
Good luck with your data cleaning, I suggest using sites like https://regex101.com/ and https://regexr.com/ to learn more about and help test regular expressions.
I want to design a database for billing system. In one bill a customer might have purchased multiple different items ,for example fot bill Id 1 customer purchased 2 apples 3 bananas and 1 watermelon. i want to know how i can normalize this database.
This is a pretty standard, basic normalization exercise with a pretty standard solution. The usual approach is to have an orders table containing order ID, customer ID, order date &c., and an order_items table with a record for each line item on the order.
Working on a financial application that tracks sales. However, I'm running into problems trying to create a schema for properly tracking the data for reports (the main point of the app).
A purchase is the foundation of the app. It has several associations (listed below). Each purchase is tracked via a year and month field. A year is the smallest unit a user may filter a report by, so I will only have to show data for each month in that year.
# purchase.rb model
class Purchase < ActiveRecord::Base
# Associations:
# belongs_to :partner
# belongs_to :purchase_type
# belongs_to :purchase_category
# Attributes:
# partner_id => association
# purchase_type_id => association
# purchase_category_id => association
# year => year in integer (2013, 2014, etc...)
# month => month in integer ("January" => 1, etc...)
# amount => amount a product sold for in cents ($10.00 => 1000)
# fee => fee for associated partner (if there is one) in cents ($2.00 => 200)
end
The problem is that I need to show an overview for a given year, which breaks things down by how many purchases were completed, what partners completed them, and what were the fee amounts. I solved that by having YearMetric and MonthMetric tables that are updated everytime a purchase is added/updated/removed. So you add a new purchase for a given year/month, and the corresponding YearMetric and MonthMetric rows are found and updated with +/- the appropriate amounts/fees.
This solution works well for the overview page. However, I also need to be able to view purchases in the context of partners, purchase_types, and purchase_categories. If I followed the same strategy as my overview report, I would have to add the following tables:
PartnerYearMetric, PartnerMonthMetric
PurchaseCategoryYearMetric, PurchaseCategoryMonthMetric
PurchaseTypeYearMetric, PurchaseTypeMonthMetric
So everytime I add a purchase, I would be doing up to 8 additional DB updates (8 finds and then 8 updates).
The items I'm reporting on are total purchases made, average purchases (historical comparison), total amounts/fees for the period, top partners by number of purchases and by most fee amounts, etc...
There has to be a better solution than this. "Live calculation" by updating 8 records for every 1 purchase seems a bit overkill.
What you're doing is maintaining materialized views of the data in the application. It's a form of denormalization. That can be OK as an optimization but should not be your first choice. It can be very error prone, especially in the presence of concurrency, and must be done quite carefully.
Instead, when you wish to generate a summary report, use an aggregate to SUM them, COUNT them, etc as appropriate. See aggregate functions in the Pg docs, rails Calculations, rails aggregates.
You may find it convenient to create a VIEW over the query you use, and then access the view from the application.
If you find performance of calculating the aggregates in real time for the summary to be a problem, and you cannot solve it with proper indexing and tuning, then you should think about denormalizing. Rather than maintaining your materialized views in the app, though, consider using triggers in the database; they're much easier to write in a concurrency-safe way.
You may also want to look up PostgreSQL 9.4's enhanced materialized views support.
I have my sql database Views available to my report, but sometimes they return multiple values, for example I have one that shows me the Total Credits for a range of years.
When I click "Browse Data.." it lets me see what bits of data are available
Eg:
Credits
-------
31
45
460
But I want to select 45 (based on a customer ID)... is it possible to do this?
EDIT: An alternative is if I can link the Customer ID from two views, but only if it's not null (as sometimes there are no records in the Credits)
To avoid the problem of unintentionally "deleting" customers from the report results, first do a left outer join between the CONTRACT_VIEW and the year views, such as TOTAL_2013. In your selection formula, instead of just doing something like {TOTAL_2013.Customer_ID}=MyCustomerID, add all the nulls to it as well, so: isnull({TOTAL_2013.Customer_ID}) or {TOTAL_2013.Customer_ID}=MyCustomerID. This will prevent customers who don't have any entries in the by-year views from being removed completely from the report.
There are a few answers here already that have part answered my challenge in Access but not fully.
I have 2 tables that form the basis of my database: customers and items
I have a further 2 tables; one for order quantities against customers and items (orders_a), and one for forecast quantities against customers and items (forecast_a).
forecast_a and orders_a also have a date for each customer and item combination (basically there will be 12 dates only for the 12 months of the year - 01/01/12,01/02/12,01/03/12 etc.)
Because a user will want to manually forecast quantities for a full year for each customer and each item, if there were 2 customers and 2 items, the forecast_a table would contain 48 rows. 2 items x 2 customers = 4, 4 x 12 dates = 48. The same goes for the orders_a.
I know this is a slightly unusual set up but the user requires visibility of a full year.
My main challenge based on this is as follows:
A user will want to see a form with customers in the first column, items in the second and then (like a crosstab): Jan Forecast Qty, Jan Order Qty, Feb Forecast Qty, Feb Order Qty etc.
Therefore how would I create a crosstab to pull both these tables together, and how would I go about creating a form for data entry off the back of it?
I may well be constructing my database the wrong way but the fact that the user needs a 'grid' where every entry is manual means I can't just have a form that creates a record one at a time for orders or forecasts.
Thanks in advance!
Nick
The problem you have is that this is a task that is in essence a spreadsheet task. Accordingly it may be best handled in Excel. To achieve this create an Excel object, create a blank worksheet, populate it with the data, then have a button to suck it back into the database when the user has finished.