Is a Hash table needed here? - hash

I am trying to decide if I should use a hash table of some sort. What I will have is a large amount of data in string format.
I will have many strings that will fall into categories, but have the same key value to access them.
An example would be if some strings fall into the category animal, I would use the string animal as a key but I would have many like this below as an example:
animal dog
animal cat
and so on.
And then maybe another called person
person tom
person joe
and son on
So I would want to search for animal or person and then list each value, so a search on person would return tom and joe.
Can you have multiple keys of the same value? It's been a long time since I've had to think of a hash.
Is a hash good for this? If so is Boost or STL better?
Thanks
I can add more detail if this makes no sense, what I am asking.

Yep, a hash map supports multiple keys as its definition states. Keys are internally converted to indexes that are unique. The ideal is to have keys that are all different but this is seldom achievable unless the keys are fixed.
In your case, is there a restriction on why you did not choose "animal" and "person" to be keys?
In this case you can have a list of animals (dog, cat, camel, bird...) identified by a single key: animal and the same goes for person.

Try to implement a hash table structure as for each key, create a list of values pertaining to it. So when you search for a value in the hash table just display the list associated with that key. hope it helps!

Related

Inheriting Parent Table with identifier (Postgres)

Sorry if this is a relatively easy problem to solve; I read the docs on inheritance and I'm still confused on how I would do this.
Let's say I have the parent table being car_model, which has the name of the car and some of it's features as the columns (e.g. car_name, car_description, car_year, etc). Basically a list of cars.
I have the child table being car_user, which has the column user_id.
Basically, I want to link a car to the car_user, so when I call
SELECT car_name FROM car_user WHERE user_id = "name", I could retrieve the car_name. I would need a linking component that links car_user to the car.
How would I do this?
I was thinking of doing something like having car_name column in car_user, so when I create a new data row in car_user, it could link the 2 together.
What's the best way to solve this problem?
Inheritance is something completely different. You should read about foreign keys and joins.
If one user drives only one car, but many users can drive same car, you need to build one-to-many -relation. Add car_name to your user table and JOIN using that field.

Self References

For an assessment task I'm doing, an entity album has the attribute also_bought, which is a self-referential attribute. However, this one attribute has multiple entries for any one album - as the also_bought recommendations are rarely only one recommendation - and thus, is a bit of a question mark when it comes to normalisation. I'm not sure whether it passes 1NF or not.
To be clear, the entire entity's set is
Album(album_id, title, playtime, genre, release_date, price, also_bought)
"Also bought" items should be stored in a separate table, something like.
AlsoBought (table)
album_id
also_bought_album_id
Then configure foreign keys from both columns to reference Album.album_id.
You mean that Album is a "self-referencing table" because it has a FK (foreign key) from one column list to another in the same table? (A FK constraint holds when subrow values for a column list must appear elsewhere.) If you mean that the type of also_bought is a list of album_ids, there is no FK from the former to the latter, because values for the former (lists of ids) are not values for the latter (ids). There's a constraint that is reminding you of a FK.
Anyway, normalization is done to one table, and doesn't depend on FKs.
But any time you are "normalizing to 1NF" eliminating "non-atomic columns" you have to start by deciding what your "table" "columns" contain. If you decide a cell for a column in a row contains "many values" then you don't have a relational table and you have to come up with one. The easiest way is to assume a set-valued column to get a relation and then follow the standard rules for elimination of too-complex column types.

Attribute creation from one single table

I have one table.
I have to make attributes only from the fields on that table.
I have to use these attributes on one report.
I wanted to ask that all the attributes I have made are keys. Is this fine? If not, how do I resolve this issue?
The Keys are like primary, foreign keys in RDBMS. They define the joins
So long as you do not have other tables involved in the design, this is fine.
Ideally attributes are made only for dimensions
e.g
you could make attribute called Issue with forms(Issue id, Issue desc, Issue date) with Issue id as the ID form that drives the join with the other tables
All attributes should not be keys. Every key denotes that the tool is interpreting them as primary keys. Set proper relationship (parent-child) between the attributes and you will see keys only for the child attribute(s).

Sort order in Core Data with a multi-multi relationship

Say I'm modeling a school, so I have 2 Entities: Student and Class. For whatever reason, I want each class roster to have a custom sort order. In a simple relationship, this would mean giving Student a sortOrder attribute and just sorting the list by this number. Issue is, a Student might be order 3 in one Class and order 6 in another. How would I store these orderings in Core Data in a way that I can easily access them and sort my lists properly?
Student Class
classes <<--------->> students
^ ^
| |
unordered ordered
This diagram might help explain what I'm trying to do. The students "roster" I would want to be fetched in a specific order stored somewhere, which could be any ordering. Storing this ordering is what I'm not sure how to do in a way that's the most efficient. Creating a bunch of Order objects and trying to manage the links sounds like a lot of overhead, and it feels like there must be a better way.
If the ordering of students can be described by one or more NSSortDescriptors, you could create a fetched property on the Class entity that fetches the students and applies the sort descriptor. Alternatively, it may be easier (depending on your use case) to apply the sort descriptor(s) to the NSFetchedResultsController that you're using to deal with the class' students collection.
If you can't use an NSSortDescriptor, then you'll need an index attribute (or name of your choice) on the Student entity if there's only one ordering or a collection of Order entities describing the index in each ordering for each Student. You'll be responsible for maintaing these index values. Unfortunately, there's no easy way to do this in Core Data. It's just a lot of work.
Student <<->> StudentClass <<->> Class
StudentClass
----
studentID
order
classID
Then you can select as necessary.
For example, you have a student. Fetch all StudentClass where StudentID is student.studentID. You then have the order, as well as access to the Class.
You'll likely want to add some business logic to make your life easier. Also, if you're not already using it, take a peek at MOGenerator: https://github.com/rentzsch/mogenerator
EDIT: I'd really like to know why this is getting voted down. Comments would be much appreciated.

No-sql relations question

I'm willing to give MongoDB and CouchDB a serious try. So far I've worked a bit with Mongo, but I'm also intrigued by Couch's RESTful approach.
Having worked for years with relational DBs, I still don't get what is the best way to get some things done with non relational databases.
For example, if I have 1000 car shops and 1000 car types, I want to specify what kind of cars each shop sells. Each car has 100 features. Within a relational database i'd make a middle table to link each car shop with the car types it sells via IDs. What is the approach of No-sql? If every car shop sells 50 car types, it means replicating a huge amount of data, if I have to store within the car shop all the features of all the car types it sells!
Any help appreciated.
I can only speak to CouchDB.
The best way to stick your data in the db is to not normalize it at all beyond converting it to JSON. If that data is "cars" then stick all the data about every car in the database.
You then use map/reduce to create a normalized index of the data. So, if you want an index of every car, sorted first by shop, then by car-type you would emit each car with an index of [shop, car-type].
Map reduce seems a little scary at first, but you don't need to understand all the complicated stuff or even btrees, all you need to understand is how the key sorting works.
http://wiki.apache.org/couchdb/View_collation
With that alone you can create amazing normalized indexes over differing documents with the map reduce system in CouchDB.
In MongoDB an often used approach would be store a list of _ids of car types in each car shop. So no separate join table but still basically doing a client-side join.
Embedded documents become more relevant for cases that aren't many-to-many like this.
Coming from a HBase/BigTable point of view, typically you would completely denormalize your data, and use a "list" field, or multidimensional map column (see this link for a better description).
The word "column" is another loaded
word like "table" and "base" which
carries the emotional baggage of years
of RDBMS experience.
Instead, I find it easier to think
about this like a multidimensional map
- a map of maps if you will.
For your example for a many-to-many relationship, you can still create two tables, and use your multidimenstional map column to hold the relationship between the tables.
See the FAQ question 20 in the Hadoop/HBase FAQ:
Q:[Michael Dagaev] How would you
design an Hbase table for many-to-many
association between two entities, for
example Student and Course?
I would
define two tables: Student: student
id student data (name, address, ...)
courses (use course ids as column
qualifiers here) Course: course id
course data (name, syllabus, ...)
students (use student ids as column
qualifiers here) Does it make sense?
A[Jonathan Gray] : Your design does
make sense. As you said, you'd
probably have two column-families in
each of the Student and Course tables.
One for the data, another with a
column per student or course. For
example, a student row might look
like: Student : id/row/key = 1001
data:name = Student Name data:address
= 123 ABC St courses:2001 = (If you need more information about this
association, for example, if they are
on the waiting list) courses:2002 =
... This schema gives you fast access
to the queries, show all classes for a
student (student table, courses
family), or all students for a class
(courses table, students family).
In relational database, the concept is very clear: one table for cars with columns like "car_id, car_type, car_name, car_price", and another table for shops with columns "shop_id, car_id, shop_name, sale_count", the "car_id" links the two table together for data Ops. All the columns must well defined in creating the database.
No SQL database systems do not require you pre-define these columns and tables. You just construct your records in a certain format, say JSon, like:
"{car:[id:1, type:auto, name:ford], shop:[id:100, name:some_shop]}",
"{car:[id:2, type:auto, name:benz], shop:[id:105, name:my_shop]}",
.....
After your system is on-line providing service for your management, you may find there are some flaws in your design of db structure, you hope to add one column "employee" of "shop" for your future records. Then your new records coming is as:
"{car:[id:3, type:auto, name:RR], shop:[id:108, name:other_shop, employee:Bill]}",
No SQL systems allow you to do so, but relational database is impossible for this job.