I'm trying to query multiple tables at once. Say I have a table named PRESCHOOLERS and I have another one called FAVORITE_GOOEY_TREATS, with a foreign key column in the PRESCHOOLERS table referencing the id field of FAVORITE GOOEY TREAT. What would I do if I wanted to get a list of preschoolers with their first names alongside their favorite treats. I mean something like:
first_name | treat
john | fudge
sally | ice-cream
Here's what I'm trying, but I've got a syntax error on the where part.
SELECT PRESCHOOLERS.first_name, FAVORITE_GOOEY_TREATS.name as treat
FROM PRESCHOOLERS, FAVORITE_GOOEY_TREATS
WHERE PRESCHOOLERS.favorite_treat = FAVORITE_GOOEY_TREATS.id and PRESCHOOLERS.age>15;
As far as I know this kind of thing is alright by sql standards, but sqlite3 doesn't much like it. Can someone point me at some examples of similar queries that work?
Try
SELECT PRESCHOOLERS.first_name, FAVORITE_GOOEY_TREATS.name as treat
FROM PRESCHOOLERS
JOIN FAVORITE_GOOEY_TREATS ON PRESCHOOLERS.favorite_treat = FAVORITE_GOOEY_TREATS.id
WHERE PRESCHOOLERS.age > 15;
Related
Good morning! I am currently working on creating a postgreSQL database with some client information, however I ran into an issue which I wasn't able to solve with my basic knowledge of SQL. Searching for this method also returned with no results which I found useful or applicable.
I have two tables: 'mskMobile' and 'emailData'. Both of those tables contain a column named 'email' and some of those emails overlap. I figured out that I can view those intersecting emails by requesting
SELECT "mailData".email
FROM "mailData"
JOIN "mskMobile"
ON "mailData".email="mskMobile".email;
Now I want to write the data of two other columns of those common rows in 'mskMobile' named 'name' and 'surname' to the corresponding columns in 'emailData' (named identically), however I cannot find any answer on how to do so. Any suggestions on how to execute this action?
UPDATE "mksMobile" SET name = "mailData".name, surname = "mailData".surname
FROM "mailData"
WHERE "mailData".email = "mskMobile".email;
After a bit more research I came up with a following way of declaring it:
SELECT "mailData".email, "mskMobile".num, "mskMobile".name
FROM "mailData"
INNER JOIN "mskMobile"
ON "mailData".email="mskMobile".email;
This allowed me to build a new table with the data combined.
I am trying to figure out what would be the best way to go ahead and locate duplicates in a 5 column csv data. The real data has more than million rows in it.
Following is the content of mentioned 6 columns.
Name, address, city, post-code, phone number, machine number
Data does not have fixed length, data might in certain columns might be missing in certain instances.
I am thinking of using perl to first normalize all the short forms used in names, city and address. Fellow perl enthusiasts from stackoverflow have helped me a lot.
But there would still be a lot of data which would be difficult to match.
So I am wondering is it possible to match content based on "LIKELINESS / SIMILARITY" (eg. google similar to gugl) the likeliness would be required to overcome errors that creeped in while collecting data.
I have 2 tasks in hand w.r.t. the data.
Flag duplicate rows with certain identifier
Mention the percentage match between similar rows.
I would really appreciate if I could get suggestions as to what all possible methods could be employed and which would propbably be best because of their certain merits.
You could write a Perl program to do this, but it will be easier and faster to put it into a SQL database and use that.
Most SQL databases have a way to import CSV. For this answer, I suggest PostgreSQL because it has very powerful string functions which you will need to find your fuzzy duplicates. Create your table with an auto incremented ID column if your CSV data doesn't already have unique IDs.
Once the import is done, add indexes on the columns you want to check for duplicates.
CREATE INDEX name ON whatever (name);
You can do a self-join to look for duplicates in whatever way you like. Here's an example that finds duplicate names.
SELECT id
FROM whatever t1
JOIN whatever t2 ON t1.id < t2.id
WHERE t1.name = t2.name
PostgreSQL has powerful string functions including regexes to do the comparisons.
Indexes will have a hard time working on things like lower(t1.name). Depending on the sorts of duplicates you want to work with, you can add indexes for these transforms (this is a feature of PostgreSQL). For example, if you wanted to search case insensitively you can add an index on the lower-case name. (Thanks #asjo for pointing that out)
CREATE INDEX ON whatever ((lower(name)));
// This will be muuuuuch faster
SELECT id
FROM whatever t1
JOIN whatever t2 ON t1.id < t2.id
WHERE lower(t1.name) = lower(t2.name)
A "likeness" match can be achieved in several ways, a simple one would be to use the fuzzystrmatch functions like metaphone(). Same trick as before, add a column with the transformed row and index it.
Other simple things like data normalization are better done on the data itself before adding indexes and looking for duplicates. For example, trim out and squish extra whitespace.
UPDATE whatever SET name = trim(both from name);
UPDATE whatever SET name = regexp_replace(name, '[[:space:]]+', ' ');
Finally, you can use the Postgres Trigram module to add fuzzy indexing to your table (thanks again to #asjo).
I'm having trouble with the 'Ambiguous column name' issue in Transact-SQL, using the Microsoft SQL 2012 Server Management Studio.
I´ve been looking through some of the answers already posted on Stackoverflow, but they don´t seem to work for me, and parts of it I simply don´t understand or loses the general view of.
Executing the following script :
USE CDD
SELECT Artist, Album_title, track_title, track_number, Release_Year, EAN_code
FROM Artists AS a INNER JOIN CD_Albumtitles AS c
ON a.artist_id = c.artist_id
INNER JOIN Track_lists AS t
ON c.title_id = t.title_id
WHERE track_title = 'bohemian rhapsody'
triggers the following error message :
Msg 209, Level 16, State 1, Line 3
Ambiguous column name 'EAN_code'.
Not that this is a CD database with artists names, album titles and track lists. Both the tables 'CD_Albumtitles' and 'Track_lists' have a column, with identical EAN codes. The EAN code is an important internationel code used to uniquely identify CD albums, which is why I would like to keep using it.
You need to put the alias in front of all the columns in your select list and your where clause. You're getting that error because one of the columns you have currently is coming from multiple tables in your join. If you alias the columns, it will essentially pick one or the other of the tables.
SELECT a.Artist,c.Album_title,t.track_title,t.track_number,c.Release_Year,t.EAN_code
FROM Artists AS a INNER JOIN CD_Albumtitles AS c
ON a.artist_id = c.artist_id
INNER JOIN Track_lists AS t
ON c.title_id = t.title_id
WHERE t.track_title = 'bohemian rhapsody'
so choose one of the source tables, prefixing the field with the alias (or table name)
SELECT Artist,Album_title,track_title,track_number,Release_Year,
c.EAN_code -- or t.EAN_code, which should retrieve the same value
By the way, try to prefix all the fields (in the select, the join, the group by, etc.), it's easier for maintenance.
I have a table in Cassandra containing name, item.
Using the following data types: name is text, item is set<text>.
f.e. I have these entries:
name | item
a | {item1, item3}
b | {item2, item3}
c | {item1, item2}
Now my question: Is there any way to get all names having item1?
I tried this, but didn't work:
SELECT name
FROM table
WHERE item = 'item1';
I get an error that 'item1' is a string, but item is a set<text>.
I guess there is a way to do this, but I can't think of how.
Thanks in advance.
Unlikely this is not yet supported in Cassandra. May be in some upcoming version we will be able to index even collection items.
I hope you can help find an answer to a problem that will become a recurring theme at work. This involves denormalising data from RDBMS tables to flat file formats with repeating groups (sharing domain and meaning) across columns. Unfortunately this is unavoidable.
Here's a very simplified example of the transformation I'd require:
TABLE A TABLE B
------------------- 1 -> MANY ----------------------------
A_KEY FIELD_A B_KEY A_KEY FIELD_B
A_KEY_01 A_VALUE_01 B_KEY_01 A_KEY_01 B_VALUE_01
A_KEY_02 A_VALUE_02 B_KEY_02 A_KEY_01 B_VALUE_02
B_KEY_03 A_KEY_02 B_VALUE_03
This will become:
A_KEY FIELD_A B_KEY1 FIELD_B1 B_KEY2 FIELD_B2
A_KEY_01 A_VALUE_01 B_KEY_01 B_VALUE_01 B_KEY_02 B_VALUE_02
A_KEY_02 A_VALUE_02 B_KEY_03 B_VALUE_03
Each entry from TABLE A will have one row in the output flat file with one column per related field from TABLE B. Columns in the output file can have empty values for fields obtained from TABLE B.
I realise this will create an extremely wide file, but this is a requirement. I've had a look at MapForce and Apatar, but I think this problem is too bizarre or I can't use them correctly.
My question: is there already a tool that will accomplish this or should I develop one from scratch (I don't want to reinvent the wheel)?
I'm pretty sure you can't solve this in plain SQL, but depending on your RDBMS, it may be possible to create a stored procedure or some such thing. Otherwise it's a fairly easy thing to do in a scripting language. Which technology are you using?
Does this help?
using-pivot-in-sql-server-2008
Thanks for all your help. As it turns out the relationship is ONE -> MAX of 3 and this constraint will not change as the data is now static so the following run-of-the-mill SQL works:
select A.A_KEY, A.FIELD_A, B.B_KEY, B.FIELD_B, B2.B_KEY, B2.FIELD_B, B3.B_KEY,
B3.FIELD_B
from
A left join B on (A.A_KEY = B.A_KEY)
left join B B2 on (A.A_KEY = B2.A_KEY and B2.B_KEY != B.B_KEY)
left join B B3 on (A.A_KEY = B3.A_KEY and B3.B_KEY != B.B_KEY
and B3.B_KEY != B2.B_KEY)
group by A.A_KEY
order by A.A_KEY