Automating a data feed into a PostgreSQL table when the number of columns could change and there are duplicate names

Automating a data feed into a PostgreSQL table when the number of columns could change and there are duplicate names - postgresql

My company uses a third-party vendor to get all of our NPS information. I'm trying to set up a data feed from this vendor into our data warehouse, which runs PostgreSQL.
The feed is in the form of 2 tab-separated text files: "question mapping" and the responses. The question map is one row per question, with columns for question id, question text, question label question type, etc - straightforward. The responses are one row per survey response, with a column for each question and stuff like user id, etc. Here are the 2 biggest problems:
The survey questions sometimes use the same question ID for different questions, resulting in multiple columns in the response data having the same name but not being the same question.
The number of questions could change, resulting in a different number of columns in the data.
Both of these things make it a real headache to automate a data feed into a single table.
I'm afraid I don't quite know how to phrase my real question other than, "Does anyone have any ideas how I can accomplish this?" If I think of something better than that, I'll come and update this, so for now:
Does anyone have any ideas at all about how I can efficiently set up my automated data feed without having to always drop and recreate everything?

If your data is a mess and doesn't really have well defined columns you can use the entity attribute value pattern, where you turn each fact into a set of rows with 4 columns - a unique row id, the same entity id for each row extracted from the map, an attribute column (where you put what would be the name of the column) you get from the key of the map, and a value column where you put the value from the map. It's not that neat but you can still query it and you won't have to drop it when you receive a map with a new column.

Related

what is the correct terminology for data that has no meaning?

A database organises/structures bits of data into tables, records and fields. The name of each table and field gives semantic meaning to the data they contain. So, for example, a datatable called 'individuals' may well have a field called 'firstName', and as a result we know that the string "john" found in that field on one record in that table is the first name of an individual. As a result, the string "john" has a meaning and as a result could be referred to as a piece of data.
That datatable could be represented on an excel or other spreadsheet with the records as rows and the fields given as column headers at the top of the sheet, so that we know that all the bits of info under the header "firstName" are first names.
Now, imagine that same sheet, but without the field names in the first row. We have a structured list of values, organised into rows and columns. We can see that in one column there are a series of values "john", "jim", "sue", etc. We might guess that they are first names but we don't KNOW that they are. My question is: what is the correct terminology for this kind of data which has no ostensible meaning.
In other fields of human endeavour one might be faced with an undifferentiated mass of meaningless or unimportant static or "noise", and with skill identify some meaningful intelligence or "signal". I am looking for the equivalent data management term for that undifferentiated mass containing both signal and noise.
Essentially, I am looking for a synonym for data, but a name that implies "non-information" in that the items are essentially collection of strings, numbers, whatever that have no obvious meaning (...yet!). Words such as values, info, content all seem to convey too great a sense of importance or meaning, whereas dross, rubbish, noise are also too pejorative. Any suggestions?

FileMaker - Getting Data From Another Table with Multiple Field Restrictions

I can't think of a better title, so feel free to make a suggestion once you understand the issue.
I was given a table to work with that I need to call from another table:
Name
Month
Type
Value
For each record in the main table I need to pull one "Value" that corresponds to it. What it is will be determined by all three of the other fields. So for example, if a record in the main table is:
Name:
Google
Date:
3\17\2016
Type:
M
Then I need to pull the value for the record in the other table where the Name is "Google", the month is "3", and the type is "M".
I was able to do this successfully (if slowly) using an ExecuteSQL command in a calculation field, with a ton of nested If statements for the names (I have yet to figure out how to input the record's data directly into the ExecuteSQL statement, it breaks when I try). I would prefer to just grab the data directly. I can't switch over to the other layout because I need to see all of the records at once. I can't do a simple relationship because there isn't a real relationship, it's like there are three foreign keys working in tandem and I only know how to use one to call the data.
Any idea on how to do this more simplistically?
Some ideas I've had but not sure if it will work:
Using a calculation field as a related field to dynamically point to the row by code (concatenate the three relevant fields into a type of code). Not sure if you can connect two tables by a calculation field.
Doing that same thing when calling the data into the table in the first place, adding a code to create a single primary key.
Here are my relationships:

I can't do a simple relationship because there isn't a real
relationship, it's like there are three foreign keys working in tandem
and I only know how to use one to call the data.
Simply define a relationship with three predicates - i.e. three pairs of match fields.

Creating long forms in FileMaker Pro

I am creating long forms in FileMaker Pro with many unique questions in each form.
Each unique question is comprised of: a radio button, two fields of support data, 4 container fields, and a field for comments. There is also a map feature that collects the device location when using an iPad.
Because each question is unique, I have been creating up to 8 fields for each question. The forms I am creating contain up to 40 questions.
Example fields:
Question1
Question1_Comments
Question1_Value1
Question1_Value2
Question1_Image[1], Question1_Image[2], Question1_Image[3], Question1_Image[4]
Is there is a simpler way of approaching this?

Yes. I can offer some general suggestions, but it sounds like you need to normalize your data. Whenever you start creating fields of the form Field1, Field2, etc., that's a hint that you should probably create a separate table. In your case it sounds like you need at least three tables:
Forms
Questions
Files
This is going from the information you've provided that each form has many questions and each question has many files (container fields). Assuming that your form table already has a primary key field (a field that is unique for every record, often an auto-enter serial number), the Questions table would have the following fields:
id (primary key)
form_id
question
comments
value1
value2
Then the Files table would have two fields:
id
question_id
file
Then you'd create a relationship from Forms to Questions with Forms::id=Questions::form_id and from Questions to Files with Questions::id=Files::question_id. If both of the value fields will always have data, I'd leave them in the Questions table, but if one of them could be blank, or if you think you may someday want more than two, I'd break that into it's own table as well.
Check the FileMaker documentation for more information on creating relationships.

FullText Index - Searching values from another table

Is it possible, in SQL Server 2008, using the full text index syntax, to run a query such as this one?
SELECT *
FROM TABLE_TO_SEARCH S,
TABLE_WITH_STRINGS_TO_SEARCH SS
WHERE
CONTAINS(S.WHOLE_NAME,SS.FIRST_NAME)
OR CONTAINS(S.WHOLE_NAME,SS.LAST_NAME)
I need to search for the FIRST_NAME in table TABLE_TO_SEARCH, column WHOLE_NAME that has an full text index on it. It doesn't seem to be a valid query though... Is there any workaround to it by using the full text index search?
LATER EDIT:
Here is the business case: each night I am downloading from several websites information about "blacklisted" individuals and insert it into a table in this format: WholeName, LastName, FirstName, MiddleName. But the data is chaotic as WholeName does not necessarily contain either the last, first or middle name or the WholeName is null while the other 3 fields have values, or every of these 4 fields is null and so on. Also, the data may repeat itself as one blacklisted individual may come from 2+ of these websites. What I need to do is to compare this data, as chaotic as it is, against our customer data based on our customer's First and Last name and give it a matching score (rank) against the files we download from these websites.
First I tried with charindex or like operators but I couldn't create a scoring algorithm based on this and also it took 6 hours to compare just our customer's first and last name with only the WholeName column from the TABLE_TO_SEARCH table. I thought that perhaps implementing the full_text index it would get easier and faster but ... apparently I was wrong.
Has anyone dealt with a task like this? And if so, what was the best approach?

After skimming http://technet.microsoft.com/en-us/library/ms187787.aspx and http://technet.microsoft.com/en-us/library/ms142571.aspx I don't think it is possible to do your search in this way. Not only that, but it seems this type of index wouldn't work well with names anyway.
If you care about checking one name then all you have to do is set those values to variables. This method would allow you to use the full-text index.
Otherwise, I would suggest splitting the WHOLE_NAME column (if there is a space or unique character between the first and last name) and comparing each part to those other columns. If you are working with a huge data set, you may want to experiment with doing this at a temp table level and creating an index.
Good luck!

Storing multiple test answers in Access

I'm fairly new to Access and have gotten stuck at a point despite hours of on-line research. In short, I'm trying to write a database that will store the answers that people give on several different tests. Some people take 1 test, some take 2, 3, etc. I need to store for each student what test(s) they took and what their answers were for each question. I feel like my current approach (make a separate field for each question on my MainRecord table along with a yes/no field for each test that can be taken) is cumbersome and leading to my ultimate problem: when I populate a continuous form with all of the test questions and an adjacent combo box to input their answer, I can't transcribe the combo box value back into my MainRecord. The data for the continuous form comes from a separate table (Test1) which has a field for question number and a lookup field that allows me to select the person's answer (i.e. A,B,C,D,E).
Is there a better way to construct my tables? If not, how can I get the combo box values on a continuous form into different fields on a table? Thanks, sorry if I sound like a moron.

You're going to need to look into a more generalized structure.
Here's a really basic structure that should work.
I can't help too much with all the continuous form stuff
Test
test_id
TestQuestion
test_id
question_id
question_order (used for sorting)
question_text
QuestionPossibleAnswers
question_id
possible_answer_value
possible_answer_prompt
Student
student_id
student_name
// etc...
StudentTest
test_id
student_id
date_taken
// whatever
(assuming a student can only take a test once)
StudentAnswers
student_id
question_id
student_answer (would be the possible_answer_value from "QuestionPossibleAnswers"
Anyway, when a student takes a test, your top form would be 'bound' to the 'StudentTest' table i guess. The continuous form would be based probably on StudentAnswer. The student_answer drop down would need to be bound to the current StudentAnswer.question_id's possible answers (through the query builder).
It's been years since i've done Access so i can't give step by step, I apologize, but the structure above is pretty sound (if not overly simple).