what is the correct terminology for data that has no meaning? - data-management

A database organises/structures bits of data into tables, records and fields. The name of each table and field gives semantic meaning to the data they contain. So, for example, a datatable called 'individuals' may well have a field called 'firstName', and as a result we know that the string "john" found in that field on one record in that table is the first name of an individual. As a result, the string "john" has a meaning and as a result could be referred to as a piece of data.
That datatable could be represented on an excel or other spreadsheet with the records as rows and the fields given as column headers at the top of the sheet, so that we know that all the bits of info under the header "firstName" are first names.
Now, imagine that same sheet, but without the field names in the first row. We have a structured list of values, organised into rows and columns. We can see that in one column there are a series of values "john", "jim", "sue", etc. We might guess that they are first names but we don't KNOW that they are. My question is: what is the correct terminology for this kind of data which has no ostensible meaning.
In other fields of human endeavour one might be faced with an undifferentiated mass of meaningless or unimportant static or "noise", and with skill identify some meaningful intelligence or "signal". I am looking for the equivalent data management term for that undifferentiated mass containing both signal and noise.
Essentially, I am looking for a synonym for data, but a name that implies "non-information" in that the items are essentially collection of strings, numbers, whatever that have no obvious meaning (...yet!). Words such as values, info, content all seem to convey too great a sense of importance or meaning, whereas dross, rubbish, noise are also too pejorative. Any suggestions?

Related

ABAP Domain and Data Types Understanding

so my company wants me to learn ABAP for SAP and I have started on the road to learn this. My background is mainly VB.net and sqlserver with T-SQL but also have experience in c#.
With ABAP though I am needing some clarification or confirmation on the understanding of Data Types and Domain. If anyone can help.
My understanding currently is we have a table, in the table we have fields and the fields have data types and lengths if needed. Example: We have a table Customer, I could have a customerNumber field with the data type of char(10). To me this mean in the table customer we have a field called CustomerNumber that will have 10 characters.
However with ABAP we have Domains, Data elements then the field, does this mean we have a field named whatever we want. As the field could mean anything we assign a data element which has the descriptions of the sort of data stored within the field. However to store the format and data type we need to assign the Domain to the Data element.
For example I call a field ZCUSNO, currently this means nothing however if I assign the ZCTNMR (with description of customer number) Data element this tells us that the field ZCUSNO is ZCTNMR so ZCUSNO is a customer number field.
Now within the data elements we would have a domain and for our example ZCTNMR data element (the customer number) we could assign ZCTDOM as the domain which would be what I recognise as the data types so Char 20, Char 100 or integer field etc.
Is my understanding correct on this? and could someone give me a clear indication of what the difference between a Domain > Data Element is against what I would know as data types in sqlserver.
Thanks
I don't know if it's 100% correct, but that's is the way I use, like you say.
You can reuse the Domain, If you don't plan to reuse you can use direct the Data Element and refer this to a built-in-type.
Data Element is to define semantic of the field, like label, translation, etc
Domain is to define techinical info of the field, like Type, conversions, predefined Values,e tc
E.G.
Domain:
DOM_VALUE you define it's 10 position and 2 Decimals
Data Element:
UNIT_VAL you refer it to DOM_VALUE and define label as "Unit Value"
TOTAL_VAL you refer it to DOM_VALUE and define label as "Total Value"
Your understanding is pretty correct and not much can be added here.
You should clearly get the main thing.
Domains store technical data (decimal points, length, type, predefined values and so on)
Data elements store semantic data (labels, texts, search help binding, etc.)
Not every table field has data element (they can possess builtin type) but every field has type (either primitive or wrapped in data element).
If you wanna use your field in screens (Dynpros), ALV grids or other reports, then create data elements that will bear business meaning of your field.
If you use this field just for calculations or other utility internal tasks, then don't bother yourself.
As usual table date field (type of variable) uses data element which uses domain.
When you create fields in table and use predefined types instead of data elements you will have some problems in future, when you'll need to see the data on alv_grid.
Actually, you will see that you have some problems even before this (when you will try to make a maintenance view the header will have something like "+" symbol).
And of course we usually try to create 1 domain for 2 and more Data Elements.
In domain you talk about main logic.
In Data Element I always talk about Field label settings (how it'll show in future and some other things)
Final: Actually, the good practice, as I think to create a domain for data element, it may help you in future.
I hope that it helps you. Good luck!

FileMaker - Getting Data From Another Table with Multiple Field Restrictions

I can't think of a better title, so feel free to make a suggestion once you understand the issue.
I was given a table to work with that I need to call from another table:
Name
Month
Type
Value
For each record in the main table I need to pull one "Value" that corresponds to it. What it is will be determined by all three of the other fields. So for example, if a record in the main table is:
Name:
Google
Date:
3\17\2016
Type:
M
Then I need to pull the value for the record in the other table where the Name is "Google", the month is "3", and the type is "M".
I was able to do this successfully (if slowly) using an ExecuteSQL command in a calculation field, with a ton of nested If statements for the names (I have yet to figure out how to input the record's data directly into the ExecuteSQL statement, it breaks when I try). I would prefer to just grab the data directly. I can't switch over to the other layout because I need to see all of the records at once. I can't do a simple relationship because there isn't a real relationship, it's like there are three foreign keys working in tandem and I only know how to use one to call the data.
Any idea on how to do this more simplistically?
Some ideas I've had but not sure if it will work:
Using a calculation field as a related field to dynamically point to the row by code (concatenate the three relevant fields into a type of code). Not sure if you can connect two tables by a calculation field.
Doing that same thing when calling the data into the table in the first place, adding a code to create a single primary key.
Here are my relationships:
I can't do a simple relationship because there isn't a real
relationship, it's like there are three foreign keys working in tandem
and I only know how to use one to call the data.
Simply define a relationship with three predicates - i.e. three pairs of match fields.

How to best structure csv data for tableau that has "multiple categories"?

I have a set of 100 “student records”, I want to have checkboxes for each "favorite_food_type" and "favorite_food", whichever is checked would filter a "bar graph" that counts number of reports that contain that specific "favorite_food"type" and "favorite_food" schema could be:
name
favorite_food_type (e.g. vegetable)
favorite_food (e.g. banana)
I would like to in the dashboard be able to select via checkboxes, “Give me all the COUNT OF DISTINCT students with favorite_food of banana, apple, pear“ and filter graphs for all records. My issue is for a single student record, maybe one student likes both banana and apple. How do I best capture that? Should I have:
CASE A: Duplicate Records (this captures the two different “favorite_food”, but now I have to figure out how many students there are (which is one student)
NAME, FAVORITE_FOOD_TYPE,FRUIT
Charlie, Fruit, Apple
Charlie, Fruit, Pear
CASE B: Single Records (this captures the two different “favorite_food”, but is there a way to pick out from delimiters?)
NAME, FAVORITE_FOOD_TYPE,FRUITS
Charlie, Fruit, Apple#Pear
CASE C: Column for Each Fruit (this captures one record per student, but need a loooot of columns for each fruit, many would be false)
NAME, FAVORITE_FOOD_TYPE, APPLE, BANANA, PINEAPPLE, PEAR
Charlie, Fruit, TRUE, FALSE, TRUE, FALSE
I want to do this as easy as possible.
Avoid Case B if at all possible. Repeating information is almost always best handled by repeating rows -- not by cramming multiple values into a single table cell, nor by creating multiple columns such as Favorite_1 and Favorite_2
If you are provided data with multiple values in a field, Tableau does have functions and data connection features that can be used to split a single field into its constituent parts to form multiple fields. That works well with fixed number of different kinds of information -- say splitting a City, State field into separate fields for City and State.
Avoid Case C if at all possible. That cross tab structure makes it hard to analyze the data and make useful visualizations. Each value is treated as a separated field.
If you are provided data in crosstab format, Tableau allows you to pivot the data in the data connection pane to reshape into a form with fewer columns and many rows.
Case A is usually the best approach. You can simplify it further by factoring out repeating information into separated tables -- a process known as normalization. Then you can use a join to recombine the tables and see the repeating information when desired.
A normalized approach to your example would have two tables (or tabs in excel). The first table would have exactly one row per student with 2 columns: name and favorite_food_type. The second table would have a row per student/favorite food combination, with 2 columns: name and favorite_food. Now each student can have as many favorite foods as you like or none at all. Since both columns have a name field, that would be the key used to join (combine) the tables when needed.
Given that table design, you could have 2 data sources in Tableau. The first one just pointed to the student table and could be used to create visualizations that only involved students and favorite_food_types. The second data source would use a (left) join to read from both tables and could be used to look at favorite foods. When working with the second data source, you would have to be careful about reporting information about student names and favorite food types to account for the duplicate information. So use the first data source when possible. Finally, you could put both kinds of visualizations on a dashboard and use filter and highlight actions to make interaction seamless despite the two sources -- getting the best of both worlds.

Automating a data feed into a PostgreSQL table when the number of columns could change and there are duplicate names

My company uses a third-party vendor to get all of our NPS information. I'm trying to set up a data feed from this vendor into our data warehouse, which runs PostgreSQL.
The feed is in the form of 2 tab-separated text files: "question mapping" and the responses. The question map is one row per question, with columns for question id, question text, question label question type, etc - straightforward. The responses are one row per survey response, with a column for each question and stuff like user id, etc. Here are the 2 biggest problems:
The survey questions sometimes use the same question ID for different questions, resulting in multiple columns in the response data having the same name but not being the same question.
The number of questions could change, resulting in a different number of columns in the data.
Both of these things make it a real headache to automate a data feed into a single table.
I'm afraid I don't quite know how to phrase my real question other than, "Does anyone have any ideas how I can accomplish this?" If I think of something better than that, I'll come and update this, so for now:
Does anyone have any ideas at all about how I can efficiently set up my automated data feed without having to always drop and recreate everything?
If your data is a mess and doesn't really have well defined columns you can use the entity attribute value pattern, where you turn each fact into a set of rows with 4 columns - a unique row id, the same entity id for each row extracted from the map, an attribute column (where you put what would be the name of the column) you get from the key of the map, and a value column where you put the value from the map. It's not that neat but you can still query it and you won't have to drop it when you receive a map with a new column.

elasticsearch array field of keywords - how to index it

I've got input that is analogous to tags, where there are a couple of strings per record, and they should be thought of as keywords, not to be tokenized or broken up or analyzed in any particular way. I want it to show up in faceting "as-is", including spaces, slashes, dashes and ampersands.
I don't think I need multi_field here. There is one input value per record "keyPhrases" but the input value is a simple json array of strings.
I want elasticsearch to insert into the facets each of the values, and tag the record with all of the phrases.
Usually there are only one or two or three phrases per record, but there could be more. The set of keyPhrases is fairly small, like 30 or at most like 50. They could be thought of as "categories".
The faceting keeps breaking up the input strings and using lowercasing, even though I'm trying to specify not_analyzed, keyword tokenizer, keyword analyzer, and trying things like that.
I have other fields that keep their spacing and capitalization as I desire in the facets returned, however those fields are not_analyzed and are also store: true, but are also just exactly 1 string input per record, as opposed to many per record.
I could just take the top 1 keyPhrase per record and flatten it, but ideally all the tags would work and be available as facets.
Any ideas on how to do this?
Well, this is embarrassing.
My strict mapping wasn't actually committed to the server at the time I was trying this.
(I was dropping the index and creating the index again with each new mapping, and hadn't realized it, and this was not the final mapping, so it was getting loaded and then dropped.)