SQL - Best practice for handling mass arbitrary data [closed] - tsql

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a massive delimited file and many normalized tables to input the data. Is there a best practice for bringing in the data and inserting the data into its proper fields and tables?
For instance, right now I've created a temp table that holds all the arbitrary data. Some logic runs against each row to determine what values will be going in to what table. Without too much specifics the part that concerns me looks something like:
INSERT INTO table VALUES (
(SELECT TOP 1 field1 FROM #tmpTable),
(SELECT TOP 1 field30 FROM #tmpTable),
(SELECT TOP 1 field2 FROM #tmpTable),
...
(SELECT TOP 1 field4 FROM #tmpTable))
With that, my questions are: Is it reasonable to be using a temp table for this purpose? And is it poor practice to use these SELECT statements so liberally like this? It feels sort of hacky, are there a better ways to handle mass data importing and separation like this?

You should try SSIS.
SSIS How to Create an ETL Package

Related

how to write pytest functons by importing dataframe in palantir foundry [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
The community reviewed whether to reopen this question 8 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm able to write pytest functions by manually giving column names and values to create data frame and passing it to the production code to check all the transformed fields values in palantir foundry code repository.
Instead of manually passing column names and their respective values I want to store all the required data in dataset and import that dataset into pytest function to fetch all the required values and passing over to the production code to check all the transformed field values.
Is there anyways to accept dataset as an input the test function in planatir code repository.
You can probably do something like this:
Lets say you have your csv inside a fixtures/ folder next to your test.
test_yourtest.py
fixtures/yourfilename.csv
You can just read it directly and pass it to create a new dataframe. I didn't test this code but it should be something similar to this:
def load_file(spark_context):
filename = "yourfilename.csv"
file_path = os.path.join(Path(__file__).parent, "fixtures", filename)
return open(file_path).read()
Now you can load your CSV, it's just a matter of loading it into a dataframe and passing it into your pyspark logic that you want to test. Get CSV to Spark dataframe

Postgres jsonb data into Amazon Quicksight [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am looking into Amazon Quicksight as a reporting tool and I am using data from a postgres database, which in includes some columns in a few tables in jsonb format. Unfortunately these columns are skipped by Quicksight, because it only supports primitive types as mentioned here: https://docs.aws.amazon.com/quicksight/latest/user/data-source-limits.html
I am looking for a solution where I can include these data, together with the rest of the relational data that are in the same tables.
So far I cannot find anything better than actually making a view in my own application with this data in a relational format, that can be used by Quicksight. Is there anything else that does not pollute my original database with reporting stuff? I also thought of having these views only in the read-only replica of my db, but this is not possible with postgres on RDS. Athena is also not an option, and nor is the option to choose json as the data set, and this is because I want to have both the relational data and the json for my analysis.
Any better ideas?
Created a test Postgres table with the following columns:
id integer
info jsonb
Added data to the table, with a sample value:
{ "customer": "John Doe", "items": {"product": "Beer","qty": 6}}
In QuickSight, created a data set using custom SQL, with a SQL statement (based on [1]) similar to:
select id, (info#>>'{}') as jsonb_value from "orders"
With the above data set I was able to import both the columns to QuickSight SPICE as well as directly query the data. The JSONB column gets imported as 'String' type field in QuickSight.

slick & scala : What are TableQueries? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I am a bit disappointed with slick & its TableQueries : the model of an application can be a "class Persons(tag: Tag) extends Table[Person] for example (where Person is a case class with some fields like name, age,address...).
The weird point is the "val persons = TableQuery[Persons]" contains all the records.
To have for example all the adults, we can use:
adults = persons.filter(p => p.age >= 18).list()
Is the content of the database loaded in the variable persons?
Is there on the contrary a mechanism that allows to evaluate not "persons" but "adults"?(a sort of lazy variable)?
Can we say something like 'at any time, "persons" contains the entire database'?
Are there good practices, some important ideas that can help the developer?
thanks.
You are mistaken in your assumption that persons contains all of the records. The Table and TableQuery classes are representations of a SQL table, and the whole point of the library is to ease the interaction with SQL databases by providing a convenient, scala-like syntax.
When you say
val adults = persons.filter{ p => p.age >= 18 }
You've essentially created a SQL query that you can think of as
SELECT * FROM PERSONS WHERE AGE >= 18
Then when you call .list() it executes that query, transforming the result rows from the database back into instances of your Person case class. Most of the methods that have anything to do with slick's Table or Query classes will be focused on generating Queries (i.e. "select" statements). They don't actually load any data until you invoke them (e.g. by calling .list() or .foreach).
As for good practices and important ideas, I'd suggest you read through their documentation, as well as take a look at the scaladocs for any of the classes you are curious about.
http://slick.typesafe.com/docs/

What is a 'Schema' in PostgreSQL? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
As the question clearly indicates, what is actually a Schema in PostgreSQL that I see in the top level of the hierarchy in pgAdmin(III)?
Ok, I'm answering my own question just to clarify any other people (who do not have time to read docs or want a more simplified version):
You can think of a Schema as a namespace/package (just like in Java or C++). For example, let us assume mydb is the name of our database, A and B is the name of two different schemas which are present in the same database (mdb).
Now, we can use the same table name in two different schemas in the same single database:
mydb -> A -> myTable
mydb -> B -> myTable
Hope, that clarifies your answer. For more detail: PostgreSQL 9.3.1 Documentation - 5.7. Schemas

DB2 RECORDSET table name converted to uppercase [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
"SQL0204N "FUSIONDBUNIT.ACQUIREDRUN" is an undefined name. SQLSTATE=42704
The table is actually AcquireRun and not ACQUIREDRUN
Following line throws the exception
pRecordSet->Open(CRecordset::dynaset, NULL,CRecordset::readOnly | CRecordset::skipDeletedRecords)
DB2 table names are not case-sensitive unless you define them with double-quotes around the name, e.g. CREATE TABLE "MySchema"."MyTable" (...) will only work if you do:
SELECT *
FROM "MySchema"."MyTable"
I won't work even if you do SELECT * FROM MySchema.MyTable because DB2 automatically folds identifiers to upper-case, unless you quote them.
However, as noted by #sl0ppy, it looks like you might have a typo, AcquireRun vs. ACQUIREDRUN (no D).