how to create a PostgreSQL table from a XML file... - postgresql

I have a XML Document file. The part of the file looks like this:
-<attr>
<attrlabl>COUNTY</attrlabl>
<attrdef>County abbreviation</attrdef>
<attrtype>Text</attrtype>
<attwidth>1</attwidth>
<atnumdec>0</atnumdec>
-<attrdomv>
-<edom>
<edomv>C</edomv>
<edomvd>Clackamas County</edomvd>
<edomvds/>
</edom>
-<edom>
<edomv>M</edomv>
<edomvd>Multnomah County</edomvd>
<edomvds/>
</edom>
-<edom>
<edomv>W</edomv>
<edomvd>Washington County</edomvd>
<edomvds/>
</edom>
</attrdomv>
</attr>
From this XML file, I want to create a PostgreSQL table with columns of attrlabl, attrdef, attrtype, and attrdomv. I appreciate your suggestions!

While Erwin is right that this can be done with PostgreSQL tools, I would suggest still going the custom translation yourself as there are a few reasons here.
The first is determining appropriate XML to PostgreSQL type conversions. You probably want to choose these yourself. But this example highlights a very different problem, what to do with nested data structures. You could, for example, store XML fragments. You could store text, json, or the like. You could create other tables and fkey in.
In general I have almost always found the best approach is to simply manually create the tables. This substitutes human judgement for automated mappings and allows you to create better matches than a computer will.

Related

Creating spectrum table in matillion for csv file with comma inside quotes

I have a scenario for creating spectrum table in redshift using matillion.
my CSV file data is like this:-
column1,column2,column3
abc,"qwety,pqr",xyz
but in spectrum table i am seeing data
as
column1 column2 column3
abc qwerty pqr
Matillion is not taking quotes value as one.
can you please suggest how to achieve this using matillion's EXTERNAL TABLE component.
Basically you would like to specify a quote parameter for your CSV data.
Redshift has 2 ways of specifying external tables (see Redshift Docs for reference):
using the default built-in SerDes and properties like ROW FORMAT DELIMITED, FIELDS TERMINATED BY
explicitly specifying a SerDe with ROW FORMAT SERDE, WITH SERDEPROPERTIES
I don't think it's possible to specify a quote parameter using the built-in SerDes.
It is possible to specify them using org.apache.hadoop.hive.serde2.OpenCSVSerde (look here for details on it's properties), but beware that there are know problems with it, as one described in this SO question.
Now for Metillion:
I have never used Matillion, but looking at their Redshift External Table documentation page, looks like it's only possible to specify the FORMAT and the FIELD TERMINATOR, but not to specify a SerDe and it's properties, hence it's not possible to specify the quote parameters for the external table - unless there are some undocumented means to specify a custom SerDe.
Personal note:
We have experienced many problems with ingesting data stored as CSV, and we basically try to avoid it. There's no standard for CSV, each tool implements it's own version of support for it, and it's very difficult convince all your tools to see data the same way.

Db2 for I: Cpyf *nochk emulation

In the IBM i system there's a way to copy a from a structured file to one without structure using Cpyf *nochk.
How can it be done with sql?
The answer may be "You can't", not if you are using DDL defined tables anyway. The problem is that *NOCHK just dumps data into the file like a flat file. Files defined with CRTPF, whether they have source, or are program defined, don't care about bad data until read time, so they can contain bad data. In fact you can even read bad data out of a file if you use a program definition for that file.
But, an SQL Table (one defined using DDL) cannot contain bad data. No matter how you write it, the database validates the data at write time. Even the *NOCHK option of the CPYF command cannot coerce bad data into an SQL table.
There really isn't an easy way
Closest would be to just build a big character string using CONCAT...
insert into flatfile
select mycharfld1
concat cast(myvchar as char(20))
concat digits(zonedFld3)
from mytable
That works for fixed length, varchar (if casted to char) and zoned decimal...
Packed decimal would be problematic..
I've seen user defined functions that can return the binary character string that make up a packed decimal...but it's very ugly
I question why you think you need to do this.
You can use QSYS2.QCMDEXC stored procedure to execute OS commands.
Example:
call qsys2.qcmdexc ( 'CPYF FROMFILE(QTEMP/FILE1) TOFILE(QTEMP/FILE2) MBROPT(*replace) FMTOPT(*NOCHK)' )

Updating the text of a large number of stored procedures

The question pretty much sums it up. I've got to replace text in a large number for store procedures. Its not so many that doing it manually is impossible, but enough that I'm asking the question. I also prefer automation as it reduces the change of user error when we make the change in production.
I can Identify them like this:
select OBJECT_DEFINITION(object_id), *
from sys.procedures
where OBJECT_DEFINITION(object_id) like '%''MyExampleLiteral''%'
order by name
Is there any way to mass update them all to change 'MyExampleLiteral' to 'MyOtherExampleLiteral'?
I'd even settle for a way to open all the stored procs. Just Finding these store procs in a larger list will take some time.
I thought about generating alter statements using the above select statements, but then I lose line breaks.
Thanks in advance,
This is a Microsoft SQL Server.
There are different tools to use depending on the database in question. For example, Microsoft SQL Server Data Tools integrates with Visual Studio, and allows you to do these types of operations fairly easily. The database is stored in your solution as scripts, which you can then search and replace any keyword you wish. I'm assuming there would be similar tools available for other platforms.
You could do this with dynamic sql. Query the system tables to get all the SPs containing your "MyExampleLiteral":
SELECT [object_id] FROM sys.objects o
WHERE type_desc = 'SQL_STORED_PROCEDURE'
AND is_ms_shipped = 0
AND OBJECT_DEFINITION(o.[object_id]) LIKE '%<search string>%'
Then, write a while loop to go through those object_ids. In the while loop, get the OBJECT_DEFINITION() into a string and replace the "MyExampleLiteral", then replace CREATE PROCEDURE with ALTER PROCEDURE and execute the string using sp_executesql.
Doing something this crazy, make sure you backup the database first.

Change the generated file name in EF Power Tools Beta 3

I've searched but haven't been able to find an answer to this question. Currently our Db users prefixes of tables - e.g. tblUsers. I've updated the EF templates to remove the "tbl" from the generated class names. However I still can't figure out how to change the output file name to match.
Is it possible or am I asking for the moon? I’m using EF Power Tools Beta 3 in VS 2012. Any help would be GREATLY appreciated!
Patrick, what you need is to modify the T4 template used by the EF Power Tools. When you want to create a code-first with all the mappings, instead of Reverse Engineer Code First option, choose Customize Reverse Engineer Template. You should get three files:
Context.tt
Entity.tt
Mapping.tt
For example, in Mapping.tt there is a line that reads MetadataProperties from a TableSet, and which extracts Table name. The line looks like this:
var tableSet = efHost.TableSet;
var tableName = (string)tableSet.MetadataProperties["Table"].Value ?? tableSet.Name;
This is where you need to make changes and do something like:
var newTableName = tableName.Replace("tbl", String.Empty);
Of course, you should opt for a different strategy and use Substring method or Regular expression to read and remove the first three characters. After that you have to go through the tt file and use your logic where you want to use tableName and where to user newTableName variable. You will keep tableName where the mapping is done with the table in a database, and newTableName where you want to use that name for your POCO classes and filenames.
Repeat the process for the other two files. For more information have a look at Rowan Miller's blog article. This should give you a pretty good idea how to proceed.

How to alter Postgres table data based on its contents?

This is probably a super simple question, but I'm struggling to come up with the right keywords to find it on Google.
I have a Postgres table that has among its contents a column of type text named content_type. That stores what type of entry is stored in that row.
There are only about 5 different types, and I decided I want to change one of them to display as something else in my application (I had been directly displaying these).
It struck me that it's funny that my view is being dictated by my database model, and I decided I would convert the types being stored in my database as strings into integers, and enumerate the possible types in my application with constants that convert them into their display names. That way, if I ever got the urge to change any category names again, I could just change it with one alteration of a constant. I also have the hunch that storing integers might be somewhat more efficient than storing text in the database.
First, a quick threshold question of, is this a good idea? Any feedback or anything I missed?
Second, and my main question, what's the Postgres command I could enter to make an alteration like this? I'm thinking I could start by renaming the old content_type column to old_content_type and then creating a new integer column content_type. However, what command would look at a row's old_content_type and fill in the new content_type column based off of that?
If you're finding that you need to change the display values, then yes, it's probably a good idea not to store them in a database. Integers are also more efficient to store and search, but I really wouldn't worry about it unless you've got millions of rows.
You just need to run an update to populate your new column:
update table_name set content_type = (case when old_content_type = 'a' then 1
when old_content_type = 'b' then 2 else 3 end);
If you're on Postgres 8.4 then using an enum type instead of a plain integer might be a good idea.
Ideally you'd have these fields referring to a table containing the definitions of type. This should be via a foreign key constraint. This way you know that your database is clean and has no invalid values (i.e. referential integrity).
There are many ways to handle this:
Having a table for each field that can contain a number of values (i.e. like an enum) is the most obvious - but it breaks down when you have a table that requires many attributes.
You can use the Entity-attribute-value model, but beware that this is too easy to abuse and cause problems when things grow.
You can use, or refer to my implementation solution PET (Parameter Enumeration Tables). This is a half way house between between 1 & 2.