Data Dictionary generators for PostgreSQL to Confluence Wiki - postgresql

I'm looking for a tool that takes PostgreSQL tables and outputs a Data Dictionary in a wiki format (preferably Confluence). It seems like most tools out there require a lot of manual work/multiple tools to accomplish this task (IE> SchemaSpy, DB Visual Architect, Confluence plugins to take outputted HTML DD and convert to Confluence). I'm looking for ONE tool that will scan my Postgres tables and output a wiki friendly Data Dictionary that will allow seamless maintenance as the DB changes, without having to update my database and the DB schema in the other tool.

There is the Bob Swift's Confluence SQL Plugin that allows you to display data derived from a SQL query in a Confluence Page .. e.g. as a table .. perhaps it is worth a look for you?
Confluence versions 3.1.x - 4.9.x are currently supported...
The plugin is free and can be downloaded from Atlassian's Plugin Exchange: https://plugins.atlassian.com/plugins/org.swift.confluence.sql
Additional information about the plugin can be found here:
https://studio.plugins.atlassian.com/wiki/display/SQL/Confluence+SQL+Plugin

I think you'll have to script this yourself, but it's pretty easy and fun. I'll assume Python here.
I like the Confluence XML-RPC interface. For that, see http://goo.gl/KCt3z. The remote methods you care about are likely login, getPage, setPage and/or updatePage. This skeleton will look like:
import xmlrpclib
server = xmlrpclib.Server(opts.url)
conn = server.confluence1
token = conn.login(opts.username, opts.password)
page = conn.getPage(token,'PageSpace',page_title)
page = page + table
page = conn.updatePage(token,page,update_options)
table here is the data from PG tables. We'll build that below.
For pulling simple data from PostgreSQL, I use psycopg2 most often (also consider SQLSoup). Regardless of how you fetch data, you'll end up with a list of rows as a dictionary. The database part will probably look like:
import psycopg2, psycopg2.extras
conn = psycopg2.connect("dbname=reece")
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
cur.execute('SELECT * FROM sometable')
rows = cur.fetchall()
Now you need to format the data. For simple stuff, print= statements will work. For more complicated formatting, consider a templating engine like jinja2 (http://jinja.pocoo.org/). The rendering code might look like this:
from jinja2 import Template
template = Template(open(template_path).read())
table = template.render( rows = rows )
The file template_path will contain the formatting template, which might look like:
<table>
<tr>
<th>col header 1</th>
<th>col header 2</th>
</tr>
{% for row in rows|sort -%}
<tr>
<td>{{row.col1}}</td>
<td>{{row.col2}}</td>
</tr>
{% endfor %}
</table>
Note: Confluence no longer uses the wiki markup by default. You should be writing HTML.
Finally, if you want to make a page for all tables, you can look at information_schema, which contains information about the database as tables. For example:
select table_name from information_schema.tables where table_schema = current_schema();

Related

PostgreSQL: Import columns into table, matching key/ID

I have a PostgreSQL database. I had to extend an existing, big table with a few more columns.
Now I need to fill those columns. I tought I can create an .csv file (out of Excel/Calc) which contains the IDs / primary keys of existing rows - and the data for the new, empty fields. Is it possible to do so? If it is, how to?
I remember doing exactly this pretty easily using Microsoft SQL Management Server, but for PostgreSQL I am using PG Admin (but I am ofc willing to switch the tool if it'd be helpfull). I tried using the import function of PG Admin which uses the COPY function of PostgreSQL, but it seems like COPY isn't suitable as it can only create whole new rows.
Edit: I guess I could write a script which loads the csv and iterates over the rows, using UPDATE. But I don't want to reinvent the wheel.
Edit2: I've found this question here on SO which provides an answer by using a temp table. I guess I will use it - although it's more of a workaround than an actual solution.
PostgreSQL can import data directly from CSV files with COPY statements, this will however only work, as you stated, for new rows.
Instead of creating a CSV file you could just generate the necessary SQL UPDATE statements.
Suppose this would be the CSV file
PK;ExtraCol1;ExtraCol2
1;"foo",42
4;"bar",21
Then just produce the following
UPDATE my_table SET ExtraCol1 = 'foo', ExtraCol2 = 42 WHERE PK = 1;
UPDATE my_table SET ExtraCol1 = 'bar', ExtraCol2 = 21 WHERE PK = 4;
You seem to work under Windows, so I don't really know how to accomplish this there (probably with PowerShell), but under Unix you could generate the SQL from a CSV easily with tools like awk or sed. An editor with regular expression support would probably suffice too.

Updating the text of a large number of stored procedures

The question pretty much sums it up. I've got to replace text in a large number for store procedures. Its not so many that doing it manually is impossible, but enough that I'm asking the question. I also prefer automation as it reduces the change of user error when we make the change in production.
I can Identify them like this:
select OBJECT_DEFINITION(object_id), *
from sys.procedures
where OBJECT_DEFINITION(object_id) like '%''MyExampleLiteral''%'
order by name
Is there any way to mass update them all to change 'MyExampleLiteral' to 'MyOtherExampleLiteral'?
I'd even settle for a way to open all the stored procs. Just Finding these store procs in a larger list will take some time.
I thought about generating alter statements using the above select statements, but then I lose line breaks.
Thanks in advance,
This is a Microsoft SQL Server.
There are different tools to use depending on the database in question. For example, Microsoft SQL Server Data Tools integrates with Visual Studio, and allows you to do these types of operations fairly easily. The database is stored in your solution as scripts, which you can then search and replace any keyword you wish. I'm assuming there would be similar tools available for other platforms.
You could do this with dynamic sql. Query the system tables to get all the SPs containing your "MyExampleLiteral":
SELECT [object_id] FROM sys.objects o
WHERE type_desc = 'SQL_STORED_PROCEDURE'
AND is_ms_shipped = 0
AND OBJECT_DEFINITION(o.[object_id]) LIKE '%<search string>%'
Then, write a while loop to go through those object_ids. In the while loop, get the OBJECT_DEFINITION() into a string and replace the "MyExampleLiteral", then replace CREATE PROCEDURE with ALTER PROCEDURE and execute the string using sp_executesql.
Doing something this crazy, make sure you backup the database first.

Change the generated file name in EF Power Tools Beta 3

I've searched but haven't been able to find an answer to this question. Currently our Db users prefixes of tables - e.g. tblUsers. I've updated the EF templates to remove the "tbl" from the generated class names. However I still can't figure out how to change the output file name to match.
Is it possible or am I asking for the moon? I’m using EF Power Tools Beta 3 in VS 2012. Any help would be GREATLY appreciated!
Patrick, what you need is to modify the T4 template used by the EF Power Tools. When you want to create a code-first with all the mappings, instead of Reverse Engineer Code First option, choose Customize Reverse Engineer Template. You should get three files:
Context.tt
Entity.tt
Mapping.tt
For example, in Mapping.tt there is a line that reads MetadataProperties from a TableSet, and which extracts Table name. The line looks like this:
var tableSet = efHost.TableSet;
var tableName = (string)tableSet.MetadataProperties["Table"].Value ?? tableSet.Name;
This is where you need to make changes and do something like:
var newTableName = tableName.Replace("tbl", String.Empty);
Of course, you should opt for a different strategy and use Substring method or Regular expression to read and remove the first three characters. After that you have to go through the tt file and use your logic where you want to use tableName and where to user newTableName variable. You will keep tableName where the mapping is done with the table in a database, and newTableName where you want to use that name for your POCO classes and filenames.
Repeat the process for the other two files. For more information have a look at Rowan Miller's blog article. This should give you a pretty good idea how to proceed.

how to create a PostgreSQL table from a XML file...

I have a XML Document file. The part of the file looks like this:
-<attr>
<attrlabl>COUNTY</attrlabl>
<attrdef>County abbreviation</attrdef>
<attrtype>Text</attrtype>
<attwidth>1</attwidth>
<atnumdec>0</atnumdec>
-<attrdomv>
-<edom>
<edomv>C</edomv>
<edomvd>Clackamas County</edomvd>
<edomvds/>
</edom>
-<edom>
<edomv>M</edomv>
<edomvd>Multnomah County</edomvd>
<edomvds/>
</edom>
-<edom>
<edomv>W</edomv>
<edomvd>Washington County</edomvd>
<edomvds/>
</edom>
</attrdomv>
</attr>
From this XML file, I want to create a PostgreSQL table with columns of attrlabl, attrdef, attrtype, and attrdomv. I appreciate your suggestions!
While Erwin is right that this can be done with PostgreSQL tools, I would suggest still going the custom translation yourself as there are a few reasons here.
The first is determining appropriate XML to PostgreSQL type conversions. You probably want to choose these yourself. But this example highlights a very different problem, what to do with nested data structures. You could, for example, store XML fragments. You could store text, json, or the like. You could create other tables and fkey in.
In general I have almost always found the best approach is to simply manually create the tables. This substitutes human judgement for automated mappings and allows you to create better matches than a computer will.

Using Script Task to create ADO NET (ODBC) Data Flow Source

I need some help with a SSIS Script Task (SQL 2008 R2) that dynamically creates a package. I am refining a package that copies data from a Sage Timberline (Now rebranded to Sage 300) Pervasive SQL environment to a SQL server data warehouse. I can create a package that opens the connection to Timberline and copies the data to a table in SQL Server. The problem is, for each company in timberline and each table in SQL, I need to create a separate data flow task. Given the three Timberline company folders and the number of tables in each folder, this would take a lot of time to create and be cumbersome to maintain and troubleshoot.
I am trying to create a package that uses a Foreach Loop to create a package that creates a ADO/ODBC source (Timberline), a OLE destination (SQL) and dynamically handles the column mapping. I found code here that almost does what I need.
I tested this code and it works great using OLE SQL source and destinations. What makes this script work is that it dynamically handles the column mapping. So, it you placed it into a Foreach Loop of the 100 or so tables, with each loop it could dynamically create the data flow and map the columns, then execute the new package.
My problem is that I can only connect to Timberline using ODBC. So, I need to modify the script to create the source connection with ADO NET (ODBC) instead of OLE. I’m having a lot of trouble trying to figure this out. Could someone please help me out with this?
Here the other couple of things I tried first, other than this approach:
Solution: Setup a Linked server to Timberline Pervasive SQL
Problem: SQL server is 64-bit and the Timberline driver is 32-bit. Using a linked server returns a architecture mismatch error. I called Sage and they said they have no plans to release a 64-bit drive.
Solution: Use one of the SQL Transfer tasks
Problem: Only works with SQL databases. This source is a Pervasive SQL database
Solution: Use a “INSERT … INTO …” type script
Problem: This requires a linked server. See the problem above
Here’s the section of the original VB .NET code I need help with:
'To Create a package named [Sample Package]
Dim package As New Package()
package.Name = "Sample Package"
package.PackageType = DTSPackageType.DTSDesigner100
package.VersionBuild = 1
'To add Connection Manager to the package
'For source database (OLTP)
Dim OLTP As ConnectionManager = package.Connections.Add("OLEDB")
OLTP.ConnectionString = "Data Source=.;Initial Catalog=OLTP;Provider=SQLNCLI10;Integrated Security=SSPI;Auto Translate=False;"
OLTP.Name = "LocalHost.OLTP"
'To add Load Employee Dim to the package [Data Flow Task]
Dim dataFlowTaskHost As TaskHost = DirectCast(package.Executables.Add("SSIS.Pipeline.2"), TaskHost)
dataFlowTaskHost.Name = "Load Employee Dim"
dataFlowTaskHost.FailPackageOnFailure = True
dataFlowTaskHost.FailParentOnFailure = True
dataFlowTaskHost.DelayValidation = False
dataFlowTaskHost.Description = "Data Flow Task"
'-----------Data Flow Inner component starts----------------
Dim dataFlowTask As MainPipe = TryCast(dataFlowTaskHost.InnerObject, MainPipe)
' Source OLE DB connection manager to the package.
Dim SconMgr As ConnectionManager = package.Connections("LocalHost.OLTP")
' Create and configure an OLE DB source component.
Dim source As IDTSComponentMetaData100 = dataFlowTask.ComponentMetaDataCollection.[New]()
source.ComponentClassID = "DTSAdapter.OLEDBSource.2"
' Create the design-time instance of the source.
Dim srcDesignTime As CManagedComponentWrapper = source.Instantiate()
' The ProvideComponentProperties method creates a default output.
srcDesignTime.ProvideComponentProperties()
source.Name = "Employee Dim from OLTP"
' Assign the connection manager.
source.RuntimeConnectionCollection(0).ConnectionManagerID = SconMgr.ID
source.RuntimeConnectionCollection(0).ConnectionManager = DtsConvert.GetExtendedInterface(SconMgr)
' Set the custom properties of the source.
srcDesignTime.SetComponentProperty("AccessMode", 0)
' Mode 0 : OpenRowset / Table - View
srcDesignTime.SetComponentProperty("OpenRowset", "[dbo].[Employee_Dim]")
' Connect to the data source, and then update the metadata for the source.
srcDesignTime.AcquireConnections(Nothing)
srcDesignTime.ReinitializeMetaData()
srcDesignTime.ReleaseConnections()
Thanks in advance!
The C# code here is what you need if you need a Derived Column transform between the Source and Destination...
http://bifuture.blogspot.com/2011/01/ssis-adding-derived-column-to-ssis.html
To get the Source & Destination connections working, there is some secret sauce here to get things working between COM and .Net...
http://blogs.msdn.com/b/mattm/archive/2008/12/30/api-sample-ado-net-source.aspx
There is a similar page showing what to do for OleDB connections too.
Creating the source tables is easy. The available ODBC Metadata collections accessible should be retrieved with GetSchema("MetaDataCollections"). This will return a list of the available schema collections available for that particular ODBC driver.
Next, you'll want to see the data types returned from GetSchema("DataTypes"), so you can correctly interpret the data types for each column retrieved from GetSchema("Columns") to make your SQL Server create table script (which I'm assuming you've done).
To at least figure out which tables have primary keys, you'll need to loop over each table returned from GetSchema("Tables") in order to work with GetSchema("Indexes"). There's a bug that requires you to query the Indexes one table at a time. It is easy to google this - create a string array to pass in as the 3rd parameter: GetSchema("Indexes", tblName, resultArray[])
What I did was got the Tables and Columns collections into object variables in my parent SSIS package. Because Timberline is so fast (not), it seemed more efficient to pull all the columns down and filter them locally...which I do to create the tables in SQL Server, if necessary.
Once that is done, use the local copy of Tables again to manipulate a SSIS package in a Script task in "design mode" (change source and destination target tables, and redo the column mappings), and execute the now-in-memory SSIS package.
For me it took awhile to figure out. Both above URLs were required. I found and copied the .Net 2.0 Dts.PipelineWrap and Dts.RuntimeWrap .dlls to Microsoft.Net\FrameworkV2.0xxxxx folder, then referenced these in each script task wanting to use them, before setting up my "using DtsPW = Microsoft.SqlServer.Dts.Pipeline.Wrapper", etc.
Of note, because Timberline is 32-bit ODBC, I think it's necessary to build the SSIS package to use "X86", and target the script tasks to use .Net 2.0 framework.
I used the Derived Column code because I needed to copy multiple Timberline DBs into one SQL Server DB. Derived Column adds a "CompanyID" value to the output pipeline to SQL Server.
In the end, map the Destination's Virtual Input columns to its External Metadata columns, based off of the pipeline the Destination is attached to:
foreach (DtsPW.IDTSVirtualInputColumn100 vColumn in destVirtInput.VirtualInputColumnCollection)
{
var vCol = destInst.SetUsageType(destInput.ID, destVirtInput, vColumn.LineageID, DtsPW.DTSUsageType.UT_READWRITE);
destInst.MapInputColumn(destInput.ID, vCol.ID, destInput.ExternalMetadataColumnCollection[vColumn.Name].ID);
}
Anyways, that code will make more sense in the context of the bifuture.blogspot.com page.
The EzApi library could help with this too, but the AdoNet connection source for it is coded as a virtual class, so you'd need to implement specific classes to use. My C# kungfu is not strong enough for that in the time I have...
Also, CozyRoc sells a toolset with custom SSIS controls (data flow Source and Destination controls...) that looks like it does this on the fly input-to-output column mapping as well.
My package seems to work good enough now... Oh, and one more, I did not have luck trying to use DSN-less ODBC connections to Timberline, just: Dsn=dsnname;Uid=user;Pwd=pwd;
SSIS packages running in 64-bit land cannot see 32-bit DSNs on 64-bit OS, it seems...at least, it didn't work for me (win7-64, 32-bit Text ODBC DSN).