Generating message ID in PostgreSQL for storing Mattermost messages - postgresql

We are going to migrate from Skype to Mattermost and wish to copy messages for group chats, we managed to parse correctly data from *.db files where Skype stores all the messages and wish to insert that data to PostgreSQL database which will be used by Mattermost.
Main question is how algorithm used to generate Message ID works in Mattermost so we can generate these IDs correctly?

It's a base32 encoded GUID without padding. See the implementation here.
However, if you are importing message history, have you considered using the Bulk Import CLI instead of going directly to the database? This will take care of all the "internal" things like setting IDs and ensuring the relevant table columns are populated correctly.

Related

How do I get data for more than one table in Talend using Oracle CDC?

We are trying to connect Talend to our Oracle 12c database using CDC. The tOracleCDC component uses Oracle XStream to do the actual change data capture work. The issue is that when creating the CDC endpoint in Oracle one creates an "Outbound Server" which listens for changes on a number of tables, or even a number of whole schemas.
In Talend when configuring the tOracleCDC component one of the required fields is "Table Using CDC" which in the generated Java code is used to filter the incoming change records using something like "TableName".equalsIgnoringCase(... )
This means that we can only get changes for a single table for a given XStream connection (and each connection will require a unique outbound server object in the database).
We must be missing something, how can we pull changes for multiple tables in Talend?
Thanks!
The solution is to use an empty string as the table name in the Table Using CDC field. This will cause the templating engine to not emit the table name check that was causing this problem.
I could not find this documented anywhere, so it might be unsupported, but examining the templates shows that it is the intended behavior.

Talend open studio run only created or modified records among 15k

I have a job in talend open studio which is working fine, it conects a tMSSqlinput to a tMap then tMysqlOutput, very straight forward. My problem is that i need this job running on daily basis, but only run when a new record is created or modified...any help is highly aprecciated!
It seems that you are searching for a Change Data Capture Tool for Talend.
Unfortunately it is only available on the licenced product.
To implement your need, you do have several ways. I want to show the most popular ones.
CDC from Talend
As Corentin said correctly, you could choose to use CDC (Change Data Capture) from Talend if you use the subscription version.
CDC of MSSQL
Alternatively you can check if you can activate or use CDC in your MSSQL server. This depends on your license. If it is possible, you can use the function to identify new elements and proceed them.
Triggers
Also you can create triggers on your database (if you have access to it). For example, creating a trigger for the cases INSERT, UPDATE, DELETE would help you getting the deltas. Then you could store those records separately or their IDs.
Software driven / API
If your database is connected to a software and you have developers around, you could ask for a service which identifies records on insert / update / delete and shows them to you. This could be done e.g. in a REST interface.
Delta via ID
If the primary key is an ID and it is set to autoincrement, you could also check your MySQL table for the biggest number and only SELECT those from the source which have a bigger ID than you have already got. This depends of course from the database layout.

Viewing tableau server when one data source is missing

I have a dashboard in Tableau which pulls data from about 10 tables in a SQL database.
These tables are refreshed at various times of day. There are occasions where one of them is not available (or has been deleted and awaiting rebuild)
However when I open my tableau dashboard on the server it wont let me see any of it. Not seeing the data from the missing table is fine but the majority of the data that does not come from that table is unavailable too.
I get this error
An unexpected error occurred. If you continue to receive this error please contact your Tableau Server Administrator.
TableauException: [Microsoft][SQL Server Native Client 11.0][SQL Server]Invalid object name 'dbo.survey_order_info_fy16_TV_L'. The table "[dbo].[survey_order_info_fy16_TV_L]" does not exist. Unable to connect to the server "dbedwro.vistaprint.net". Check that the server is running and that you have access privileges to the requested database.
"survey_order_info_fy16_TV_L" being the missing table but not one I'm bothered about right now.
Is there an option that might help me see all the other data?
I am not sure if it's possible to avoid this behavior.
If there isn't there is a workaround for that by creating extract of these tables and storing them on the Tableau server. You can then use these extracts instead of the tables on the DB and just refresh them either by schedule if you know when the tables are available again or from the SQL server (eg. with SSIS by triggering the refresh once the data is available again).
Advantage of that would be that
you can refresh them independently and always have the latest data
it performs better than an SQL connection
you don't jam your SQL server with connections (in case you have a lot of users accesing)
you can filter and select if you didn't want your users to get access to the full dataset
disadvantages:
you will have to create one extract per table, and replace all data sources in workbooks you already use
It's a matter of creating a workbook, connecting to the source (adding filters or hiding fields) and publishing it to the server. Details of that can be found here:
http://onlinehelp.tableau.com/current/pro/online/mac/en-us/publish_datasources.html

Adding user information to centralized logging with ELK stack

I am using ELK stack (first project) to centralize logs of a server and visualize some real-time statistics with Kibana. The logs are stored in an ES index and I have another index with user information (IP, name, demographics). I am trying to:
Join user information with the server logs, matching the IPs. I want to include this information in the Kibana dashboard (e.g. to show in real-time the username of the connected users).
Create new indexes with filtered and processed information (e.g. users that have visited more than 3 times certain url).
Which is the best design to solve those problems (e.g. include username in the logstash stage through a filter, do scheduled jobs,...)? If the processing task (2) gets more complex, would it be better to use MongoDB instead?
Thank you!
I recently wanted to cross reference some log data with user data (containing IPs among other data) and just used elasticsearch's bulk import API. This meant extracting the data from a RDBMS, converting it to JSON and outputting a flat file that adhered to the format desired by the bulk import API (basically prefixing a row that describes the index and type).
That should work for an initial import, then your delta could be achieved using triggers in whatever stores your user data. Might simply write to a flat file and process like other logs. Other options might be JDBC River.
I am also interested to know where the data is stored originally (DB, pushing straight from a server..). However, I initially used the ELK stack to pull data back from a DB server using a batch file utilizing BCP (running on a scheduled task) and storing it to a flat file, monitoring the file with Logstash, and manipulating the data inside the LS config (grok filter). You may also consider a simple console/web application to manipulate the data before grokking with Logstash.
If possible, I would attempt to pull your data via SQL Server SPROC/BCP command and match the returned, complete message within Logstash. You can then store the information in a single index.
I hope this helps as I am by no means an expert, but I will be happy to answer more questions for you if you get a little more specific with the details of your current data storage; namely how the data is entering Logstash. RabbitMQ is another valuable tool to take a look at for your input source.

What applications do you use for data entry and retrieval via ODBC?

What apps or tools do you use for data entry into your database? I'm trying to improve our existing (cumbersome) system that uses a php web based system for entering data one ... item ... at ... a ... time.
My current solution to this is to use a spreadsheet. It works well with text and numbers that are human readable, but not with foreign keys that are used to join with the other table's rows.
Imagine that I want a row of data to include what city someone lives in. The column holding this is id_city, which is keyed to the "city" table which has two columns: id (serial) and name (text).
I envision being able to extend the spreadsheet capabilities to include dropdown menu's for every row of the id_city column that would allow the user to select which city (displaying the text of the city names), but actually storing the city id chosen. This way, the spreadsheet would:
(1) show a great deal of data on each screen and
(2) could be exported as a csv file and thrown to our existing scripts that manually insert rows into the database.
I have been playing around with MS Excel and Access, as well as OpenOffice's suite, but have not found something that gives me the functionality I mention above.
Other items on my wish-list:
(1) dynamically fetch the name of cities that can be selected by the user.
(2) allow the user to push the data directly into the backend (not via external files/scripts.
(3) If any of the columns of the rows of data gets changed in the backend, the user could refresh the data on the screen to reflect any recent changes.
Do you know how I could improve the process of data entry? What tools do you use? I use PostgreSQL for the backend and have access to MS Office, OpenOffice, as well as web based solutions. I would love a solution that is flexible, powerful, and doesn't require much time to develop or deploy (I know, dream on...)
I know that pgAdmin3 has similar functionality, but from what I have seen, it is more of an administrative tool rather than something for users to use.
As j_random_hacker noted, I've used MS Access for years (since Access 97) to connect to an ODBC Data Source.
You can do this via linking to external tables: (in Access 2010:)
New -> Blank Database
External Data -> ODBC Database -> Link to Data Source
Machine Data Source -> New -> System Data Source -> Select Driver (Oracle, or whatever) -> Finish
Enter a new name for your DSN, the all of the connection parameters, then click OK
Select newly created DSN, hit ok.
You can do so much once Access sees your external table as a linked table, including sorting, filtering, etc. There's one caveat: as far as I can tell, ALL operations happen on the client side unless you're using a pass-through query. That's fine if you're looking at a table with 3000 records. With 2,000,000 records, that hurts. To be clear, all data in the table comes down to the workstation, for all tables being joined, and the join happens client-side, NOT server-side.
There are usually standalone tools for basic database management - e.g., for Oracle and MySQL a free tool called SQL Developer suffices for basic database data entry.
For more complex types (especially involving clobs) I can usually knock an application together in Java+SWT in a day if we already have the model and DAOs available on the Java side. Yeah, you have to put some effort in, but if it will be used regularly in the future then it is probably worth it.
In your case (well, the case where you have bulk imports of data) knocking up some Perl that reads from the CSV and does the city id lookup would be trivial to implement. Maybe a waste for a one-off thing? Depends on the amount of data to import.
I would be surprised if MS Access can't do what you're looking for -- this is basically the exact use case for it. Namely, quickly throwing together a nice UI for a simple CRUD DB application that a spreadsheet doesn't quite stretch to.
This is an answer, technically, but not a recommendation:
I've used Excel and SSIS for importing simple data entry files into MS SQL, but it's not adequate - there's very little ability to control the data, and SSIS is so very touchy, especially when working with Excel.
MS Access does not work well with some non-Microsoft databases. There is an open-source equivalent called Apache OpenOffice Base you may want to try.