Fetching only one field using sorm framework - scala

Is it possible to fetch only one field from the database using the SORM Framework?
What I want in plain SQL would be:
SELECT node_id FROM messages
I can't seem to be able to reproduce this in sorm. I know this might be against how sorm is supposed to work, but right now I have two huge tables with different kind of messages. I was asked to get all the unique node_ids from both tables.
I know I could just query both tables using sorm and parse through all the data but I would like to put the database to work. Obviously, this would be even better if one can get only unique node_ids in a single db call.
Right now with just querying everything and parsing it, it takes way too long.

There doesn't seem to be ORM support for what you want to do, unless node_id happens to be the primary key of your Message object:
val messages = Db.query[Message].fetchIds()
In this case you shouldn't need to worry about it being UNIQUE, since primary keys are by definition unique. Alternatively, you can run a custom SQL query:
Db.fetchWithSql[Message]("SELECT DISTINCT node_id FROM messages")
Note this latter might be typed wrong: you'd have to try it against your database. You might need fetchWithSql[Int], or some other variation: it is unclear what SORM does in the eventuality that the primary key hasn't been queried.

Related

Mybatis dynamic select query with prevention of 'no column exits' errors

What's the best approch to have a dynamic query like
select $dynamic_columns from table
But also prevent error like column not found and get result with available columns. considering $dynamic_columns is given by end users.
One approach would be to store the schema in java object and filter it. Again if schema is update in DB we will need to update the schema java object cache. is there any better way to handle this?
Be careful with this as it is more vulnerable to SQL injection.
Never let the user type something into a text field, instead build a
list for them to select from.
For building the list, I think the best approach is to use the JDBC method DatabaseMetaData.getColumns(...) to retrieve a list of columns for a table. I don't think there's a need to cache anything.

How to implement a high performing non incremental ID in postgresql? [duplicate]

I would like to replace some of the sequences I use for id's in my postgresql db with my own custom made id generator. The generator would produce a random number with a checkdigit at the end. So this:
SELECT nextval('customers')
would be replaced by something like this:
SELECT get_new_rand_id('customer')
The function would then return a numerical value such as: [1-9][0-9]{9} where the last digit is a checksum.
The concerns I have is:
How do I make the thing atomic
How do I avoid returning the same id twice (this would be caught by trying to insert it into a column with unique constraint but then its to late to I think)
Is this a good idea at all?
Note1: I do not want to use uuid since it is to be communicated with customers and 10 digits is far simpler to communicate than the 36 character uuid.
Note2: The function would rarely be called with SELECT get_new_rand_id() but would be assigned as default value on the id-column instead of nextval().
EDIT: Ok, good discussusion below! Here are some explanation for why:
So why would I over-comlicate things this way? The purpouse is to hide the primary key from the customers.
I give each new customer a unique
customerId (generated serial number in
the db). Since I communicate that
number with the customer it is a
fairly simple task for my competitors
to monitor my business (there are
other numbers such as invoice nr and
order nr that have the same
properties). It is this monitoring I
would like to make a little bit
harder (note: not impossible but
harder).
Why the check digit?
Before there was any talk of hiding the serial nr I added a checkdigit to ordernr since there were klumbsy fingers at some points in the production, and my thought was that this would be a good practice to keep in the future.
After reading the discussion I can certainly see that my approach is not the best way to solve my problem, but I have no other good idea of how to solve it, so please help me out here.
Should I add an extra column where I put the id I expose to the customer and keep the serial as primary key?
How can I generate the id to expose in a sane and efficient way?
Is the checkdigit necessary?
For generating unique and random-looking identifiers from a serial, using ciphers might be a good idea. Since their output is bijective (there is a one-to-one mapping between input and output values) -- you will not have any collisions, unlike hashes. Which means your identifiers don't have to be as long as hashes.
Most cryptographic ciphers work on 64-bit or larger blocks, but the PostgreSQL wiki has an example PL/pgSQL procedure for a "non-cryptographic" cipher function that works on (32-bit) int type. Disclaimer: I have not tried using this function myself.
To use it for your primary keys, run the CREATE FUNCTION call from the wiki page, and then on your empty tables do:
ALTER TABLE foo ALTER COLUMN foo_id SET DEFAULT pseudo_encrypt(nextval('foo_foo_id_seq')::int);
And voila!
pg=> insert into foo (foo_id) values(default);
pg=> insert into foo (foo_id) values(default);
pg=> insert into foo (foo_id) values(default);
pg=> select * from foo;
foo_id
------------
1241588087
1500453386
1755259484
(4 rows)
I added my comment to your question and then realized that I should have explained myself better... My apologies.
You could have a second key - not the primary key - that is visible to the user. That key could use the primary as the seed for the hash function you describe and be the one that you use to do lookups. That key would be generated by a trigger after insert (which is much simpler than trying to ensure atomicity of the operation) and
That is the key that you share with your clients, never the PK. I know there is debate (albeit, I can't understand why) if PKs are to be invisible to the user applications or not. The modern database design practices, and my personal experience, all seem to suggest that PKs should NOT be visible to users. They tend to attach meaning to them and, over time, that is a very bad thing - regardless if they have a check digit in the key or not.
Your joins will still be done using the PK. This other generated key is just supposed to be used for client lookups. They are the face, the PK is the guts.
Hope that helps.
Edit: FWIW, there is little to be said about "right" or "wrong" in database design. Sometimes it boils down to a choice. I think the choice you face will be better served by leaving the PK alone and creating a secondary key - just that.
I think you are way over-complicating this. Why not let the database do what it does best and let it take care of atomicity and ensuring that the same id is not used twice? Why not use a postgresql SERIAL type and get an autogenerated surrogate primary key, just like an integer IDENTITY column in SQL Server or DB2? Use that on the column instead. Plus it will be faster than your user-defined function.
I concur regarding hiding this surrogate primary key and using an exposed secondary key (with a unique constraint on it) to lookup clients in your interface.
Are you using a sequence because you need a unique identifier across several tables? This is usually an indication that you need to rethink your table design, and those several tables should perhaps be combined into one, with an autogenerated surrogate primary key.
Also see here
How you generate the random and unique ids is a useful question - but you seem to be making a counter productive assumption about when to generate them!
My point is that you do not need to generate these id's at the time of creating your rows, because they are essentially independent of the data being inserted.
What I do is pre-generate random id's for future use, that way I can take my own sweet time and absolutely guarantee they are unique, and there's no processing to be done at the time of the insert.
For example I have an orders table with order_id in it. This id is generated on the fly when the user enters the order, incrementally 1,2,3 etc forever. The user does not need to see this internal id.
Then I have another table - random_ids with (order_id, random_id). I have a routine that runs every night which pre-loads this table with enough rows to more than cover the orders that might be inserted in the next 24 hours. (If I ever get 10000 orders in one day I'll have a problem - but that would be a good problem to have!)
This approach guarantees uniqueness and takes any processing load away from the insert transaction and into the batch routine, where it does not affect the user.
Your best bet would probably be some form of hash function, and then a checksum added to the end.
If you're not using this too often (you do not have a new customer every second, do you?) then it is feasible to just get a random number and then try to insert the record. Just be prepared to retry inserting with another number when it fails with unique constraint violation.
I'd use numbers 1000000 to 999999 (900000 possible numbers of the same length) and check digit using UPC or ISBN 10 algorithm. 2 check digits would be better though as they'll eliminate 99% of human errors instead of 9%.

Partial inserts with Cassandra and Phantom DSL

I'm building a simple Scala Play app which stores data in a Cassandra DB using the Phantom DSL driver for Scala. One of the nice features of Cassandra is that you can do partial updates i.e. so long as you provide the key columns, you do not have to provide values for all the other columns in the table. Cassandra will merge the data into your existing record based on the key.
Unfortunately, it seems this doesn't work with Phantom DSL. I have a table with several columns, and I want to be able to do an update, specifying values just for the key and one of the data columns, and let Cassandra merge this into the record as usual, while leaving all the other data columns for that record unchanged.
But Phantom DSL overwrites existing columns with null if you don't specify values in your insert/update statement.
Does anybody know of a work-around for this? I don't want to have to read/write all the data columns every time, as eventually the data columns will be quite large.
FYI I'm using the same approach to my Phantom coding as in these examples:
https://github.com/thiagoandrade6/cassandra-phantom/blob/master/src/main/scala/com/cassandra/phantom/modeling/model/GenericSongsModel.scala
It would be great to see some code, but partial updates are possible with phantom. Phantom is an immutable builder, it will not override anything with null by default. If you don't specify a value it won't do anything about it.
database.table.update.where(_.id eqs id).update(_.bla setTo "newValue")
will produce a query where only the values you've explicitly set to something will be set to null. Please provide some code examples, your problem seems really strange as queries don't keep track of table columns to automatically add in what's missing.
Update
If you would like to delete column values, e.g set them to null inside Cassandra basically, phantom offers a different syntax which does the same thing:
database.table.delete(_.col1, _.col2).where(_.id eqs id)`
Furthermore, you can even delete map entries in the same fashion:
database.table.delete(_.props("test"), _.props("test2").where(_.id eqs id)
This assumes props is a MapColumn[Table, Record, String, _], as the props.apply(key: T) is typesafe, so it will respect the keytype you define for the map column.

Discard values while inserting and updating data using slick

I am using slick with play2.
I have multiple fields in the database which are managed by the database. I don't want to create or update them, however I want to get them while reading the values.
For example, suppose I have
case class MappedDummyTable(id: Int, .. 20 other fields, modified_time: Optional[Timestamp])
which maps Dummy in the database. modified_time is managed by the database.
The problem is during insert or update, I create an instance of MappedDummyTable without the modified time attribute and pass it to slick for create/update like
TableQuery[MappedDummyTable].insert(instanceOfMappedDummyTable)
For this, Slick creates query as
Insert INTO MappedDummyTable(id,....,modified_time) Values(1,....,null)
and updates the modified_time as NULL, which I don't want. I want Slick to ignore the fields while updating and creating.
For updating, I can do
TableQuery[MappedDummyTable].map(fieldsToBeUpdated).update(values)
but this leads to 20 odd fields in the map method which looks ugly.
Is there any better way?
Update:
The best solution that I found was using multiple projection. I created one projection to get the values and another to update and insert the data
maybe you need to write some triggers in table if you don't want to write code like row => (row.id,...other 20 fields)
or try use None instead of null?
I believe that the solution with mapping non-default field is the only way to do it with Slick. To make it less ugly you can define function ignoreDefaults on MappedDummyTable that will return only non default value and function in companion object to MappedDummyTable case class that returns projection
TableQuery[MappedDummyTable].map(MappedDummyTable.ignoreDefaults).insert(instanceOfMappedDummyTable.ignoreDefaults)

duplicate primary key in return table created by select union

I have the following query called searchit
SELECT 2 AS sourceID, BLOG_COMMENTS.bID, BLOG_TOPICS.Topic_Title,
BLOG_TOPICS.LFD, BLOG_TOPICS.LC,
BLOG_COMMENTS.Comment_Narrative
FROM BLOG_COMMENTS INNER JOIN BLOG_TOPICS
ON BLOG_COMMENTS.bID = BLOG_TOPICS.bID
WHERE (BLOG_COMMENTS.Comment_Narrative LIKE #Phrase)
This query executes AND returns the correct results in the query builder!
HOWEVER, the query needs to run in code-behind, so I have the following line:
DataTable blogcomments = btad.searchit(aphrase);
There are no null fields in any row of any column in EITHER of the tables. The tables are small enough I can easily detect null data. Note that bID is key for blog_topics and cID is key for blog comments.
In any case, when I run this I get the following error:
Failed to enable constraints. One or more rows contain values
violating non-null, unique, or foreign-key constraints.
Tables have a 1 x N relationship, many comments for each blog entry. IF I run the query with DISTINCT and remove the Comment_Narrative from the return fields, it returns data correctly (but I need the other rows!) However, when I return the other rows, I get the above error!
I think tells me that there is a constraint on the return table that I did not put there, therefore it must somehow be inheriting that constraint from the call to the query itself because one of the tables happens to have a primary key defined (which it MUST have). But why does the query work fine in the querybuilder? The querybuilder does not care that bID is duped in the result (and it should not be), but the code-behind DOES care.
Addendum:
Just as tests,
I removed the bID from the return list and I still get the error.
I removed the primary key from blog_topics.bID and I get the same error.
This kinda tells me that it's not the fact that my bID is duped that is causing the problem.
Another test:
I went into the designer code (I know it's nasty, I'm just desperate).
I added the following:
// zzz
try
{
this.Adapter.Fill(dataTable);
}
catch ( global::System.Exception ex )
{
}
Oddly enough, when I run it, I get the same error as before AND it doesn't show the changes I've made in the error message:
Line 13909: }
Line 13910: BPLL_Dataset.BLOG_TOPICSDataTable dataTable = new BPLL_Dataset.BLOG_TOPICSDataTable();
Line 13911: this.Adapter.Fill(dataTable);
Line 13912: return dataTable;
Line 13913: }
I'm stumped.... Unless maybe it sees I'm not doing anything in the try catch and is optimizing for me.
Another addendum:
Suspecting that it was ignoring the test code I added to the designer, I added something to the catch. It produces the SAME error and acts like it does not see this code. (Well, okay, it DOES NOT see this code, because it prints out same as before into the browser.)
// zzz
try
{
this.Adapter.Fill(dataTable);
}
catch ( global::System.Exception ex )
{
System.Web.HttpContext.Current.Response.Redirect("errorpage.aspx");
}
The thing is, when I made the original post, I was ALREADY trying to do a work-around. I'm not sure how far I can afford to go down the rabbit hole. Maybe I read the whole mess into C# and do all the joins and crap myself. I really hate to do that, because I've only recently gotten out of the habit, but I perceive I'm making a good faith effort to use the the tool the way God and Microsoft intended. From wit's end, tff.
You don't really show how you're running this query from C# ... but I'm assuming either as a straight text in a SqlCommand or it's being done by some ORM ... Have you attempted writing this query as a Stored Procedure and calling it that way? The stored Procedure would be easier to test and run by itself with sample data.
Given the fact that the error is mentioning null values I would presume that, if it is a problem with the query and not some other element of your code, then it'd have to be on one of the following fields:
BLOG_COMMENTS.bID
BLOG_TOPICS.bID
BLOG_COMMENTS.Comment_Narrative
If any of those fields are Nullable then you should be doing a COALESCE or an ISNULL on them before using them in any comparison or Join. It's situations like these which explain why most DBAs prefer to have as few nullable columns in tables as possible - they cause overhead and are prone to errors.
If that still doesn't fix your problem, then COALESCE/ISNULL all fields that are nullable and are being returned by this query. Take all null values out of the equation and just get the thing working and then, if you really need the null values to be null, go back through and remove the COALESCE/ISNULLs one at a time until you find the culprit.
My problem came from ignorance and a bit of dullness. I did not realize that just because a field is a key in the sql table does mean it has to be a key in the tableadapter. If one has a key field defined in the SQL table and then creates a table adapter, the corresponding field in the adapter will also be a key. All I had to do was unset the key field in the tableadapter and it worked.
Solution:
Select the key field in the adapter.
Right click
Select "Delete Key" (keeps the field, but removes the "key" icon)
That's it.