How can you generate a unique value for a field that matches a concatenation of certain fields and a random number
i.e.
First Name: Jim
Last Name: Jones
Field Value: jimjones0345
obviously there's a need to ensure that this value was not populated before. How would one go about this?
Assuming your using SQL Server 2005 or later...
You might try something like
update myTable
set myNewColumn = FirstName+LastName+convert(varchar,(ABS(CAST(CAST(NEWID() AS VARBINARY(5)) AS Bigint))));
Related
I’m working with identifiers in a rather unusual format: every single ID has the same prefix and the prefix consists of as many as 25 characters. The only thing that is unique is the last part of the ID string and it has a variable length of up to ten characters:
ID
----------------------------------
lorem:ipsum:dolor:sit:amet:12345
lorem:ipsum:dolor:sit:amet:abcd123
lorem:ipsum:dolor:sit:amet:efg1
I’m looking for advice on the best strategy around indexing and matching this kind of ID string in PostgreSQL.
One approach I have considered is basically cutting these long prefixes out and only storing the unique suffix in the table column.
Another option that comes to mind is only indexing the suffix:
CREATE INDEX ON books (substring(book_id FROM 26));
I don’t think this is the best idea though as you would need to remember to always strip out the prefix when querying the table. If you forgot to do it and had a WHERE book_id = '<full ID here>' filter, the index would basically be ignored by the planner.
Most times I always create an integer type ID for my tables if even I have one unique string type of field. Recommendation for you is a good idea, I must view all your queries in DB. If you are recently using substring(book_id FROM 26) after the where statement, this is the best way to create expression index (function-based index). Basically, you need to check table joining conditions, which fields are used in the joining processes, and which fields are used after WHERE statements in your queries. After then you can prepare the best plan for creating indexes. If on the process of table joining you are using last part unique characters on the ID field then this is the best way to extract unique last characters and store this in additional fields or create expression index using the function for extracting unique characters.
I have a text input source. This has over 100 columns so I won't show all of them here - a cut-down view of the data would be:
CustomerNo
DOB
DOD
Status
01418495
01/02/1940
NULL
1
01418496
01/01/1930
NULL
1
The users want to be able to update/override any of these columns during processing by providing another input text file containing the PK (CustomerNo) and the key/value pairs of the columns to be updated e.g.
CustomerNo
Variable
New Value
01418495
DOB
01/12/1941
01418496
DOD
01/01/2021
01418496
Status
0
Can this data be used to create dynamic columns somehow that update the customer records regardless of the columns they want to update - in the example above this would result in:
CustomerNo
DOB
DOD
Status
01418495
01/02/1941
NULL
1
01418496
01/01/1930
01/01/2021
0
I have looked at the documentation but don't see any examples of how something like this could be achieved? Thanks in advance for any advice.
You would use a technique similar to what I describe in this video: https://www.youtube.com/watch?v=q7W6J-DUuJY. What I've done is created a file with rules that have expressions and then apply those rules dynamically inside of my data flow.
The key to make this work is using the expr() function to dynamically evaluate the expression from the external file.
I can copy the contents of one column to another using the sql UPDATE easily. But I need to do it without deleting the content already there, so in essence I want to append a column to another without overwriting the other's original content.
I have a column called notes then for some unknown reason after several months I added another column called product_notes and after 2 days realised that I have two sets of notes I urgently need to merge.
Usually when making a note we just add to any note already there with a form. I need to put these two columns like that, keeping any note in the first column eg
Column notes = Out of stock Pete 040618--- ordered 200 units Jade
050618 --- 200 units received Lila 080618
and
Column product_notes = 5 units left Dave 120618 --- unit 10724 unacceptable quality noted in list Dave 130618
I need to put them together with our spacer of --- without losing the first column's content so the result needs to be like this for my test case:
Column notes = Out of stock Pete 040618--- ordered 200 units Jade
050618 --- 200 units received Lila 080618 --- 5 units left Dave 120618 --- unit 10724 unacceptable quality noted in list Dave 130618
It's simple -
update table1 set notes = notes || '---' || product_notes;
The solution provided by #MaheshHViraktamath is fine, but the problem with simple string concatenation is that if any of the items being concatenated are NULL, the whole result becomes NULL.
Another potential issue is if either field is empty. In that case you might get a result of field a--- or ---field b.
To guard against the first scenario (without putting checks in the WHERE clause) you can use CONCAT_WS like so: CONCAT_WS('---', notes, product_notes). This will combine the two (or however many you put in there) fields with the first parameter, i.e. '---'. If either of those two fields are NULL, the separator won't be used, so you won't get a result with the separator prepended or appended.
There are two issues with the above: if both fields are NULL, the result isn't NULL but an empty string. To handle this case just put it in a NULLIF: NULLIF(CONCAT_WS('---', notes, product_notes), '') so that NULL is returned if both fields are NULL.
The other issue is if either field is empty, the separator will still be used. To guard against this scenario (and only you will know whether it's a scenario worth guarding against, or if this is even desired, based on your data), put each field in a NULLIF as well: NULLIF(CONCAT_WS('---', NULLIF(notes, ''), NULLIF(product_notes, '')), '')
As a result you get: UPDATE your_table SET notes = NULLIF(CONCAT_WS('---', NULLIF(notes, ''), NULLIF(product_notes, '')), '');
I am trying to retrieve values from a PostgreSQL database in a variable using a WHERE clause, but I am getting an error.
The query is:
select age into x from employee where name=name.GetValue()
name is the textcontrol in which I am entering a value from wxpython GUI.
I am getting an error as name schema doesn't exist.
What is the correct method for retrieving values?
"name.GetValue()" is a literal string, you are sending that to your db which knows nothing about wxpython and nothing about the variables in your program. You need to send the value of that data to your db, probably using bound parameters. Something like:
cur.execute("select age from employee where name=%s", [name.GetValue()])
x = cur.fetchone()[0] # returns a row containing [age] from the db
is probably what you're after. This will create a query with a placeholder in the database, then bind the value of name.GetValue() to that placeholder and execute the query. The next line fetches the first row of the result of the query and assigns x to the first item in that row.
I'm not positive what you are trying to do, but I think your issue might be syntax (misuse of INTO instead of AS):
SELECT age AS x FROM employee WHERE name = ....
I want to store an ID and a date and I want to retrieve all entries from dateA up to dateB, what exactly do I need to be able to perform select from my_column_family where date >= dateA and date < dateB; ?
the guys at #cassandra (IRC) helped me find a way, there's many subtle details so I'd like to document that here.
first you need to declare a column family similar to this (examples from cassandra-cli):
create column family users with comparator=UTF8Type and key_validation_class=UTF8Type and column_metadata=[
{column_name: id, validation_class: LongType}
{column_name: name, validation_class: UTF8Type, index_type: KEYS}
{column_name: age, validation_class: LongType}
];
few important things about this declaration:
the comparator and key_validation_class are there to be able to use strings as key names
the first declared column is special, it's the "row key" which is used to address each row and therefore cannot contain duplicate values (the INSERT is really an UPSERT so when there's duplicates the new values overwrite the old ones)
the second column declares a "secondary index" on its values (more on that below)
the dates are stored as Long datatypes, interpretation is up to the client
now let's add some values:
set users[1][name] = john;
set users[1][age] = 19;
set users[2][name] = jane;
set users[2][age] = 21;
set users[3][name] = john;
set users[3][age] = 32;
according to this: http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/ Cassandra does not support the < operators, what it does is to manually exclude the rows that don't match but it does that AFTER there's a resultset and it also refuses to do so unless and actual filtering has taken place.
what that means is that a query like get users where age > 20; will return null but if we add a predicate that includes = it'll magically work.
here's where the secondary index is important, without it you can't use = so on this example I can do get users where name = jane; but I cannot ask for get users where age = 21;
the funny thing is that, after using = the < works so having a secondary index allows you to ask for get users where name = john and age > 20; and it'll filter correctly.
There are a few ways to solve this. The simplest is probably the secondary index solution with the equality limitation mentioned in your own answer. I've used this method, adding an additional column called 'valid', setting the value to 1. Then the queries can become where valid=1 and date>nnnn
The other solutions require additional column families and additional queries.
When loading the data, create and add to a column family which contains the timestamps as keys, and each entry would list all the user ids as column names.
If the partitioning strategy is ordered, then a single RangeSliceQuery can specify the date range as a key range and get all the columns for each key. Then iterate through the result keys, using the column values for each user id and if needed, query the original column family for the data associated with each id. Cassandra always stores the column names sorted, and can be reversed when reading.
But, as documented, the ordered partitioner is not ideal, leading to hot spots and difficulty in load balancing the nodes.
Without the ordered partitioner, still keeping the timestamp column family, you would have to create another column family while loading data where you can store all the timestamps as the columns under one or more known keys (e.g. 'created' or 'updated'). The first query would be a SliceQuery for a known key, and then the column names (as timestamps) would provide the keys for the MultigetSliceQuery to the timestamp column family.
I've used variations on this, usually adding Composite keys or columns for additional flexibility.