Is there any way to avoid PostgreSQL placing the updated row as the last row? - postgresql

Well, my problem is that each time that I make an update of a row, this row goes to the last place in the table. It doesn't really matter where was placed before.
I've read in this post Postgresql: row number changes on update that rows in a relational table are not sorted. Then, why when I execute a select * from table; do I always get the same order?
Anyway, I don't want to start a discussion about that, just to know if is there any way to don't let update sentence place the row in the last place.
Edit for more info:
I don't really want to get all results at all. I have programmed 2 buttons in Java, next and previous and, being still a begginer, the only way that I had to get the next or the previous row was to use select * from table limit 1 and adding offset num++ or offset num-- depending of the button clicked. So, when I execute the update, I lose the initial order (insertion order).
Thanks.

You could make some space in your tables for updates. Change the fill factor from the default 100%, no space for updates left on a page, to something less to create space for updates.
From the manual (create table):
fillfactor (integer)
The fillfactor for a table is a percentage
between 10 and 100. 100 (complete packing) is the default. When a
smaller fillfactor is specified, INSERT operations pack table pages
only to the indicated percentage; the remaining space on each page is
reserved for updating rows on that page. This gives UPDATE a chance to
place the updated copy of a row on the same page as the original,
which is more efficient than placing it on a different page. For a
table whose entries are never updated, complete packing is the best
choice, but in heavily updated tables smaller fillfactors are
appropriate. This parameter cannot be set for TOAST tables.
But without an ORDER BY in your query, there is no guarantee that a result set will be sorted the way you expect it to be sorted. No fill factor can change that.

Related

What kind of index should I use in postgresql for a column with 3 values

I have a table with 100Mil+ records and 237 fields.
One of the fields is a varchar 1 field with three possible values (Y,N,I)
I need to find all of the records with N.
Right now I have a b-tree index built and the query below takes about 20 min to run.
Is there another index I can use to get better performance?
SELECT * FROM tableone WHERE export_value='N';
Assuming your values are roughly equally distributed (say at least 15% of each value) and roughly equally distributed throughout the table (some physically at the beginning, some in the middle, some at the end) then no.
If you think about it you'll see why. You'll have to look up tens of millions of disk blocks in the index and then fetch them from the disk one by one. By the time you have done that, it would have been quicker to just scan the whole table and pick out the values as they match. The planner knows this and would probably not use the index at all.
However - if you only have 17 rows with "N" or they are all very recently added to the table and so physically happen to be close to each other then yes, and index can help.
If you only had a few rows with "N" you would have mentioned it, so we can ignore that one.
If however you mostly insert to this table you might find a BRIN index helpful. That can let the planner see that e.g. the first 80% of your table doesn't have any "N" blocks and so it just needs to look at the last bit.

How to check if the stream of rows has ended

Is there a way for me to know if the stream of rows has ended? That is, if the job is on the last row?
What im trying to do is for every 10 rows do something, my problem are the last rows, for example in 115 rows, the last 5 wont happen but i need them to.
There is no built-in functionality in Talend which tells you if you're on the last row. You can work around this using one of the following:
Get the row count beforehand. For instance, if you have a file, you
can use tFileRowCount to count the number of rows, then when you
process your file, you use a variable for your current row
number, and so you can tell if you've reached the last row. If your
data come from a database, you could either issue a query that
returns the total number of rows beforehand, or modify your main
query to return the total number of rows in an additional column and
use that (using ranking functions).
Do some processing after the subjob has ended: There may be situations
where you need a special processing for the last row, you can achieve
this by getting the last row processed by the previous subjob (which
you have already saved, for instance, by putting a tSetGlobalVar
after your target, when your subjob is done, your variable contains the last written value).
Edit
For your use case, what you could do is first store the result of the API call in memory using tHashOutput, then read it with a tHashInput in order to process it, and you'll know then how many rows you have retrieved using tHashOutput's global variable tHashOuput_X_NB_LINE.

How does postgres stores row in page, when row size exceeds available free size in page ?

I am exploring storage mechanism of postgres. I know that postgres is using page like structure(each of size 8K) to store rows. One page can contain more than one row. I also know that TOASTing is done by postgres, when the row can not be contained in given page.
But I am not certain about following scerio :-
There's only 1K space left in current page, and the size of newly created row exceeds one 1K. In that case, what will happen ? Will new page be allocated for that row and old page will have unused space ? OR the old page's remaining space will be occupied, when another row with size less than or equal to 1K is created ?
I am referring TOAST. Following para is bit unclear :-
When a row that is to be stored is "too wide" (the threshold for that is 2KB by default), the TOAST mechanism first attempts to compress any wide field values. If that isn't enough to get the row under 2KB, it breaks up the wide field values into chunks that get stored in the associated TOAST table. Each original field value is replaced by a small pointer that shows where to find this "out of line" data in the TOAST table. TOAST will attempt to squeeze the user-table row down to 2KB in this way, but as long as it can get below 8KB, that's good enough and the row can be stored successfully.
Why it's talking about two sizes 8K and 2K ? Why postgres checks for threshold 2K ?
Thanks in advance.
First, I should clarify that “enough room in the table page” has nothing to do with the question if an attribute is TOASTed or not.
The paragraph you quote describes how TOAST tries to reduce the size of a table row that exceeds 2KB by first compressing the values and then storing them “out of line” in a TOAST table.
The idea is to reduce the size such that a row does not use up more than a quarter of the space in a table block. But if that fails, and the row ends up bigger than 2KB after TOASTing, that is no problem either, as long if the resulting row fits into one 8KB block.
A table row is always stored in a single table block. If there is not enough space left in any existing block, a new table block is allocated and the existing blocks are left with some empty space. This empty space can still be used for other, smaller new rows.
The limits of 8KB for a table block and 2KB for the threshold for TOASTing are somewhat arbitrary and based on experience. You can change them if you are ready to recompile PostgreSQL (from PostgreSQL v11 on, you can specify the block size when you create the database cluster with initdb), but I have not heard any reports that this is a good idea.

Erroneous duplicate rows in BIRT report

I have a problem with a BIRT report I'm working on where I have a nested table in the report. The outer table contains data to do with an item on an invoice, while the inner table contains stuff to do with price banding for labor charges. I've written a separate DataSet which gets the inner data, bound by parameters to data in the outer table. Now, when I preview the inner DataSet in BIRT using the defaults I've given it, it returns two rows of data for that bill number & item number - a normal rate & an overtime rate if you like. When I run the report in full over the same data, the outer table stuff is fine, but the inner table just repeats the same row over twice - it's just the first row repeating.
This is sorta what the table looks like in layout view:
Item Description Rate Quantity Item total
[item] [desc] [rate] [quantity] [total]
...where the price & quantity are in the inner table.
I'd have expected to see something like:
Item Description Rate Quantity Item Total
1 Callout $40 1 $40
2 Labor $30 4.5 $185
$50 1
but instead I get more like:
Item Description Rate Quantity Item Total
1 Callout $40 1 $40
2 Labor $30 4.5 $185
$30 4.5
...even though querying the database & previewing the inner data set based on the same input criteria show the expected result.
Has anyone else had experience like this? I have a hunch it's to do with bindings, but not sure what.
One way to get this behavior is by accidentally replacing a table-level binding with a column-level binding.
For example, define a table by dragging a data set into the report. Select the entire table (use the outline view, or select something in the table and then click on the "Table" button that pops up just below the grid.) Then go to the Binding tab. Note that the data set and column bindings are all filled in.
Now select just one field in the Detail row. On the Binding tab, note that the Data Set is blank, and no column binding is shown. Someone who is confused by this (as I was) might then edit the column's binding and specify the same Data Set that was used to create the table. If you do this you will only see a single value repeated in that column when you run the report. (I believe the overridden column is binding to a second instance of the data set, not the one the table is iterating over.)
Not sure your question can be answered withou looking at the data and the design. But it is important to note that the results you see in the dataset preview, and not neccisarly what you would see if the query was run fully. I have seen difference with 7 records returned. I thought as it was only 7 it would be the same on full run, but it's not. The preview is not just a top 500 query, it has some other (not sure what) filters also.
To problem solve if it is your query or your binding.
If you are using a SQL database. Run the SQL in a SSMS query and see if you get the same results you do when run in the innner table.
Altentively, create a new test report, copy over your dataset and use with a stand alone table.
I think I sorted it, & this is the most bizarre thing: On the child table I'd been deleting the header & footer row & just leaving the detail row in, in the layout view. Last thing today, just before I was going to go home, I tried again - deleted the table for about the 70th time that day, replaced it, re-did the parameter bindings all exactly as before, but this time I left the header row & footer intact. Clicked the preview tab, voila, all shows up correctly. So, since I didn't need the header or footer on the child table, I went into properties, clicked Hide this element, preview again - all good. No difference to the data bindings, no difference to mappings or anything else, no change to the data sets - the only difference was leaving the header & footer in place but hidden.
Contemplating making a bug report, tbh.

db2: select from table without replacement

Hi I would just like to ask a simple question - is there a way in DB2, to select a row from a table (whether that be based on a join or selecting a random row), and then select from the same table again where choosing the last, or any previous rows cannot be selected.
I am thinking I have to loop my code through each row in the table and delete each row I select, but would be interested if anyone has an alternative solution. No code needed but rather describe another approach.
Thanks,
Arron
The simplest way of doing this is to declare a cursor to select all rows from the table then process the
cursor one row at a time. Each row will be selected exactly 1 time (this is pretty much what a cursor is all about).
I suspect that is not the answer you were looking for. You most likely have at least two other constrains on this
selection problem:
You do not want, or cannot, have a single cursor open until the entire table has been processed
You want to have some sort of "randomness" with respect to the order in which rows are selected
The problem of not being able to open and process the entire table under a single cursor can be solved by
maintaining some sort of "state" information between selections. The "state" can be used to determine whether a row
is still eligible for selection on subsequent inquiries. You might add another column to the table to hold the "selected"
state of that row. When a row is inserted its "selected" state is set to "no". On each select operation the state
of the selected row is updated to "yes". The predicate to select new rows then needs to have a WHEN SELECT_STATE = 'no'
added to it to disqualify previously selected rows. If you cannot change the structure of the table you are selecting
from, then add a second table having the same primary key as the selection table plus the "selected" indicator then join
these tables to obtain the required state information.
Another approach is to delete a row once it has been selected.
These or some similar type of state management can be used to solve the selection eligibility problem.
If you need to introduce randomness into the selection process (i.e. make it difficult go guess what
the next row to be selected will be), then you have a very different problem to solve. If this is the case
please ask a new question outlining the approximate size of you table (how many rows) and what the key structure
is (eg. a number between 1 and 100000, a 30 character name etc.)
You can use a cursor, and use the 'delete where current of' feature called positioned-delete. For more information:
http://pic.dhe.ibm.com/infocenter/db2luw/v10r5/topic/com.ibm.db2.luw.sql.ref.doc/doc/r0000939.html
http://mysite.verizon.net/Graeme_Birchall/cookbook/DB2V97CK.PDF page 55