SWT/JFace table performance with many columns - swt

I have a table with approximately 100 columns. Number of rows are not very large, about 300 rows. But it's performance is not very good.
Scrolling (especially horizontal) and adding new rows proceed with visible lags. I do not use SWT.VIRTUAL as it's good for cases when there are huge number of rows. I'm aware of alternative table implementations in SWT, but my question is particularly about SWT/JFace implementation. Is there any way to improve performance for table with quite large amount of columns?

As you already stated, SWT.VIRTUAL only helps with large number of rows. AFAIK there isn't much you can do to improve handling a large amount of columns.
... other than reducing the amoiunt of columns. Left aside the weak performance, are you sure that 100 columns will lead to a good user experience? Wouldn't it be better to choose a different form of presenting the information?

Related

Ag-grid initializing grid with thousands of columns

I am initilizing a grid that contains more than a few thousand columns. The columns are defined by a user of my system and there is no limit to what they can define.
The performance is extremely slow. Approximately 5 minutes for the grid to render. Is there an alternative that I can do to enhance the rendering of the table?
It appears as though the columns go through a method called recursivelyCreateColumns which in my case gets called thousands of times.
Any help is appreciated

Create view or table

I have a table that will have about 3 * 10 ^ 12 lines (3 trillion), but with only 3 attributes.
In this table you will have the IDs of 2 individuals and the similarity between them (it is a number between 0 and 1 that I multiplied by 100 and put as a smallint to decrease the space).
It turns out that I need to perform, for a certain individual that I want to do the research, the summarization of these columns and returning how many individuals have up to 10% similarity, 20%, 30%. These values ​​are fixed (every 10) until identical individuals (100%).
However, as you may know, the query will be very slow, so I thought about:
Create a new table to save summarized values
Create a VIEW to save these values.
As individuals are about 1.7 million, the search would not be so time consuming (if indexed, returns quite fast). So, what can I do?
I would like to point out that my population will be almost fixed (after the DB is fully populated, it is expected that almost no increase will be made).
A view won't help, but a materialized view sounds like it would fit the bill, if you can afford a sequential scan of the large table whenever the materialized view gets updated.
It should probably contain a row per user with a count for each percentile range.
Alternatively, you could store the aggregated data in an independent table that is updated by a trigger on the large table whenever something changes there.

Designed PostGIS Database...Points table and polygon tables...How to make more efficient?

This is a conceptual question, but I should have asked it long ago on this forum.
I have a PostGIS database, and I have many tables in it. I have researched some on the use of keys in databases, but I'm not sure how to incorporate keys in the case of the point data that is dynamic and increases with time.
I'm storing point data in one table, and this data grows each day. It's about 10 million rows right now and will probably grow about 10 million rows each year or so. There are lat, lon, time, and the_geom columns.
I have several other tables, each representing different polygon groups (converted shapefiles to tables with shp2pgsql), like counties, states, etc.
I'm writing queries that relate the point data to the spatial tables to see if points are inside of the polygons, resulting in things like "55 points in X polygon in the past 24 hours", etc.
The problem is, I don't have a key that relates the point table to the other tables. I think this is probably inhibiting query efficiency, but I'm not sure.
I know this question is fairly vague, and I'm happy to clarify anything, but I basically have a bunch of points in a table that I'm spatially comparing to other tables, and I'm trying to find the best way to design things.
Thanks for any help!
If you don't have already, you should build a spatial index on both the point and polygons table.
Anyway, spatial comparisons are usually slower than numerical comparison.
So adding one or more keys to the point table referencing the other tables, and using them on your select queries instead of spatial operations, will surely speed up.
Obviously, inserts will be slower, but, given the numbers you gave (10millions per year), it should not be a problem.
Probably, just adding a foreign key to the smallest entities (cities for example) and joining the others to get results (countries, states...) will be faster than spatial comparison.
Foreign keys (and other constraints) are not needed to query. Moreover they arise as a consequence of whatever design arises appropriate to the application per priciples of good design.
They just tell the DBMS that a list of values under a list of columns in a table also appear elsewhere as a list of values under a list of columns in some table. (For avoiding errors and improving optimization.)
You still would want indices on columns that will be involved in joins. Eg you might want X coordinates in two tables to hav sorted indices, in the same order. This is independent of whether one column's values form a subset of another's, ie whether a foreign key constraint holds between them.

Insert Performance Benchmarks [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
It would be extremely useful to have some idea of expected performance benchmarks for inserts in a postgresql database. Typically the type of answers one would get on this are vague, and in many ways rightly so. For example, answers could range from every database is different, to it depends on the number of indexes/columns, to hardware makes a big difference, to db tuning makes a big difference etc. MY goal is to know the general guidelines of insert performance, roughly at an equivalent level as when an experienced SQL Developer's intuition says "this seems slow, I should try to optimize this".
Let me illustrate, someone could ask how much does it cost to buy a house? We answer, expensive! And there are many factors that go into the price such as size of the house and location in the country. BUT, to the person asking the question, they might think $20,000 is a lot of money so houses must cost about that much. Saying it's expensive and there are a lot of variables obviously doesn't help the person asking the question much. It would be MUCH more helpful for someone to say, in general the "normal" cost of houses ranges from $100K-$1M, the average middle-class family can afford a house between $200K and $500K, and a normal cost per square foot is $100/square foot.
All that to say I'm looking for ballpark performance benchmarks on inserts for the following factors
Inserting 1000, 10000, 100000 rows into average table size of 15 columns.
Rough effect of every additional 5 columns added to the table
Rough effect of each index on the table
Effect of special types of indexes
Any other ideas that people have
I'm fine with gut feel answers on these if you are an experienced postgresql performance tuner.
You cannot get a meaningful figure here for the list of conditions you specified, because you do not even list the types of conditions that would have a profound effect on the speed of the INSERT command:
Hardware capabilities:
CPU speed + number of cores
storage speed
memory speed and size
Cluster architecture, in case the batch is huge and can cross over
Execution scenario:
text batch, with pre-generated inserts one-by-one
direct stream-based insert
insert via a specific driver, like an ORM
In addition, the insert speed can be:
maintained (consistent or average) speed
single-operation speed, i.e. for a single batch execution
You can always find a combination of such criteria so bad you would be struggling to do 100 inserts a second, and on the other side it is possible to go over 1m of inserts in a properly set up environment and execution plan.
So you will find the speed of your implementation somewhere in between, but given the known conditions, the speed will be 42 :)

is kdb fast solely due to processing in memory

I've heard quite a couple times people talking about KDB deal with millions of rows in nearly no time. why is it that fast? is that solely because the data is all organized in memory?
another thing is that is there alternatives for this? any big database vendors provide in memory databases ?
A quick Google search came up with the answer:
Many operations are more efficient with a column-oriented approach. In particular, operations that need to access a sequence of values from a particular column are much faster. If all the values in a column have the same size (which is true, by design, in kdb), things get even better. This type of access pattern is typical of the applications for which q and kdb are used.
To make this concrete, let's examine a column of 64-bit, floating point numbers:
q).Q.w[] `used
108464j
q)t: ([] f: 1000000 ? 1.0)
q).Q.w[] `used
8497328j
q)
As you can see, the memory needed to hold one million 8-byte values is only a little over 8MB. That's because the data are being stored sequentially in an array. To clarify, let's create another table:
q)u: update g: 1000000 ? 5.0 from t
q).Q.w[] `used
16885952j
q)
Both t and u are sharing the column f. If q organized its data in rows, the memory usage would have gone up another 8MB. Another way to confirm this is to take a look at k.h.
Now let's see what happens when we write the table to disk:
q)`:t/ set t
`:t/
q)\ls -l t
"total 15632"
"-rw-r--r-- 1 kdbfaq staff 8000016 May 29 19:57 f"
q)
16 bytes of overhead. Clearly, all of the numbers are being stored sequentially on disk. Efficiency is about avoiding unnecessary work, and here we see that q does exactly what needs to be done when reading and writing a column - no more, no less.
OK, so this approach is space efficient. How does this data layout translate into speed?
If we ask q to sum all 1 million numbers, having the entire list packed tightly together in memory is a tremendous advantage over a row-oriented organization, because we'll encounter fewer misses at every stage of the memory hierarchy. Avoiding cache misses and page faults is essential to getting performance out of your machine.
Moreover, doing math on a long list of numbers that are all together in memory is a problem that modern CPU instruction sets have special features to handle, including instructions to prefetch array elements that will be needed in the near future. Although those features were originally created to improve PC multimedia performance, they turned out to be great for statistics as well. In addition, the same synergy of locality and CPU features enables column-oriented systems to perform linear searches (e.g., in where clauses on unindexed columns) faster than indexed searches (with their attendant branch prediction failures) up to astonishing row counts.
Sources(S): http://www.kdbfaq.com/kdb-faq/tag/why-kdb-fast
as for speed, the memory thing does play a big part but there are several other things, fast read from disk for hdb, splaying etc. From personal experienoce I can say, you can get pretty good speeds from c++ provided you want to write that much code. With kdb you get all that and some more.
another thing about speed is also speed of coding. Steep learning curve but once you get it, complex problems can be coded in minutes.
alternatives you can look at onetick or google in memory databases
kdb is fast but really expensive. Plus, it's a pain to learn Q. There are a few alternatives such as DolphinDB, Quasardb, etc.