Does writing custom sql increases size of tde?
i am working on tableau. I have observed that when i used custom sql instead of including table directly, size of tde has been increased drastically.
Is is because of custom sql or there can be any other reason.?
Though i have not faced this issue, i think you can have a look at this link which might help you to consider while creating an extract.
http://kb.tableau.com/articles/knowledgebase/tips-working-with-extracts
Related
So, basically, I have a website that will be used by people to modify filters and then click 'download' and the resulting Excel file will have the data (specified by their filters). There are about 125,000+ data-points in my postgreSQL database, and I currently have it being loaded in the background using a simple
df = pd.read_sql_query('select * from real_final_sale_data', con = engine)
The only problem is that this quickly overwhelms Heroku's memory allowance on 1 dyno (512 MB), but I do not understand why this is happening or what the solution is.
For instance, when I run this on my computer and do 'df.info()' it shows that it's only using about 30 MB of space, so how come when I read it, it suddenly is sucking up so much MB?
Thank you so much for your time!
So, the solution that ended up working was to just use some of the filters as queries to SQL. I.e., I had been just doing a select * without filtering anything from SQL, so my database which has like 120,000 rows and 30 columns caused a bit of strain on Heroku's dyno so it's definitely recommended to either use chunking or do some filtering when querying the DB.
I have a dashboard where I have kept all the filters used in the dashboard as global filters and most used filters I have put as context filters,
The problem is the time taken to compute filters is about 1-2 minutes,How can I reduce this time taken in computing these filters
I have about 2 Million of extracted data, on Oracle with Tableau 9.3
Adding to Aron's point, you can also use a custom SQL to select only the dimensions and measures which you are going to use for the dashboard. I have worked on big data and it used to take around 5-7 mins to load the dashboard. Finally, ended up using custom sql and removing unnecessary filters and parameters. :)
There are several things you can look at it to guide performance optimization, but the details matter.
Custom SQL can help or hurt performance (more often hurt because it prevents some query optimizations). Context filters can help or hurt depending on user behavior. Extracts usually help, especially when aggregated.
An extremely good place to start is the following white paper by Alan Eldridge
http://www.tableau.com/learn/whitepapers/designing-efficient-workbooks
When creating data tables in Amazon Redshift, you can specify various encodings such as MOSTLY32 or BYTEDICT or LZO. Those are the compressions used when storing the columnar values on disk.
I am wondering if my choice of encoding is supposed to make a difference in query execution times. For example, if I make a column BYTEDICT would that make a difference over LZO when it comes to SELECTs, GROUP BYs or FILTERs?
Yes. The compression encoding used translates to amount of disk storage. Generally, the lower the storage the better would be query performance.
But, which encoding would be be more beneficial to you depends on your data type and its distribution. There is no gurantee that LZO will always be better than Bytedict or vice-a-versa. In my experience, I usually load some sample data in the intended table. Than do a analyze compression. Now whatever Redshift suggests, I go with it. That has worked for me.
Amazon actually has released a python script that can apply this automatically to your database. You can find this script here https://github.com/awslabs/amazon-redshift-utils/blob/master/src/ColumnEncodingUtility/analyze-schema-compression.py
Bit late but likely useful to anyone taking a look here:
Amazon can now decide on the best compression to use (Loading Tables with Automatic Compression), if you are using a COPY command to load your table, and there is no existing compression defined in your table.
You just have to add COMPUPDATE ON to your COPY command.
How do I implement Sphinx's companion Suggest feature on databases with large numbers of rows?
Right now, Sphinx's Suggest feature works fine on around 1,000 rows. But I am afraid that actually using Suggest, on a database with hundreds of thousands of rows, will not work out well especially if there are new rows are added regularly (say, 100 at a time). My database also contains very specific and technical terms, so I need to actually lower the frequency limit from the default 40 to 1.
How should I work on this? My primary concern is that using by using suggest.php --builddict, I am actually rewriting the whole suggest table from scratch. Is there a better way to do this?
I'm attempting to load large amounts of data directly into Sphinx from Mongo; and currently the best method I've found has been using xmlpipe2.
I'm wondering however if there are ways to just do updates to the dataset, as a full reindex of hundreds of thousands of records can take a while and be a bit intensive on the system.
Is there a better way to do this?
Thank you!
Main plus delta scheme. When all the updates goes to separate smaller index as described here:
http://sphinxsearch.com/docs/current.html#delta-updates