Tableau is counting too many records - tableau-api

I am new to tableau and not sure what information is needed to answer this question.
I have set up my Data Source with multiple tables and joins. All of the joins are inner joins.
I am trying to get a count of records from one of my tables.
The table has 112,000 records but tableau is returning a count of 12 million

One of your joins is multiplying the record count by 10. I suggest removing the joins 1 by 1 until you pinpoint the cause.

Related

Tableau Fixed Calculation Summing too much data

I have volume data for specific customers. The customer names come from salesforce and the volume comes from another table. When I add each in tableau, i get a nice table that seems to be working.
We can see that there are 19 values ~500 My ultimate goal is to sum these based upon filters.
A way i discovered that i can do that is to use the syntax
{ FIXED [Account Id]: count([Volume]) }
But when i do that,
I get
When I change my function to count([volume]) i get a count of all joined rows ~250k
My question is how do i make this respect indivudal entries in the database and not all the joined values? If there was a way to do the sum for distinct timestamps in another field this would also work? Any other advice would be helpful from you tableau experts.
Thanks!
I think i got it. In the table of the database that i was trying to calculate there were 20 rows that needed to be calculated. When the data was joined in SF, it duplicated the rows. The trick here was to do the sum of the max for each primary key
SUM({ FIXED [Pk], [Name1] : MAX([Volume]) })

Simple update query taking too long - Postgres

I have a table with 28 million rows that I want to update. It has around 60 columns and a ID column (primary key) with an index created on it. I created four new columns and I want to populate them with the data from four columns from other table which also has an ID column with an index created on it. Both tables have the same amount of rows and just the primary key and the index on the IDENTI column. The query has been running for 15 hours and since it is high priority work, we are starting to get nervous about it and we don't have so much time to experiment with queries. We have never update a table so big (7 GB), so we are not sure if this amount of time is normal.
This is the query:
UPDATE consolidated
SET IDEDUP2=uni.IDEDUP2
USE21=uni.USE21
USE22=uni.USE22
PESOXX2=uni.PESOXX2
FROM uni_group uni, consolidated con
WHERE con.IDENTI=uni.IDENTI
How can I make it faster? Is it possible? If not, is there a way to check how much longer it is going to take (without killing the process)?
Just as additional information, we have ran before much more complex queries for 3 million row tables (postgis) and It has taken it about 15 hours as well.
You should not repeat the target table in the FROM clause. Your statement creates a cartesian join of the consolidated table with itself, which is not what you want.
You should use the following:
UPDATE consolidated con
SET IDEDUP2=uni.IDEDUP2
USE21=uni.USE21
USE22=uni.USE22
PESOXX2=uni.PESOXX2
FROM uni_group uni
WHERE con.IDENTI = uni.IDENTI

long running queries and new data

I'm looking at a postgres system with tables containing 10 or 100's of millions of rows, and being fed at a rate of a few rows per second.
I need to do some processing on the rows of these tables, so I plan to run some simple select queries: select * with a where clause based on a range (each row contains a timestamp, that's what I'll work with for ranges). It may be a "closed range", with a start and an end I know are contained in the table, and I know no new data will fall into the range, or an open range : ie one of the range boundary might not be "in the table yet" and rows being fed in the table might thus fall in that range.
Since the response will itself contains millions of rows, and the processing per row can take some time (10s of ms) I'm fully aware I'll use a cursor and fetch, say, a few 1000 rows at a time. My question is:
If I run an "open range" query: will I only get the result as it was when I started the query, or will new rows being inserted in the table that fall in the range while I run my fetch show up ?
(I tend to think that no I won't see new rows, but I'd like a confirmation...)
updated
It should not happen under any isolation level:
https://www.postgresql.org/docs/current/static/transaction-iso.html
but Postgres insures it only in Serializable isolation
Well, I think when you make a query, that means you create a new transaction and it will not receive/update data from any other transaction until it commit.
So, basically "you only get the result as it was when you started the query"

Insert multiple records into fact table based on fields in single record

I'm working in Pentaho 4.4.1-GA (Kettle / PDI). The database is Postgres.
I need to be able to insert multiple records into a fact table based on the fields that come from a single record. The single record contains fields:
productcode1, price1
productcode2, price2
productcode3, price3
...
productcode10,price10
So if there was a value for each of the 10 productcode / prices then I'd need to insert a total of 10 records into the fact table. If there were values for 4 of the combinations, then I'd need to insert 4 records into the fact table, etcetera. All field values for the fact records would be identical except for the PK (generated by sequence), product codes, and prices.
I figure that I need some type of looping construct which would let me check whether or not a value was present for each productx field, and if so, do an insert/update step on the fact table with the desired field values. I'm just not sure how to do this in Pentaho.
Any ideas? All suggestions are welcome :)
Thank You,
Rakesh
Could you give a sample input and output for your scenario??
From your example data I can infer that if there are 10 different product codes and only 4 product prices you want to have 4 records inserted into your table. Is that so?
Well for a start you can add a constant value of 1 to those records by filtering for NOT NULL and then use an Group BY Step to count the number of 1's. This would give you the count. BTW it would be helpful if you could provide more details on what columns you would be loading as there are ways to make a PDI transformation execute multiple times

Reporting on multiple tables independently in Crystal Reports 11

I am using Crystal Reports Developer Studio to create a report that reports on two different tables, let them be "ATable" and "BTable". For my simplest task, I would like to report the count of each table by using Total Running Fields. I created one for ATable (Called ATableTRF) and when I post it on my report this is what happens:
1) The SQL Query (Show SQL Query) shows:
SELECT "ATABLE"."ATABLE_KEY"
FROM "DB"."ATABLE" "ATABLE"
2) The total records read is the number of records in ATable.
3) The number I get is correct (total records in ATable).
Same goes for BTableTRF, if I remove ATableTRF I get:
1) The SQL Query (Show SQL Query) shows:
SELECT "BTABLE"."BTABLE_KEY"
FROM "DB"."BTABLE" "BTABLE"
2) The total records read is the number of records in BTable.
3) The number I get is correct (total records in BTable).
The problems starts when I just put both fields on the reports. What happens then is that I get the two queries one after another (since the tables are not linked in crystal reports):
SELECT "ATABLE"."ATABLE_KEY"
FROM "DB"."ATABLE" "ATABLE"
SELECT "BTABLE"."BTABLE_KEY"
FROM "DB"."BTABLE" "BTABLE"
And the number of record read is far larger than each of the tables - it doesn't stop. I would verify it's count(ATable)xcount(BTable) but that would exceed my computer's limitation (probably - one is around 300k rows the other around 900k rows).
I would just like to report the count of the two tables. No interaction is needed - but crystal somehow enforces an interaction.
Can anyone help with that?
Thanks!
Unless there is some join describing the two tables' relationship, then the result will be a Cartesian product. Try just using two subqueries, either via a SQL Command or as individual SQL expressions, to get the row counts. Ex:
select count(distinct ATABLE_KEY) from ATABLE
If you're not interested in anything else in these tables aside from the row counts, then there's no reason to bring all those rows into Crystal - better to do the heavy lifting on the RDBMS.
You could UNION the two queries. This would give you one record set containing rows from each query once.