Count rows in massive .csv file

Count rows in massive .csv file - postgresql

dumping a Postgres table out by sections is yielding sections that are 30GB+ in size. The files are landing on a windows 2008 server. I'm trying to count the rows in the csv to ensure I have a row count that I expect (22,725,303 to be exact). I can count the rows in the section that I expect to dump - but I am not sure if I'm getting them all.
It's a 190M row table so sections of table is the way to go.
so how can I count the rows so I know I've got the full section?

In a PL/pgSQL function, you can get the count of rows processed by the last command - since Postgres 9.3 including COPY - with:
GET DIAGNOSTICS x = ROW_COUNT;
Get the count of rows from a COPY command

Related

Ingest Utility with delete statement in db2 doesnt show number of rows deleted

When I run the ingest utility with the delete statement it gives the number of rows inserted as 0 and doesn't show the number of rows deleted. Is there any option to show the number of rows deleted?
I have included the output message of the ingest utility and the code
output
------
Number of rows read = 255
Number of rows inserted = 0
Number of rows rejected = 0
code
----
db2 "ingest from file mypipe format delimited(
$field1 CHAR(9),
$field2 DATE 'yyyy-mm-dd'
)
Delete from mytable where dob = $field2"

The documentation specifies initially that the summary report only contains the number of rows read, inserted , rejected. This is perhaps what you are seeing.
Quote from documentation:
Messages from the INGEST command If the utility read at least one
record from the input source, the utility issues a summary of the
number of rows read, inserted, and rejected (similar to the import and
load utilities) and a successful completion message.
However, on the same page a later statement is:
Number of rows inserted (updated, deleted, merged)
The number of rows affected by the execution of the SQL statement against the target table and committed to the database. The message
says "inserted", "updated", "deleted", or "merged", depending on the
SQL statement.
So the behaviour for your case seems unhelpful, and IBM could make it better by additionally including the count of rows deleted, and count of rows updated when the sole SQL statement is DELETE. I tested this behaviour with Db2-LUW v11.5.6.0".
Even when the delete statement is replaced by a MERGE with WHEN MATCHED THEN DELETE the summary report excludes the count of deleted rows. Undesirable behaviour.
If you have a support contract, you could open a ticket with IBM asking for a workaround or fix as there may be some regression here.

Get count of rows in Paquet file in Powershell script

I am writing a Powershell script that compares the number of rows in 2 parquet files that are created each hour to monitor the number of rows etc.
I have found the Parquet.NET project, but since all I want to do is query the count of rows, I wonder if this is overkill. Is there a module for parquet?

executing query having over billion rows

I have a table say 'T' in kdb which has rows over 6 billion. When I tried to execute query like this
select from T where i < 10
it throws wsfull expection. Is there any way I can execute queries like this in table having large amount of data.

10#T
The expression as you wrote it first makes a bitmap containing all of the elements where i (rownumber) < 10, which is as tall as one of your columns. It then does where (which just contains til 10) and then gets them from each row. You can save the last step with:
T[til 10]
but 10#T is shorter.

Assuming you have a partitioned table here, it is normally beneficial to have the partitioning column (date, int etc.) as the first item in the where clause of your query - otherwise as mentioned previously you are reading a six billion item list into memory, which will result in a 'wsfull signal for any machine with less than the requisite amount of RAM.
Bear in mind that row index starts at 0 for each partition, and is not reflective of position in the overall table. The query that you gave as an example in your question would return the first ten rows of each partition of table T in your database.
In order to do this without reaching your memory limit, you can try running the following (if your database is date-partitioned):
raze{10#select from T where date=x}each date

libreOffice, is automation possible to delete empty table rows?

I have a mail merge generated document which comprises repeats of a template with a fixed number of table rows in.
In some of the repeats few of the rows are used.
Is there a way to delete all empty rows across all tables in the document en masse?
TIA

Reporting on multiple tables independently in Crystal Reports 11

I am using Crystal Reports Developer Studio to create a report that reports on two different tables, let them be "ATable" and "BTable". For my simplest task, I would like to report the count of each table by using Total Running Fields. I created one for ATable (Called ATableTRF) and when I post it on my report this is what happens:
1) The SQL Query (Show SQL Query) shows:
SELECT "ATABLE"."ATABLE_KEY"
FROM "DB"."ATABLE" "ATABLE"
2) The total records read is the number of records in ATable.
3) The number I get is correct (total records in ATable).
Same goes for BTableTRF, if I remove ATableTRF I get:
1) The SQL Query (Show SQL Query) shows:
SELECT "BTABLE"."BTABLE_KEY"
FROM "DB"."BTABLE" "BTABLE"
2) The total records read is the number of records in BTable.
3) The number I get is correct (total records in BTable).
The problems starts when I just put both fields on the reports. What happens then is that I get the two queries one after another (since the tables are not linked in crystal reports):
SELECT "ATABLE"."ATABLE_KEY"
FROM "DB"."ATABLE" "ATABLE"
SELECT "BTABLE"."BTABLE_KEY"
FROM "DB"."BTABLE" "BTABLE"
And the number of record read is far larger than each of the tables - it doesn't stop. I would verify it's count(ATable)xcount(BTable) but that would exceed my computer's limitation (probably - one is around 300k rows the other around 900k rows).
I would just like to report the count of the two tables. No interaction is needed - but crystal somehow enforces an interaction.
Can anyone help with that?
Thanks!

Unless there is some join describing the two tables' relationship, then the result will be a Cartesian product. Try just using two subqueries, either via a SQL Command or as individual SQL expressions, to get the row counts. Ex:
select count(distinct ATABLE_KEY) from ATABLE
If you're not interested in anything else in these tables aside from the row counts, then there's no reason to bring all those rows into Crystal - better to do the heavy lifting on the RDBMS.

You could UNION the two queries. This would give you one record set containing rows from each query once.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Count rows in massive .csv file - postgresql

In a PL/pgSQL function, you can get the count of rows processed by the last command - since Postgres 9.3 including COPY - with: GET DIAGNOSTICS x = ROW_COUNT; Get the count of rows from a COPY command

Related

Ingest Utility with delete statement in db2 doesnt show number of rows deleted

Get count of rows in Paquet file in Powershell script

executing query having over billion rows

libreOffice, is automation possible to delete empty table rows?

Reporting on multiple tables independently in Crystal Reports 11

Categories

Resources