Calculating timing offsets in a text file - sed

I'm using sed to process a very long PeopleSoft trace file. The trace file is full of information, but I am interested in how long each step took to process, and the number of rows affected in the database by the step.
I have a fairly ugly bash script which is basically a one-line series of sed pipes. It works pretty quickly, so it is OK. Of course I would welcome any suggestions to clarify the code.
I must say that while it is ugly, it is clear (to me) what sequential steps it is going through. Often sed/awk one liners are completely unintelligible to a non-expert.
#!/bin/bash
sed '/./=' "$1" | sed '/./N; s/\n/ /' | sed '/--/!d' | sed '/-- B/d' | sed '/-- /d' | sed '/Instance/d' | sed '1,/Application Engine ended normally/!d' | awk '/Row/ {n = $0; getline; print $0 n; next } 1' > "$1".txt
Timings
Most lines contain timing information, in the format for HH:MM:SS. Many statements are running subsecond so the timing field often doesn't change. I'd like to produce a new field which is the offset of the previous line. Only the seconds need to be considered as most operations are subsecond anyway.
246 -- 14.54.43 .(TL_TIMEADMIN.MAIN.Step040) (PeopleCode)238 -- Row(s) affected: 1
247 -- 14.54.43 Program Skipping Step due to non-zero return code from PeopleCode at TL_TIMEADMIN.MAIN.Step040
249 -- 14.54.43 .(TL_TIMEADMIN.MAIN.Step050) (Call Section TL_TIMEADMIN.DISPATCH)
251 -- 14.54.45 ..(TL_TIMEADMIN.DISPATCH.Step02a) (PeopleCode)
253 -- 14.54.45 ..(TL_TIMEADMIN.DISPATCH.Step02a) (SQL)
266 -- 14.54.45 ..(TL_TIMEADMIN.DISPATCH.Step02b) (Call Section TL_TA000200.TA000200)258 -- Row(s) affected: 1
268 -- 14.54.46 ...(TL_TA000200.TA000200.Step001) (PeopleCode)
270 -- 14.54.46 ...(TL_TA000200.TA000200.Step001) (Call Section FUNCLIB_TLTA.STEPMSG)
I would like to see something like this:
246 -- 14.54.43 0 .(TL_TIMEADMIN.MAIN.Step040) (PeopleCode)238 -- Row(s) affected: 1
247 -- 14.54.43 0 Program Skipping Step due to non-zero return code from PeopleCode at TL_TIMEADMIN.MAIN.Step040
249 -- 14.54.43 0 .(TL_TIMEADMIN.MAIN.Step050) (Call Section TL_TIMEADMIN.DISPATCH)
251 -- 14.54.45 2 ..(TL_TIMEADMIN.DISPATCH.Step02a) (PeopleCode)
253 -- 14.54.45 0 ..(TL_TIMEADMIN.DISPATCH.Step02a) (SQL)
266 -- 14.54.45 0 ..(TL_TIMEADMIN.DISPATCH.Step02b) (Call Section TL_TA000200.TA000200)258 -- Row(s) affected: 1
268 -- 14.54.46 1 ...(TL_TA000200.TA000200.Step001) (PeopleCode)
270 -- 14.54.46 0 ...(TL_TA000200.TA000200.Step001) (Call Section FUNCLIB_TLTA.STEPMSG)

OK with an awk solution?
{
split($3, time, ".")
if (NR == 1) prev = time[3]
$3 = $3 " " time[3] - prev
prev = time[3]
print
}
output:
$ awk -f time.awk input
246 -- 14.54.43 0 .(TL_TIMEADMIN.MAIN.Step040) (PeopleCode)238 -- Row(s) affected: 1
247 -- 14.54.43 0 Program Skipping Step due to non-zero return code from PeopleCode at TL_TIMEADMIN.MAIN.Step040
249 -- 14.54.43 0 .(TL_TIMEADMIN.MAIN.Step050) (Call Section TL_TIMEADMIN.DISPATCH)
251 -- 14.54.45 2 ..(TL_TIMEADMIN.DISPATCH.Step02a) (PeopleCode)
253 -- 14.54.45 0 ..(TL_TIMEADMIN.DISPATCH.Step02a) (SQL)
266 -- 14.54.45 0 ..(TL_TIMEADMIN.DISPATCH.Step02b) (Call Section TL_TA000200.TA000200)258 -- Row(s) affected: 1
268 -- 14.54.46 1 ...(TL_TA000200.TA000200.Step001) (PeopleCode)
270 -- 14.54.46 0 ...(TL_TA000200.TA000200.Step001) (Call Section FUNCLIB_TLTA.STEPMSG)

Related

Other way to cast varbit to int? And bigint?

This function is a workaround ... nothing with better performance?
CREATE or REPLACE FUNCTION varbit_to_int(v varbit) RETURNS int AS $f$
SELECT CASE bit_length(v)
WHEN 1 THEN v::bit(1)::int
WHEN 2 THEN v::bit(2)::int
WHEN 3 THEN v::bit(3)::int
...
WHEN 30 THEN v::bit(30)::int
WHEN 31 THEN v::bit(31)::int
WHEN 32 THEN v::bit(32)::int
ELSE NULL::int
END
$f$ LANGUAGE SQL IMMUTABLE;
Same problem for bigint:
CREATE or replace FUNCTION varbit_to_bigint(p varbit) RETURNS bigint AS $f$
SELECT CASE bit_length($1)
WHEN 1 THEN $1::bit(1)::bigint
WHEN 2 THEN $1::bit(2)::bigint
WHEN 3 THEN $1::bit(3)::bigint
...
WHEN 62 THEN $1::bit(62)::bigint
WHEN 63 THEN $1::bit(63)::bigint
WHEN 64 THEN $1::bit(64)::bigint
ELSE NULL::bigint
END
$f$ LANGUAGE SQL IMMUTABLE STRICT;
Using many times in loops seems CPU-wasteful, only to avoid "cannot cast type bit varying to integer" error. Maybe an external C-language library do this and other useful castings.
NOTICE select b'101'::bit(64)::bigint != b'101'::bigint;
I tested a couple of variants (for bigint only) with built-in functionality and this variant with OVERLAY() turned out fastest in my local tests on Postgres 11:
CREATE OR REPLACE FUNCTION varbit2bigint2(b varbit)
RETURNS bigint AS
$func$
SELECT OVERLAY(bit(64) '0' PLACING b FROM 65 - bit_length(b))::bigint
$func$ LANGUAGE SQL IMMUTABLE;
Other candidates:
Note the different conversion of empty bitstrings ('') to 0 vs. NULL. Adapt to your needs!
Your function:
CREATE OR REPLACE FUNCTION varbit2bigint1(b varbit)
RETURNS bigint AS
$func$
SELECT CASE bit_length($1)
WHEN 1 THEN $1::bit(1)::bigint
WHEN 2 THEN $1::bit(2)::bigint
WHEN 3 THEN $1::bit(3)::bigint
WHEN 4 THEN $1::bit(4)::bigint
WHEN 5 THEN $1::bit(5)::bigint
WHEN 6 THEN $1::bit(6)::bigint
WHEN 7 THEN $1::bit(7)::bigint
WHEN 8 THEN $1::bit(8)::bigint
WHEN 9 THEN $1::bit(9)::bigint
WHEN 10 THEN $1::bit(10)::bigint
WHEN 11 THEN $1::bit(11)::bigint
WHEN 12 THEN $1::bit(12)::bigint
WHEN 13 THEN $1::bit(13)::bigint
WHEN 14 THEN $1::bit(14)::bigint
WHEN 15 THEN $1::bit(15)::bigint
WHEN 16 THEN $1::bit(16)::bigint
WHEN 17 THEN $1::bit(17)::bigint
WHEN 18 THEN $1::bit(18)::bigint
WHEN 19 THEN $1::bit(19)::bigint
WHEN 20 THEN $1::bit(20)::bigint
WHEN 21 THEN $1::bit(21)::bigint
WHEN 22 THEN $1::bit(22)::bigint
WHEN 23 THEN $1::bit(23)::bigint
WHEN 24 THEN $1::bit(24)::bigint
WHEN 25 THEN $1::bit(25)::bigint
WHEN 26 THEN $1::bit(26)::bigint
WHEN 27 THEN $1::bit(27)::bigint
WHEN 28 THEN $1::bit(28)::bigint
WHEN 29 THEN $1::bit(29)::bigint
WHEN 30 THEN $1::bit(30)::bigint
WHEN 31 THEN $1::bit(31)::bigint
WHEN 32 THEN $1::bit(32)::bigint
WHEN 33 THEN $1::bit(33)::bigint
WHEN 34 THEN $1::bit(34)::bigint
WHEN 35 THEN $1::bit(35)::bigint
WHEN 36 THEN $1::bit(36)::bigint
WHEN 37 THEN $1::bit(37)::bigint
WHEN 38 THEN $1::bit(38)::bigint
WHEN 39 THEN $1::bit(39)::bigint
WHEN 40 THEN $1::bit(40)::bigint
WHEN 41 THEN $1::bit(41)::bigint
WHEN 42 THEN $1::bit(42)::bigint
WHEN 43 THEN $1::bit(43)::bigint
WHEN 44 THEN $1::bit(44)::bigint
WHEN 45 THEN $1::bit(45)::bigint
WHEN 46 THEN $1::bit(46)::bigint
WHEN 47 THEN $1::bit(47)::bigint
WHEN 48 THEN $1::bit(48)::bigint
WHEN 49 THEN $1::bit(49)::bigint
WHEN 50 THEN $1::bit(50)::bigint
WHEN 51 THEN $1::bit(51)::bigint
WHEN 52 THEN $1::bit(52)::bigint
WHEN 53 THEN $1::bit(53)::bigint
WHEN 54 THEN $1::bit(54)::bigint
WHEN 55 THEN $1::bit(55)::bigint
WHEN 56 THEN $1::bit(56)::bigint
WHEN 57 THEN $1::bit(57)::bigint
WHEN 58 THEN $1::bit(58)::bigint
WHEN 59 THEN $1::bit(59)::bigint
WHEN 60 THEN $1::bit(60)::bigint
WHEN 61 THEN $1::bit(61)::bigint
WHEN 62 THEN $1::bit(62)::bigint
WHEN 63 THEN $1::bit(63)::bigint
WHEN 64 THEN $1::bit(64)::bigint
ELSE NULL::bigint
END
$func$ LANGUAGE SQL IMMUTABLE; -- no STRICT modifier
Left-padding the text representation with '0':
CREATE OR REPLACE FUNCTION pg_temp.varbit2bigint3(b varbit)
RETURNS bigint AS
$func$
SELECT lpad(b::text, 64, '0')::bit(64)::bigint
$func$ LANGUAGE SQL IMMUTABLE;
Bit-shifting before the cast:
CREATE OR REPLACE FUNCTION varbit2bigint4(b varbit)
RETURNS bigint AS
$func$
SELECT (bit(64) '0' || b << bit_length(b))::bit(64)::bigint
$func$ LANGUAGE SQL IMMUTABLE;
db<>fiddle here
Related:
Postgresql Convert bit varying to integer
Your feedback
It is not worst, it is faster!
EXPLAIN ANALYZE select
varbit_to_bigint(osm_id::bit(64)::varbit)
from planet_osm_point limit 10000 ;
-- Planning time: 0.697 ms
-- Execution time: 1133.571 ms
EXPLAIN ANALYZE select
lpad(osm_id::bit(64)::varbit::text, 32, '0')::bit(64)::bigint
from planet_osm_point limit 10000;
-- Planning time: 0.105 ms
-- Execution time: 26.429 ms
You show a STRICT modifier with the bigint variant of the function in the question (not sure why it differs from the integer variant). If that represents the function you actually tested, I expect most of the observed performance difference is due to that added STRICT modifier preventing function inlining. Quoting the Postgres Wiki:
if the function is declared STRICT, then the planner must be able to
prove that the body expression necessarily returns NULL if any
parameter is null. At present, this condition is only satisfied if:
every parameter is referenced at least once, and all functions,
operators and other constructs used in the body are themselves STRICT.
That seems to hurt your function badly - while my winner seems unaffected, and the other two variants are even ~ 10 % faster. Same fiddle with STRICT functions:
db<>fiddle here
Related:
Function executes faster without STRICT modifier?
I suggest you re-test with and without STRICT modifier to see for yourself.

Full text search with Postgres

How do I do a full text search in Postgres of all columns without preprocessing? I found http://www.postgresql.org/docs/9.3/static/textsearch-intro.html I'm not exactly sure what I need to do.
My initial impression is I need to auto concatenate each column (how do I do that? Can't find via Googling) put it in a WHERE and do ## to_tsquery
This is for https://github.com/timwis/node-soda2-parser/issues/1 I'm not concerned with bad performance
I tried starting with
select array_to_string(translate(string_to_array(r::text, ',')::text, '()', '')::text[], ' ')::tsvector FROM seattle_police_govqa_audit_trails as r LIMIT 1
But get:
{"readyState":4,"responseText":"{\"error\":[\"syntax error in tsvector: \\\"1 -1 -1 -1 -1 -1 0 0 1 1 2 3 3500 5 7007 198 1264 NULL NULL \\\"Answer created by staff\\\" NULL NULL \\\"9/24/2015 16:01\\\" A000198-092415\\\"\"]}","responseJSON":{"error":["syntax error in tsvector: \"1 -1 -1 -1 -1 -1 0 0 1 1 2 3 3500 5 7007 198 1264 NULL NULL \"Answer created by staff\" NULL NULL \"9/24/2015 16:01\" A000198-092415\""]},"status":400,"statusText":"Bad Request"}
select * FROM seattle_police_govqa_audit_trails as r WHERE regexp_replace(array_to_string(translate(string_to_array(r::text, ',')::text, '()', '')::text[], ' '), '[^a-zA-Z\s]', '', 'g')::tsvector ## 'created'::tsquery = true LIMIT 10

Hive redirect output to file with \N for NULL value

I am runig a hive query and redirecting its output to a file
$ hive -e "select id, age from employee" > /tmp/1
$ cat /tmp/1
1 44
2 32
3 NULL
I want the Null to be printed as \N so that I can upload it to mysql.
This is a sample query but I have more than 20 columns in real and any column can have null value. Writing a if() or case() in select column wont be helpful. I want a generic solution.
A simple case statement should do.
$ hive -e "select id
, case when age is null then '\\\N' else age end as age from employee" > /tmp/1
$ less /tmp/1
id age
1 44
2 32
3 \N

Transactional Replication: Column name or number of supplied values does not match table definition

I have set up a transactionnal replication for some tables.
The Master and the Slave Database are identical.
I used this query and compared the result from master and slave to make sure the table is identical
select * from sys.columns c
join sys.tables t on t.object_id = c.object_id
where t.name = 'customers'
In the Replication Monitor I can find this error:
Column name or number of supplied values does not match table definition.
If I check the details I get this:
Command attempted:
if ##trancount > 0 rollback tran
(Transaction sequence number: 0x0011775200000105007600000000, Command ID: 1)
So I checked in the destribution database using this query to find the command that is failing.
sp_browsereplcmds #xact_seqno_start = '0x0011775200000105007600000000',
#xact_seqno_end = '0x0011775200000105007600000000'
This is the command (its in 2 lines in that table):
{CALL [sp_MSins_dboCustomers] (0,'575',N'todelete','575',N'todelete',118594,118595,118596,N'10T 3% Sk 30T net.',0,'Deutschland',4,24399158193054E-314,4,24399158193054E-314,4,24399158193054E-314,4,24399158193054E-314,2,54639494915833E-313,'','','','','','TGW',N'Liefern LKW',NULL,NULL,0,0,6,79038653108887E-311,NULL,'',0,NULL,NULL,NULL,0,0,0,-1,-1,1900-01-01 00:00:00,0,1,{AEB3D911-36D1-4A8A-B713-6B2F2CCA1641},0,0,2,'de-AT',25,NULL,NULL,0,1,NULL,NULL,2014-03-07 08:57:45.727,-1,NULL,0,'','','','','','','','','','','','',
'','','','','','','','')}
This is what I have in my DB
TypeID CustomerID Name SiteID SiteName AddressID BillAddressID ShipAddressID Terms TaxExempt TaxSchedID TaxPercent TaxPercent1 TaxPercent2 TaxPercent3 TaxPercent4 TaxTitle TaxTitle1 TaxTitle2 TaxTitle3 TaxTitle4 LocationID ShipVia PackingType PackingNoteID CutoffDay UploadAction LeadTime ExpDays Notes SalesPersonID CreditLimit OpenOrders OrderValueScheduleID OAHidePrices DefaultAckType DefaultInvType DefaultPackType UploadEmployee UploadDateTime OAHideImages MfgCustomer CustomerGUID PricingMethod DefaultCustomer EngineeringUnitSetID CurrencyCulture FamilyGroupID InvoiceMinimum InvoiceSurcharge InvoiceGroup InvoiceCopies DeliveryMinimum DeliverySurcharge CreateDate EnteredBy LanguageCulture DropShip UserDef1 UserDef2 UserDef3 UserDef4 UserDef5 UserDef6 UserDef7 UserDef8 UserDef9 UserDef10 UserDef11 UserDef12 UserDef13 UserDef14 UserDef15 UserDef16 UserDef17 UserDef18 UserDef19 UserDef20
0 575 todelete 575 todelete 118594 118595 118596 10T 3% Sk 30T net. 0 Deutschland 0 0 0 0 0 TGW Liefern LKW NULL NULL 0 0 0 NULL 302 NULL NULL NULL 0 0 0 -1 -1 1900-01-01 00:00:00 0 1 AEB3D911-36D1-4A8A-B713-6B2F2CCA1641 0 0 2 de-AT 25 NULL NULL 0 1 NULL NULL 2014-03-07 08:57:45.727 -1 NULL 0 1 2 3 4 0 1 2 3 4
As you can see here, the values for the taxpercent fields (after "Deutschland") are 0 in my DB, in the command they are really weird (4,24399158193054E-314)
The Datatype is "real"
Maybe this is not the issue but this is the only weird thing I could find.
I found my problem.
In fact this 4,24399158193054E-314 is a value for "0" in real, the problem is that it did not use the "." but the "," as decimal separator and therefore the call of the procedure had too much argument.
What I did is to change the statement delivery for insert, update, delete from "Call " to INSERT/UPDATE/DELETE statement.
I don't know why this is not selected by default, but now it works.

Perl+Postgresql: a function does not return a value if RAISE NOTICE is present

I noticed that when I call a PL/PgSQL or PL/Perl function from a Perl script using DBI, it does not return a value if a RAISE NOTICE or elog(NOTICE) is used in the function. To illustrate:
A simple table:
CREATE TABLE "public"."table1" (
"fld" INTEGER
) WITHOUT OIDS;
A simple function:
CREATE OR REPLACE FUNCTION "public"."function1" () RETURNS integer AS
$body$
DECLARE
myvar INTEGER;
BEGIN
SELECT INTO myvar fld FROM table1 LIMIT 1;
RETURN myvar;
END;
$body$
LANGUAGE 'plpgsql'
A piece of Perl script:
use DBI;
...
my $ref = $dbh->selectcol_arrayref('SELECT function1()');
print $$ref[0];
As it is, it prints the value from the table.
But I get no result if I add RAISE NOTICE as follows:
SELECT INTO myvar fld FROM table1 LIMIT 1;
RAISE NOTICE 'Testing';
RETURN myvar;
Am I missing something or such behavior is by design?
Check the client_min_messages setting in your database server's postgresql.conf file. From the PostgreSQL 8.3 docs:
client_min_messages (string)
Controls which message levels are sent to the client. Valid values are DEBUG5, DEBUG4, DEBUG3, DEBUG2, DEBUG1, LOG, NOTICE, WARNING, ERROR, FATAL, and PANIC. Each level includes all the levels that follow it. The later the level, the fewer messages are sent. The default is NOTICE. Note that LOG has a different rank here than in log_min_messages.
I can't reproduce this, using Debian's Perl 5.10, DBI 1.605 and DBD::Pg 2.8.7 against PostgreSQL 8.3.7. I get the notice printed out as expected.
steve#steve#[local] =# create or replace function public.function1() returns integer language 'plpgsql' as $$ declare myvar integer; begin select into myvar fld from table1 limit 1; raise notice 'Testing'; return myvar; end; $$;
CREATE FUNCTION
steve#steve#[local] =#
[1]+ Stopped psql --cluster 8.3/steve
steve#arise:~$ DBI_TRACE=1 perl -MData::Dumper -MDBI -e '$dbh = DBI->connect(qw|dbi:Pg:dbname=steve;port=5433;host=/tmp steve steve|, {RaiseError=>1,PrintError=>0}); print Data::Dumper->new([$dbh->selectcol_arrayref("SELECT function1()")], [qw|result|])->Dump'
DBI 1.605-ithread default trace level set to 0x0/1 (pid 5739) at DBI.pm line 273 via -e line 0
Note: perl is running without the recommended perl -w option
-> DBI->connect(dbi:Pg:dbname=steve;port=5433;host=/tmp, steve, ****, HASH(0x1c9ddf0))
-> DBI->install_driver(Pg) for linux perl=5.010000 pid=5739 ruid=1000 euid=1000
install_driver: DBD::Pg version 2.8.7 loaded from /usr/lib/perl5/DBD/Pg.pm
<- install_driver= DBI::dr=HASH(0x1e06a68)
!! warn: 0 CLEARED by call to connect method
<- connect('dbname=steve;port=5433;host=/tmp', 'steve', ...)= DBI::db=HASH(0x1fd8e08) at DBI.pm line 638
<- STORE('RaiseError', 1)= 1 at DBI.pm line 690
<- STORE('PrintError', 0)= 1 at DBI.pm line 690
<- STORE('AutoCommit', 1)= 1 at DBI.pm line 690
<- STORE('Username', 'steve')= 1 at DBI.pm line 693
<> FETCH('Username')= 'steve' ('Username' from cache) at DBI.pm line 693
<- connected('dbi:Pg:dbname=steve;port=5433;host=/tmp', 'steve', ...)= undef at DBI.pm line 699
<- connect= DBI::db=HASH(0x1fd8e08)
<- STORE('dbi_connect_closure', CODE(0x1da2280))= 1 at DBI.pm line 708
NOTICE: Testing
<- selectcol_arrayref('SELECT function1()')= ( [ '2' ] ) [1 items] at -e line 1
$result = [
'2'
];
I suggest isolating your problem to a small script (like above) and running it with DBI_TRACE set fairly high any seeing what differences you see. Maybe also looking at the release notes for DBD::Pg and seeing if they mention it maybe having been confused by these in the past. With DBI_TRACE=10 I see this:
PQexec
Begin pg_warn (message: NOTICE: Testing
DBIc_WARN: 1 PrintWarn: 1)
NOTICE: Testing
End pg_warn
Begin _sqlstate
So you should be looking for something like that in your own output.