Preserving blank columns & adding delimiters when reading fixed width data - perl

I am parsing through a file.
The file format is like this:
Column1 Column2 Column3 Column4 Column5
1 2 3 4 5
6 7 8 9
10 11 12 14
15 16 17 18
Some of the Column's are empty. So I am reading two files having same format as above and merging both files and adding the "|" between each column so it should look like this:
Column1 | Column2 | Column3 | Column4 | Column5
1 | 2 | 3 | 4 | 5
6 | 7 | | 8 | 9
10 | 11 | 12 | | 14
| 15 | 16 | 17 | 18
But I'm getting like this. The spaces in columns are removed.
Column1 | Column2 | Column3 | Column4 | Column5
1 | 2 | 3 | 4 | 5
6 | 7 | 8 | 9
10 | 11 | 12 | 14
15 | 16 | 17 | 18
Code part:
while(<FH>){
my #lines =split ' ',$_;
say (join '|',#lines);
}
I know this is happening because I am splitting with space delimiter. Can anyone tell me how to get the desired output?

You can use unpack to parse fixed-width data. The A9 in the template assumes your columns are 9 characters wide. You can use sprintf to space the data out again into columns of the original width.
use warnings;
use strict;
while (<DATA>) {
chomp;
printf "%s\n", join '| ', map { sprintf '%-8s', $_ } unpack 'A9' x 5, $_;
}
__DATA__
Column1 Column2 Column3 Column4 Column5
1 2 3 4 5
6 7 8 9
10 11 12 14
15 16 17 18
This prints:
Column1 | Column2 | Column3 | Column4 | Column5
1 | 2 | 3 | 4 | 5
6 | 7 | | 8 | 9
10 | 11 | 12 | | 14
| 15 | 16 | 17 | 18

If you don't need to parse the data to do anything with, just reformat it, you can use a regex substitution to add in the vertical bar characters.
This code will add | after every 9 characters. This assumes that your data is fixed width columns. The \K assertion means to keep (get it?) all of the leftward matched text and not replace it with the substitution text. So in effect it allows you to set the point where text from the right side of the s/// will be placed. The /m option tells Perl that this is a multi-line string. The (?!$) assertion means "not at the end of the line" so that we don't insert anything after the final column.
I did it with all of the text in a single variable but you could do it line by line.
If the columns are variable width you can still do it with a regex but it gets more complicated. unpack/sprintf may well be simpler in that case.
$s = '
Column1 Column2 Column3 Column4 Column5
1 2 3 4 5
6 7 8 9
10 11 12 14
15 16 17 18
';
$s =~ s/.{9}(?!$)\K/| /gm;
print $s;
Column1 | Column2 | Column3 | Column4 | Column5
1 | 2 | 3 | 4 | 5
6 | 7 | | 8 | 9
10 | 11 | 12 | | 14
| 15 | 16 | 17 | 18
More info perlre.
Thanks.

Related

Postgres: How to delete rows from auto-generated tables? [duplicate]

This question already has answers here:
PostgreSQL "Column does not exist" but it actually does
(6 answers)
sql statement error: "column .. does not exist"
(1 answer)
Closed 10 months ago.
In postgres, I have generated table user and table organizations. There is some relationship between them, i.e.: multiple users belong to one organization.
The table user_organizations_organization was auto-generated by postgres.
Here I have:
development=# select * from user_organizations_organization;
userId | organizationId
--------+----------------
1 | 1
2 | 1
3 | 2
4 | 2
5 | 1
6 | 1
7 | 2
8 | 2
9 | 3
10 | 3
11 | 4
12 | 4
13 | 3
14 | 3
15 | 4
16 | 4
17 | 5
18 | 5
19 | 6
20 | 6
21 | 5
22 | 5
23 | 6
24 | 6
25 | 7
26 | 7
27 | 8
28 | 8
29 | 7
30 | 7
31 | 8
32 | 8
33 | 9
34 | 9
35 | 10
36 | 10
37 | 9
38 | 9
39 | 10
40 | 10
(40 rows)
I want to delete relationships related to organizations 5,6,7,8:
development=# delete from user_organizations_organization where organizationId in (5,6,7,8);
ERROR: column "organizationid" does not exist
LINE 1: delete from user_organizations_organization where organizati...
^
HINT: Perhaps you meant to reference the column "user_organizations_organization.organizationId".
development=#
How can I delete them?

Appending datasets by matched variables

I have to append three datasets named A, B and C that contain data for various years (for example, 1990, 1991...2014).
The problem is that not all datasets contain all the survey years and therefore the unmatched years need to be dropped manually before appending.
I would like to know if there is any way to append three (or more) datasets that will keep only the matched variables across the datasets (years in this case).
Consider the following toy example:
clear
input year var
1995 0
1996 1
1997 2
1998 3
1999 4
2000 5
end
save data1, replace
clear
input year var
1995 6
1996 9
1998 7
1999 8
2000 9
end
save data2, replace
clear
input year var
1995 10
1996 11
1997 12
2000 13
end
save data3, replace
There is no option that will force append to do what you want, but you can do the following:
use data1, clear
append using data2 data3
duplicates tag year, generate(tag)
sort year
list
+------------------+
| year var tag |
|------------------|
1. | 1995 0 2 |
2. | 1995 6 2 |
3. | 1995 10 2 |
4. | 1996 9 2 |
5. | 1996 1 2 |
|------------------|
6. | 1996 11 2 |
7. | 1997 2 1 |
8. | 1997 12 1 |
9. | 1998 7 1 |
10. | 1998 3 1 |
|------------------|
11. | 1999 8 1 |
12. | 1999 4 1 |
13. | 2000 13 2 |
14. | 2000 5 2 |
15. | 2000 9 2 |
+------------------+
drop if tag == 1
list
+------------------+
| year var tag |
|------------------|
1. | 1995 0 2 |
2. | 1995 6 2 |
3. | 1995 10 2 |
4. | 1996 9 2 |
5. | 1996 1 2 |
|------------------|
6. | 1996 11 2 |
7. | 2000 13 2 |
8. | 2000 5 2 |
9. | 2000 9 2 |
+------------------+
You can also further generalize this approach by finding the maximum value of the variable tag and keeping all observations with that value:
summarize tag
keep if tag == `r(max)'

Architecture Design for Bus Routing with Time

This is to confirm if my design is good enough or get the better ideas to solve the bus routing problem with time. Here is my solution with the primary steps given below:
Have one edges table which represents all the edges (the source and target represent vertices (bus stops):
postgres=# select id, source, target, cost from busedges;
id | source | target | cost
----+--------+--------+------
1 | 1 | 2 | 1
2 | 2 | 3 | 1
3 | 3 | 4 | 1
4 | 4 | 5 | 1
5 | 1 | 7 | 1
6 | 7 | 8 | 1
7 | 1 | 6 | 1
8 | 6 | 8 | 1
9 | 9 | 10 | 1
10 | 10 | 11 | 1
11 | 11 | 12 | 1
12 | 12 | 13 | 1
13 | 9 | 15 | 1
14 | 15 | 16 | 1
15 | 9 | 14 | 1
16 | 14 | 16 | 1
Have a table which represents bus details like from time, to time, edge etc.
NOTE: I have used integer format for "from" and "to" column for faster results as I can do an integer query, but I can replace it with any better format if available.
postgres=# select id, "busedgeId", "busId", "from", "to" from busedgetimes;
id | busedgeId | busId | from | to
----+-----------+-------+-------+-------
18 | 1 | 1 | 33000 | 33300
19 | 2 | 1 | 33300 | 33600
20 | 3 | 2 | 33900 | 34200
21 | 4 | 2 | 34200 | 34800
22 | 1 | 3 | 36000 | 36300
23 | 2 | 3 | 36600 | 37200
24 | 3 | 4 | 38400 | 38700
25 | 4 | 4 | 38700 | 39540
Use dijkstra algorithm to find the nearest path.
Get the upcoming buses from the busedgetimes table in the earliest first order for the nearest path detected by dijkstra algorithm. => This leads to a bit complex query though.
Can I do any kind of improvements to this, or are there any better designs?
Links to docs, articles related to this would be really helpful.
This is totally normal and the regular way to do it. See also,
PgRouting Example

How print Horizontally on ireport dynamically while printing on next page when reached column limit for single page?

I think the the question can be reworded properly please kindly edit if necessary. I've also checked other questions and answers over the internet but it didn't help.
Below is what I'm trying to achieve, basically I want the columns to display horizontally; and if they reach lets say 3 columns(name) it will start on another page. These columns have subcolumns below.
I've already tried setting the Print Order to horizontal and set the columns to 3. However its showing the unexpected output.
This my table structure(These have thousands of records). I've also tried to turn this into array but it doesnt work.
How can I achieve the ouput above on the report? If you can provide documents or links about this is really helpful. Im using Postgres and ireport 3.7.6.
date | name
------+--------
1 | Name 1
2 | Name 1
3 | Name 1
4 | Name 1
5 | Name 1
6 | Name 1
7 | Name 1
8 | Name 1
9 | Name 1
10 | Name 1
1 | Name 2
2 | Name 2
3 | Name 2
4 | Name 2
5 | Name 2
6 | Name 2
7 | Name 2
8 | Name 2
9 | Name 2
10 | Name 2
1 | Name 3
2 | Name 3
3 | Name 3
4 | Name 3
5 | Name 3
6 | Name 3
7 | Name 3
8 | Name 3
9 | Name 3
10 | Name 3
(30 rows)
I'm not really familiar with ireport, but from the PostgreSQL perspective, I believe you're looking for crosstab. Bellow an example:
(If you haven't installed the extension yet, just execute this command)
CREATE EXTENSION tablefunc
Considering the following table (I believe it's close to your structure):
CREATE TEMPORARY TABLE t (id INT, name TEXT,val INT);
And the following values ...
db=#INSERT INTO t VALUES (1,'Name1',10),
(2,'Name1',20),
(3,'Name1',80),
(1,'Name2',30),
(2,'Name2',52),
(3,'Name2',40);
db=# SELECT * FROM t;
id | name | val
----+-------+-----
1 | Name1 | 10
2 | Name1 | 20
3 | Name1 | 80
1 | Name2 | 30
2 | Name2 | 52
3 | Name2 | 40
(6 Zeilen)
... you can use crosstab to display your results horizontally:
db=# SELECT *
FROM crosstab( 'SELECT name,id,val FROM t')
AS j(name text, val1 int, val2 int, val3 int);
name | val1 | val2 | val3
-------+------+------+------
Name1 | 10 | 20 | 80
Name2 | 30 | 52 | 40
(2 Zeilen)

Select rows by one column value should only be repeat N times

My table is:
id sub_id datetime resource
---|-----|------------|-------
1 | 10 | 04/03/2009 | 399
2 | 11 | 04/03/2009 | 244
3 | 10 | 04/03/2009 | 555
4 | 10 | 03/03/2009 | 300
5 | 11 | 03/03/2009 | 200
6 | 11 | 03/03/2009 | 500
7 | 11 | 24/12/2008 | 600
8 | 13 | 01/01/2009 | 750
9 | 10 | 01/01/2009 | 760
10 | 13 | 01/01/2009 | 570
11 | 11 | 01/01/2009 | 870
12 | 13 | 01/01/2009 | 670
13 | 13 | 01/01/2009 | 703
14 | 13 | 01/01/2009 | 705
I need to select for each sub_id only 2 times
Result would be:
id sub_id datetime resource
---|-----|------------|-------
1 | 10 | 04/03/2009 | 399
3 | 10 | 04/03/2009 | 555
5 | 11 | 03/03/2009 | 200
6 | 11 | 03/03/2009 | 500
8 | 13 | 01/01/2009 | 750
10 | 13 | 01/01/2009 | 570
How can I achieve this result in postgres ?
Use the window function row_number():
select id, sub_id, datetime, resource
from (
select *, row_number() over (partition by sub_id order by id)
from my_table
) s
where row_number < 3;
look at the order column (I use id to match your sample):
t=# with data as (select *,count(1) over (partition by sub_id order by id) from t)
select id,sub_id,datetime,resource from data where count <3;
id | sub_id | datetime | resource
----+--------+------------+----------
1 | 10 | 2009-03-04 | 399
3 | 10 | 2009-03-04 | 555
2 | 11 | 2009-03-04 | 244
5 | 11 | 2009-03-03 | 200
8 | 13 | 2009-01-01 | 750
10 | 13 | 2009-01-01 | 570
(6 rows)