Nested dictionary behaving in a strange way - defaultdict

I have a list of text files, each representing a year, starting from 1880 and going all the way up to 2014. Each file contains a list of names, genders and the number of occurrences of that name in that year, similar to this:
Mary,F,14406
Anna,F,5773
Helen,F,5230
Now, I want to create a function that reads all the files and returns a nested dictionary of the form:
name -> year -> count
That is, name is the key and its value is another dictionary in which I add all the years as keys and the number of occurrences of a name as their value.
This is what I've come up with:
def files_to_dict(folder_name):
dic = defaultdict(dict)
for filename in glob.glob( os.path.join( folder_name, '*.txt' )):
with open( filename, 'r' ) as yearFile:
# Each file is named yob[year].txt e.g yob2011.txt, hence
# I'm slicing the filename to get just the year.
year = int(filename[9:13])
for line in yearFile:
# [0] = Name, [1] = Gender, [2] = Total Occurrences of Name
list_of_line = line.replace(',',' ').split()
dic[ list_of_line[0] ][ year ] = int( list_of_line[2] )
return dic
Now If I do this:
d = files_to_dict( 'names' )
print ( d['Mary'] )
I get something like this. (only showing last 10 years):
...2004: 31, 2005: 10, 2006: 10, 2007: 10, 2008: 3490, 2009: 3154, 2010:
2862, 2011: 2701, 2012: 6, 2013: 2632, 2014: 5}
Here, 2008 - 2011 all have the correct counts as they appear in the files, but
2004 - 2007 and 2012 and 2014 all have incorrect counts. This is what happens with the whole output. Most of the counts are wrong and only a dozen or so contain the correct tally of names. I really don't understand why this is so. Does anyone have any idea?

Related

selecting daily revised hourly values for a month

I have two queries
first one return latest Revision number for each date for a month
select max(a.wdcm_revision_no)
from wb_declared_capacity_master a, wb_declared_capacity_det_unit b
where a.wdcm_internal_id = b.wdcd_ref_id and b.wdcm_unit = 'HSY_Unit1' and to_char(a.wdcm_date,'MM YYYY')='08 2019' group by a.wdcm_date
And second query which returns all the data irrespective of any revision
select a1.wdcm_date, b1.wdcd_block_no, c1.wdcd_capacity, c1.wdcd_approval, a1.wdcm_revision_no
from wb_declared_capacity_master a1, wb_declared_capacity_det_unit b1, wb_declared_capacity_detail c1
where a1.wdcm_internal_id = b1.wdcd_ref_id and a1.wdcm_internal_id = c1.wdcd_ref_id and b1.wdcm_unit = 'HSY_Unit1' and to_char(a1.wdcm_date,'MM YYYY')='08 2019'and b1.wdcd_block_no = c1.wdcd_block_no
Now I want to pass each date's latest revision from first query in where condition of second query expecting to return all the values based on those latest revision number
You can combine then into 1 query. The 1st becoming a CTE which is then fed into the 2nd. I believe the following is close, but as you didn't actually indicate where you wanted to "pass in the result" I'm making assumption that seems likely, but it's still an assumption.
with latest (revision) as
-- first query
( select max(a.wdcm_revision_no)
from wb_declared_capacity_master a
, wb_declared_capacity_det_unit b
where a.wdcm_internal_id = b.wdcd_ref_id
and b.wdcm_unit = 'HSY_Unit1'
and to_char(a.wdcm_date,'MM YYYY')='08 2019'
group by a.wdcm_date
)
-- And second query which returns all the data irrespective of any revision
select a1.wdcm_date
, b1.wdcd_block_no
, c1.wdcd_capacity
, c1.wdcd_approval
, a1.wdcm_revision_no
from wb_declared_capacity_master a1
, wb_declared_capacity_det_unit b1
, wb_declared_capacity_detail c1
, latest
where a1.wdcm_internal_id = b1.wdcd_ref_id
and a1.wdcm_internal_id = c1.wdcd_ref_id
and b1.wdcm_unit = 'HSY_Unit1'
and to_char(a1.wdcm_date,'MM YYYY')='08 2019'
and b1.wdcd_block_no = c1.wdcd_block_no
and a1.wdcm_revision_no = latest.revision; ---- you didn't actually say where to "pass it in" this seems likely
I've kept the join format you used but you really should begin using the modern ANSI/ISO standard join format. (Modern meaning being in ONLY since 1992 or so).

# in Caché between columns

I have a SQL query and I would like to insert a hashtag between one column and another to be able to reference in Excel, using an import option in fields delimited by #. Anyone have an idea how to do it? A query is as follows:
SELECT FC.folha, folha->folhames,folha->folhaano, folha->folhaseq, folha->folhadesc, folha->TipoCod as Tipo_Folha,
folha->FolhaFechFormatado as Folha_Fechada, folha->DataPagamentoFormatada as Data_Pgto,
Servidor->matricula, Servidor->nome, FC.rubrica,
FC.Rubrica->Codigo, FC.Rubrica->Descricao, FC.fator, FC.TipoRubricaFormatado as TipoRubrica,
FC.ValorFormatado,FC.ParcelaAtual, FC.ParcelaTotal
FROM RHFolCalculo FC WHERE folha -> FolhaFech = 1
AND folha->folhaano = 2018
and folha->folhames = 06
and folha->TipoCod->codigo in (1,2,3,4,6,9)
You are generating delimited output from the query, so the first row should be a header row, with all following rows the data rows. You will really only have one column due to concat. So remove the alias from the columns, output the first row like so (using the alias here) . . .
SELECT 'folha#folhames#folhaano#folhaseq#folhadesc#Tipo_Folha#
Folha_Fechada#Data_Pgto#
matricula#nome#rubrica#
Codigo#Descricao#fator#TipoRubrica#
ValorFormatado#ParcelaAtual#ParcelaTotal'
UNION
SELECT FC.folha || '#' || folha->folhames || '#' || folha->folhaano . . .
The UNION will give the remaining rows. Note some conversion may be necessary on the columns data if not all strings.

Select data within specific month in SELECT-OPTIONS

This ABAP code works but it works only once. I run this code with different parameters but result data does not change. How can I solve it?
PARAMETERS : S_MONTH LIKE ISELLIST-MONTH OBLIGATORY.
SELECT-OPTIONS : S_DATE FOR SY-DATUM.
AT SELECTION-SCREEN ON VALUE-REQUEST FOR S_MONTH.
PERFORM GET_DATES.
FORM GET_DATES.
DATA: MONTH LIKE ISELLIST-MONTH,
FIRST_DAY LIKE SY-DATUM,
LAST_DAY LIKE SY-DATUM.
MONTH = SY-DATUM+0(6). "default
CALL FUNCTION 'POPUP_TO_SELECT_MONTH'
EXPORTING
ACTUAL_MONTH = MONTH
IMPORTING
SELECTED_MONTH = MONTH.
IF SY-SUBRC <> 0.
"put some message
ENDIF.
CONCATENATE MONTH '01' INTO FIRST_DAY.
CALL FUNCTION 'RP_LAST_DAY_OF_MONTHS'
EXPORTING
DAY_IN = FIRST_DAY
IMPORTING
LAST_DAY_OF_MONTH = LAST_DAY.
IF SY-SUBRC <> 0.
"put some message
ENDIF.
S_DATE-LOW = FIRST_DAY.
S_DATE-HIGH = LAST_DAY.
S_DATE-SIGN = 'I'.
S_DATE-OPTION = 'BT'.
APPEND S_DATE.
S_MONTH = MONTH.
ENDFORM.
Add REFRESH S_DATE. before the APPEND S_DATE. You are now just appending every selection you make.

SQL Give me the name of all the people that I sent the same file

I have a table that includes the userID that sent the file, the userID that the file was sent to, the filename and the date it was sent.
http://sqlfiddle.com/#!6/855cc6
I'm trying to get a statement that returns one row per filename sent with the list of records (one per file sent) with the names of the people I sent it to at the end of the row
Something like this:
01/08/2014 | "main doc" | "Jon P, Mike S, Ron W"
04/04/2014 | "other doc" | "Jon P, Mike S"
10/10/2014 | "last doc" | "Ron W"
(where the date is the oldest instance of the DateSent datetime field).
Sorry I don't know how to create functions in sqlfiddler so let's assume that there is a scalar function named "GetName(UserID)" that returns a name of the user passed as parameter. It returns one row only.
You can use FOR XML PATH to concatenate values like this:
SELECT DISTINCT
DateSent,
FileName,
SUBSTRING
(
(
SELECT CONCAT(',', t1.SentToUserID) --maybe GetName(t1.SentToUserID)
FROM FileSent t1
WHERE t1.FileName = t2.FileName AND t1.DateSent = t2.DateSent AND t1.UserID = t2.UserID
ORDER BY t1.FileName
FOR XML PATH ('')
), 2, 1000
) [SentFiles]
FROM FileSent t2
ORDER BY DateSent
Sample SQL Fiddle (two slightly different versions).
To get just the minimum date you can use MIN(DateSent) and GROUP BY on FileName and UserId
SELECT DISTINCT
MIN(DateSent) DateSent,
FileName,
STUFF ((SELECT CONCAT(',', t1.SentToUserID)
FROM FileSent T1
WHERE t1.FileName = t2.FileName AND t1.UserID = t2.UserID
FOR XML PATH('')
),1,1,'' ) [SentFiles]
FROM FileSent T2
GROUP BY FileName, UserID
SQL Fiddle for this.

DBI::Sybase data-conversion resulted in overflow

I am writing a Perl script that is using the DBI module and is connecting to a Sybase DB. I am calling a stored procedure (one that I don't have access to so I cannot post sample code) and when I get data back I get an error that reads "error_handler: Data-conversion resulted in overflow". I still get data back and after doing some intensive research it seems that some data types in the columns (such as BigInt, nvarchar, etc) are the culprits. Now the question is, how can I fix this? Can this be fixed on the client side or can it only be fixed on the server side?
my $dbh = DBI->connect("DBI:Sybase:server=$server", $username, $password, {PrintError => 0}) or die;
$dbh->do("use $database") or die;
my $sql = &getQuery;
my $sth = $dbh->prepare($sql) or die;
$sth->execute() or die;
while ($rowRef = $sth->fetchrow_arrayref) #Error seems to occur here
{
#Parse through each row
}
Part of the FreeTDS 0.82 log that explains the problem:
_ct_bind_data(): column 7 is type 38 and has length 8
_ct_get_server_type(0)
_ct_get_client_type(type 38, user 0, size 8)
cs_convert(0x18dfed40, 0x7fff73216050, 0x18e44250, 0x7fff73215fa0, 0x18e387c0, 0x18e45a64)
_ct_get_server_type(30)
_ct_get_server_type(0)
converting type 127 (8 bytes) to type = 47 (9 bytes)
cs_convert() calling tds_convert
cs_convert() tds_convert returned 10
cs_prretcode(0)
cs_convert() returning CS_FAIL
cs_convert-result = 1
The problem is on the FreeTDS side. I've had the same problem before and successfully fixed it by converting the returned fields to varchar in the select statement.
Given you don't have access to modify the original query, you can do some regex search and replace on the returned $sql variable in your code. In particular, if the original query has a part that looks like
SELECT field1, field2, field3 FROM ...
After you retrieve the query statement, you may run
my $new_sql;
if ($sql =~ /SELECT\s+(.*)\s+FROM/i) { # match selected field string
my $field_str = $1;
my #fields = split ",", $field_str; # parse individual fields
map s/\s//g, #fields; # get rid of spaces
my $new_str = join ", ", (map {sprintf "convert(varchar, $_)"} #fields); # construct new query string
my $quoted_field_str = quotemeta($field_str); # prepare regex replacement string
$new_sql = $sql;
$new_sql =~ s/$quoted_field_str/$new_str/i # actual replacement
}
print $new_sql;
Of course, if your original statement is more complex, you should print it out and check how to modify it with a generic replacement bearing the same spirit. Alternatively, you can ask your DBA (or whoever has access to the stored procedure) to modify the actual query directly.
Hope this helps.