Got wrong answer in implementing merge sort in python - mergesort

I am getting wrong answer for particular input arrays in this merge sort implementation.
I tried with this code below in python.
Python code -
a=[100,3,4]
b=[]
for i in range(len(a)):
b.append(0)
def ms( a ,lb,ub ):
if (lb<ub):
mid=int((lb+ub)/2)
ms(a, lb, mid)
ms(a, mid+1,ub)
merge(a,lb,mid,ub)
def merge(a,lb,mid,ub):
i=lb
j=mid+1
k=lb
while (i<=mid and j<=ub) :
if a[i]<=a[j]:
b[k]=a[i]
i+=1
k+=1
else:
b[k]=a[j]
j+=1
k+=1
if (i>mid) :
while j<=ub :
b[k]=a[j]
j+=1
k+=1
elif (j>ub) :
while i<=mid :
b[k]=a[i]
i+=1
k+=1
ms(a,0 , len(a)-1)
print(b)
i am getting output wrong answer.
Please go through this.

There are several problems with this code. You don't ask for a fix, and I can imagine at least two ways to go about fixing it, so I'll leave it to you, but essentially the fundamental problem in your implementation is that you're merging into b twice, both times at the beginning. That overwrites what the first one does.
If you add a print statement right before ms calls merge, you'll see that one call to ms turns b into the list [3, 0, 0], and a second call turns it into the list [4, 100, 0]. In other words, you've lost information. This happens because merge always initializes k=lb.
IMHO you should not try to perform merge sort using a global list in this manner.

Related

How to translate this code into python 3?

This code is originally written in Python 2 and I need to translate it in python 3!
I'm sorry for not sharing enough information:
Also, here's the part where self.D was first assigned:
def __init__(self,instance,transformed,describe1,describe2):
self.D=[]
self.instance=instance
self.transformed=transformed
self.describe1,self.describe2=describe1,describe2
self.describe=self.describe1+', '+self.describe2 if self.describe2 else self.describe1
self.column_num=self.tuple_num=self.view_num=0
self.names=[]
self.types=[]
self.origins=[]
self.features=[]
self.views=[]
self.classify_id=-1
self.classify_num = 1
self.classes=[]
def generateViews(self):
T=map(list,zip(*self.D))
if self.transformed==0:
s= int( self.column_num)
for column_id in range(s):
f = Features(self.names[column_id],self.types[column_id],self.origins[column_id])
#calculate min,max for numerical,temporal
if f.type==Type.numerical or f.type==Type.temporal:
f.min,f.max=min(T[column_id]),max(T[column_id])
if f.min==f.max:
self.types[column_id]=f.type=Type.none
self.features.append(f)
continue
d={}
#calculate distinct,ratio for categorical,temporal
if f.type == Type.categorical or f.type == Type.temporal:
for i in range(self.tuple_num):
print([type(self.D[i]) for i in range(self.tuple_num)])
if self.D[i][column_id] in d:
d[self.D[i][column_id]]+=1
else:
d[self.D[i][column_id]]=1
f.distinct = len(d)
f.ratio = 1.0 * f.distinct / self.tuple_num
f.distinct_values=[(k,d[k]) for k in sorted(d)]
if f.type==Type.temporal:
self.getIntervalBins(f)
self.features.append(f)
TypeError: 'map' object is not subscriptable
The snippet you have given is not enough to solve the problem. The problem lies in self.D which you are trying to subscript using self.D[i]. Please look into your code where self.D is instantiated and make sure that its an array-like variable so that you can subscript it.
Edit
based on your edit, please confirm that whether self.D[i] is also array-like for all i in the range mentioned in the code. you can do that by simply
print([type(self.D[i]) for i in range(self.tuple_num))
share the response of this code, so that I may help further.
Edit-2
As per your comments and the edited code snippet, it seems that self.D is the output of some map function. In python 2, map is a function that returns a list. However, in python3 map is a class that when invoked, creates a map object.
The simplest way to resolve this is the find out the line where self.D was first assigned, and whatever code is in the RHS, wrap it with a list(...) function.
Alternately, just after this line
T=map(list,zip(*self.D))
add the following
self.D = list(self.D)
Hope this will resolve the issue
We don't have quite enough information to answer the question, but in Python 3, generator and map objects are not subscriptable. I think it may be in your
self.D[i]
variable, because you claim that self.D is a list, but it is possible that self.D[i] is a map object.
In your case, to access the indexes, you can convert it to a list:
list(self.D)[i]
Or use unpacking to implicitly convert to a list (this may be more condensed, but remember that explicit is better than implicit):
[*self.D[i]]

Turn off Warning: Extension: Conversion from LOGICAL(4) to INTEGER(4) at (1) for gfortran?

I am intentionally casting an array of boolean values to integers but I get this warning:
Warning: Extension: Conversion from LOGICAL(4) to INTEGER(4) at (1)
which I don't want. Can I either
(1) Turn off that warning in the Makefile?
or (more favorably)
(2) Explicitly make this cast in the code so that the compiler doesn't need to worry?
The code will looking something like this:
A = (B.eq.0)
where A and B are both size (n,1) integer arrays. B will be filled with integers ranging from 0 to 3. I need to use this type of command again later with something like A = (B.eq.1) and I need A to be an integer array where it is 1 if and only if B is the requested integer, otherwise it should be 0. These should act as boolean values (1 for .true., 0 for .false.), but I am going to be using them in matrix operations and summations where they will be converted to floating point values (when necessary) for division, so logical values are not optimal in this circumstance.
Specifically, I am looking for the fastest, most vectorized version of this command. It is easy to write a wrapper for testing elements, but I want this to be a vectorized operation for efficiency.
I am currently compiling with gfortran, but would like whatever methods are used to also work in ifort as I will be compiling with intel compilers down the road.
update:
Both merge and where work perfectly for the example in question. I will look into performance metrics on these and select the best for vectorization. I am also interested in how this will work with matrices, not just arrays, but that was not my original question so I will post a new one unless someone wants to expand their answer to how this might be adapted for matrices.
I have not found a compiler option to solve (1).
However, the type conversion is pretty simple. The documentation for gfortran specifies that .true. is mapped to 1, and false to 0.
Note that the conversion is not specified by the standard, and different values could be used by other compilers. Specifically, you should not depend on the exact values.
A simple merge will do the trick for scalars and arrays:
program test
integer :: int_sca, int_vec(3)
logical :: log_sca, log_vec(3)
log_sca = .true.
log_vec = [ .true., .false., .true. ]
int_sca = merge( 1, 0, log_sca )
int_vec = merge( 1, 0, log_vec )
print *, int_sca
print *, int_vec
end program
To address your updated question, this is trivial to do with merge:
A = merge(1, 0, B == 0)
This can be performed on scalars and arrays of arbitrary dimensions. For the latter, this can easily be vectorized be the compiler. You should consult the manual of your compiler for that, though.
The where statement in Casey's answer can be extended in the same way.
Since you convert them to floats later on, why not assign them as floats right away? Assuming that A is real, this could look like:
A = merge(1., 0., B == 0)
Another method to compliment #AlexanderVogt is to use the where construct.
program test
implicit none
integer :: int_vec(5)
logical :: log_vec(5)
log_vec = [ .true., .true., .false., .true., .false. ]
where (log_vec)
int_vec = 1
elsewhere
int_vec = 0
end where
print *, log_vec
print *, int_vec
end program test
This will assign 1 to the elements of int_vec that correspond to true elements of log_vec and 0 to the others.
The where construct will work for any rank array.
For this particular example you could avoid the logical all together:
A=1-(3-B)/3
Of course not so good for readability, but it might be ok performance-wise.
Edit, running performance tests this is 2-3 x faster than the where construct, and of course absolutely standards conforming. In fact you can throw in an absolute value and generalize as:
integer,parameter :: h=huge(1)
A=1-(h-abs(B))/h
and still beat the where loop.

Scala bufferedIterator incrementing head in inner function

Hi I'm seeing what I believe is odd behaviour in scala. Calling head on a bufferedIterator seems to be incrementing the head in a inner function. Either my expetations are wrong in which case why is the output correct. Or is the output wrong?
given:
import scala.io.Source
val source = Source.fromString("abcdef")
val buff1 = source.buffered;
println("outer head 1: " +buff1.head)
println("outer head 2: " +buff1.head)
def readLine():List[String] = {
def buffered = source.buffered
def readLine(tokens:List[String] , partialToken:String):List[String] = {
println("head1 " + buffered.head)
println("head2 " + buffered.head)
return Nil;
}
return (readLine(Nil, ""));
}
readLine();
The expected output of this to me is
outer head 1: a
outer head 2: a
head1: a
head2: a
actual output is as follows.
outer head 1: a
outer head 2: a
head1 b
head2 c
scala.io.Source is and behaves like an Iterator[Char]. So you must make sure not to use it in several places at once: Iterator.next is called 3 times from 3 different BufferedSource in your example, hence the different values you get out of it:
buff1.head: the buffered source has not buffered anything yet, so asking for head here calls next on the inner source, hence the first a.
buff1.head again: here the head has already been buffered, so you get a and the inner source isn't changed.
buffered.head: since buffered is a def, this is equivalent to source.buffered.head. This new buffered source has not buffered anything yet, so asking for head retrieves an element from the inner source, hence the b.
buffered.head: this creates yet another buffered source, same as above, and you get c.
The bottom line is: if you call source.buffered, never use source again directly, and do not call it several times either.
Your example can be fixed by calling buffered immediately:
val source = Source.fromString("abcdef").buffered
You could also turn def buffered = into val buffered = to make sure source.buffered is not called several times.
Calling head on a bufferedIterator seems to be incrementing the head in a inner function.
Note: (July 2016 3 years later)
Commit 11688eb shows:
SI-9691 BufferedIterator should expose a headOption
This exposes a new API to the BufferedIterator trait.
It will return the next element of an iterator as an Option.
The return will be Some(value) if there is a next value, and None if there is not a next element.
That should help avoid any kind of increment.
You are right, except it increments not a function, but a simple field : IndexedSeqLike on line 66, you can check it out by yourself using some IDE debbuger and following execution step by step

Scala while loop returns Unit all the time

I have the following code, but I can't get it to work. As soon as I place a while loop inside the case, it's returning a unit, no matter what I change within the brackets.
case While(c, body) =>
while (true) {
eval(Num(1))
}
}
How can I make this while loop return a non-Unit type?
I tried adding brackets around my while condition, but still it doesn't do what it's supposed to.
Any pointers?
Update
A little more background information since I didn't really explain what the code should do, which seems to be handy if I want to receive some help;
I have defined a eval(exp : Exp). This will evaluate a function.
Exp is an abstract class. Extended by several classes like Plus, Minus (few more basic operations) and a IfThenElse(cond : Exp, then : Exp, else : Exp). Last but not least, there's the While(cond: Exp, body: Exp).
Example of how it should be used;
eval(Plus(Num(1),Num(4)) would result in NumValue(5). (Evaluation of Num(v : Value) results in NumValue(v). NumValue extends Value, which is another abstract class).
eval(While(Lt(Num(1),Var("n")), Plus(Num(1), Var("n"))))
Lt(a : Exp, b : Exp) returns NumValue(1) if a < b.
It's probably clear from the other answer that Scala while loops always return Unit. What's nice about Scala is that if it doesn't do what you want, you can always extend it.
Here is the definition of a while-like construct that returns the result of the last iteration (it will throw an exception if the loop is never entered):
def whiley[T](cond : =>Boolean)(body : =>T) : T = {
#scala.annotation.tailrec
def loop(previous : T) : T = if(cond) loop(body) else previous
if(cond) loop(body) else throw new Exception("Loop must be entered at least once.")
}
...and you can then use it as a while. (In fact, the #tailrec annotation will make it compile into the exact same thing as a while loop.)
var x = 10
val atExit = whiley(x > 0) {
val squared = x * x
println(x)
x -= 1
squared
}
println("The last time x was printed, its square was : " + atExit)
(Note that I'm not claiming the construct is useful.)
Which iteration would you expect this loop to return? If you want a Seq of the results of all iterations, use a for expression (also called for comprehension). If you want just the last one, create a var outside the loop, set its value on each iteration, and return that var after the loop. (Also look into other looping constructs that are implemented as functions on different types of collections, like foldLeft and foldRight, which have their own interesting behaviors as far as return value goes.) The Scala while loop returns Unit because there's no sensible one size fits all answer to this question.
(By the way, there's no way for the compiler to know this, but the loop you wrote will never return. If the compiler could theoretically be smart enough to figure out that while(true) never terminates, then the expected return type would be Nothing.)
The only purpose of a while loop is to execute a side-effect. Or put another way, it will always evaluate to Unit.
If you want something meaningful back, why don't you consider using an if-else-expression or a for-expression?
As everyone else and their mothers said, while loops do not return values in Scala. What no one seems to have mentioned is that there's a reason for that: performance.
Returning a value has an impact on performance, so the compiler would have to be smart about when you do need that return value, and when you don't. There are cases where that can be trivially done, but there are complex cases as well. The compiler would have to be smarter, which means it would be slower and more complex. The cost was deemed not worth the benefit.
Now, there are two looping constructs in Scala (all the others are based on these two): while loops and recursion. Scala can optimize tail recursion, and the result is often faster than while loops. Or, otherwise, you can use while loops and get the result back through side effects.

Big text file processing

I need to implement lazy loading in Mathematica. I have a 600 Mb CSV text file which I need to process. This file contains a lot of duplicated records:
1;0;0;13;6
1;0;0;13;6
..........
2;0;0;13;6
2;0;0;13;6
..........
etc.
So instead of loading them all into memory, I'd like to create a list containing records and the number of times this record was encountered in the file:
{{10000,{1,0,0,13,6}}, {20000,{2,0,0,13,6}}, ...}
I couldn't find a way to do it with Import function. I'm looking for something like
Import["my_file.csv", "CSV", myProcessingFunction]
where myProcessingFunction will take one record at a time and create a dataset. Is it possible to do this with Import or any other Mathematica function?
If it were me, I'd probably do this using unix sort and uniq, but since you ask about Mathematica.... I'd use ReadList[] to read blocks of lines, and define downvalues to find the unique strings an keep track of how many we've seen before.
(* Create some test data *)
Export["/tmp/test.txt", Flatten[{Range[1000], Range[1000]}], "Lines"];
countUniqueLines[file_String, blockSize_Integer] := Module[{stream, map, block, keys, out},
map[_]:=0;
stream = OpenRead[file];
CheckAbort[While[(block=ReadList[stream, String, blockSize])=!={},
(map[#]=map[#]+1)& /# block;];, Close[stream];Clear[map]];
Close[stream];
keys = Cases[DownValues[map][[All, 1, 1, 1]], _String];
out = {#, map[#]}& /# keys;
Clear[map];
out
]
countUniqueLines["/tmp/test.txt", 500]
(* Alternative implementation if you have a little more memory *)
Tally[Import["/tmp/test.txt", "Lines"]]
I think you want the Read[] function.
Perhaps there are better alternatives than Mathematica for doing this.
A small awk script:
{a[$0]++}
END { ... print loop ... }
will accumulate the repeated records. Of course you may suffer overflows depending on the number of distinct records.
Or sort the file first and the counting will not overflow. In awk, the non-overflows program may be something like;
BEGIN{ p =""; i=0}
{if (($0 != p) && (i != 0) ) {print $0,i ; p =$0; i=0; next}}
{i++; p = $0}
Perhaps Perl is better, but I'm old fashioned.
HTH!
I would recommend you to consider loading it first into a database system like MySQL and then you can access it from Mathematica using the DatabaseLink.