pynestkernel ImportError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory - nest-simulator

When installing NEST 2.18 with:
cmake \
-Dwith-mpi=/usr/lib/x86_64-linux-gnu/openmpi \
-Dwith-python=3 \
-DPYTHON_EXECUTABLE=/home/robin/.pyenv/versions/3.8.6/bin/python \
-DPYTHON_LIBRARY=/home/robin/.pyenv/versions/3.8.6/lib/libpython3.8.so \
-DPYTHON_INCLUDE_DIR=/home/robin/.pyenv/versions/3.8.6/include/python3.8/ \
-DCMAKE_INSTALL_PREFIX=/home/robin/nest-install \
..
It seems that NEST 2.18 tries to look for libmpi_cxx.so.20 even though it doesn't exist and isn't part of the installed mpi lib
$ ldd nest-install/lib/python3.8/site-packages/nest/pynestkernel.so
linux-vdso.so.1 (0x00007fff3bb37000)
libpython3.8.so.1.0 => /home/robin/.pyenv/versions/3.8.6/lib/libpython3.8.so.1.0 (0x00007feaa1401000)
libnest.so => /nest/2.18/lib/libnest.so (0x00007feaa11c1000)
libmodels.so => /nest/2.18/lib/libmodels.so (0x00007feaa0930000)
libtopology.so => /nest/2.18/lib/libtopology.so (0x00007feaa0691000)
libnestkernel.so => /nest/2.18/lib/libnestkernel.so (0x00007feaa0335000)
librandom.so => /nest/2.18/lib/librandom.so (0x00007feaa00e9000)
libsli.so => /nest/2.18/lib/libsli.so (0x00007fea9fd9a000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fea9fba1000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fea9fb86000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fea9f994000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fea9f971000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fea9f969000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fea9f964000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fea9f815000)
libprecise.so => /nest/2.18/lib/libprecise.so (0x00007fea9f592000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007fea9f587000)
libmpi_cxx.so.20 => /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so.20 (0x00007fea9f567000)
libmpi.so.20 => not found
libnestutil.so => /nest/2.18/lib/libnestutil.so (0x00007fea9f363000)
libgsl.so.23 => /usr/lib/x86_64-linux-gnu/libgsl.so.23 (0x00007fea9f0e7000)
libgslcblas.so.0 => /usr/lib/x86_64-linux-gnu/libgslcblas.so.0 (0x00007fea9f0a5000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fea9f063000)
/lib64/ld-linux-x86-64.so.2 (0x00007feaa180a000)
libmpi.so.20 => not found
libmpi.so.20 => not found
libmpi.so.20 => not found
libmpi.so.20 => not found
libmpi.so.40 => /usr/local/lib/libmpi.so.40 (0x00007fea9ed39000)
libopen-rte.so.40 => /usr/local/lib/libopen-rte.so.40 (0x00007fea9ea83000)
libopen-pal.so.40 => /usr/local/lib/libopen-pal.so.40 (0x00007fea9e76b000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fea9e760000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fea9e744000)
I've tried to change all of the cmake variables using ccmake but I can't get it to link to libmpi_cxx.so.40.
Even without MPI support it includes this link, seems like a bug?
robin#robin-ZenBook-UX533FN:~$ ldd nest-install/lib/python3.8/site-packages/nest/pynestkernel.so
linux-vdso.so.1 (0x00007ffe63518000)
libpython3.8.so.1.0 => /home/robin/.pyenv/versions/3.8.6/lib/libpython3.8.so.1.0 (0x00007fa1e3c37000)
libnest.so => /nest/2.18/lib/libnest.so (0x00007fa1e39f7000)
libmodels.so => /nest/2.18/lib/libmodels.so (0x00007fa1e3166000)
libtopology.so => /nest/2.18/lib/libtopology.so (0x00007fa1e2ec7000)
libnestkernel.so => /nest/2.18/lib/libnestkernel.so (0x00007fa1e2b6b000)
librandom.so => /nest/2.18/lib/librandom.so (0x00007fa1e291f000)
libsli.so => /nest/2.18/lib/libsli.so (0x00007fa1e25d0000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa1e23d7000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa1e23bc000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa1e21ca000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa1e21a7000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa1e219f000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fa1e219a000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa1e204b000)
libprecise.so => /nest/2.18/lib/libprecise.so (0x00007fa1e1dc8000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007fa1e1dbd000)
libmpi_cxx.so.20 => not found
libmpi.so.20 => not found
libnestutil.so => /nest/2.18/lib/libnestutil.so (0x00007fa1e1bb7000)
libgsl.so.23 => /usr/lib/x86_64-linux-gnu/libgsl.so.23 (0x00007fa1e193b000)
libgslcblas.so.0 => /usr/lib/x86_64-linux-gnu/libgslcblas.so.0 (0x00007fa1e18f9000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fa1e18b7000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa1e4040000)
libmpi_cxx.so.20 => not found
libmpi.so.20 => not found
libmpi_cxx.so.20 => not found
libmpi.so.20 => not found
libmpi_cxx.so.20 => not found
libmpi.so.20 => not found
libmpi_cxx.so.20 => not found
libmpi.so.20 => not found
The full error when importing it is:
>>> import nest
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/robin/nest-install/lib/python3.8/site-packages/nest/__init__.py", line 26, in <module>
from . import ll_api # noqa
File "/home/robin/nest-install/lib/python3.8/site-packages/nest/ll_api.py", line 72, in <module>
from . import pynestkernel as kernel # noqa
ImportError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory

Related

map() method works to multiple textfiles in Scala spark intellij

I want to make some operations at text which I read it from multiple textfile, but the map() method takes every file Separately. As example I do:
val text = sc.wholeTextFiles("src/folder").map(a => a._2)
.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)
and the result is:
(hi , 1) //from the first file
(hi , 1) // from the second file
I want the result to be: (hi,2)
I'm thinking in for loop, but it's does not seem flexibility because I don't know the number of text files
I tried your code in spark-shell and these are my findings:
I have 2 files:
csv1 -> hi
csv2 -> hi hi hi
The result was ok after I removed the end lines:
val text = sc.wholeTextFiles("testSO/").map(a => a._2).flatMap(line => line.split(" ")).map(line => line.replace("\n","")).map(word => (word,1)).reduceByKey(_+_).foreach(println)
Output:
(hi,4)
This was the result without removing endlines:
scala> val text = sc.wholeTextFiles("testSO/").map(a => a._2).flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_).foreach(println)
Output:
(hi,2)
(hi
,2)

Turn a bi-dimensional matrix into perl hash

I have a genotype matrix in which:
Rows represent locus
Columns represent samples
Each value represents a genotype which could be either P1/P1, P2/P2, P1/P2 or NA if the genotype is not determined.
I'd like to turn this matrix into a perl HoH in order to get the genotype (specific then to the single sample and its locus).
My matrix looks like:
CDS BC1-III BC1-IV BC10-II
LOC105031928 P1/P2 P1/P2 P1/P2
LOC105031930 NA NA NA
LOC105031931 P1/P1 P1/P1 P1/P1
LOC105031933 P1/P1 P1/P1 P1/P1
LOC105031934 NA NA NA
LOC105031935 P1/P1 P1/P1 P1/P1
LOC105031937 NA NA NA
LOC105031938 P1/P1 P1/P1 P1/P1
As an output, the code should give:
$hash{$sample}{$locus} = P1/P1 #(for locus LOC105031935 in sample BC10-II for example)
Here's what i've tried to solve the problem but I cant figure out yet how to assign in the hash each locus of the first column as a second key of this hashtable. #sample_names is a list of the three samples.
open(GENOTYPE, '<', "$matrix_geno") or die ("Cannot open $matrix_geno\n");
my %hash;
while (my $line = <GENOTYPE>)
{
my #columns = split(/\s+/, $line);
#hash{#sample_names} = #columns;
#print Dumper \%hash;
}
Any help will be seriously welcomed.
PS: This example is a small part of my data. I'm actually seeking for a more general solution
Thank you very much.
Code:
#!/usr/bin/perl
use strict; use warnings; use Data::Dumper;
my $matrix_geno = 'input.io';
open ( my $GENOTYPE, '<', "$matrix_geno" ) or die ($!);
my $header = <$GENOTYPE>;
chomp($header);
my #headers = split( /\s+/, $header );
my %hash = ();
while ( my $line = <$GENOTYPE> ) {
chomp($line);
my #columns_data = split( /\s+/, $line );
$hash{$columns_data[0]}{$headers[1]} = $columns_data[1];
$hash{$columns_data[0]}{$headers[2]} = $columns_data[2];
$hash{$columns_data[0]}{$headers[3]} = $columns_data[3];
}
print Dumper(\%hash);
close($GENOTYPE);
OUTPUT:
$VAR1 = {
'LOC105031933' => {
'BC1-III' => 'P1/P1',
'BC10-II' => 'P1/P1',
'BC1-IV' => 'P1/P1'
},
'LOC105031934' => {
'BC1-III' => 'NA',
'BC10-II' => 'NA',
'BC1-IV' => 'NA'
},
'LOC105031938' => {
'BC1-IV' => 'P1/P1',
'BC1-III' => 'P1/P1',
'BC10-II' => 'P1/P1'
},
'LOC105031931' => {
'BC10-II' => 'P1/P1',
'BC1-III' => 'P1/P1',
'BC1-IV' => 'P1/P1'
},
'LOC105031937' => {
'BC1-IV' => 'NA',
'BC10-II' => 'NA',
'BC1-III' => 'NA'
},
'LOC105031935' => {
'BC1-III' => 'P1/P1',
'BC10-II' => 'P1/P1',
'BC1-IV' => 'P1/P1'
},
'LOC105031928' => {
'BC1-IV' => 'P1/P2',
'BC10-II' => 'P1/P2',
'BC1-III' => 'P1/P2'
},
'LOC105031930' => {
'BC1-III' => 'NA',
'BC10-II' => 'NA',
'BC1-IV' => 'NA'
}
};
Is this is the output you wanted?
Hope this helps and please change it your need.
This seems to do what you want. I'm reading from DATA for simplicity.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
# Read headers
chomp(my $headers = <DATA>);
my #samples = split /\s+/, $headers;
# Remove 'CDS'
shift #samples;
my %genotype;
while (<DATA>) {
chomp;
my ($locus, #genotypes) = split;
for my $x (0 .. $#samples) {
$genotype{$samples[$x]}{$locus} = $genotypes[$x];
}
}
# Display the data structure
say Dumper \%genotype;
# Simple test
say $genotype{'BC10-II'}{LOC105031935};
__DATA__
CDS BC1-III BC1-IV BC10-II
LOC105031928 P1/P2 P1/P2 P1/P2
LOC105031930 NA NA NA
LOC105031931 P1/P1 P1/P1 P1/P1
LOC105031933 P1/P1 P1/P1 P1/P1
LOC105031934 NA NA NA
LOC105031935 P1/P1 P1/P1 P1/P1
LOC105031937 NA NA NA
LOC105031938 P1/P1 P1/P1 P1/P1
The output is as follows:
$VAR1 = {
'BC10-II' => {
'LOC105031931' => 'P1/P1',
'LOC105031935' => 'P1/P1',
'LOC105031930' => 'NA',
'LOC105031928' => 'P1/P2',
'LOC105031937' => 'NA',
'LOC105031938' => 'P1/P1',
'LOC105031933' => 'P1/P1',
'LOC105031934' => 'NA'
},
'BC1-IV' => {
'LOC105031934' => 'NA',
'LOC105031933' => 'P1/P1',
'LOC105031938' => 'P1/P1',
'LOC105031937' => 'NA',
'LOC105031928' => 'P1/P2',
'LOC105031930' => 'NA',
'LOC105031935' => 'P1/P1',
'LOC105031931' => 'P1/P1'
},
'BC1-III' => {
'LOC105031931' => 'P1/P1',
'LOC105031935' => 'P1/P1',
'LOC105031930' => 'NA',
'LOC105031928' => 'P1/P2',
'LOC105031937' => 'NA',
'LOC105031938' => 'P1/P1',
'LOC105031933' => 'P1/P1',
'LOC105031934' => 'NA'
}
};
P1/P1

Usage example of scalaz-stream's inflate

In the following usage example of scalaz-stream (taken from the documentation), what do I need to change if the input and/or output is a gzipped file? In other words, how do I use compress?
import scalaz.stream._
import scalaz.concurrent.Task
val converter: Task[Unit] =
io.linesR("testdata/fahrenheit.txt")
.filter(s => !s.trim.isEmpty && !s.startsWith("//"))
.map(line => fahrenheitToCelsius(line.toDouble).toString)
.intersperse("\n")
.pipe(text.utf8Encode)
.to(io.fileChunkW("testdata/celsius.txt"))
.run
// at the end of the universe...
val u: Unit = converter.run
Compressing the output is easy. Since compress.deflate() is a Process1[ByteVector, ByteVector] you need to plug it into your pipeline where you are emitting ByteVectors (that is right after text.utf8Encode which is a Process1[String, ByteVector]):
val converter: Task[Unit] =
io.linesR("testdata/fahrenheit.txt")
.filter(s => !s.trim.isEmpty && !s.startsWith("//"))
.map(line => fahrenheitToCelsius(line.toDouble).toString)
.intersperse("\n")
.pipe(text.utf8Encode)
.pipe(compress.deflate())
.to(io.fileChunkW("testdata/celsius.zip"))
.run
For inflate you can't use io.linesR to read the compressed file. You need a process that produces ByteVectors instead of Strings in order to pipe them into inflate. (You could use io.fileChunkR for that.) The next step would be decoding the uncompressed data to Strings (with text.utf8Decode for example) and then using text.lines() to emit the text line by line. Something like this should do the trick:
val converter: Task[Unit] =
Process.constant(4096).toSource
.through(io.fileChunkR("testdata/fahrenheit.zip"))
.pipe(compress.inflate())
.pipe(text.utf8Decode)
.pipe(text.lines())
.filter(s => !s.trim.isEmpty && !s.startsWith("//"))
.map(line => fahrenheitToCelsius(line.toDouble).toString)
.intersperse("\n")
.pipe(text.utf8Encode)
.to(io.fileChunkW("testdata/celsius.txt"))
.run

Facing error while extending scala class with Product interface to overcome limit of 22 fields in spark-shell

I need to create a class schema to support 29 fields. Due to limit of 22 fields with case class I tried extending my class "sdp_d" with the Product interface as follows:
class sdp_d( WID :Option[Int], BATCH_ID :Option[Int], SRC_ID :Option[String], ORG_ID :Option[Int], CLASS_WID :Option[Int], DESC_TEXT :Option[String], PREMISE_WID :Option[Int], FEED_LOC :Option[String], GPS_LAT :Option[Double], GPS_LONG :Option[Double], PULSE_OUTPUT_BLOCK :Option[String], UDC_ID :Option[String], UNIVERSAL_ID :Option[String], IS_VIRTUAL_FLG :Option[String], SEAL_INFO :Option[String], ACCESS_INFO :Option[String], ALT_ACCESS_INFO :Option[String], LOC_INFO :Option[String], ALT_LOC_INFO :Option[String], TYPE :Option[String], SUB_TYPE :Option[String], TIMEZONE_ID :Option[Int], GIS_ID :Option[String], BILLED_UPTO_TIME :Option[java.sql.Timestamp], POWER_STATUS :Option[String], LOAD_STATUS :Option[String], BILLING_HOLD_STATUS :Option[String], INSERT_TIME :Option[java.sql.Timestamp], LAST_UPD_TIME :Option[java.sql.Timestamp]) extends Product{
#throws(classOf[IndexOutOfBoundsException])
override def productElement(n: Int) = n match
{
case 0 => WID
case 1 => BATCH_ID
case 2 => SRC_ID
case 3 => ORG_ID
case 4 => CLASS_WID
case 5 => DESC_TEXT
case 6 => PREMISE_WID
case 7 => FEED_LOC
case 8 => GPS_LAT
case 9 => GPS_LONG
case 10 => PULSE_OUTPUT_BLOCK
case 11 => UDC_ID
case 12 => UNIVERSAL_ID
case 13 => IS_VIRTUAL_FLG
case 14 => SEAL_INFO
case 15 => ACCESS_INFO
case 16 => ALT_ACCESS_INFO
case 17 => LOC_INFO
case 18 => ALT_LOC_INFO
case 19 => TYPE
case 20 => SUB_TYPE
case 21 => TIMEZONE_ID
case 22 => GIS_ID
case 23 => BILLED_UPTO_TIME
case 24 => POWER_STATUS
case 25 => LOAD_STATUS
case 26 => BILLING_HOLD_STATUS
case 27 => INSERT_TIME
case 28 => LAST_UPD_TIME
case _ => throw new IndexOutOfBoundsException(n.toString())
}
override def productArity: Int = 29
override def canEqual(that: Any): Boolean = that.isInstanceOf[sdp_d]
}
This defined the class "sdp_d". However when I try to load csv data into with this pre-defined schema and register it as table I get an error:
> scala> import java.text.SimpleDateFormat; val sdf = new SimpleDateFormat("yyyy-mm-dd hh:mm:ss.S"); import java.util.Calendar; import java.util.Date; val calendar = Calendar.getInstance()
import java.text.SimpleDateFormat
sdf: java.text.SimpleDateFormat = java.text.SimpleDateFormat#cce61785
import java.util.Calendar
import java.util.Date
calendar: java.util.Calendar = java.util.GregorianCalendar[time=1424687963209,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="Asia/Kolkata",offset=19800000,dstSavings=0,useDaylight=false,transitions=6,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2015,MONTH=1,WEEK_OF_YEAR=9,WEEK_OF_MONTH=4,DAY_OF_MONTH=23,DAY_OF_YEAR=54,DAY_OF_WEEK=2,DAY_OF_WEEK_IN_MONTH=4,AM_PM=1,HOUR=4,HOUR_OF_DAY=16,MINUTE=9,SECOND=23,MILLISECOND=209,ZONE_OFFSET=19800000,DST_OFFSET=0]
> scala> sc.textFile("hdfs://CDH-Master-1.cdhcluster/user/spark/Sdp_d.csv").map(_.split(",")).map { r =>
| val upto_time = sdf.parse(r(23).trim);
| calendar.setTime(upto_time);
| val r23 = new java.sql.Timestamp(upto_time.getTime);
|
| val insert_time = sdf.parse(r(26).trim);
| calendar.setTime(insert_time);
| val r26 = new java.sql.Timestamp(insert_time.getTime);
|
| val last_upd_time = sdf.parse(r(27).trim);
| calendar.setTime(last_upd_time);
| val r27 = new java.sql.Timestamp(last_upd_time.getTime);
|
| sdp_d(r(0).trim.toInt, r(1).trim.toInt, r(2).trim, r(3).trim.toInt, r(4).trim.toInt, r(5).trim, r(6).trim.toInt, r(7).trim, r(8).trim.toDouble, r(9).trim.toDouble, r(10).trim, r(11).trim, r(12).trim, r(13).trim, r(14).trim, r(15).trim, r(16).trim, r(17).trim, r(18).trim, r(19).trim, r(20).trim, r(21).trim.toInt, r(22).trim, r23, r(24).trim, r(25).trim, r26, r27, r(28).trim)
| }.registerAsTable("sdp")
<console>:36: error: not found: value sdp_d
sdp_d(r(0).trim.toInt, r(1).trim.toInt, r(2).trim, r(3).trim.toInt, r(4).trim.toInt, r(5).trim, r(6).trim.toInt, r(7).trim, r(8).trim.toDouble, r(9).trim.toDouble, r(10).trim, r(11).trim, r(12).trim, r(13).trim, r(14).trim, r(15).trim, r(16).trim, r(17).trim, r(18).trim, r(19).trim, r(20).trim, r(21).trim.toInt, r(22).trim, r23, r(24).trim, r(25).trim, r26, r27, r(28).trim)
^`
I am working in spark-shell. Spark version 1.1.0 and scala version 2.10.4.
I don't understand why the error : not found: value sdp_d.
How am I supposed to registerAsTable when I create my own class extending Product interface??
Please help in resolving the error.
Did you happen to have a look at https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
You should just instantiate the class with new:
new sdp_d(r(0).trim.toInt, r(1).trim.toInt, ...
You may:
Instantiate with the new keyword new sdp_d(...)
You declared fields as Option[T], eg Option[Int], so we need to pass Option[T] as parameters (Some or None).
new sdp_d(Try(r(0).trim.toInt).toOption, Try(r(1).trim.toInt).toOption, l(2).trim.toOption, ...)
This works for me:
//AirTraffic.scala
class AirTraffic(Year:Option[Int], Month:Option[Int], DayOfMonth:Option[Int], DayOfWeek:Option[Int],
DepTime:Option[Int], CRSDepTime:Option[Int], ArrTime:Option[Int], CRSArrTime:Option[Int],
UniqueCarrier:String, FlightNum:Option[Int], TailNum:String, ActualElapsedTime:Option[Int],
CRSElapsedTime:Option[Int], AirTime:Option[Int], ArrDelay:Option[Int], DepDelay:Option[Int],
Origin:String, Dest:String, Distance:Option[Int], TaxiIn:Option[Int], TaxiOut:Option[Int],
Cancelled:Option[Boolean], CancellationCode:String, Diverted:Option[Boolean], CarrierDelay:Option[Int],
WeatherDelay:Option[Int], NASDelay:Option[Int], SecurityDelay:Option[Int], LateAircraftDelay:Option[Int]) extends Product {
// We declare field with Option[T] type to make that field null-able.
override def productElement(n: Int): Any =
n match {
case 0 => Year
case 1 => Month
case 2 => DayOfMonth
case 3 => DayOfWeek
case 4 => DepTime
case 5 => CRSDepTime
case 6 => ArrTime
case 7 => CRSArrTime
case 8 => UniqueCarrier
case 9 => FlightNum
case 10 => TailNum
case 11 => ActualElapsedTime
case 12 => CRSElapsedTime
case 13 => AirTime
case 14 => ArrDelay
case 15 => DepDelay
case 16 => Origin
case 17 => Dest
case 18 => Distance
case 19 => TaxiIn
case 20 => TaxiOut
case 21 => Cancelled
case 22 => CancellationCode
case 23 => Diverted
case 24 => CarrierDelay
case 25 => WeatherDelay
case 26 => NASDelay
case 27 => SecurityDelay
case 28 => LateAircraftDelay
case _ => throw new IndexOutOfBoundsException(n.toString)
}
override def productArity: Int = 29
override def canEqual(that: Any): Boolean = that.isInstanceOf[AirTraffic]
}
//main.scala
val data = sparkContext.textFile("local-input/AIRLINE/2008.csv").map(_.split(","))
.map(l => new AirTraffic(Try(l(0).trim.toInt).toOption, Try(l(1).trim.toInt).toOption, Try(l(2).trim.toInt).toOption, Try(l(3).trim.toInt).toOption,
Try(l(4).trim.toInt).toOption, Try(l(5).trim.toInt).toOption, Try(l(6).trim.toInt).toOption, Try(l(7).trim.toInt).toOption,
l(8).trim, Try(l(9).trim.toInt).toOption, l(10).trim, Try(l(11).trim.toInt).toOption,
Try(l(12).trim.toInt).toOption, Try(l(13).trim.toInt).toOption, Try(l(14).trim.toInt).toOption, Try(l(15).trim.toInt).toOption,
l(16).trim, l(17).trim, Try(l(18).trim.toInt).toOption, Try(l(19).trim.toInt).toOption, Try(l(20).trim.toInt).toOption,
Try(l(21).trim.toBoolean).toOption, l(22).trim, Try(l(23).trim.toBoolean).toOption, Try(l(24).trim.toInt).toOption,
Try(l(25).trim.toInt).toOption, Try(l(26).trim.toInt).toOption, Try(l(27).trim.toInt).toOption, Try(l(28).trim.toInt).toOption)).toDF()
// register table with SQLContext
data.registerTempTable("AirTraffic")
val count = sqlContext.sql("SELECT COUNT(*) FROM AirTraffic").collect()
count.foreach(print)
If you think it's still ugly, we can do more by:
implicit class StringConverter(val s: String) extends AnyVal {
def tryGetInt = Try(s.trim.toInt).toOption
def tryGetString = {
val res = s.trim
if (res.isEmpty) None else res
}
def tryGetBoolean = Try(s.trim.toBoolean).toOption
}
then
val data = sparkContext.textFile("local-input/AIRLINE/2008.csv").map(_.split(","))
.map(l => new AirTraffic(l(0).tryGetInt, l(1).tryGetInt, l(2).tryGetInt, l(3).tryGetInt,
l(4).tryGetInt, l(5).tryGetInt, l(6).tryGetInt, l(7).tryGetInt,
l(8).trim, l(9).tryGetInt, l(10).trim, l(11).tryGetInt,
l(12).tryGetInt, l(13).tryGetInt, l(14).tryGetInt, l(15).tryGetInt,
l(16).trim, l(17).trim, l(18).tryGetInt, l(19).tryGetInt, l(20).tryGetInt,
l(21).tryGetBoolean, l(22).trim, l(23).tryGetBoolean, l(24).tryGetInt,
l(25).tryGetInt, l(26).tryGetInt, l(27).tryGetInt, l(28).tryGetInt)).toDF()

clapack.so: undefined symbol: clapack_sgesv on RHEL

I'm getting this error when importing scipy.stats:
import scipy.stats
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/site-packages/scipy/stats/__init__.py", line 322, in <module>
from stats import *
File "/usr/lib64/python2.6/site-packages/scipy/stats/stats.py", line 194, in <module>
import scipy.linalg as linalg
File "/usr/lib64/python2.6/site-packages/scipy/linalg/__init__.py", line 116, in <module>
from basic import *
File "/usr/lib64/python2.6/site-packages/scipy/linalg/basic.py", line 12, in <module>
from lapack import get_lapack_funcs
File "/usr/lib64/python2.6/site-packages/scipy/linalg/lapack.py", line 15, in <module>
from scipy.linalg import clapack
ImportError: /usr/lib64/python2.6/site-packages/scipy/linalg/clapack.so: undefined symbol: clapack_sgesv
Looks like clapack.so links to the full, ATLAS version of libatlas:
ldd /usr/lib64/python2.6/site-packages/scipy/linalg/clapack.so
linux-vdso.so.1 => (0x00007fff232e6000)
liblapack.so.3 => /usr/lib64/liblapack.so.3 (0x00007f23b8ad7000)
libptf77blas.so.3 => /usr/lib64/atlas/libptf77blas.so.3 (0x00007f23b88b7000)
libptcblas.so.3 => /usr/lib64/atlas/libptcblas.so.3 (0x00007f23b8697000)
libatlas.so.3 => /usr/lib64/atlas/libatlas.so.3 (0x00007f23b8120000)
libpython2.6.so.1.0 => /usr/lib64/libpython2.6.so.1.0 (0x00007f23b7d65000)
libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007f23b7a73000)
libm.so.6 => /lib64/libm.so.6 (0x00007f23b77da000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f23b75c3000)
libc.so.6 => /lib64/libc.so.6 (0x00007f23b7232000)
libblas.so.3 => /usr/lib64/libblas.so.3 (0x00007f23b6fdb000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f23b6dbd000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f23b6bb9000)
libutil.so.1 => /lib64/libutil.so.1 (0x00007f23b69b6000)
/lib64/ld-linux-x86-64.so.2 (0x00000032a2200000)
Any ideas?