pynestkernel ImportError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory - nest-simulator
When installing NEST 2.18 with:
cmake \
-Dwith-mpi=/usr/lib/x86_64-linux-gnu/openmpi \
-Dwith-python=3 \
-DPYTHON_EXECUTABLE=/home/robin/.pyenv/versions/3.8.6/bin/python \
-DPYTHON_LIBRARY=/home/robin/.pyenv/versions/3.8.6/lib/libpython3.8.so \
-DPYTHON_INCLUDE_DIR=/home/robin/.pyenv/versions/3.8.6/include/python3.8/ \
-DCMAKE_INSTALL_PREFIX=/home/robin/nest-install \
..
It seems that NEST 2.18 tries to look for libmpi_cxx.so.20 even though it doesn't exist and isn't part of the installed mpi lib
$ ldd nest-install/lib/python3.8/site-packages/nest/pynestkernel.so
linux-vdso.so.1 (0x00007fff3bb37000)
libpython3.8.so.1.0 => /home/robin/.pyenv/versions/3.8.6/lib/libpython3.8.so.1.0 (0x00007feaa1401000)
libnest.so => /nest/2.18/lib/libnest.so (0x00007feaa11c1000)
libmodels.so => /nest/2.18/lib/libmodels.so (0x00007feaa0930000)
libtopology.so => /nest/2.18/lib/libtopology.so (0x00007feaa0691000)
libnestkernel.so => /nest/2.18/lib/libnestkernel.so (0x00007feaa0335000)
librandom.so => /nest/2.18/lib/librandom.so (0x00007feaa00e9000)
libsli.so => /nest/2.18/lib/libsli.so (0x00007fea9fd9a000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fea9fba1000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fea9fb86000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fea9f994000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fea9f971000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fea9f969000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fea9f964000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fea9f815000)
libprecise.so => /nest/2.18/lib/libprecise.so (0x00007fea9f592000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007fea9f587000)
libmpi_cxx.so.20 => /usr/lib/x86_64-linux-gnu/openmpi/lib/libmpi_cxx.so.20 (0x00007fea9f567000)
libmpi.so.20 => not found
libnestutil.so => /nest/2.18/lib/libnestutil.so (0x00007fea9f363000)
libgsl.so.23 => /usr/lib/x86_64-linux-gnu/libgsl.so.23 (0x00007fea9f0e7000)
libgslcblas.so.0 => /usr/lib/x86_64-linux-gnu/libgslcblas.so.0 (0x00007fea9f0a5000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fea9f063000)
/lib64/ld-linux-x86-64.so.2 (0x00007feaa180a000)
libmpi.so.20 => not found
libmpi.so.20 => not found
libmpi.so.20 => not found
libmpi.so.20 => not found
libmpi.so.40 => /usr/local/lib/libmpi.so.40 (0x00007fea9ed39000)
libopen-rte.so.40 => /usr/local/lib/libopen-rte.so.40 (0x00007fea9ea83000)
libopen-pal.so.40 => /usr/local/lib/libopen-pal.so.40 (0x00007fea9e76b000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fea9e760000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fea9e744000)
I've tried to change all of the cmake variables using ccmake but I can't get it to link to libmpi_cxx.so.40.
Even without MPI support it includes this link, seems like a bug?
robin#robin-ZenBook-UX533FN:~$ ldd nest-install/lib/python3.8/site-packages/nest/pynestkernel.so
linux-vdso.so.1 (0x00007ffe63518000)
libpython3.8.so.1.0 => /home/robin/.pyenv/versions/3.8.6/lib/libpython3.8.so.1.0 (0x00007fa1e3c37000)
libnest.so => /nest/2.18/lib/libnest.so (0x00007fa1e39f7000)
libmodels.so => /nest/2.18/lib/libmodels.so (0x00007fa1e3166000)
libtopology.so => /nest/2.18/lib/libtopology.so (0x00007fa1e2ec7000)
libnestkernel.so => /nest/2.18/lib/libnestkernel.so (0x00007fa1e2b6b000)
librandom.so => /nest/2.18/lib/librandom.so (0x00007fa1e291f000)
libsli.so => /nest/2.18/lib/libsli.so (0x00007fa1e25d0000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fa1e23d7000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fa1e23bc000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa1e21ca000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fa1e21a7000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fa1e219f000)
libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fa1e219a000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fa1e204b000)
libprecise.so => /nest/2.18/lib/libprecise.so (0x00007fa1e1dc8000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007fa1e1dbd000)
libmpi_cxx.so.20 => not found
libmpi.so.20 => not found
libnestutil.so => /nest/2.18/lib/libnestutil.so (0x00007fa1e1bb7000)
libgsl.so.23 => /usr/lib/x86_64-linux-gnu/libgsl.so.23 (0x00007fa1e193b000)
libgslcblas.so.0 => /usr/lib/x86_64-linux-gnu/libgslcblas.so.0 (0x00007fa1e18f9000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007fa1e18b7000)
/lib64/ld-linux-x86-64.so.2 (0x00007fa1e4040000)
libmpi_cxx.so.20 => not found
libmpi.so.20 => not found
libmpi_cxx.so.20 => not found
libmpi.so.20 => not found
libmpi_cxx.so.20 => not found
libmpi.so.20 => not found
libmpi_cxx.so.20 => not found
libmpi.so.20 => not found
The full error when importing it is:
>>> import nest
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/robin/nest-install/lib/python3.8/site-packages/nest/__init__.py", line 26, in <module>
from . import ll_api # noqa
File "/home/robin/nest-install/lib/python3.8/site-packages/nest/ll_api.py", line 72, in <module>
from . import pynestkernel as kernel # noqa
ImportError: libmpi_cxx.so.20: cannot open shared object file: No such file or directory
Related
map() method works to multiple textfiles in Scala spark intellij
I want to make some operations at text which I read it from multiple textfile, but the map() method takes every file Separately. As example I do: val text = sc.wholeTextFiles("src/folder").map(a => a._2) .flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_) and the result is: (hi , 1) //from the first file (hi , 1) // from the second file I want the result to be: (hi,2) I'm thinking in for loop, but it's does not seem flexibility because I don't know the number of text files
I tried your code in spark-shell and these are my findings: I have 2 files: csv1 -> hi csv2 -> hi hi hi The result was ok after I removed the end lines: val text = sc.wholeTextFiles("testSO/").map(a => a._2).flatMap(line => line.split(" ")).map(line => line.replace("\n","")).map(word => (word,1)).reduceByKey(_+_).foreach(println) Output: (hi,4) This was the result without removing endlines: scala> val text = sc.wholeTextFiles("testSO/").map(a => a._2).flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_).foreach(println) Output: (hi,2) (hi ,2)
Turn a bi-dimensional matrix into perl hash
I have a genotype matrix in which: Rows represent locus Columns represent samples Each value represents a genotype which could be either P1/P1, P2/P2, P1/P2 or NA if the genotype is not determined. I'd like to turn this matrix into a perl HoH in order to get the genotype (specific then to the single sample and its locus). My matrix looks like: CDS BC1-III BC1-IV BC10-II LOC105031928 P1/P2 P1/P2 P1/P2 LOC105031930 NA NA NA LOC105031931 P1/P1 P1/P1 P1/P1 LOC105031933 P1/P1 P1/P1 P1/P1 LOC105031934 NA NA NA LOC105031935 P1/P1 P1/P1 P1/P1 LOC105031937 NA NA NA LOC105031938 P1/P1 P1/P1 P1/P1 As an output, the code should give: $hash{$sample}{$locus} = P1/P1 #(for locus LOC105031935 in sample BC10-II for example) Here's what i've tried to solve the problem but I cant figure out yet how to assign in the hash each locus of the first column as a second key of this hashtable. #sample_names is a list of the three samples. open(GENOTYPE, '<', "$matrix_geno") or die ("Cannot open $matrix_geno\n"); my %hash; while (my $line = <GENOTYPE>) { my #columns = split(/\s+/, $line); #hash{#sample_names} = #columns; #print Dumper \%hash; } Any help will be seriously welcomed. PS: This example is a small part of my data. I'm actually seeking for a more general solution Thank you very much.
Code: #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $matrix_geno = 'input.io'; open ( my $GENOTYPE, '<', "$matrix_geno" ) or die ($!); my $header = <$GENOTYPE>; chomp($header); my #headers = split( /\s+/, $header ); my %hash = (); while ( my $line = <$GENOTYPE> ) { chomp($line); my #columns_data = split( /\s+/, $line ); $hash{$columns_data[0]}{$headers[1]} = $columns_data[1]; $hash{$columns_data[0]}{$headers[2]} = $columns_data[2]; $hash{$columns_data[0]}{$headers[3]} = $columns_data[3]; } print Dumper(\%hash); close($GENOTYPE); OUTPUT: $VAR1 = { 'LOC105031933' => { 'BC1-III' => 'P1/P1', 'BC10-II' => 'P1/P1', 'BC1-IV' => 'P1/P1' }, 'LOC105031934' => { 'BC1-III' => 'NA', 'BC10-II' => 'NA', 'BC1-IV' => 'NA' }, 'LOC105031938' => { 'BC1-IV' => 'P1/P1', 'BC1-III' => 'P1/P1', 'BC10-II' => 'P1/P1' }, 'LOC105031931' => { 'BC10-II' => 'P1/P1', 'BC1-III' => 'P1/P1', 'BC1-IV' => 'P1/P1' }, 'LOC105031937' => { 'BC1-IV' => 'NA', 'BC10-II' => 'NA', 'BC1-III' => 'NA' }, 'LOC105031935' => { 'BC1-III' => 'P1/P1', 'BC10-II' => 'P1/P1', 'BC1-IV' => 'P1/P1' }, 'LOC105031928' => { 'BC1-IV' => 'P1/P2', 'BC10-II' => 'P1/P2', 'BC1-III' => 'P1/P2' }, 'LOC105031930' => { 'BC1-III' => 'NA', 'BC10-II' => 'NA', 'BC1-IV' => 'NA' } }; Is this is the output you wanted? Hope this helps and please change it your need.
This seems to do what you want. I'm reading from DATA for simplicity. #!/usr/bin/perl use strict; use warnings; use feature 'say'; use Data::Dumper; # Read headers chomp(my $headers = <DATA>); my #samples = split /\s+/, $headers; # Remove 'CDS' shift #samples; my %genotype; while (<DATA>) { chomp; my ($locus, #genotypes) = split; for my $x (0 .. $#samples) { $genotype{$samples[$x]}{$locus} = $genotypes[$x]; } } # Display the data structure say Dumper \%genotype; # Simple test say $genotype{'BC10-II'}{LOC105031935}; __DATA__ CDS BC1-III BC1-IV BC10-II LOC105031928 P1/P2 P1/P2 P1/P2 LOC105031930 NA NA NA LOC105031931 P1/P1 P1/P1 P1/P1 LOC105031933 P1/P1 P1/P1 P1/P1 LOC105031934 NA NA NA LOC105031935 P1/P1 P1/P1 P1/P1 LOC105031937 NA NA NA LOC105031938 P1/P1 P1/P1 P1/P1 The output is as follows: $VAR1 = { 'BC10-II' => { 'LOC105031931' => 'P1/P1', 'LOC105031935' => 'P1/P1', 'LOC105031930' => 'NA', 'LOC105031928' => 'P1/P2', 'LOC105031937' => 'NA', 'LOC105031938' => 'P1/P1', 'LOC105031933' => 'P1/P1', 'LOC105031934' => 'NA' }, 'BC1-IV' => { 'LOC105031934' => 'NA', 'LOC105031933' => 'P1/P1', 'LOC105031938' => 'P1/P1', 'LOC105031937' => 'NA', 'LOC105031928' => 'P1/P2', 'LOC105031930' => 'NA', 'LOC105031935' => 'P1/P1', 'LOC105031931' => 'P1/P1' }, 'BC1-III' => { 'LOC105031931' => 'P1/P1', 'LOC105031935' => 'P1/P1', 'LOC105031930' => 'NA', 'LOC105031928' => 'P1/P2', 'LOC105031937' => 'NA', 'LOC105031938' => 'P1/P1', 'LOC105031933' => 'P1/P1', 'LOC105031934' => 'NA' } }; P1/P1
Usage example of scalaz-stream's inflate
In the following usage example of scalaz-stream (taken from the documentation), what do I need to change if the input and/or output is a gzipped file? In other words, how do I use compress? import scalaz.stream._ import scalaz.concurrent.Task val converter: Task[Unit] = io.linesR("testdata/fahrenheit.txt") .filter(s => !s.trim.isEmpty && !s.startsWith("//")) .map(line => fahrenheitToCelsius(line.toDouble).toString) .intersperse("\n") .pipe(text.utf8Encode) .to(io.fileChunkW("testdata/celsius.txt")) .run // at the end of the universe... val u: Unit = converter.run
Compressing the output is easy. Since compress.deflate() is a Process1[ByteVector, ByteVector] you need to plug it into your pipeline where you are emitting ByteVectors (that is right after text.utf8Encode which is a Process1[String, ByteVector]): val converter: Task[Unit] = io.linesR("testdata/fahrenheit.txt") .filter(s => !s.trim.isEmpty && !s.startsWith("//")) .map(line => fahrenheitToCelsius(line.toDouble).toString) .intersperse("\n") .pipe(text.utf8Encode) .pipe(compress.deflate()) .to(io.fileChunkW("testdata/celsius.zip")) .run For inflate you can't use io.linesR to read the compressed file. You need a process that produces ByteVectors instead of Strings in order to pipe them into inflate. (You could use io.fileChunkR for that.) The next step would be decoding the uncompressed data to Strings (with text.utf8Decode for example) and then using text.lines() to emit the text line by line. Something like this should do the trick: val converter: Task[Unit] = Process.constant(4096).toSource .through(io.fileChunkR("testdata/fahrenheit.zip")) .pipe(compress.inflate()) .pipe(text.utf8Decode) .pipe(text.lines()) .filter(s => !s.trim.isEmpty && !s.startsWith("//")) .map(line => fahrenheitToCelsius(line.toDouble).toString) .intersperse("\n") .pipe(text.utf8Encode) .to(io.fileChunkW("testdata/celsius.txt")) .run
Facing error while extending scala class with Product interface to overcome limit of 22 fields in spark-shell
I need to create a class schema to support 29 fields. Due to limit of 22 fields with case class I tried extending my class "sdp_d" with the Product interface as follows: class sdp_d( WID :Option[Int], BATCH_ID :Option[Int], SRC_ID :Option[String], ORG_ID :Option[Int], CLASS_WID :Option[Int], DESC_TEXT :Option[String], PREMISE_WID :Option[Int], FEED_LOC :Option[String], GPS_LAT :Option[Double], GPS_LONG :Option[Double], PULSE_OUTPUT_BLOCK :Option[String], UDC_ID :Option[String], UNIVERSAL_ID :Option[String], IS_VIRTUAL_FLG :Option[String], SEAL_INFO :Option[String], ACCESS_INFO :Option[String], ALT_ACCESS_INFO :Option[String], LOC_INFO :Option[String], ALT_LOC_INFO :Option[String], TYPE :Option[String], SUB_TYPE :Option[String], TIMEZONE_ID :Option[Int], GIS_ID :Option[String], BILLED_UPTO_TIME :Option[java.sql.Timestamp], POWER_STATUS :Option[String], LOAD_STATUS :Option[String], BILLING_HOLD_STATUS :Option[String], INSERT_TIME :Option[java.sql.Timestamp], LAST_UPD_TIME :Option[java.sql.Timestamp]) extends Product{ #throws(classOf[IndexOutOfBoundsException]) override def productElement(n: Int) = n match { case 0 => WID case 1 => BATCH_ID case 2 => SRC_ID case 3 => ORG_ID case 4 => CLASS_WID case 5 => DESC_TEXT case 6 => PREMISE_WID case 7 => FEED_LOC case 8 => GPS_LAT case 9 => GPS_LONG case 10 => PULSE_OUTPUT_BLOCK case 11 => UDC_ID case 12 => UNIVERSAL_ID case 13 => IS_VIRTUAL_FLG case 14 => SEAL_INFO case 15 => ACCESS_INFO case 16 => ALT_ACCESS_INFO case 17 => LOC_INFO case 18 => ALT_LOC_INFO case 19 => TYPE case 20 => SUB_TYPE case 21 => TIMEZONE_ID case 22 => GIS_ID case 23 => BILLED_UPTO_TIME case 24 => POWER_STATUS case 25 => LOAD_STATUS case 26 => BILLING_HOLD_STATUS case 27 => INSERT_TIME case 28 => LAST_UPD_TIME case _ => throw new IndexOutOfBoundsException(n.toString()) } override def productArity: Int = 29 override def canEqual(that: Any): Boolean = that.isInstanceOf[sdp_d] } This defined the class "sdp_d". However when I try to load csv data into with this pre-defined schema and register it as table I get an error: > scala> import java.text.SimpleDateFormat; val sdf = new SimpleDateFormat("yyyy-mm-dd hh:mm:ss.S"); import java.util.Calendar; import java.util.Date; val calendar = Calendar.getInstance() import java.text.SimpleDateFormat sdf: java.text.SimpleDateFormat = java.text.SimpleDateFormat#cce61785 import java.util.Calendar import java.util.Date calendar: java.util.Calendar = java.util.GregorianCalendar[time=1424687963209,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="Asia/Kolkata",offset=19800000,dstSavings=0,useDaylight=false,transitions=6,lastRule=null],firstDayOfWeek=1,minimalDaysInFirstWeek=1,ERA=1,YEAR=2015,MONTH=1,WEEK_OF_YEAR=9,WEEK_OF_MONTH=4,DAY_OF_MONTH=23,DAY_OF_YEAR=54,DAY_OF_WEEK=2,DAY_OF_WEEK_IN_MONTH=4,AM_PM=1,HOUR=4,HOUR_OF_DAY=16,MINUTE=9,SECOND=23,MILLISECOND=209,ZONE_OFFSET=19800000,DST_OFFSET=0] > scala> sc.textFile("hdfs://CDH-Master-1.cdhcluster/user/spark/Sdp_d.csv").map(_.split(",")).map { r => | val upto_time = sdf.parse(r(23).trim); | calendar.setTime(upto_time); | val r23 = new java.sql.Timestamp(upto_time.getTime); | | val insert_time = sdf.parse(r(26).trim); | calendar.setTime(insert_time); | val r26 = new java.sql.Timestamp(insert_time.getTime); | | val last_upd_time = sdf.parse(r(27).trim); | calendar.setTime(last_upd_time); | val r27 = new java.sql.Timestamp(last_upd_time.getTime); | | sdp_d(r(0).trim.toInt, r(1).trim.toInt, r(2).trim, r(3).trim.toInt, r(4).trim.toInt, r(5).trim, r(6).trim.toInt, r(7).trim, r(8).trim.toDouble, r(9).trim.toDouble, r(10).trim, r(11).trim, r(12).trim, r(13).trim, r(14).trim, r(15).trim, r(16).trim, r(17).trim, r(18).trim, r(19).trim, r(20).trim, r(21).trim.toInt, r(22).trim, r23, r(24).trim, r(25).trim, r26, r27, r(28).trim) | }.registerAsTable("sdp") <console>:36: error: not found: value sdp_d sdp_d(r(0).trim.toInt, r(1).trim.toInt, r(2).trim, r(3).trim.toInt, r(4).trim.toInt, r(5).trim, r(6).trim.toInt, r(7).trim, r(8).trim.toDouble, r(9).trim.toDouble, r(10).trim, r(11).trim, r(12).trim, r(13).trim, r(14).trim, r(15).trim, r(16).trim, r(17).trim, r(18).trim, r(19).trim, r(20).trim, r(21).trim.toInt, r(22).trim, r23, r(24).trim, r(25).trim, r26, r27, r(28).trim) ^` I am working in spark-shell. Spark version 1.1.0 and scala version 2.10.4. I don't understand why the error : not found: value sdp_d. How am I supposed to registerAsTable when I create my own class extending Product interface?? Please help in resolving the error.
Did you happen to have a look at https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema
You should just instantiate the class with new: new sdp_d(r(0).trim.toInt, r(1).trim.toInt, ...
You may: Instantiate with the new keyword new sdp_d(...) You declared fields as Option[T], eg Option[Int], so we need to pass Option[T] as parameters (Some or None). new sdp_d(Try(r(0).trim.toInt).toOption, Try(r(1).trim.toInt).toOption, l(2).trim.toOption, ...) This works for me: //AirTraffic.scala class AirTraffic(Year:Option[Int], Month:Option[Int], DayOfMonth:Option[Int], DayOfWeek:Option[Int], DepTime:Option[Int], CRSDepTime:Option[Int], ArrTime:Option[Int], CRSArrTime:Option[Int], UniqueCarrier:String, FlightNum:Option[Int], TailNum:String, ActualElapsedTime:Option[Int], CRSElapsedTime:Option[Int], AirTime:Option[Int], ArrDelay:Option[Int], DepDelay:Option[Int], Origin:String, Dest:String, Distance:Option[Int], TaxiIn:Option[Int], TaxiOut:Option[Int], Cancelled:Option[Boolean], CancellationCode:String, Diverted:Option[Boolean], CarrierDelay:Option[Int], WeatherDelay:Option[Int], NASDelay:Option[Int], SecurityDelay:Option[Int], LateAircraftDelay:Option[Int]) extends Product { // We declare field with Option[T] type to make that field null-able. override def productElement(n: Int): Any = n match { case 0 => Year case 1 => Month case 2 => DayOfMonth case 3 => DayOfWeek case 4 => DepTime case 5 => CRSDepTime case 6 => ArrTime case 7 => CRSArrTime case 8 => UniqueCarrier case 9 => FlightNum case 10 => TailNum case 11 => ActualElapsedTime case 12 => CRSElapsedTime case 13 => AirTime case 14 => ArrDelay case 15 => DepDelay case 16 => Origin case 17 => Dest case 18 => Distance case 19 => TaxiIn case 20 => TaxiOut case 21 => Cancelled case 22 => CancellationCode case 23 => Diverted case 24 => CarrierDelay case 25 => WeatherDelay case 26 => NASDelay case 27 => SecurityDelay case 28 => LateAircraftDelay case _ => throw new IndexOutOfBoundsException(n.toString) } override def productArity: Int = 29 override def canEqual(that: Any): Boolean = that.isInstanceOf[AirTraffic] } //main.scala val data = sparkContext.textFile("local-input/AIRLINE/2008.csv").map(_.split(",")) .map(l => new AirTraffic(Try(l(0).trim.toInt).toOption, Try(l(1).trim.toInt).toOption, Try(l(2).trim.toInt).toOption, Try(l(3).trim.toInt).toOption, Try(l(4).trim.toInt).toOption, Try(l(5).trim.toInt).toOption, Try(l(6).trim.toInt).toOption, Try(l(7).trim.toInt).toOption, l(8).trim, Try(l(9).trim.toInt).toOption, l(10).trim, Try(l(11).trim.toInt).toOption, Try(l(12).trim.toInt).toOption, Try(l(13).trim.toInt).toOption, Try(l(14).trim.toInt).toOption, Try(l(15).trim.toInt).toOption, l(16).trim, l(17).trim, Try(l(18).trim.toInt).toOption, Try(l(19).trim.toInt).toOption, Try(l(20).trim.toInt).toOption, Try(l(21).trim.toBoolean).toOption, l(22).trim, Try(l(23).trim.toBoolean).toOption, Try(l(24).trim.toInt).toOption, Try(l(25).trim.toInt).toOption, Try(l(26).trim.toInt).toOption, Try(l(27).trim.toInt).toOption, Try(l(28).trim.toInt).toOption)).toDF() // register table with SQLContext data.registerTempTable("AirTraffic") val count = sqlContext.sql("SELECT COUNT(*) FROM AirTraffic").collect() count.foreach(print) If you think it's still ugly, we can do more by: implicit class StringConverter(val s: String) extends AnyVal { def tryGetInt = Try(s.trim.toInt).toOption def tryGetString = { val res = s.trim if (res.isEmpty) None else res } def tryGetBoolean = Try(s.trim.toBoolean).toOption } then val data = sparkContext.textFile("local-input/AIRLINE/2008.csv").map(_.split(",")) .map(l => new AirTraffic(l(0).tryGetInt, l(1).tryGetInt, l(2).tryGetInt, l(3).tryGetInt, l(4).tryGetInt, l(5).tryGetInt, l(6).tryGetInt, l(7).tryGetInt, l(8).trim, l(9).tryGetInt, l(10).trim, l(11).tryGetInt, l(12).tryGetInt, l(13).tryGetInt, l(14).tryGetInt, l(15).tryGetInt, l(16).trim, l(17).trim, l(18).tryGetInt, l(19).tryGetInt, l(20).tryGetInt, l(21).tryGetBoolean, l(22).trim, l(23).tryGetBoolean, l(24).tryGetInt, l(25).tryGetInt, l(26).tryGetInt, l(27).tryGetInt, l(28).tryGetInt)).toDF()
clapack.so: undefined symbol: clapack_sgesv on RHEL
I'm getting this error when importing scipy.stats: import scipy.stats Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib64/python2.6/site-packages/scipy/stats/__init__.py", line 322, in <module> from stats import * File "/usr/lib64/python2.6/site-packages/scipy/stats/stats.py", line 194, in <module> import scipy.linalg as linalg File "/usr/lib64/python2.6/site-packages/scipy/linalg/__init__.py", line 116, in <module> from basic import * File "/usr/lib64/python2.6/site-packages/scipy/linalg/basic.py", line 12, in <module> from lapack import get_lapack_funcs File "/usr/lib64/python2.6/site-packages/scipy/linalg/lapack.py", line 15, in <module> from scipy.linalg import clapack ImportError: /usr/lib64/python2.6/site-packages/scipy/linalg/clapack.so: undefined symbol: clapack_sgesv Looks like clapack.so links to the full, ATLAS version of libatlas: ldd /usr/lib64/python2.6/site-packages/scipy/linalg/clapack.so linux-vdso.so.1 => (0x00007fff232e6000) liblapack.so.3 => /usr/lib64/liblapack.so.3 (0x00007f23b8ad7000) libptf77blas.so.3 => /usr/lib64/atlas/libptf77blas.so.3 (0x00007f23b88b7000) libptcblas.so.3 => /usr/lib64/atlas/libptcblas.so.3 (0x00007f23b8697000) libatlas.so.3 => /usr/lib64/atlas/libatlas.so.3 (0x00007f23b8120000) libpython2.6.so.1.0 => /usr/lib64/libpython2.6.so.1.0 (0x00007f23b7d65000) libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007f23b7a73000) libm.so.6 => /lib64/libm.so.6 (0x00007f23b77da000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f23b75c3000) libc.so.6 => /lib64/libc.so.6 (0x00007f23b7232000) libblas.so.3 => /usr/lib64/libblas.so.3 (0x00007f23b6fdb000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f23b6dbd000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f23b6bb9000) libutil.so.1 => /lib64/libutil.so.1 (0x00007f23b69b6000) /lib64/ld-linux-x86-64.so.2 (0x00000032a2200000) Any ideas?