-
Notifications
You must be signed in to change notification settings - Fork 44
ValueError: RDD is empty #320
Description
I have installed the module as suggested and run the command:
srcml preprocrepos -m 50G,50G,50G -r siva --output ./test
Where siva is the directory, containing all the siva files. The memory parameters do not change anything.
My spark is very old (1.3) - could it be the reason? Is it runnable in pyspark (the latest one)?
_/usr/local/lib64/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
INFO:spark:Starting preprocess_repos-424fe007-f0db-48b7-863b-5a5b90ce5f63 on local[*]
Ivy Default Cache set to: /home/b7066789/.ivy2/cache
The jars for the packages stored in: /home/b7066789/.ivy2/jars
:: loading settings :: url = jar:file:/home/b7066789/.local/lib/python3.6/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
tech.sourced#engine added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found tech.sourced#engine;0.6.4 in central
found io.netty#netty-all;4.1.17.Final in central
found org.eclipse.jgit#org.eclipse.jgit;4.9.0.201710071750-r in central
found com.jcraft#jsch;0.1.54 in central
found com.googlecode.javaewah#JavaEWAH;1.1.6 in central
found org.apache.httpcomponents#httpclient;4.3.6 in central
found org.apache.httpcomponents#httpcore;4.3.3 in central
found commons-logging#commons-logging;1.1.3 in central
found commons-codec#commons-codec;1.6 in central
found org.slf4j#slf4j-api;1.7.2 in central
found tech.sourced#siva-java;0.1.3 in central
found org.bblfsh#bblfsh-client;1.8.2 in central
found com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 in central
found com.thesamet.scalapb#lenses_2.11;0.7.0-test2 in central
found com.lihaoyi#fastparse_2.11;1.0.0 in central
found com.lihaoyi#fastparse-utils_2.11;1.0.0 in central
found com.lihaoyi#sourcecode_2.11;0.1.4 in central
found com.google.protobuf#protobuf-java;3.5.0 in central
found commons-io#commons-io;2.5 in central
found io.grpc#grpc-netty;1.10.0 in central
found io.grpc#grpc-core;1.10.0 in central
found io.grpc#grpc-context;1.10.0 in central
found com.google.code.gson#gson;2.7 in central
found com.google.guava#guava;19.0 in central
found com.google.errorprone#error_prone_annotations;2.1.2 in central
found com.google.code.findbugs#jsr305;3.0.0 in central
found io.opencensus#opencensus-api;0.11.0 in central
found io.opencensus#opencensus-contrib-grpc-metrics;0.11.0 in central
found io.netty#netty-codec-http2;4.1.17.Final in central
found io.netty#netty-codec-http;4.1.17.Final in central
found io.netty#netty-codec;4.1.17.Final in central
found io.netty#netty-transport;4.1.17.Final in central
found io.netty#netty-buffer;4.1.17.Final in central
found io.netty#netty-common;4.1.17.Final in central
found io.netty#netty-resolver;4.1.17.Final in central
found io.netty#netty-handler;4.1.17.Final in central
found io.netty#netty-handler-proxy;4.1.17.Final in central
found io.netty#netty-codec-socks;4.1.17.Final in central
found com.thesamet.scalapb#scalapb-runtime-grpc_2.11;0.7.1 in central
found io.grpc#grpc-stub;1.10.0 in central
found io.grpc#grpc-protobuf;1.10.0 in central
found com.google.protobuf#protobuf-java;3.5.1 in central
found com.google.protobuf#protobuf-java-util;3.5.1 in central
found com.google.api.grpc#proto-google-common-protos;1.0.0 in central
found io.grpc#grpc-protobuf-lite;1.10.0 in central
found org.rogach#scallop_2.11;3.0.3 in central
found org.apache.commons#commons-pool2;2.4.3 in central
found tech.sourced#enry-java;1.6.3 in central
found org.xerial#sqlite-jdbc;3.21.0 in central
found com.groupon.dse#spark-metrics;2.0.0 in central
found io.dropwizard.metrics#metrics-core;3.1.2 in central
:: resolution report :: resolve 1148ms :: artifacts dl 44ms
:: modules in use:
com.google.api.grpc#proto-google-common-protos;1.0.0 from central in [default]
com.google.code.findbugs#jsr305;3.0.0 from central in [default]
com.google.code.gson#gson;2.7 from central in [default]
com.google.errorprone#error_prone_annotations;2.1.2 from central in [default]
com.google.guava#guava;19.0 from central in [default]
com.google.protobuf#protobuf-java;3.5.1 from central in [default]
com.google.protobuf#protobuf-java-util;3.5.1 from central in [default]
com.googlecode.javaewah#JavaEWAH;1.1.6 from central in [default]
com.groupon.dse#spark-metrics;2.0.0 from central in [default]
com.jcraft#jsch;0.1.54 from central in [default]
com.lihaoyi#fastparse-utils_2.11;1.0.0 from central in [default]
com.lihaoyi#fastparse_2.11;1.0.0 from central in [default]
com.lihaoyi#sourcecode_2.11;0.1.4 from central in [default]
com.thesamet.scalapb#lenses_2.11;0.7.0-test2 from central in [default]
com.thesamet.scalapb#scalapb-runtime-grpc_2.11;0.7.1 from central in [default]
com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 from central in [default]
commons-codec#commons-codec;1.6 from central in [default]
commons-io#commons-io;2.5 from central in [default]
commons-logging#commons-logging;1.1.3 from central in [default]
io.dropwizard.metrics#metrics-core;3.1.2 from central in [default]
io.grpc#grpc-context;1.10.0 from central in [default]
io.grpc#grpc-core;1.10.0 from central in [default]
io.grpc#grpc-netty;1.10.0 from central in [default]
io.grpc#grpc-protobuf;1.10.0 from central in [default]
io.grpc#grpc-protobuf-lite;1.10.0 from central in [default]
io.grpc#grpc-stub;1.10.0 from central in [default]
io.netty#netty-all;4.1.17.Final from central in [default]
io.netty#netty-buffer;4.1.17.Final from central in [default]
io.netty#netty-codec;4.1.17.Final from central in [default]
io.netty#netty-codec-http;4.1.17.Final from central in [default]
io.netty#netty-codec-http2;4.1.17.Final from central in [default]
io.netty#netty-codec-socks;4.1.17.Final from central in [default]
io.netty#netty-common;4.1.17.Final from central in [default]
io.netty#netty-handler;4.1.17.Final from central in [default]
io.netty#netty-handler-proxy;4.1.17.Final from central in [default]
io.netty#netty-resolver;4.1.17.Final from central in [default]
io.netty#netty-transport;4.1.17.Final from central in [default]
io.opencensus#opencensus-api;0.11.0 from central in [default]
io.opencensus#opencensus-contrib-grpc-metrics;0.11.0 from central in [default]
org.apache.commons#commons-pool2;2.4.3 from central in [default]
org.apache.httpcomponents#httpclient;4.3.6 from central in [default]
org.apache.httpcomponents#httpcore;4.3.3 from central in [default]
org.bblfsh#bblfsh-client;1.8.2 from central in [default]
org.eclipse.jgit#org.eclipse.jgit;4.9.0.201710071750-r from central in [default]
org.rogach#scallop_2.11;3.0.3 from central in [default]
org.slf4j#slf4j-api;1.7.2 from central in [default]
org.xerial#sqlite-jdbc;3.21.0 from central in [default]
tech.sourced#engine;0.6.4 from central in [default]
tech.sourced#enry-java;1.6.3 from central in [default]
tech.sourced#siva-java;0.1.3 from central in [default]
:: evicted modules:
com.google.protobuf#protobuf-java;3.5.0 by [com.google.protobuf#protobuf-java;3.5.1] in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 51 | 0 | 0 | 1 || 50 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 50 already retrieved (0kB/18ms)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/10/03 15:50:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/10/03 15:50:55 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
18/10/03 15:50:58 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
INFO:engine:Initializing engine on siva
INFO:ParquetSaver:Ignition -> DzhigurdaFiles -> UastExtractor -> Moder -> FieldsSelector -> ParquetSaver
Traceback (most recent call last):
File "/home/b7066789/.local/bin/srcml", line 11, in
sys.exit(main())
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/main.py", line 354, in main
return handler(args)
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/utils/engine.py", line 87, in wrapped_pause
return func(cmdline_args, *args, **kwargs)
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/cmd/preprocess_repos.py", line 24, in preprocess_repos
.link(ParquetSaver(save_loc=args.output))
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/transformers/transformer.py", line 114, in execute
head = node(head)
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/transformers/basic.py", line 292, in call
rdd.toDF().write.parquet(self.save_loc)
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/sql/session.py", line 58, in toDF
return sparkSession.createDataFrame(self, schema, sampleRatio)
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/sql/session.py", line 582, in createDataFrame
rdd, schema = self._createFromRDD(data.map(prepare), schema, samplingRatio)
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/sql/session.py", line 380, in _createFromRDD
struct = self._inferSchema(rdd, samplingRatio)
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/sql/session.py", line 351, in inferSchema
first = rdd.first()
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/rdd.py", line 1364, in first
raise ValueError("RDD is empty")
ValueError: RDD is empty