Skip to content
This repository was archived by the owner on May 22, 2019. It is now read-only.
This repository was archived by the owner on May 22, 2019. It is now read-only.

ValueError: RDD is empty #320

@sakalouski

Description

@sakalouski

I have installed the module as suggested and run the command:
srcml preprocrepos -m 50G,50G,50G -r siva --output ./test
Where siva is the directory, containing all the siva files. The memory parameters do not change anything.
My spark is very old (1.3) - could it be the reason? Is it runnable in pyspark (the latest one)?

_/usr/local/lib64/python3.6/site-packages/h5py/init.py:36: FutureWarning: Conversion of the second argument of issubdtype from float to np.floating is deprecated. In future, it will be treated as np.float64 == np.dtype(float).type.
from ._conv import register_converters as _register_converters
INFO:spark:Starting preprocess_repos-424fe007-f0db-48b7-863b-5a5b90ce5f63 on local[*]
Ivy Default Cache set to: /home/b7066789/.ivy2/cache
The jars for the packages stored in: /home/b7066789/.ivy2/jars
:: loading settings :: url = jar:file:/home/b7066789/.local/lib/python3.6/site-packages/pyspark/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
tech.sourced#engine added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
confs: [default]
found tech.sourced#engine;0.6.4 in central
found io.netty#netty-all;4.1.17.Final in central
found org.eclipse.jgit#org.eclipse.jgit;4.9.0.201710071750-r in central
found com.jcraft#jsch;0.1.54 in central
found com.googlecode.javaewah#JavaEWAH;1.1.6 in central
found org.apache.httpcomponents#httpclient;4.3.6 in central
found org.apache.httpcomponents#httpcore;4.3.3 in central
found commons-logging#commons-logging;1.1.3 in central
found commons-codec#commons-codec;1.6 in central
found org.slf4j#slf4j-api;1.7.2 in central
found tech.sourced#siva-java;0.1.3 in central
found org.bblfsh#bblfsh-client;1.8.2 in central
found com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 in central
found com.thesamet.scalapb#lenses_2.11;0.7.0-test2 in central
found com.lihaoyi#fastparse_2.11;1.0.0 in central
found com.lihaoyi#fastparse-utils_2.11;1.0.0 in central
found com.lihaoyi#sourcecode_2.11;0.1.4 in central
found com.google.protobuf#protobuf-java;3.5.0 in central
found commons-io#commons-io;2.5 in central
found io.grpc#grpc-netty;1.10.0 in central
found io.grpc#grpc-core;1.10.0 in central
found io.grpc#grpc-context;1.10.0 in central
found com.google.code.gson#gson;2.7 in central
found com.google.guava#guava;19.0 in central
found com.google.errorprone#error_prone_annotations;2.1.2 in central
found com.google.code.findbugs#jsr305;3.0.0 in central
found io.opencensus#opencensus-api;0.11.0 in central
found io.opencensus#opencensus-contrib-grpc-metrics;0.11.0 in central
found io.netty#netty-codec-http2;4.1.17.Final in central
found io.netty#netty-codec-http;4.1.17.Final in central
found io.netty#netty-codec;4.1.17.Final in central
found io.netty#netty-transport;4.1.17.Final in central
found io.netty#netty-buffer;4.1.17.Final in central
found io.netty#netty-common;4.1.17.Final in central
found io.netty#netty-resolver;4.1.17.Final in central
found io.netty#netty-handler;4.1.17.Final in central
found io.netty#netty-handler-proxy;4.1.17.Final in central
found io.netty#netty-codec-socks;4.1.17.Final in central
found com.thesamet.scalapb#scalapb-runtime-grpc_2.11;0.7.1 in central
found io.grpc#grpc-stub;1.10.0 in central
found io.grpc#grpc-protobuf;1.10.0 in central
found com.google.protobuf#protobuf-java;3.5.1 in central
found com.google.protobuf#protobuf-java-util;3.5.1 in central
found com.google.api.grpc#proto-google-common-protos;1.0.0 in central
found io.grpc#grpc-protobuf-lite;1.10.0 in central
found org.rogach#scallop_2.11;3.0.3 in central
found org.apache.commons#commons-pool2;2.4.3 in central
found tech.sourced#enry-java;1.6.3 in central
found org.xerial#sqlite-jdbc;3.21.0 in central
found com.groupon.dse#spark-metrics;2.0.0 in central
found io.dropwizard.metrics#metrics-core;3.1.2 in central
:: resolution report :: resolve 1148ms :: artifacts dl 44ms
:: modules in use:
com.google.api.grpc#proto-google-common-protos;1.0.0 from central in [default]
com.google.code.findbugs#jsr305;3.0.0 from central in [default]
com.google.code.gson#gson;2.7 from central in [default]
com.google.errorprone#error_prone_annotations;2.1.2 from central in [default]
com.google.guava#guava;19.0 from central in [default]
com.google.protobuf#protobuf-java;3.5.1 from central in [default]
com.google.protobuf#protobuf-java-util;3.5.1 from central in [default]
com.googlecode.javaewah#JavaEWAH;1.1.6 from central in [default]
com.groupon.dse#spark-metrics;2.0.0 from central in [default]
com.jcraft#jsch;0.1.54 from central in [default]
com.lihaoyi#fastparse-utils_2.11;1.0.0 from central in [default]
com.lihaoyi#fastparse_2.11;1.0.0 from central in [default]
com.lihaoyi#sourcecode_2.11;0.1.4 from central in [default]
com.thesamet.scalapb#lenses_2.11;0.7.0-test2 from central in [default]
com.thesamet.scalapb#scalapb-runtime-grpc_2.11;0.7.1 from central in [default]
com.thesamet.scalapb#scalapb-runtime_2.11;0.7.1 from central in [default]
commons-codec#commons-codec;1.6 from central in [default]
commons-io#commons-io;2.5 from central in [default]
commons-logging#commons-logging;1.1.3 from central in [default]
io.dropwizard.metrics#metrics-core;3.1.2 from central in [default]
io.grpc#grpc-context;1.10.0 from central in [default]
io.grpc#grpc-core;1.10.0 from central in [default]
io.grpc#grpc-netty;1.10.0 from central in [default]
io.grpc#grpc-protobuf;1.10.0 from central in [default]
io.grpc#grpc-protobuf-lite;1.10.0 from central in [default]
io.grpc#grpc-stub;1.10.0 from central in [default]
io.netty#netty-all;4.1.17.Final from central in [default]
io.netty#netty-buffer;4.1.17.Final from central in [default]
io.netty#netty-codec;4.1.17.Final from central in [default]
io.netty#netty-codec-http;4.1.17.Final from central in [default]
io.netty#netty-codec-http2;4.1.17.Final from central in [default]
io.netty#netty-codec-socks;4.1.17.Final from central in [default]
io.netty#netty-common;4.1.17.Final from central in [default]
io.netty#netty-handler;4.1.17.Final from central in [default]
io.netty#netty-handler-proxy;4.1.17.Final from central in [default]
io.netty#netty-resolver;4.1.17.Final from central in [default]
io.netty#netty-transport;4.1.17.Final from central in [default]
io.opencensus#opencensus-api;0.11.0 from central in [default]
io.opencensus#opencensus-contrib-grpc-metrics;0.11.0 from central in [default]
org.apache.commons#commons-pool2;2.4.3 from central in [default]
org.apache.httpcomponents#httpclient;4.3.6 from central in [default]
org.apache.httpcomponents#httpcore;4.3.3 from central in [default]
org.bblfsh#bblfsh-client;1.8.2 from central in [default]
org.eclipse.jgit#org.eclipse.jgit;4.9.0.201710071750-r from central in [default]
org.rogach#scallop_2.11;3.0.3 from central in [default]
org.slf4j#slf4j-api;1.7.2 from central in [default]
org.xerial#sqlite-jdbc;3.21.0 from central in [default]
tech.sourced#engine;0.6.4 from central in [default]
tech.sourced#enry-java;1.6.3 from central in [default]
tech.sourced#siva-java;0.1.3 from central in [default]
:: evicted modules:
com.google.protobuf#protobuf-java;3.5.0 by [com.google.protobuf#protobuf-java;3.5.1] in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 51 | 0 | 0 | 1 || 50 | 0 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
confs: [default]
0 artifacts copied, 50 already retrieved (0kB/18ms)
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/10/03 15:50:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/10/03 15:50:55 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
18/10/03 15:50:58 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
INFO:engine:Initializing engine on siva
INFO:ParquetSaver:Ignition -> DzhigurdaFiles -> UastExtractor -> Moder -> FieldsSelector -> ParquetSaver
Traceback (most recent call last):
File "/home/b7066789/.local/bin/srcml", line 11, in
sys.exit(main())
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/main.py", line 354, in main
return handler(args)
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/utils/engine.py", line 87, in wrapped_pause
return func(cmdline_args, *args, **kwargs)
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/cmd/preprocess_repos.py", line 24, in preprocess_repos
.link(ParquetSaver(save_loc=args.output))
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/transformers/transformer.py", line 114, in execute
head = node(head)
File "/home/b7066789/.local/lib/python3.6/site-packages/sourced/ml/transformers/basic.py", line 292, in call
rdd.toDF().write.parquet(self.save_loc)
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/sql/session.py", line 58, in toDF
return sparkSession.createDataFrame(self, schema, sampleRatio)
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/sql/session.py", line 582, in createDataFrame
rdd, schema = self._createFromRDD(data.map(prepare), schema, samplingRatio)
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/sql/session.py", line 380, in _createFromRDD
struct = self._inferSchema(rdd, samplingRatio)
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/sql/session.py", line 351, in inferSchema
first = rdd.first()
File "/home/b7066789/.local/lib/python3.6/site-packages/pyspark/rdd.py", line 1364, in first
raise ValueError("RDD is empty")
ValueError: RDD is empty

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions