Skip to content

ZMQ recv_json_with_retry returns None on exhausted retries, but callers treat return value as a valid list — potential TypeError crash #393

@GaneshPatil7517

Description

@GaneshPatil7517

Problem Description

The recv_json_with_retry() method returns None after 5 failed receive attempts (concore.py line 84). The caller in read() then attempts to check if isinstance(message, list) and len(message) > 0 — this works for None (the isinstance check fails), but the fallback path return message (line 308) returns None to the user's control loop, which typically expects a list.

Technical Analysis

ZMQ receive (concore.py, recv_json_with_retry, line 84):

logging.error("Failed to receive after retries.")
return None

Caller in read() (concore.py, lines 299–308):

message = zmq_p.recv_json_with_retry()
if isinstance(message, list) and len(message) > 0:
    first_element = message[0]
    if isinstance(first_element, (int, float)):
        simtime = max(simtime, first_element)
        return message[1:]
return message   # <-- returns None if all retries failed

A downstream controller calling ym = concore.read(...) and then u = controller(ym) will pass None into a numerical computation, causing a TypeError crash. The ZMQ error except blocks (lines 309–313) only catch zmq.error.ZMQError and Exception — they don't catch the None-return path because no exception is raised.

The send_json_with_retry has a similar issue: it returns None on failure (implicit return), but the caller doesn't check whether the send actually succeeded.

Proposed Improvement

def recv_json_with_retry(self):
    for attempt in range(5):
        try:
            return self.socket.recv_json()
        except zmq.Again:
            logging.warning(f"Receive timeout (attempt {attempt + 1}/5)")
            time.sleep(0.5)
    raise TimeoutError(f"ZMQ recv failed after 5 retries on {self.address}")

Change the method to raise an exception on exhaustion instead of returning None. The existing except blocks in read() will then catch it and return default_return_val, which is the correct behavior.

Impact

  • Reliability: None propagation causes crash in user code unrecoverable at runtime
  • Julia impl: Must decide on error signaling strategy (exceptions vs Nothing vs Result type)
  • Cross-language: Only affects Python ZMQ path; C++ has no ZMQ support yet

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions