Skip to content

Is current megablocks compatible with distributed optimizer in Megatron-LM? #160

@Spico197

Description

@Spico197

Hi there, thanks for the amazing work! I found expert parallel is not compatible with the distributed optimizer in the fork version of Megatron-LM here:

https://github.com/stanford-futuredata/Megatron-LM/blob/85f95aef3b648075fe6f291c86714fdcbd9cd1f5/megatron/arguments.py#L352-L356

But there's no such validation in the open PR to Megatron-LM: NVIDIA/Megatron-LM#288

Does that mean the assertion is redundant and the current version of megablocks is compatible with the distributed optimizer under expert parallelism?

Thanks very much.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions