MinerU在300I Duo机器上使用http-client/server方式调用报错：RuntimeError: ACL stream synchronize failed, error code:507013

一、问题现象（附报错日志上下文）：
执行命令：

ASCEND_VISIBLE_DEVICES=7  mineru-vllm-server --port   7860    --enforce-eager

报错：
INFO 11-13 09:11:44 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 11-13 09:11:44 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 11-13 09:11:44 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 11-13 09:11:44 [__init__.py:207] Platform plugin ascend is activated
WARNING 11-13 09:11:49 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
2025-11-13 09:11:49.982 | INFO     | mineru.backend.vlm.utils:enable_custom_logits_processors:15 - CUDA not available, disabling custom_logits_processors
start vllm server: ['/usr/local/python3.11.13/bin/mineru-vllm-server', 'serve', '/root/.cache/modelscope/hub/models/OpenDataLab/MinerU2.5-2509-1.2B', '--port', '7860', '--enforce-eager', '--gpu-memory-utilization', '0.5']
INFO 11-13 09:11:50 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available.
WARNING 11-13 09:11:50 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
WARNING 11-13 09:11:50 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration.
WARNING 11-13 09:11:50 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration.
WARNING 11-13 09:11:50 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
WARNING 11-13 09:11:50 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
WARNING 11-13 09:11:50 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
WARNING 11-13 09:11:50 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
WARNING 11-13 09:11:50 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
WARNING 11-13 09:11:50 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM.
(APIServer pid=31822) INFO 11-13 09:11:50 [api_server.py:1839] vLLM API server version 0.11.0rc3
(APIServer pid=31822) INFO 11-13 09:11:50 [utils.py:233] non-default args: {'model_tag': '/root/.cache/modelscope/hub/models/OpenDataLab/MinerU2.5-2509-1.2B', 'port': 7860, 'model': '/root/.cache/modelscope/hub/models/OpenDataLab/MinerU2.5-2509-1.2B', 'enforce_eager': True, 'gpu_memory_utilization': 0.5}
(APIServer pid=31822) INFO 11-13 09:11:50 [model.py:547] Resolved architecture: Qwen2VLForConditionalGeneration
(APIServer pid=31822) `torch_dtype` is deprecated! Use `dtype` instead!
(APIServer pid=31822) INFO 11-13 09:11:50 [model.py:1510] Using max model len 16384
(APIServer pid=31822) INFO 11-13 09:11:50 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=2048.
(APIServer pid=31822) INFO 11-13 09:11:50 [__init__.py:381] Cudagraph is disabled under eager mode
(APIServer pid=31822) INFO 11-13 09:11:50 [platform.py:141] Non-MLA LLMs forcibly disable the chunked prefill feature,as the performance of operators supporting this feature functionality is currently suboptimal.
(APIServer pid=31822) INFO 11-13 09:11:50 [platform.py:179] Compilation disabled, using eager mode by default
INFO 11-13 09:11:59 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 11-13 09:11:59 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 11-13 09:11:59 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 11-13 09:11:59 [__init__.py:207] Platform plugin ascend is activated
WARNING 11-13 09:12:04 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
(EngineCore_DP0 pid=31961) INFO 11-13 09:12:05 [core.py:644] Waiting for init message from front-end.
(EngineCore_DP0 pid=31961) INFO 11-13 09:12:05 [importing.py:63] Triton not installed or not compatible; certain GPU-related functions will not be available.
(EngineCore_DP0 pid=31961) WARNING 11-13 09:12:05 [registry.py:581] Model architecture Qwen2VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_vl:AscendQwen2VLForConditionalGeneration.
(EngineCore_DP0 pid=31961) WARNING 11-13 09:12:05 [registry.py:581] Model architecture Qwen3VLMoeForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLMoeForConditionalGeneration.
(EngineCore_DP0 pid=31961) WARNING 11-13 09:12:05 [registry.py:581] Model architecture Qwen3VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl_without_padding:AscendQwen3VLForConditionalGeneration.
(EngineCore_DP0 pid=31961) WARNING 11-13 09:12:05 [registry.py:581] Model architecture Qwen2_5_VLForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen2_5_vl:AscendQwen2_5_VLForConditionalGeneration.
(EngineCore_DP0 pid=31961) WARNING 11-13 09:12:05 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
(EngineCore_DP0 pid=31961) WARNING 11-13 09:12:05 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
(EngineCore_DP0 pid=31961) WARNING 11-13 09:12:05 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
(EngineCore_DP0 pid=31961) WARNING 11-13 09:12:05 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
(EngineCore_DP0 pid=31961) WARNING 11-13 09:12:05 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM.
(EngineCore_DP0 pid=31961) INFO 11-13 09:12:05 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc3) with config: model='/root/.cache/modelscope/hub/models/OpenDataLab/MinerU2.5-2509-1.2B', speculative_config=None, tokenizer='/root/.cache/modelscope/hub/models/OpenDataLab/MinerU2.5-2509-1.2B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=16384, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=npu, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/root/.cache/modelscope/hub/models/OpenDataLab/MinerU2.5-2509-1.2B, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":null,"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":0,"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":0,"local_cache_dir":null}
INFO 11-13 09:12:18 [__init__.py:36] Available plugins for group vllm.platform_plugins:
INFO 11-13 09:12:18 [__init__.py:38] - ascend -> vllm_ascend:register
INFO 11-13 09:12:18 [__init__.py:41] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 11-13 09:12:18 [__init__.py:207] Platform plugin ascend is activated
WARNING 11-13 09:12:23 [_custom_ops.py:20] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
(EngineCore_DP0 pid=31961) INFO 11-13 09:12:26 [parallel_state.py:1208] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=31961) INFO 11-13 09:12:29 [model_runner_v1.py:2627] Starting to load model /root/.cache/modelscope/hub/models/OpenDataLab/MinerU2.5-2509-1.2B...
(EngineCore_DP0 pid=31961) INFO 11-13 09:12:30 [__init__.py:381] Cudagraph is disabled under eager mode
(EngineCore_DP0 pid=31961) INFO 11-13 09:12:30 [platform.py:141] Non-MLA LLMs forcibly disable the chunked prefill feature,as the performance of operators supporting this feature functionality is currently suboptimal.
(EngineCore_DP0 pid=31961) INFO 11-13 09:12:30 [platform.py:179] Compilation disabled, using eager mode by default
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:03<00:00,  3.82s/it]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:03<00:00,  3.82s/it]
(EngineCore_DP0 pid=31961) 
(EngineCore_DP0 pid=31961) INFO 11-13 09:12:34 [default_loader.py:267] Loading weights took 3.94 seconds
(EngineCore_DP0 pid=31961) INFO 11-13 09:12:35 [model_runner_v1.py:2661] Loading model weights took 2.4358 GB
(EngineCore_DP0 pid=31961) INFO 11-13 09:12:38 [worker_v1.py:234] Available memory: 9677338112, total memory: 46431260672
(EngineCore_DP0 pid=31961) INFO 11-13 09:12:38 [kv_cache_utils.py:1087] GPU KV cache size: 787,456 tokens
(EngineCore_DP0 pid=31961) INFO 11-13 09:12:38 [kv_cache_utils.py:1091] Maximum concurrency for 16,384 tokens per request: 48.06x
[rank0]:[W1113 09:12:38.529853322 compiler_depend.ts:62] Warning: Cannot create tensor with NZ format while dim < 2, tensor will be created with ND format. (function operator())
[rank0]:[W1113 09:12:38.797847485 compiler_depend.ts:57] Warning: EI9999: Inner Error!
        The error from device(0), serial number is 5. there is a sdma error, sdma channel is 0, the channel exist the following problems: The SMMU returns a Terminate error during page table translation.. the value of CQE status is 2. the description of CQE status: When the SQE translates a page table, the SMMU returns a Terminate error.it's config include: setting1=0xc000000880e0000, setting2=0xff009000ff004c, setting3=0, sq base addr=0x800d00001004c000[FUNC:ProcessSdmaErrorInfo][FILE:device_error_proc.cc][LINE:779]
EI9999: [PID: 31961] 2025-11-13-09:12:38.658.163 Memory async copy failed, device_id=0, stream_id=53, task_id=2565, flip_num=0, copy_type=2, memcpy_type=0, copy_data_type=0, length=403177472[FUNC:GetError][FILE:stream.cc][LINE:1183]
        TraceBack (most recent call last):
        rtStreamSynchronize execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
       synchronize stream failed, runtime result = 507013[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]

DEVICE[0] PID[31961]: 
EXCEPTION STREAM:
  Exception info:TGID=3353395, model id=65535, stream id=53, stream phase=SCHEDULE
  Message info[0]:RTS_HWTS: hwts sdma error, slot_id=17, stream_id=53
    Other info[0]:time=2025-11-13-09:12:37.478.650, function=int_process_hwts_sdma_error, line=1381, error code=0x20b (function copy_between_host_and_device_opapi)
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]     self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]     self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]   File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 75, in initialize_from_config
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]     self.collective_rpc("compile_or_warm_up_model")
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]   File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]   File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 3121, in run_method
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]     return func(*args, **kwargs)
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 310, in compile_or_warm_up_model
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]     self._warm_up_atb()
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 316, in _warm_up_atb
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]     x = torch.rand((2, 4), dtype=torch.float16).npu()
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/backend_registration.py", line 148, in wrap_tensor_to
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]     return self.to(device=torch.device(f'{custom_backend_name}:{device_idx}'), non_blocking=non_blocking, **kwargs)
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31961) ERROR 11-13 09:12:38 [core.py:708] RuntimeError: ACL stream synchronize failed, error code:507013
(EngineCore_DP0 pid=31961) Process EngineCore_DP0:
(EngineCore_DP0 pid=31961) Traceback (most recent call last):
(EngineCore_DP0 pid=31961)   File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=31961)     self.run()
(EngineCore_DP0 pid=31961)   File "/usr/local/python3.11.13/lib/python3.11/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=31961)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=31961)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=31961)     raise e
(EngineCore_DP0 pid=31961)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=31961)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=31961)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31961)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=31961)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=31961)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 92, in __init__
(EngineCore_DP0 pid=31961)     self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=31961)   File "/vllm-workspace/vllm/vllm/v1/engine/core.py", line 207, in _initialize_kv_caches
(EngineCore_DP0 pid=31961)     self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore_DP0 pid=31961)   File "/vllm-workspace/vllm/vllm/v1/executor/abstract.py", line 75, in initialize_from_config
(EngineCore_DP0 pid=31961)     self.collective_rpc("compile_or_warm_up_model")
(EngineCore_DP0 pid=31961)   File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=31961)     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=31961)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31961)   File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 3121, in run_method
(EngineCore_DP0 pid=31961)     return func(*args, **kwargs)
(EngineCore_DP0 pid=31961)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31961)   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 310, in compile_or_warm_up_model
(EngineCore_DP0 pid=31961)     self._warm_up_atb()
(EngineCore_DP0 pid=31961)   File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker_v1.py", line 316, in _warm_up_atb
(EngineCore_DP0 pid=31961)     x = torch.rand((2, 4), dtype=torch.float16).npu()
(EngineCore_DP0 pid=31961)         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31961)   File "/usr/local/python3.11.13/lib/python3.11/site-packages/torch/utils/backend_registration.py", line 148, in wrap_tensor_to
(EngineCore_DP0 pid=31961)     return self.to(device=torch.device(f'{custom_backend_name}:{device_idx}'), non_blocking=non_blocking, **kwargs)
(EngineCore_DP0 pid=31961)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=31961) RuntimeError: ACL stream synchronize failed, error code:507013
[rank0]:[W1113 09:12:40.247279995 compiler_depend.ts:528] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.108.576 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeUsedDevices)
[W1113 09:12:40.249145458 compiler_depend.ts:510] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.110.685 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W1113 09:12:40.250596503 compiler_depend.ts:227] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.112.194 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function empty_cache)
[W1113 09:12:40.757161910 compiler_depend.ts:510] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.618.466 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W1113 09:12:40.758794788 compiler_depend.ts:227] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.620.293 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function empty_cache)
[W1113 09:12:40.760359145 compiler_depend.ts:510] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.621.859 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W1113 09:12:40.762001553 compiler_depend.ts:227] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.623.458 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function empty_cache)
[W1113 09:12:40.763705442 compiler_depend.ts:510] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.625.094 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W1113 09:12:40.765601805 compiler_depend.ts:227] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.626.876 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function empty_cache)
[W1113 09:12:40.767316475 compiler_depend.ts:510] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.628.706 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W1113 09:12:40.769077995 compiler_depend.ts:227] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.630.465 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function empty_cache)
[W1113 09:12:40.770826945 compiler_depend.ts:510] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.632.171 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W1113 09:12:40.772674737 compiler_depend.ts:227] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.634.021 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function empty_cache)
[W1113 09:12:40.774446618 compiler_depend.ts:510] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.635.786 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W1113 09:12:40.776271389 compiler_depend.ts:227] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.637.622 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function empty_cache)
[W1113 09:12:40.778058240 compiler_depend.ts:510] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.639.397 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function npuSynchronizeDevice)
[W1113 09:12:40.780058025 compiler_depend.ts:227] Warning: NPU warning, error code is 507013[Error]: 
[Error]: System Direct Memory Access (DMA) hardware execution error. 
        Rectify the fault based on the error information in the ascend log.
EH9999: Inner Error!
        rtDeviceSynchronizeWithTimeout execute failed, reason=[sdma copy error][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
EH9999: [PID: 31961] 2025-11-13-09:12:40.641.232 wait for compute device to finish failed, runtime result = 507013.[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        TraceBack (most recent call last):
 (function empty_cache)
(APIServer pid=31822) Traceback (most recent call last):
(APIServer pid=31822)   File "/usr/local/python3.11.13/bin/mineru-vllm-server", line 10, in <module>
(APIServer pid=31822)     sys.exit(main())
(APIServer pid=31822)              ^^^^^^
(APIServer pid=31822)   File "/workspace/MinerU/mineru/model/vlm_vllm_model/server.py", line 61, in main
(APIServer pid=31822)     vllm_main()
(APIServer pid=31822)   File "/vllm-workspace/vllm/vllm/entrypoints/cli/main.py", line 54, in main
(APIServer pid=31822)     args.dispatch_function(args)
(APIServer pid=31822)   File "/vllm-workspace/vllm/vllm/entrypoints/cli/serve.py", line 57, in cmd
(APIServer pid=31822)     uvloop.run(run_server(args))
(APIServer pid=31822)   File "/usr/local/python3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 105, in run
(APIServer pid=31822)     return runner.run(wrapper())
(APIServer pid=31822)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=31822)   File "/usr/local/python3.11.13/lib/python3.11/asyncio/runners.py", line 118, in run
(APIServer pid=31822)     return self._loop.run_until_complete(task)
(APIServer pid=31822)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=31822)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=31822)   File "/usr/local/python3.11.13/lib/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=31822)     return await main
(APIServer pid=31822)            ^^^^^^^^^^
(APIServer pid=31822)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
(APIServer pid=31822)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=31822)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
(APIServer pid=31822)     async with build_async_engine_client(
(APIServer pid=31822)   File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=31822)     return await anext(self.gen)
(APIServer pid=31822)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=31822)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
(APIServer pid=31822)     async with build_async_engine_client_from_engine_args(
(APIServer pid=31822)   File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 210, in __aenter__
(APIServer pid=31822)     return await anext(self.gen)
(APIServer pid=31822)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=31822)   File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
(APIServer pid=31822)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=31822)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=31822)   File "/vllm-workspace/vllm/vllm/utils/__init__.py", line 1571, in inner
(APIServer pid=31822)     return fn(*args, **kwargs)
(APIServer pid=31822)            ^^^^^^^^^^^^^^^^^^^
(APIServer pid=31822)   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=31822)     return cls(
(APIServer pid=31822)            ^^^^
(APIServer pid=31822)   File "/vllm-workspace/vllm/vllm/v1/engine/async_llm.py", line 134, in __init__
(APIServer pid=31822)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=31822)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=31822)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=31822)     return AsyncMPClient(*client_args)
(APIServer pid=31822)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=31822)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 769, in __init__
(APIServer pid=31822)     super().__init__(
(APIServer pid=31822)   File "/vllm-workspace/vllm/vllm/v1/engine/core_client.py", line 448, in __init__
(APIServer pid=31822)     with launch_core_engines(vllm_config, executor_class,
(APIServer pid=31822)   File "/usr/local/python3.11.13/lib/python3.11/contextlib.py", line 144, in __exit__
(APIServer pid=31822)     next(self.gen)
(APIServer pid=31822)   File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 732, in launch_core_engines
(APIServer pid=31822)     wait_for_engine_startup(
(APIServer pid=31822)   File "/vllm-workspace/vllm/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
(APIServer pid=31822)     raise RuntimeError("Engine core initialization failed. "
(APIServer pid=31822) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
(APIServer pid=31822) [ERROR] 2025-11-13-09:12:52 (PID:31822, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception

二、软件版本:
-- CANN 版本:  8.2.RC1
--Pytorch 版本: 2.7.1
--Python 版本 :  3.11.13

--操作系统版本 : Ubuntu 3.11.13

三、测试步骤：

ASCEND_VISIBLE_DEVICES=7  mineru-vllm-server --port   7860    --enforce-eager

**使用机器是300I  Duo**

Ascend/ModelZoo-PyTorch
暂停

内容风险标识

评论 (0)

Ascend/ModelZoo-PyTorch暂停 .gitee-modal { width: 500px !important; }

内容风险标识