LLM 推理算法工程师

Zoom ·careers.zoom.com

Location US
Work type Remote
Type Full time
Level Mid
Source Shazamme
Other
Apply direct

工作内容

推理架构演进: 参与或主导分布式大模型推理系统的架构演进,设计并落地 Prefill / Decode 分离架构(P/D Separation)、全局 KV 缓存路由(KV Router)等核心基础设施功能。 底层算子与量化优化: 负责低比特量化推理方案(如 FP8、W4A8 等)的生产级落地;深入优化 Attention 与 MoE 等核心算子,提升单卡吞吐量,降低单 Token 推理成本。 复杂场景专项优化: 针对会议总结等超长文本场景,深度重构和优化 Chunked Prefill 机制与显存置换(Swapping)策略,解决极限并发下的显存碎片与网络传输开销;优化多轮对话下的 Prefix Caching 及投机采样(Speculative Decoding)实现,降低系统首字延迟(TTFT)。 性能剖析与调优: 建立完善的推理性能可观测体系(Profiling),从系统级、算子级、通信级深入剖析软硬件性能瓶颈,指导系统架构与调度算法的迭代。

岗位要求

计算机及相关专业本科或以上学历,具备扎实的计算机体系结构及操作系统基础。 精通 C/C++ 与 Python,具备扎实的 CUDA 编程能力,熟悉 Triton/Cutlass,有实际的算子深度调优及业务落地经验。 深入理解主流开源推理框架(如 vLLM, TensorRT-LLM, SGLang),具备源码级的二次开发、定制优化及复杂 Bug 排查能力。 熟悉 Transformer 及主流大语言模型架构,了解模型量化、剪枝、稀疏化等主流压缩加速技术。 了解底层分布式通信协议(NCCL/RDMA),对多机多卡环境下的并行计算策略(TP/PP/EP)有实践经验。

加分项

在大模型推理系统(vLLM, SGLang, TRT-LLM等)或高性能计算开源项目中作为核心贡献者。 对 NVIDIA Hopper/Blackwell 等最新微架构特性有深入理解与实战经验

Ways of Working
Our structured hybrid approach is centered around our offices and remote work environments. The work style of each role, Hybrid, Remote, or In-Person is indicated in the job description/posting.

Benefits
As part of our award-winning workplace culture and commitment to delivering happiness, our benefits program offers a variety of perks, benefits, and options to help employees maintain their physical, mental, emotional, and financial health; support work-life balance; and contribute to their community in meaningful ways. Click Learn for more information.

About Us
Zoomies help people stay connected so they can get more done together. We set out to build the best collaboration platform for the enterprise, and today help people communicate better with products like Zoom Contact Center, Zoom Phone, Zoom Events, Zoom Apps, Zoom Rooms, and Zoom Webinars.
We’re problem-solvers, working at a fast pace to design solutions with our customers and users in mind. Find room to grow with opportunities to stretch your skills and advance your career in a collaborative, growth-focused environment.


Our Commitment​

At Zoom, we believe great work happens when people feel supported and empowered. We’re committed to fair hiring practices that ensure every candidate is evaluated based on skills, experience, and potential. If you require an accommodation during the hiring process, let us know—we’re here to support you at every step.

If you need assistance navigating the interview process due to a medical disability, please submit an Accommodations Request Form and someone from our team will reach out soon. This form is solely for applicants who require an accommodation due to a qualifying medical disability. Non-accommodation-related requests, such as application follow-ups or technical issues, will not be addressed.

Our interviews are supported by BrightHire, a tool that helps us create a consistent and thoughtful interview experience and may include recordings. Please refer to our candidate privacy statement for more information of how we use your data.

Frequently asked questions

Who is hiring for the LLM 推理算法工程师 role?
Zoom is hiring for the LLM 推理算法工程师 position, a Shazamme client. Apply directly on the employer's career site.
Where is the LLM 推理算法工程师 job located?
The LLM 推理算法工程师 role with Zoom is based in US. The role is remote-friendly.
Is the LLM 推理算法工程师 role remote?
Yes — the LLM 推理算法工程师 position at Zoom is remote. Candidates based in US are preferred.
Is the LLM 推理算法工程师 role full-time or contract?
This is a full time position at Zoom.
What experience level is the LLM 推理算法工程师 role?
The LLM 推理算法工程师 position is aimed at mid-level candidates.
How do I apply for the LLM 推理算法工程师 role at Zoom?
Apply directly on Zoom's career page via the Apply button on this listing. ZammeJobs links straight through to the employer's ATS — no third-party form, no resume database.
Apply direct