Jobs on TAL
All jobsOnsiteEngineeringai/ml infraExperience not specifiedsglang
OnsiteMid Levelai/ml infra

Research Engineering, Inference

BitdeerSingaporePosted 20 May 2026

Bitdeer AI Lab is seeking a Research Engineer to focus on ML systems research for large-scale inference optimization. The role involves working with SGLang and vLLM stacks to optimize serving for LLMs and reasoning models on large GPU clusters. Responsibilities include GPU kernel-level tuning, inter-node communication, and designing end-to-end serving systems. This position offers the opportunity to work on frontier AI models within a rapidly growing global computing infrastructure company.

Matched by TAL

50k new jobs listed every day. Install TAL to find more jobs like this.

Install TAL

Experience

Experience not specified

Function

Engineering

Work mode

Onsite, Singapore

Company

Tier 2

What you will work on

Bitdeer AI Lab is seeking a Research Engineer to focus on ML systems research for large-scale inference optimization. The role involves working with SGLang and vLLM stacks to optimize serving for LLMs and reasoning models on large GPU clusters. Responsibilities include GPU kernel-level tuning, inter-node communication, and designing end-to-end serving systems. This position offers the opportunity to work on frontier AI models within a rapidly growing global computing infrastructure company.

TAL's take

Quality 65/1005/5 clarityTier 2 company

Solid mid-tier role focused on high-performance infrastructure and inference at scale, clear technical mission but lacks explicit seniority or YOE context.

Very clear scope focused on SGLang/vLLM inference optimization, specific stack requirements, and clear team goals.

Must haves

  • Experience with SGLang or vLLM for large-scale model serving
  • Understanding of inference optimization for LLMs and reasoning models
  • Understanding of GPU kernel-level optimization
  • Knowledge of distributed inference communication patterns and collective operations like NCCL
  • Understanding of latency, throughput, stability, and GPU efficiency metrics

Tools and skills

sglangvllmgpu kernel optimizationnccldistributed inference communicationtensor parallelismpipeline parallelismnvlinkinfiniband

About the company

Publicly traded company with global infrastructure, focused on specialized crypto/AI hardware and cloud, but not a FAANG-tier software brand.