Task Description
- Research, design, and implement computer vision and vision-language model (VLM) use cases tailored for MBOS.
- Train and fine-tune deep learning models focusing on multimodal fusion and efficient deployment on embedded platforms.
- Work closely with software, hardware, and product teams to integrate developed algorithms into the overall vehicle system.
- Build and maintain toolchains for fine-tuning and deploying LLMs/VLMs, manage training clusters, and ensure efficient inference on both server-side and embedded targets.
- Experimentation, Evaluation, and Knowledge Transfer to other team members.
Qualifications
- Master degree or above in Computer Science, Electrical Engineering, Robotics, or a related field.
- Proven hands-on experience in developing and deploying computer vision and/or VLM algorithms, preferably in the automotive or robotics domain.
- Experience with deep learning frameworks (such as PyTorch, TensorFlow) and classical computer vision libraries
- Experience with fine-tuning and optimizing LLMs/VLMs (e.g., LoRA, RAG, prompt engineering)
- Familiarity with multimodal fusion techniques and the architecture of models like Transformer, BERT or similar.
- Solid grounding in both Natural Language Processing and Computer Vision; able to design and implement solutions that leverage both modalities.
- Experience with model compression, quantization, and deployment on resource-constrained environments
- Familiarity with dataset collection, labeling, and evaluation for multimodal tasks
- Strong programming skills in Python and C++.
- Experience with cloud services (e.g., Azure, AWS, Tencent) is a plus.
- Outstanding analytical and problem-solving skills.
- Technical leadership and ability to make decisions based on technical facts.
- Strong sense of ownership and drive.
- Good communication skills and ability to work in a collaborative, cross-functional environment.
- English proficiency in written and spoken form.
