AIML - Sr. Software Engineer (Large Language Models) - Machine Intelligence
Apple Inc
Seattle, WA
Job posting number: #7150434 (Ref:apl-200479320)
Posted: May 17, 2023
Application Deadline: Open Until Filled
Job Description
Summary
The Machine Intelligence, Neural Design (MIND) team employs HW/SW co-design to achieve best-in-class performance and energy efficiency for numerous use cases that deploy neural networks. We seek a Sr. Software Engineer to help define and implement features that accelerate and compress large language models (LLMs) in our next-gen inference engine. Our team is comprised of Efficient ML experts with skillsets in HW, SW, and ML. Our charter is to push the frontiers of perf and power for DNNs with minimal memory footprint.
As a SWE, you will be responsible for writing high-quality, well-tested code. Our ideal team member is courageous when it comes to trying new things, is adept at reasoning about systems performance, and is willing to iterate on ideas. We value team members with strong communication skills with experience working cross-functionally with HW, SW, and ML teams.
As a SWE, you will be responsible for writing high-quality, well-tested code. Our ideal team member is courageous when it comes to trying new things, is adept at reasoning about systems performance, and is willing to iterate on ideas. We value team members with strong communication skills with experience working cross-functionally with HW, SW, and ML teams.
Key Qualifications
- Proficient in C++, working knowledge of Python
Description
As a member of this team, you will use your background to:
* Implement features that compress and accelerate LLMs in our on-device inference engine
* Convert models from a high-level framework to a target device for correctness and performance issues
* Write unit and system integration tests to ensure functional correctness and to reduce performance regressions
* Diagnose performance bottlenecks and minimize memory footprint of large language models
* Work with HW Arch teams to co-design solutions that further improve perf, power, and memory footprint of neural workloads
* Work with a variety of partners from all parts of the stack — from Apps to Compilation, HW Arch, and Silicon Validation
* Implement features that compress and accelerate LLMs in our on-device inference engine
* Convert models from a high-level framework to a target device for correctness and performance issues
* Write unit and system integration tests to ensure functional correctness and to reduce performance regressions
* Diagnose performance bottlenecks and minimize memory footprint of large language models
* Work with HW Arch teams to co-design solutions that further improve perf, power, and memory footprint of neural workloads
* Work with a variety of partners from all parts of the stack — from Apps to Compilation, HW Arch, and Silicon Validation
Education & Experience
Bachelor's, Master's, or PhD in Computer Science or a related field
Additional Requirements
- Preferred Qualifications:
- * Strong communicator with ability to analyze complex and ambiguous problems
- * Familiarity with ML model compression techniques (e.g., quantization, pruning/sparsity) and their mapping to a target backend
- * Disciplined programming skillset with a strong attention to detail
- * Experience with backend compilation, HW/SW co-design, and/or performance optimization
- * Deep understanding of computer systems and the interactions between HW and SW
- * Familiarity with at least one deep learning framework (e.g., PyTorch, Keras, TensorFlow)