Job Description
ABOUT SPHERE
Every breakthrough in trade infrastructure has followed the same pattern: reduce a transaction cost, expand the market. Containerization for goods. SWIFT for money. Stripe for payments. Compliance is one of the last and largest — and the hardest, because trade rules aren't data to be looked up. They're a complex adaptive system with 190+ sovereign jurisdictions, in different languages, changing constantly, reacting to each other.
Sphere built the system that solves it. Our AI (TRAM) ingests global trade law, interprets it, resolves conflicts across jurisdictions, and produces compliance determinations more reliable than human experts. We handle the entire lifecycle — calculation, registration, filing, remittance — at millisecond latency with zero downtime.
Backed by a16z and YC. $21M Series A, 30%+ month-over-month growth, customers include ElevenLabs, Replit, Deel, Runway, and Lovable.
Small team, global surface area. Everyone owns a domain that would be a full team at a larger company. San Francisco, five days in office.
The problem keeps compounding. Expanding into input tax, withholding, e-invoicing, tariffs — each multiplies the complexity. Tens of millions of transactions today, billions ahead.
THE ROLE
You'll lead development of TRAM, our proprietary AI reasoning model that reads and interprets global trade law. This isn't a lookup problem, it's a reasoning problem — and it only became solvable with LLMs. You'll build the data pipelines that ingest legal sources, the model stack that produces structured evidence, the evaluation frameworks that measure accuracy, and the fine-tuning loops that improve performance. The unusual constraint: you need speed, scale, correctness, and robustness simultaneously — at millisecond latency, zero downtime, heading toward billions of transactions where a single error costs a customer $20K.
WHAT YOU'LL DO
Within weeks:
Lead development of new features aimed at increasing TRAM’s test-time accuracy
Work on the underlying data and retrieval pipelines that help power our AI workflows
Work directly with our internal tax experts to understand how TRAM can better reason like them
Within months:
Own TRAM’s eval framework and workflows
Work directly with leading frontier labs to reinforce fine tune models on our proprietary data
REQUIREMENTS
Prior experience building AI enabled products, particularly RAG systems
Experience fine tuning base models, ideally via RFT
Willingness to dive into tax technical problems - if you aren’t willing to dive deep on how the model should reason through the tax research process you won’t be effective
A strong understanding of how LLMs and reasoning models function
NICE TO HAVES
Experience working with LLMs on legal applications
Experience with RAG data pipelines and collecting/curating data for the pipeline
WHO YOU ARE
You'll thrive here if:
You're a Dog. You've been underestimated, gone through struggle, and never stopped running. You have a chip on your shoulder and enormous drive. You look at Stripe, Deel, and Flexport all punting on compliance and think: good, that means the opportunity is ours. Hunger beats pedigree.
Early stage is in your bones. You've built things where there's no playbook and nobody handing you the answer. You define the problem instead of waiting for instructions.
You own it end to end. Give you a goal and you figure out your own path. Small team, global surface area — everyone owns a domain that would be a full team at a larger company. No one tells you how.
You believe speed and accuracy are both possible. We're building a complex product that requires robustness and 100% uptime, and we have to build at our customers' pace. Move fast. Don't break things. Both.
Being in the room is a feature, not a cost. Five days in SF isn't a policy, it's how the work gets done. The speed and density of collaboration we need doesn't survive over video.
This won't be a fit if:
You need structure handed to you or ambiguity feels draining rather than motivating
You want to manage people more than own hard problems (we're a flat, experienced team — everyone builds)
You're used to "good enough" shipping (small errors have outsized impact here)
Being in the room five days a week feels like a cost instead of a benefit