About the role
Maincode is training Matilda, the first large language model built and trained from scratch in Australia. Our new compute cluster is live, and we are now scaling the next version.
This role sits directly inside that training stack. You will build the pipelines, infrastructure, and tooling that determine how efficiently Matilda trains, how stable long runs are, and how fast new experiments can be executed. Training runs last days or weeks. Small changes propagate through complex systems. The work requires precision and patience.
We build AI systems from first principles: designing the architectures, running the infrastructure, shaping the training process, and operating the models ourselves. Matilda is not a research prototype. It is a production system, trained at scale and served for open public access.
Maincode operates one of the largest private AI compute environments in Australia, built for a single purpose: training our own models. This is not a role that wraps external APIs or ships user-facing features. You will be working on the systems that train a large language model from scratch.
What you would actually do
You will build and maintain the systems that support large scale model training.
This includes:
Designing and maintaining distributed training pipelines for large language models
Building data ingestion and preprocessing systems for large training datasets
Developing tooling for experiment management, checkpointing, and reproducibility
Monitoring and debugging long running training jobs across clusters
Improving reliability and observability across the training stack
Optimising training throughput across compute, memory, and data pipelines
Working closely with researchers to translate experimental ideas into training runs
Diagnosing failures across infrastructure, training loops, and data pipelines
Training runs can last days or weeks. Small changes propagate through complex systems.
You will spend time inside code, logs, dashboards, and experiment outputs. The goal is simple: make large scale training reliable.
The kind of person who does well here
We are looking for engineers early in their careers who want to understand how large models are actually trained.
You may have one or two years of experience building production software. What matters most is curiosity and the willingness to learn how these systems behave under load.
People who tend to do well here:
Care about how systems behave over long runtimes
Enjoy debugging complex distributed systems
Pay attention to logs, metrics, and system behaviour
Prefer understanding how a system works rather than relying on abstraction
Are comfortable working close to infrastructure
Have the patience to diagnose failures that appear hours into a run
Want to learn how large scale AI training actually happens
You do not need prior experience training large language models. What matters is intellectual curiosity, persistence, and the ability to learn quickly.
How you would work
You will write production code that sits directly in the training stack.
You should be comfortable:
Working in Python
Using machine learning frameworks such as PyTorch or JAX
Writing reliable infrastructure for large compute workloads
Debugging distributed systems and long running jobs
Collaborating closely with researchers and infrastructure engineers
Much of the work sits between research and infrastructure. Ideas move quickly, but the systems that support them must remain stable.
What this role is not
It is not primarily about building user facing applications
It is not about prompt engineering
It is not about wrapping external APIs or third party models
You will be working on the systems that train our own models from scratch.
Why Maincode
Maincode builds AI systems end to end. We prepare the data, design the training process, run the infrastructure, and operate the models ourselves.
You will work with a small team that:
Builds the full AI stack rather than outsourcing it
Treats infrastructure as part of the intelligence system itself
Values engineers who want to understand how things actually work
Is building long term capability in training and operating large models
If you want to work directly on the systems that train large language models from scratch, this is the only role in Australia that will put you inside that work.
Note
This is a full time role based in Melbourne, working closely with our in person engineering and research team. At this time we are not able to offer visa sponsorship, so applicants must have existing and unrestricted work rights in Australia.



