It’s been two years since I started working as a instrument engineer in Tokyo.
Customarily I sigh my colleagues about a pupil mission I did in my junior 365 days of college,
and it grew to became so efficiently got that I’m going to weblog about it.
Now, let me quiz you a depend on.
Endure you ever ever designed your have ISA, constructed a processor of that ISA on FPGA, and constructed a compiler for it?
Moreover, receive you creep an working machine on that processor?
We now receive.
In this establish up, I’m going to level of curiosity on my undergraduate days in 2015,
our four months of making a location-constructed CPU of a location-constructed RISC ISA,
constructing a location-constructed C toolchain, and porting Xv6, a Unix-love OS, to that CPU.
It grew to became all performed as a pupil experiment mission ceaselessly known as CPU Experiment.
So, let’s provoke with what’s CPU experiment.
CPU experiment is a minute bit mighty say held within the frigid local climate of the junior 365 days in my division,
the Division of Recordsdata Science at the College of Tokyo.
In the experiment, students are divided into groups of 4 or 5 students.
Every neighborhood designs an have CPU architecture, implements it on an FPGA,
builds an OCaml subset compiler for that CPU, after which runs a given ray-tracing program on the CPU.
Most regularly, one or two persons are in charge for every of the CPU, FPU, CPU simulator and compiler.
I grew to became in charge of the CPU in my neighborhood, Crew 6.
This say is efficiently identified for the excessive expectation of self studying.
The trainer most racy asks the students to “develop a decision this ray-tracing program written in OCaml and creep it to your CPU utilized on an FPGA”, and the category ends.
He/she doesn’t sigh mighty concerning the concrete steps of suggestions to jot down CPU and compilers.
The students be taught for themselves suggestions to embody the final recordsdata of CPUs and compilers learned in outdated lectures to the stage of steady circuits and code.
Successfully, right here is a basically honorable say, but very despite the real fact that-provoking and tutorial.
As just a few of that you just just would possibly perhaps perhaps well well be real to perchance presumably additionally merely receive seen, I didn’t level of curiosity on about working machine at all.
I’ll add a minute bit rationalization.
Most regularly, the experiment proceeds as follows.
First, you produce a CPU that works reliably, regardless of how slack it is a prolonged system.
Whereas that you just just would possibly perhaps perhaps well well be real to perchance presumably additionally produce a working CPU and efficiently creep the ray-tracing program, that you just just would possibly perhaps perhaps well well be real to perchance presumably additionally produce the credit ranking rating of the experiment.
After that, your crew has a free time.
The veteran arrangement to utilize this free time is to further tempo up their CPU.
In old experiments, students receive made out-of-sigh CPU, VLIEW CPU, multi-core CPU, and even superscalar CPU, which is out of the ordinary.
Nonetheless, some groups arrangement further energy into doing fun such as working video video games or taking half in music by connecting a speaker with their CPU.
Crew 6, to which I belonged, grew to became a neighborhood of such those who cherished leisure,
and we decided to creep an OS as our crew arrangement.
On chronicle of assorted groups exhibiting curiosity on this blueprint, a joint neighborhood of about 8 folks, Crew X,
grew to became shaped, and their arrangement grew to became “Let’s creep an OS on our have CPU!”
Even after I grew to became in charge of making a CPU in Crew 6,
this time I selected to be the leader of the OS crew within the Crew X.
So this establish up is written in actuality from the standpoint of the OS crew,
but for sure I additionally introduce the final neighborhood’s outcomes.
On chronicle of the OS to be ported, we selected Xv6, a basically straightforward Unix v6-impressed OS created by MIT for educational capabilities.
Xv6 is written in ANSI C, no longer love Unix v6, and it runs on x86.
Xv6 is an tutorial OS, so its aspects are reasonably unhappy, alternatively it has enough aspects as a basically straightforward Unix-love OS.
You would possibly perhaps perhaps well receive further recordsdata of Xv6 on Wikipedia
In porting xv6, there are quite so much of challenges on the instrument component on my receive because we receive been taking a gaze to perform all of the items from scratch.
1. C Compiler and instrument chain for Xv6
In the CPU experiment, we most regularly accomplish an ML compiler. Naturally, that you just just would possibly perhaps perhaps well well be real to perchance presumably additionally’t assemble C codes of Xv6.
2. What vogue of CPU aspects required for working machine?
Privilege protections? Digital take care of? Interrupt?
Jog, we had overall working out of what working machine does by lectures,
but we didn’t receive receive enough working out to showcase what bid CPU aspects would possibly perhaps perhaps perchance presumably produce that occur for the time being.
3. What concerning the simulator?
We had a simulator made within the core half of CPU experiment,
alternatively it grew to became a basically straightforward one which executes one instruction by instruction,
and there grew to became no interruption or no digital take care of conversion.
4. Low portability of xv6
Xv6 grew to became no longer very transportable.
As an occasion, it assumes the
char is 1 byte and
int is 4 bytes, and manipulates the stack carefully.
Successfully, the name “Xv6” I say comes from x86 and Unix “v6”, so it’s vogue of pure.
We had quite so much of concerns, but began the Crew X’s OS porting mission in December.
From right here I’m going to jot down about what we did in roughly chronological sigh.
It’s a minute bit bit prolonged, so whereas you occur to must be taught about at our final merchandise quick, please soar to March
The most fundamental discipline that we saw the acknowledge to grew to became the compiler and instrument chain.
To be shock, our decision grew to became to assemble the C89 compiler from scratch.
To be steady, I hadn’t imagined that we’d gain to the backside of this blueprint.
I be unsleeping I talked with Yuichi, who grew to became in charge of CPU of Crew X, about doing a gcc or llvm port at the birth.
Nonetheless, one amongst many crew contributors, Keiichi, all with out prolong acknowledged he had written a C compiler and confirmed us a prototype of a compiler with a basically straightforward parser and emitter.
It looked further fun to jot down the toolchain from scratch, so we decided to jot down a compiler by ourselves.
Yuichi and Wataru from Crew 3, who had already finished the core half of the experiment that 365 days, joined Keiichi, and the Crew X compiler crew grew to became born.
We later named our compiler Ucc.
On the initiating of December, I finished my CPU, and Crew 6 finished the core half of the CPU experiment.
So, we moved on to the fun half, Crew X’s OS porting job.
Pleasant this second, myself and Shohei from Crew 6 began working in Crew X and grew to became the OS crew. Masayoshi joined it at the identical time.
By the blueprint, I say no longer so many instrument engineers receive ever written a CPU, so let me level of curiosity on a minute bit bit about making a CPU as efficiently.
In this reduce-off date, making a CPU doesn’t mean wiring each and each single soar wire on a breadboard; you write the circuitry in Hardware Description Language.
Then you positively synthesize that HDL steady into an real circuit the utilize of Vivado or Quartus.
This route of is named common sense synthesis, no longer compilation.
HDL and programming language are identical but diversified.
Make a selection it love writing a arrangement that maps the signal expose of registers to but another signal expose, attributable to a clock or enter signal.
Whereas you grab to trip steady reactive programming, I imply you would possibly perhaps perhaps well well properly be trying writing an HDL.
Please additionally be unsleeping to jot down HDLs, consistently being taking into account whether the signal propagation of the HDLs you write in actuality outcomes in a single clock.
Otherwise, the behavior of your circuits would possibly perhaps perhaps well well be incomprehensible to folks.
The toughest half of the bid enchancment grew to became that this common sense synthesis took a ridiculous interval of time.
It grew to became no longer new for us to must abet as much as half-hour after initiating the synthesis,
so as rapidly as I started the synthesis,
I grew to became most regularly taking half in Crash Bros. Melee with the reverse CPU guys who receive been additionally waiting for the synthesis to assemble.
FYI, my personality grew to became Sheik.
We began to receive the acknowledge to “What vogue of CPU aspects required for working machine?”
After the OS crew grew to became born, we began weekly rounds of Xv6 offer code reading.
On the identical time, I started porting Xv6 to MIPS.
This grew to became partly to be taught the blueprint an OS works at the implementation stage, and partly because there perceived to be no Xv6 port to MIPS.
I finished the port unless the route of scheduler began in about a week.
I did quite so much of review on MIPS all of the arrangement via this porting route of,
and on x86 to trace how xv6 works.
Attributable to that, I understood mechanisms spherical interrupts and MMU at the implementation stage.
At this stage I bought a receive working out of the CPU functionality required for Xv6.
Additionally, in mid-January, we labored exhausting to assemble your total Xv6 code by commenting out the quite so much of gear.
Which ability that, Xv6 on the simulator of our homebrew architecture confirmed the precept message of the boot sequence,
xv6... cpu0: initiating...
On the identical time, this supposed that by this time Ucc had already grown enough to assemble most of xv6, which grew to became aesthetic.2
In the MIPS port, I finished the initialization of the PIC, which grew to became an real disaster,
and additionally finished the implementation of the interrupt handler.
Which ability that, the porting of Xv6 to MIPS grew to became finished unless perfect earlier than the precept client program began.
In accordance to this trip, I made the draft specifications of the interrupt and digital take care of translation for our homebrew CPU.
In sigh to protect up it straightforward, we decided to omit hardware privilege mechanisms love Ring protection.
For digital take care of translation, we decided to utilize a hardware page-strolling blueprint, perfect love x86.
It would possibly perhaps perhaps well well perhaps per chance perchance presumably additionally merely appear honorable to enforce in hardware, but we thought it grew to became more reasonable if we sacrificed the tempo and lunge away out TLB implementation.
In the slay, Yuichi made an comely CPU core later, and it installed TLB from the initiating despite the real fact that.
Yuichi finished the final gain of the ISA of our CPU.
He named our CPU GAIA.
In strange CPU experiment projects, we don’t enforce interrupt nor MMU.
Nonetheless, Yuichi began to enforce them for Xv6, per the refactored mannequin of the CPU of Crew 3.
I’ll show cloak the weekly recordsdata as the love a flash construction begins from then on!
As a change of agreeable commenting boot sequences out, Masayoshi began implementing steady initialization of our CPU,
and Shohei rewrote the x86 assembly of Xv6 into our homebrew architecture’s.
I added interrupt simulation ability to our simulator which Wataru had made within the core half of CPU experiments,
and additionally finished toughen for digital take care of translation.
This gave the simulator enough functionality to creep the OS.
I made a former linker for our architecture to assemble Xv6 and its binary blobs.
Shohei grew to became working on implementation of the interrupt handler, which grew to became a honorable half.
Interrupts are exhausting to trace, exhausting to name the fade along with the fade along with the scamper, exhausting to debug, and exhausting to blueprint.
After I ported Xv6 to MIPS, I had GDB, so it grew to became moderately OK, but our have simulator didn’t receive any debug aspects, so it must receive been very honorable to debug.
Shohei couldn’t suffer the showcase of debugging, so he added a disassembler and a debug dump arrangement to the simulator.
After this, the simulator’s debugging aspects receive been all with out prolong upgraded by the OS crew, and at closing the simulator grew to test love the following listing.
Overcoming reasonably quite so much of difficulties, the porting of Xv6 grew to became developed, but Xv6 gentle didn’t work.
Notably, the specification of Ucc that
int are each and each 32 bits introduced on quite so much of troubles.
That grew to became no longer Ucc’s fault.
In actuality, the C specification most racy requires
sizeof(char) == 1 and
sizeof(char) <= sizeof(int), so this grew to became appropriate.
Nonetheless, xv6 is written for x86,
so it assumes
sizeof(int) == 4 and provides constants to the charge of the pointer, which introduced on quite so much of inconsistencies.
For the explanation that malicious program created by this grew to became so exhausting to receive and the amount grew to became additionally neatly-behaved,
it grew to became decided to quiz Ucc to present
char 8 bits despite all of the items.
After delegating the char 32-bit discipline to the Ucc crew,
I wrote the initialization of paging of the precept entry stage, and tried to create the interrupts to work efficiently by trial and mistake.
The unpleasant line is that we labored exhausting to repair the discipline #4, “Low portability of Xv6”.
After I reread the slack, I'm ready to search that quite so much of construction grew to became made on for the time being.
After the Ucc crew in a brief time finished the swap to present
we labored exhausting on quite so much of debugging.
In the slay, our first client program
After that, we made further and further construction in porting the client route of capabilities that I hadn’t but gotten to within the port to MIPS.
On the blueprint, many bugs that receive been exhausting to breed or inadequacies within the interrupt specification receive been found and mounted,
, but we bought over it a mode or the reverse.
One racy errata we mounted is the cache alias discipline.
GAIA CPU selected a digital take care of as a change of a bodily take care of as the cache index.
Pleasant right here is since it enables you to skip the digital take care of translation to test up caches.
Nonetheless, thanks to the that, we found that inconsistency took arrangement between caches, because a pair of caches of digital addresses would possibly perhaps perhaps perchance presumably show cloak the identical single bodily take care of.
When the cache of 1 digital take care of grew to became up as much as now, the caches of assorted digital addresses pointing to the identical bodily take care of receive been no longer up as much as now.
This malicious program grew to became exhausting to repair on the hardware component at low-charge, so we mounted it by introducing “Web page Coloring” in our Xv6.
This introduces “coloration” for every cache line, and allocates pages so as that digital addresses pointing to the identical bodily take care of will consistently create the identical coloration.
This means digital addresses pointing to the identical bodily take care of will consistently receive most racy one cache.
This allowed Xv6 to develop sure GAIA never had a pair of caches which shared the identical bodily take care of.
On 1st, the xv6 port is total. xv6 grew to became now working on the simulator…!
At the birth, the Xv6 port grew to became supposed to be fun, and since Xv6 began working on the simulator, we labored exhausting to be real so as to add plenty further fun.
First of all, a mini curses is created by Masayoshi in about 4 hours and the
sl show cloak creep on our Xv6.
Shohei vogue of wished to assemble a Minesweeper.
The total very simplest arrangement via this generation, Yuichi finished the implementation of the CPU of the Crew X.
The bid CPU ran mighty faster than the simulator, which made the recreation more straightforward to play and blueprint.
Pleasant this second, a basically excessive of the vary utility, 2048, grew to became created.
This 2048 grew to became very excessive of the vary.
Yuichi performed with it the total time.
By the very simplest arrangement, the 2048 makes utilize of non-line buffering enter, but xv6 sooner than all of the items didn't receive this decision.
To toughen this decision,
ioctl grew to became added as a
devsw motion as efficiently as to
be taught and
termios-linked aspects to manipulate
echo receive been added.
So the becoming Xv6 that would possibly perhaps perhaps well play 2048 with this produce of excessive stage of completeness is the one on GAIA.
By the blueprint, for a V6-impressed Xv6, I say adding
stty machine calls is a further Unix V6-love blueprint.
Nonetheless, I adopted
ioctl because Xv6 doesn’t receive the blueprint that of tty, and because
ioctl grew to became offered within the following mannequin, V7, which is shut in history.
Now, as one component a minute bit cooler, Xv6-GAIA has a tiny assembler made by Keiichi.
Additionally it has a mini vi that Shohei made.
Guess what that you just just would possibly perhaps perhaps well well be real to perchance presumably additionally assemble with these two.
It’s interactive programming on an FPGA!
Pleasant right here is a blinding spectacular demo for a CPU experiment, which most regularly doesn’t consist of any interactive program.
The prolonged-established job of the CPU experiment grew to became “Trek the given ray-tracing program to your homebrew CPU”.
Now that you just just’ve bought an working machine working to your CPU, what you’re speculated to assemble, appropriate?
We decided to creep the ray-tracing program “on the OS “on our have CPU.
We had just a few bugs, but we managed to assemble it an hour earlier than the final presentation.
So, we did what each and each pupil within the history of our division has potentially joked at the least as rapidly as:
Trek an working machine on a CPU, and ran the ray-tracing program on excessive of it.
What I’ve written so a prolonged system is truly a rewrite of my have weblog establish up I wrote in 2015.
Whereas reading it now, I'm ready to search quite so much of my technical inexperience at the time, what we did then is efficiently despite the real fact that-provoking.
By the blueprint, that you just just would possibly perhaps perhaps well well be real to perchance presumably additionally search what our Xv6 looked love for the time being to your browser appropriate now from right here
Let’s try our
init: initiating sh
Let me additionally sigh you that the porting of Xv6 to MIPS,
which wasn’t finished at the time of the CPU experiment,
grew to became finished a month after the experiment.
The GitHub repository is right here
After we posted a weblog establish up concerning the Crew X discipline in 2015, later generations of students persevered to salvage out on new challenges spherical OS.
In 2018, some students ran their very have OS on excessive of a location-constructed CPU,
and in 2019, a neighborhood of students ran their very have OS whereas adopting RISC-V for his or her location-constructed CPU ISA.
As efficiently as, the neighborhood in 2020 at closing ran Linux on excessive of a homebrew CPU that additionally adopted RISC-V as its ISA.3
I’m definite there'll be many further reports in some unspecified time in some unspecified time in the future,
so one and all please be taught about forward to them.
Personally, I be taught about forward to any individual working Linux on their very have ISA in some unspecified time in some unspecified time in the future, or working a VM on it.
Reinventing the wheel is most regularly acknowledged to be one component to be shunned, but there’s reasonably reasonably to be taught from truly doing it.
It made me trace that I didn’t trace it as efficiently as I would possibly perhaps perhaps perchance presumably enforce it from scratch.
Plus, I imply it to you because, above all, it’s fun to die for!
That’s the stay of the memoir of our CPU experiment. Whereas you’re drawn to reinventing the despite the real fact that-provoking wheel, please try constructing a CPU or porting an OS to it.
Lastly, I would possibly perhaps perhaps well well love to summarize the contributors of the Crew X.