Need a pointer or two Guys,
My work on a VHDL model of the v8 uRISC (Arclite) CPU is coming along well. I can perform all of the math operations, branching, jumps, etc.
I am a little fuzzy on what exactly the RSP (reset stack pointer) instruction actually does. I would think it allows you to set the start address of the stack back to the beginning, but why does it need 5 cycles for that? Alternately, I could see it being used to set the start address by loading in an address from memory, but that still doesn't take 5 cycles - and the instruction doesn't list any operands.
Can someone help me out on what this instruction does?
Thanks!
BillW- 07-28-2006
I would have assumed same as you - set the pointer back to the base.
This site appears to have some (VHDL?) code on it...
http://www.ece.neu.edu/info/vhdl/class/HomeworksExams/FullDocument.htm
...search for "RSP" a couple of times.
Additionally, daBass's simulator passes the V8 reference application, and agrees with our assumption. It interprets RSP by setting the stack pointer to it's default address.
Naive software-guy question: could it be 5 cycles because it basically does the equivalent of 2 8-bit load and stores to reset the 16-bit stack pointer? (no VHDL-guy scorn please! ;) )
Create987- 07-28-2006
Naive software-guy question: could it be 5 cycles because it basically does the equivalent of 2 8-bit load and stores to reset the 16-bit stack pointer? (no VHDL-guy scorn please! ;) )
5 cycles is a long time for this. But probably the answer is partially explained by the fact that reseting the stack pointer is something that does not need to be optimized. It is hardly ever done. Even in a real time OS environment, typically the thread switches happen by pushing the next threads context onto the stack and returning to it.
radarman- 07-28-2006
Yep - I found that document myself, and figured it out. I decided to add a flourish, though.
I have a generic that can allow the RSP instruction to set the stack pointer to the contents of R1:R0, instead of the precompiled defaults. This allows you to move the stack pointer during run-time.
I'm leaving the default to reset to the compiled-in start address, but if it proves useful, I may set it to allow the modification.
Also, I turned USR into a fairly nifty little instruction. It allows you to do a jump to {Rn+1:Rn}. Essentially, you set the address up in a register pair, execute the instruction, and away you go.
I couldn't think of anything else, so I called the new opcode \"BRX\".
Multiply still works the same as in my last post - R1:R0 = R0 * Rn. :)
The only things not working are interrupt based. I now have all of the storage instructions, branching/jumping instructions, and subroutines working.
One more quick question, since I'm at work:
When you do a JSR, does it push the lower, or upper, half of the address to the stack first? I would assume the lower half first, since you can reuse the second half of JMP after POP'ing the address off the stack. (the last thing on the data bus would be the upper-half of the address, just like with a JMP instruction)
Create987- 07-28-2006
When you do a JSR, does it push the lower, or upper, half of the address to the stack first? I would assume the lower half first, since you can reuse the second half of JMP after POP'ing the address off the stack. (the last thing on the data bus would be the upper-half of the address, just like with a JMP instruction)
And an equally valid and important question: Does it decrement the SP before or after it does the write to memory. That varies from processor to processor.
texaspyro- 07-28-2006
Also, I turned USR into a fairly nifty little instruction. It allows you to do a jump to {Rn+1:Rn}. Essentially, you set the address up in a register pair, execute the instruction, and away you go.
Once upon a before time, in the last millenium, I worked for a company that built minicomputers (actually Naked Minis). We had a neat instruction called XNX - Index Next Instruction. It fetched the next word in memory, added the contents of a register to it, then executed the resulting opcode.
There was also an instruction called SAL - Software Autoload - I invented it. It was way too cool for mere mortals to use. It took about 20% of the total microcode to implement it.
radarman- 07-28-2006
When you do a JSR, does it push the lower, or upper, half of the address to the stack first? I would assume the lower half first, since you can reuse the second half of JMP after POP'ing the address off the stack. (the last thing on the data bus would be the upper-half of the address, just like with a JMP instruction)
And an equally valid and important question: Does it decrement the SP before or after it does the write to memory. That varies from processor to processor.
I assumed that since the SP points to the next available location on the stack, that you would predecrement. At least that's the way I've implemented it in my microcode.
daBass- 07-29-2006
Radarman:
What kind of development platform are you using for your V8 experiments ? And which program are you using for VHDL development ?
I have a Xilinx Spartan 3 development board lying around (which comes with ISE). I could verify your design.
What are you going to do when you got your core working ? Have you thought about donating it to http://opencores.org ?
radarman- 07-29-2006
DaBass,
I actually have two boards I'm -*test*-('")ing on. An Altera DE2 (which is way overkill for this little processor), and a hacked together Xilinx setup out of freebies consisting of:
1) Xilinx Spartan 3e sample pack board with Spartan 3E 100, and 4MB of NOR Flash.
2) Xilinx/Digilent CPLD Design Kit with a Coolrunner 2 XC2C256 CPLD and a XC9572XL 5v tolerant CPLD.
3) Keypad & display from a medical drug enfuser. It has 3 4-character LED displays, a few discrete LED's, etc.
I attached the sample pack board to the CPLD board with 4 6-pin SIP sockets and a ribbon cable - so the spartan board is powered from the regulators on the CPLD board. (which are much beefier)
Right now, the 9KB of RAM in the FPGA is enough for both code & data, but there is plenty of room to add an SRAM on the CPLD baseboard.
The best part is that I only paid $20 for the keypad & display board on eBay - the rest were vendor freebies.
The DE2 has an Altera Cyclone II 2C35, an ethernet MAC+PHY, RS232, 16x2 character LCD, SD card slot, etc. On the upside, the design will run at over 120MHz on a 2c35. :)
radarman- 07-29-2006
Forgot to add:
I'm using little more than UltraEdit and ModelSim for the basic design. I'm using ISE 8.1 and Quartus II 5.1 Webpacks for P&R and programming.
I am planning on donating the core to opencores, along with a suite of other cores that would make for a useful microcontroller. I plan to integrate a serial UART, timer, and a HD4470 controller. That should all fit in even the smallest Xilinx of Altera device, and would make for a nice little 8-bitter.
Create987- 07-29-2006
I assumed that since the SP points to the next available location on the stack, that you would predecrement. At least that's the way I've implemented it in my microcode.
If that is how the documentation describes the SP, my guess is it would be the opposite. If it all ready points at an 'available' storage location, they are not going to waste one by decrementing away from it.
radarman- 07-29-2006
Keep in mind, that when you PUSH data to the stack, the pointer is already correct. It's only when you POP that you have to back up one. It's easier to let the stack pointer counter do a pre-decrement, then read the current location. This also means that the stack pointer is correct for the next push.
You could do it the other way, but then you would have to pre-increment for the PUSH.
I was asking because I want to replicate the basic behavior of the v8 as closely as possible. I'm not entirely sure it matters, though, since you don't have direct access to the stack pointer anyway. I believe you could implement it whichever way is easiest.
EDIT:
Let me add, I pipelined the write data & enable, so I have an extra clock cycle to use up anyway. The pipelining seriously improves routing performance when you start connecting external cores. So, I chose to predecrement on POP's, rather than preincrement on PUSH's.
Create987- 07-29-2006
I was asking because I want to replicate the basic behavior of the v8 as closely as possible. I'm not entirely sure it matters, though, since you don't have direct access to the stack pointer anyway. I believe you could implement it whichever way is easiest.
I agree... You need to replicate the behavior *EXACTLY*. Even if you can't perform indexed loads off of the SP, when you get something sizable like the camera firmware, there are going to be assumptions (or dependancies) of where part of the stack frame is located.
This sounds like a fun project!
radarman- 07-30-2006
Ok guys,
I have now fully -*test*-('")ed & debugged everything except the interrupt control code & instructions. I can branch to subroutines and return, push & pop data on a stack, and branch & jump all I want.
It's probably ready for someone else to start playing with - especially since I've never designed an interrupt controller before. If you have a background in VHDL for synthesis, PM me and I'll email you the current VHDL source.
The included -*test*-('") bench is incredibly simple. It simply stocks a RAM with some preloaded data, which the processor attempts to execute. It is far from a thorough -*test*-('"), and I probably should attempt to write more complex program code, but I believe the CPU core is mostly functional at this point.
radarman- 08-09-2006
Guys,
I believe I have the interrupt controller working. It now implements 8 interrupt signals, 7 of which are maskable. The interrupts are prioritized with 0 having the highest priority. I do not support interrupt nesting, though. (no reentrant interrupt service vectors)
Essentially, if a low-level interrupt is being processed, only a higher-level interrupt can interrupt it. This means that the worst case stacking is 8 - which was doable in hardware (vs using the stack). It actually works pretty well, and you don't lose a single "double interrupt" - the second one is still pending.
At this point, the processor is fairly well code-complete. I did replace the BRX instruction with the much more useful DBNZ (Decrement, and Branch if Not Zero) instruction. Essentially, the instruction performs an immediate DEC on the specified register, and if the result is non-zero (Zero flag is unset), the branch is taken. It implements DEC Rn : BNZ <offset> in the microcode.
I also added a generic to turn on another useful addition. Auto-increment on indexed load & store. If this generic is enabled, attempting to use an ODD register for LDX, LDO, STX, STO will result in the the next lower register pair being used, and post-incremented.
So, STX R1 would result in STX R0 ; UPP R0 - except that it occurs in the microcode. I scanned the existing firmware, and there are no references to odd indexes - which really don't make much sense anyway - so you can turn the option on, and still run current firmware with no alteration.
I currently have it running on an Altera Cyclone II (2C35-6) at 110MHz (by itself). A sample system with a FIFO based UART, FIFO based LCD controller, 32kB of RAM, and 8kB of ROM (in BRAM) is running at over 80MHz.
I haven't built for a Xilinx device yet.
Forumer™ is Voted #1 Free Forum Hosting provider
Build your own community today with the largest message board hosting company.