RetroBrew Computers Forum
Discussion forum for the RetroBrew Computers community.

Home » RBC Forums » General Discussion » Using Multi-Core Processors to build a in-circuit replacement/enhancement for vintage CPUs (The Parallax P2 Processor set me upon a design path I never expected... From "uFDC" to "uCPU")
Using Multi-Core Processors to build a in-circuit replacement/enhancement for vintage CPUs [message #9650] Thu, 20 January 2022 17:20 Go to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
FORWARD:----------------------NECESSITY BREEDS INVENTION
I started this discussion months ago when I realized that the Parallax "P2" processor would be a good chip for an in-circuit replacement/enhancement for vintage floppy disk controllers ('FDC' the chip, not the board). The P2 advantages are that its a fast, multi-core processor with a large count of GPIO pins, with reconfigurable pins and supporting hardware to operate the pins without much software bit-banging. Its disadvantages are its price (for some hobbyists) and its chip size in relation to the profile of a vintage 40pin dual-inline package and socket.

An FDC replacement is a useful project because its relatively easy to restore floppy disk drives and even some floppy disks that appear unreadable. But there are probably less vintage hobbyists still playing around with actual floppy disks.

The in-circuit replacement/enhancement CPU is only five paragraphs away; be patient. :)

Unlike doing a FPGA design to make a replacement FDC chip, this approach would be more powerful and more flexible. Basically it seemed like a feasible way for creating a universal floppy disk controller ("uFDC"); simply pull out the old FDC chip and plug the uFDC (a DIP profile SMT pcb stack with socket pins) into the FDC socket and the uFDC would default to running just like the original FDC. BUT new commands and new control registers could be added.

For example, a READ FORMAT command could list the floppy disk's side: track: sector: density, size and sector number, to help ID an unknown format. A ID FORMAT command might use that buffered list to search its internal dictionary of floppy disks formats. Note the uFDC uses its own processor system and the vintage host only sees it as a FDC.

Many possibilities open up with a software flexible uFDC that runs its internal operations on an system independent, modern, multi-core processor. However, this could be done to replace many LSI components on vintage systems, as long as the modern processor could emulate its operation to maintain the original hardware interface timing. Put simply, its fast enough to replace a vintage FDC but its not fast enough to replace a modern CPU core. Its all about timing.

FORWARD:----------------------ITS THE CPU, DUH!
But it follows that this design approach could also replace a vintage CPU within its emulation interface timing capability. And making an in-circuit replacement/enhancement for a vintage CPU would be much more significant than wasting time working on a uFDC!

The CPU socket of a vintage computer system has access to most everything. It could even add a bridge to modern peripherals and information networks, which is difficult when the vintage system only has a serial port, maybe a modem and a set of floppy drives. So I dropped the uFDC and began working on the universal-CPU; but more like a tiny tiny universe of slow, old vintage system processors. I began with my vintage favorite, the Zilog Z80.

FORWARD:----------------------HOW HAS THE PROJECT EVOLVED
I've gone silent on this thread for quite awhile and advantageously let it fall to the second page where I could quietly update it now and then without slamming it back in the way of recent dynamic discussions ands posts. I like it better this way, more like a personal webpage about a project without so much feedback.

This is what has been going on in the seeming silence of months since the my last dated post:

I switched the core from a Parallax P2, to a Raspberry Pi Pico or Pico W. The price was unbeatable and the pcb simplified the upper stack assembly.

The in-circuit replacement/enhancement is a two-part pcb stack assembly. The top-part consists of a Raspberry Pi Pico or Pico W joined to a 'hardware-resource' pcb that includes everything to hardware-emulate any of the targeted vintage CPU's pins, in-circuit... plus some useful components that boost the Pico's resources.

The bottom-part of the assembly is a single, mostly component-bare, circuit board. Its intended to be minimum cost, with just header pins that plug into a vintage system CPU socket and a connector to the hardware-resource pcb. This circuit board's task is only to connect its specific CPU socket pinout, to designated signals available from the hardware-resource connector.

Thus a single uCPU (top-part assembly) could be selectively paired, in turn, with the bottom-part assembly (CPU plug-in) for various CPUs per this example:
(1) I could pair the uCPU with a Z80 plug-in and replace the Z80 in my Xerox 820.
(2) I could later restore the Z80 and then pair the uCPU with a 8086 plug-in and replace the 8086 in my Xerox 820 16/8.
(3) I could later restore the 8086 and then pair the uCPU with a 64180 plug-in (64-pin at 70mil centers) and replace the 64180 in my Micromint SB-180 or my MSC LAT-1.

After thinking the project through, proving its feasibility, figuring out the best pcb stack structure/connectors, solving its voltage translations, designing the hardware-resource board to support a set of vintage CPUs, and then building a rats-nest-of-wiring development prototype to replace the Z80 in a Xerox 820-II... I had to change my hardware-resource pcb design due to long lead-time component availability quotes into 2024. :S

FORWARD:----------------------RECENT ACTIVITIES
Its now redesigned for parts in stock and I'll soon buy a small inventory for development. My vintage Xerox 820-II gets a rest for a while as I disassable my learning-curve-prototype and switch effort to render the new hardware-resource pcb layout, fabricate some boards and build my new, rats-nest-free prototype to tickle the Xerox 820-II some more. :)

As the uCPU could also emulate new-retro or imaginary CPUs, if some Star Trek fan out there has dreamed of building a Klingon CPU with authentic Klingon assembly language syntax... send me the specification and I'll consider it. ;)

FORWARD:----------------------CLOSING THIS FORWARD BEFORE THE ORIGINAL THREAD
The original messages begin below... it shows the early process of revelation, discovery, some initial thoughts and demonstrations of some of the features and options that a uCPU could perform in a vintage system.

A lot of the information has become moot over time, but it does illuminates the process, I'll leave it and maybe edit later to have it converge toward the current project trajectory.

FORWARD ENDS:----------------------IT BEGINS WITH A P2 THOUGHT AS A CHIP TESTER...
------------------------------------------------------------ -----------------------

A PARALLAX P2 PROCESSOR COULD MAKE AN EXCEPTIONAL VINTAGE CHIP TESTER
The P2 processor can update all of its 64 I/O pins in one tick of a test program. Its speed would be capable of testing response-signal timing with precision. Each I/O pin can be configured as input, output or tri-stated and it has many I/O modes of operation for all or most pins. The P2 processor is powered at +3.3Vdc and doesn't directly interface to +5.0Vdc of the vintage 7400 series, but a simple resistive ladder can make a simple and safe interface for all TTL type chips.

WHAT ARE COGS?
The P2 has eight 32-bit cores which they call "COGS" and that terminology will be used herein. Each COG can access all 64 I/O pins and each COG can be instructed to run an internal program from its common memory section. This means that a P2-based chip-tester could be testing several chips, asynchronously, at the same time. The limit to how many test sockets can be running at the same time, is bounded by the 64 I/O pins of the P2, minus somewhere around 8 of those I/O pins that would be reserved for system use. That leaves 56 pins as a fair assumption for now, as the available number of P2 I/O pins that can be connected to a modular test socket board.

40-PIN TEST SOCKET
With 56 I/O pins available, a 40-pin socket like the Z80, would require (40-2) or 38 I/O pins to control and monitor all of the Z80's signals other than VCC and GND. The 40-pin test socket would have 38 P2 I/O signals hard-wired to it, perhaps P2 pins 01:38. It would have (56-38) or 18 pins left over when using a 40-pin test socket with two VCC/GND pins.

Four 16-PIN TEST SOCKETS
If the tester supported 16-pin chips, that would be (56 / (16-2) ) or four 16-pin test sockets. With four sockets, the P2-based chip-tester could allocate four COGS to individually test the four sockets independently and asynchronously.

COGS TESTING INDEPENDENTLY
COG #1 might use 14 I/O pins (P2 pins 01:14) hard-wired to socket #1 to test the chip placed in that socket.
Likewise, COG #2 might use 14 I/O pins (P2 pins 15:28) hard-wired to socket #2 to test the chip placed in that socket.
Completing the series, COG #3 would use P2 pins 29:42 hard-wired to socket #3 and COG #4 would use P2 pins 43:56 hard-wired to socket #4.

TEST FOUR SAME OR DIFFERENT CHIPS CONCURRENTLY
The interesting part about the COG example above is that each COG (a core processor among eight) could be running a different test on their hard-wired test socket. Three examples: (1) All tests could be started at the same time, with all chips being the same, though all tests being run on a different COG, or (2) Each test could be started asynchronously by pressing a button next to the socket, after a chip is loaded, creating a staggered-test running capability or (3) The user could test a different chip type in each socket with the associated COG-to-socket running the selected test for its socket. The Human-Machine-Interface (HMI) program in the P2 processor would be written to allow the user to make these configuration changes. The chip test library should be in a micro-SD card that the P2 can easily access.

CHIP TESTER FOR DIFFERENT USES
This gives the P2-based chip-tester more flexibility to operate in various ways that the user might need at different times. Example (1 and 2) above would be great for testing a bunch of DRams all at once; running the same test in all four sockets and the DRams all get tested in one-quarter of the time. Example (3) would allow the user to test a lot of different chips from a fully socketed board very methodically. Grab four 16-pin chips and put them in the test sockets, then assign the correct chip test to each socket via the HMI program, then start all tests. When the tests are all done, replace the chips that passed all the tests back into the fully socketed board, and then pull four other 16-pin chips for testing, reconfiguring the chip test for each socket via the HMI before starting all the tests together.

IS THE TESTED CHIP FAST ENOUGH FOR A DESIGN?
The P2 processor also can run very fast internally so it could measure response time of signals that it triggers. If a forty-plus year old chip has slowed down, the P2 can identify them by measuring the response times and listing them. If the HMI program allows the user to set a response-time threshold for passing the test, that would allow the user an easy answer to this question without pulling out additional measurement tools.

These would be nice features for chip-tester supporting vintage computer restoration work.

MODULAR TEST SOCKET EXAMPLES
The diagram below illustrates some examples of modular test socket boards that could connect to the P2 chip tester board and use its estimated 56 I/O pins to connect to its test sockets. Notice that no P2 I/O pin is shared with any other test socket pin.

Use the HMI program to select a test program compatible with chip(s) to be tested. A library of test programs should be stored on the micro-SD that the P2 uses. The test programs should be written so that they use a test socket module configurations file to adapt its I/O pins for running the tests. That would allow someone to make their own module, create a configuration file for it, and be able to get all existing test program to use it.

The best cost-modularity rule would be to put all components, resistive-ladders and buttons/LEDS on the chip tester board so they can be used with any test socket module. It would be more inexpensive if the various test socket modules had no components other than a connector and test sockets to minimum costs.

It would be wise to have the P2 chip tester board run the extra signals for LEDs and buttons to the connector so that a test socket module designer could add a set of LEDs and buttons next to each socket. Economy mode without LED+buttons next to the sockets but let the user add them, or sell them already installed.
/forum/index.php?t=getfile&id=2719&private=0

[Updated on: Sun, 01 October 2023 18:19]

Report message to a moderator

Re: P2 based Chip Tester [message #9656 is a reply to message #9650] Fri, 21 January 2022 15:37 Go to previous messageGo to next message
cluso99 is currently offline  cluso99
Messages: 40
Registered: June 2017
Member
This is my P2 setup. I have been testing the Z80 zexall test instructions on a real Z84C0020 for validating my P2 Z80CPM Emulation.
The Z84C00 is on a protoboard and plugs into my P2 RetroBlade2 underneath.

A similar setup could be used to test other chips. I have a W65C02 waiting for similar treatment.

BTW P2 runs at 3V3 (and 1V8 for the cores). It is not 5V tolerant so you have to implement a compatible interface. I use a resistive divider.

/forum/index.php?t=getfile&id=2627&private=0
/forum/index.php?t=getfile&id=2628&private=0

[Updated on: Fri, 21 January 2022 15:41]

Report message to a moderator

Re: Project (1): P2 based Chip Tester [message #9657 is a reply to message #9656] Fri, 21 January 2022 20:01 Go to previous messageGo to next message
plasmo is currently offline  plasmo
Messages: 916
Registered: March 2017
Location: New Mexico, USA
Senior Member
The CMOS Z80 actually works down to about 3.3V at 7.37MHz so your setup should work without resistor network. Similarly WDC's W65C02 works down to 3.3V as well. WDC datasheet shows W65C02 can run at 14 MHz at 3.3V, but it is pretty well know it can run 20MHz at 3.3V
Bill

[Updated on: Thu, 31 March 2022 22:04] by Moderator

Report message to a moderator

Re: P2 based Chip Tester [message #9658 is a reply to message #9657] Sat, 22 January 2022 20:32 Go to previous messageGo to next message
cluso99 is currently offline  cluso99
Messages: 40
Registered: June 2017
Member
Thanks Bill.
Since the specs didn't allow for anything below 4.5V I wasn't game to try it out because I was using this to test the Z80 instructions (modified zexall tests) for comparison with my P2 Z80 emulator.
BTW I pass all the tests Smile

[Updated on: Sat, 22 January 2022 20:33]

Report message to a moderator

P2 Project: (1) Chip Tester 02: VCC/GND Pin Variants [message #9663 is a reply to message #9656] Sun, 23 January 2022 14:24 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
CHIP TESTER: DEALING WITH VCC/GND PIN VARIANTS:

SURVEY OF VCC AND GND PIN NUMBERS IN THE 7400/5400 TTL SERIES
I'm collecting some of the VCC.GND pin numbers on the 7400/5400 series. Listed below is my first scan; later I'll tabulate how common these patterns are, compared to all the checks scanned.

This might be useful to decide when to add a convenient VCC.GND personality insert to change the VCC and GND pins. Conceptually, I'm thinking about using a square card-edge board to offer a component-side-to-solder-side translation of pins times four sides of the square shaped board.

The final rendering will be based on small size, high density, low cost connectors. Some of the DIMM type connectors are real cheap and pack a lot of pins in a small connector, and by milling the board you could cancel the normal notch that assures proper installation, to get a board that could be spun and inserted two ways. With a top and bottom board edge connection, you could implement four translations on one card (using only one half-end of the DIMM). This isn't critical for now. I'll review various connector tech for an easy solution.

Its possible to use the cheap 2xN header and make a few translation pcbs that could be soldered on a connector that inserts into the header. Advantage here is that you could always wire-wrap the connections the one-time you really want to bother to test a batch of old old, strange, 7400 series chips.

But once the data is available for how common these patterns are and are not, it will be easier to know which to offer and which to punt-on.

The 14-pin and 16-pin chips have two common VCC-GND patterns. It may be that the rest of the patterns are just too obscure to worry about.

From my 1981, "The TTL Data Book for Design Engineers," Second Edition:
LEGEND        VCC.GND               Digital VCC.GND only
14 pin chips:  14.07, 04.11, 05.10, VCO: 09.07        
16 pin chips:  16.08, 05.12, 05.13, VCO: 16.01, 16.09
20 pin chips:  20.10
24 pin chips:  24.12, 24.07, 24.11+13
28 pin chips:  28.14
48 pin chips:  12.36
I've got half-way through my 1988 "Texas Instruments, "TTL Logic :: Standard, Schottky, Low-Powrt Schottky" book. I'll post the latest culmination table in a few days.

I was thinking about the "VCC/GND PIN VARIANTS" and will explore via tabular spreadsheets if the 'personality card' can be located on the main P2 Chip Tester board instead of the various test socket modules. This would be a cheaper solution (less connectors etc). The idea is that the P2 I/O pins that flow to the test socket modules could be signal-position-translated before it left the Chip Tester board. The target 'trick' is to find a way to to convert all test socket modules (downstream) by organizing the way they'd tap into the downstream signals.

This may rule out packing a socket board with wildly different pin sized sockets, but the advantages, if possible, would be worth that limitation. It seems possible, that a mixture of near pin-count chips would still work as long as the less pin-count chips used the rule of the higher pin-count chip.

I'll analyse this over the range of possible test socket modules and see if the compromises are worth the trouble.

Most of the VCC/GND positions take on three configurations {C1,C2,C3} and appear mostly in the early vintage chip era:
(C1) "VCC end-crossed GND" such as a 16 pin chip with VCC on pin 16, and GND on pin 8.
(C2) "VCC middle-across GND" such as a 16 pin chip with VCC on pin 5 and GND on pin 12.
(C3) "VCC middle-offset GND" such as a 16 pin chip with VCC on pin 5 and GND on pin 13.
These three configurations appear to cover most chips with the exceptions of voltage controlled oscillators which also have analog VCC and GND.
Other variants are likely to be obscure chips that would need another pin translation to support.

If it can be done, that means the P2 Chip Tester would have a C1 socket personality card installed that would put all test sockets in C1 configuration. With a C2 socket personality care installed, all test sockets would have the C2 equivalent for its socket size. If you want to support every possibility for a test socket module, it should have a header so that VCC and GND and their displaced pair of P2 I/O signals can be swapped to power a variant chip.

That's all for now. I'll update this message when there is more information on this sub-topic.



Discussion:
Cluso99 wrote:
"...This is my P2 setup. I have been testing the Z80 zexall test instructions on a real Z84C0020 for validating my P2 Z80CPM Emulation. The Z84C00 is on a protoboard and plugs into my P2 RetroBlade2 underneath..."

jayindallas writes:
Your Z80 comparison board for the P2 RetroBlade is a GREAT IDEA!
Which implementation of the Z80 flag behavior did you use?
1). strictly as documented by the Z80 Programming Manual, or
2). including 'undocumented' instructions and additional flag behavior.

[Updated on: Fri, 22 April 2022 14:23]

Report message to a moderator

Re: P2 based Chip Tester [message #9673 is a reply to message #9663] Mon, 24 January 2022 14:57 Go to previous messageGo to next message
cluso99 is currently offline  cluso99
Messages: 40
Registered: June 2017
Member
I am just using the documented flags ie I mask with $D7. However my code can inspect the undocumented ones and then test for them.

The only active part of my Z84C00 setup is a PN2222A transistor for the reset circuit. I didn't try a direct setup. The transistor setup biases the transistor to hold the Z84C00 in reset when not being driven by the P2.

I've looked at making a pcb for both the Z84C00 and the W65C02 to plug into my P2 RetroBlade2.

The RetroBlade2 can drive a VGA (or LCD) screen and USB Keyboard (and mouse) as well as serial or USB to PC, and of course the uSD card. I wrote the boot code for the SD that is in the P2 ROM so we don't even need the Flash chip for the P2.
Project (2): In-Circuit Z80 Emulator [message #9687 is a reply to message #9673] Thu, 27 January 2022 14:42 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
STUDY OF THE Z80 BUS CYCLES AND DMA
This message * originally * proposed hooking up a Z80 to a Parallax P2 to capture bus cycle characteristics, something that Cluso99 has already done for one of his projects. I too have an interest in doing that for getting bus cycle information on some useful "undocumented Z80 instructions" and to better understand how the Z80 micro-code fetches and executes instructions.

This led to my second project in this series, a in-circuit Z80 emulator based on a Parallax P2 processor. That design is introduced several messages later (down the page).

It turned out that I didn't need to hook up any circuitry to get the information I needed on Z80 bus cycles, the following documentation and an analysis of the Z80 instruction's T-state timing revealed most everything necessary in regard to the in-circuit Z80 emulator design and firmware. The result of that is tabulated in a later message discussing the removal of "idle T-states" in regard to the vintage system bus. That said, I still will hook up a Z80 to the Parallax P2 to document the bus cycles completely, included the mentioned "undoc" Z80 instructions. I'm going to add a few snap-off printed circuit boards so I can fully document the bus cycles and T-states for several vintage processors.

For my initial bus cycle analysis, I just need to account for all the T-states and the number of bus cycles. However, there is some ambiguity to which idle T-states are assigned to which bus cycles. This isn't important to the in-circuit emulation of a Z80, but it does reveal interesting information about the built-in instruction micro-code executing the instructions in the Z80. I found a few instructions where the inaccessible micro-code could be executing the instruction various ways, not that the in-circuit Z80 emulator needs this information, it would be information useful when searching for 'undoc' instructions. The complete mapping of all instruction bus cycles and T-states would add clarity.

I used these three documents for most of the information on the Z80 series:
(1) Zilog's "Z80-CPU, Z80A-CPU, Technical Manual" Copyright 1977
(2) Ampro's "Little Board User's Manual" internal Zilog "Z8400 Z80 CPU, Product Specification" authorized reprint 1983
(3) Mostek's "Z80 Programming Manual V2.0" Copyright 1978

Comments:
Document (1) is the classic Z80 manual.
Document (2) had updated information on the Z80,Z80A,Z80B and Z80H and a better section on bus diagrams and timing.
Document (3) has instruction M cycle sequences in T-states. While useful for confirmation, its unfortunately full of many careless mistakes.

Additionally, Document (3) list bus cycle by the total T-states, and then the sequence of T-states per bus cycle.
Example:
                                ;Operation:     ;(nn+1) <- IY.hi, (nn) <-IY.lo
ED4F    LD (nn),IY              ;M CYCLES: 6    ;T-states: 20(4,4,3,3,3,3)
Its reasonable to assume that the "Operation" description is poorly written as it suggests IY.hi is written before IY.lo. Loading IY.lo first is more efficient, since nn is loaded from operands, and no address adjustment is required. Loading IY.hi next would just require an increment on the address value first.

This distinction wasn't necessary for vintage age firmware coders, but the P2 can methodically document all the bus cycles and T-states for every instruction and remove some of the ambiguity.


NEWLY DISCOVERED ERRORS:
Document (1) *incorrectly* adds RFSH# to the twenty eight Z80 output signals that go into high impedance during external DMA transfers. Figure 4.0-4 titled "BUS REQUEST/ACKNOWLEDGE CYCLE" appends "RFSH#" to the four output controls in the last "floating" bus signal drawing. That error is in conflict with itself in Section 3.0 where the RFSH# pin in not described as Tri-State capable like the other twenty eight pins that go into high-impedance during DMA transfers. The 1983 Zilog reprint by Ampro, document (2), has that error removed from the figure.
. . . I found this error while confirming my in-circuit Z80 emulator circuit design is now capable of either tri-stating the twenty eight Z80 output pins that are used by external DMA devices, or turning those twenty eight pins into inputs so the P2 can capture signal activity during the DMA transfers to help diagnose DMA problems. The HMI program will allow DMA.SPY mode to be enabled.

[Updated on: Wed, 25 May 2022 12:20]

Report message to a moderator

Re: My P2 Learning Exercise [message #9688 is a reply to message #9687] Thu, 27 January 2022 15:01 Go to previous messageGo to next message
cluso99 is currently offline  cluso99
Messages: 40
Registered: June 2017
Member
@jayindallas

What you are proposing is what I've done. I send every clock to the Z80 and decode the M1, RD, WR, MREQ & IOREQ signals and display them with the data and address pins and the Tn state decoder.
It's interesting to watch the instructions executing, but it's terribly slow displaying every clock cycle (both halves). By using a CMOS Z80 you can stop the clock when needed.
Re: P2 based Chip Tester [message #9690 is a reply to message #9650] Fri, 28 January 2022 04:51 Go to previous messageGo to next message
lynchaj is currently offline  lynchaj
Messages: 1080
Registered: June 2016
Senior Member
jayindallas wrote on Thu, 20 January 2022 20:20

A Parallax P2 protoboard could make an exceptional chip tester...
...Of course the application program communicating with the P2 board would have to make all the assumed mam-machine interfacing for selection of tests and results.
Hi
I find this thread confusing. It sounds like a great idea but have you or are you designing this chip tester? Are you asking to start a group project? Is there a notional design or prototype? Then the thread seems to change to Z80 emulation/testing. Is that what you meant or did the purpose of the thread get changed? I have read this several times and it is very confusing to me.

Thanks, Andrew Lynch


[Updated on: Thu, 31 March 2022 21:42] by Moderator

Report message to a moderator

Discussion [message #9693 is a reply to message #9688] Fri, 28 January 2022 13:19 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
Cluso99 wrote:
"...What you are proposing is what I've done..."

jayindallas writes:
I realize you've done that already. Even so, it sounds like a interesting project to do while learning to write code for the P2. The advantage of working through a project oneself, as opposed to reading a report on how to do it, is that one gains deeper knowledge that triggers revelation of its permutations of possibilities in related applications. That's a valuable experience when one has the time.


Lynchaj wrote:
"...It sounds like a great idea but have you or are you designing this chip tester? Are you asking to start a group project? Is there a notional design or prototype?..."

jayindallas writes:
I thought about using the P2 for a chip tester and posted an outline for anyone that might want to do something like that. My need for a chip-tester diminished when I bought a Parallax "P2 Edge Module Breadboard". I can do a quick chip test anytime I want to on that P2 with Breadboard. As I write more chips tests, I'll be better prepared to do a chip-tester based on the P2 later on.
/forum/index.php?t=getfile&id=2722&private=0
Right now I'm working on the spin-off design, an in-circuit Z80 emulator. It has turned out to be a lot more interesting and developing a prototype would allow me to efficiently test it while restoring my vintage computer systems. If I design and program it right, I won't have to drag out a lot of test equipment and hook it all up. I'd rather ask the in-circuit Z80 emulator to just tell me what the problems are. :)

[Updated on: Sat, 23 April 2022 09:34]

Report message to a moderator

Re: My P2 Learning Exercise [message #9695 is a reply to message #9693] Fri, 28 January 2022 18:02 Go to previous messageGo to next message
cluso99 is currently offline  cluso99
Messages: 40
Registered: June 2017
Member
@jayindallas

Sorry if you took my comment the wrong way. Yes, you will learn a lot. The P2 is a very powerful chip. You can safely ignore the more complex sections of the chip until you need them.

BTW I just used spin2 to do my testing. It's much slower although it's compiled using Eric's fastspin compiler. I didn't need the speed to do it in PASM. I am not sure if I will ever do it in PASM as I currently don't have the need for that at this time.
Discussions [message #9696 is a reply to message #9695] Fri, 28 January 2022 19:37 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
Cluso99 wrote:
"...The P2 is a very powerful chip. You can safely ignore the more complex sections of the chip until you need them..."

jayindallas writes:
I agree, but those more complex sections are what fascinates me about the P2 and keeps my mind thinking of new ways I can use them. I'm on a long path to get to those complex sections.


Cluso99 wrote:
"...I just used spin2 to do my testing...I didn't need the speed to do it in PASM..."

jayindallas writes:
Exactly! You used the right tool because you're doing a one-time construct to collect information. Once you have the information, the construct has less utility. And as there is no real-time critical necessity for high speed operation, it would be a waste of time writing it in PASM. I like that Spin2 allows you to insert assembly language blocks inline where needed.

[Updated on: Fri, 22 April 2022 14:30]

Report message to a moderator

Re: The P2 manipulating a Z80 [message #9702 is a reply to message #9696] Sat, 29 January 2022 17:27 Go to previous messageGo to next message
cluso99 is currently offline  cluso99
Messages: 40
Registered: June 2017
Member
@jayindallas

"I agree, but those more complex sections are what fascinates me about the P2 and keeps my mind thinking of new ways I can use them. I'm on a long path to get to those complex sections."

There are so many complex sections as well as special instructions that reduce the instructions needed in specific instances. There is a CRC calculator (bit and nibble) that is extremely useful for comms, cordic, and so many others. It is so hard to wrap your head around what is in this chip! I lived through the whole design and many of us on the Parallax Forum had input to the design of the P2. There is even a Monitor and also a Forth language inbuilt in the 16KB ROM.

I haven't used inline PASM (yet) as I am quite at home with PASM. As always, it's horses for courses Smile
Project (2): In-Circuit Z80 Emulator :: Description [message #9703 is a reply to message #9702] Sun, 30 January 2022 10:45 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
A P2-BASED IN-CIRCUIT Z80 EMULATOR FOR VINTAGE SYSTEMS?
Here's a P2 hobbyist product idea for vintage Z80 system owners:
GIVEN:
(1) A P2 can virtually emulate a Z80.
(2) A P2 is fast enough to also run an in-circuit Z80 emulation (Z80 replacement).

CONSIDER:
Put (1) and (2), above, together and you could create a Z80 replacement that could bridge to modern resources and diagnose system problems:

(boring concept foundation first)

(01) It could replace a socketed Z80 and run it at the same system speed, i.e. an in-circuit emulation Z80.
(02) Packaged as a double-stack pcb on a 40pin DIP base; a plug-in P2 replacement of the original Z80.

(now the interesting advantages part)

(03) Add modern portals on the P2 40pin DIP pcb stack to connect to modern data resources.
(04) Add local resources like microSD, Hyper-Ram onboard to expand the system's available files capacity and speed.
(05) Add access to remote resources by adding a micro USB which could be cabled out of the vintage enclosure easily.
(06) Once outside the enclosure, the USB could bridge to all sorts of communications interfaces to access modern resources.
(07) Modern portal code runs in the P2 with some interface for the Z80 to access those functions.
(08) The Z80-replacement accesses modern resources without exceeding its designed 64KB limitations.
(09) The vintage system uses modem-like software (Modem7/MEX etc) to access the P2-based drivers.
(10) * The P2 captures DMA signal activity to help diagnose DMA problems while the Z80 sleeps.
(11) * Add a virtual memory management unit (MMU) so it can access 1MB like the 64180/Z180. The additional memory will be within the P2's memory resources.
(12) * Add a virtual DMA controller (DMAC) like the 64180/Z180 that can access the vintage systems 64KB memory/ports and additional P2 memory resources.
(13) * Perhaps a several virtual serial communication interfaces patterned after the 64180/Z180 that would be used as USB portals between the P2 and the external USB resources.

* UPDATES
* The in-circuit emulator will capture DMA signal activity and timing:
A circuit design change allows the P2 to capture DMA activity to help diagnose any DMA problems without additional test equipment.
* Add Virtual MMU to the in-circuit Z80 emulator:
The P2 will add a MMU within its Z80 emulation. The MMU will treat memory beyond 64KB as memory that resides within P2 resources external to the vintage system. The MMU needs to be compatible with vintage systems that used some bank flipping, such as the Osborne and others. Those banks will probably just be considered as bank layers of the 64KB memory, as they were designed. Those banks will not be used by the MMU. MMU I/O ports will be configurable to avoid conflict.
* Add Virtual DMAC to the in-circuit Z80 emulator:
The P2 will add a DMA controller patterned after the 64180/Z180 without any additional vintage system signals. The vintage memory can be a source or destination for DMA as well as the P2 memory resources. DMAC I/O ports will be configurable to avoid conflict.
* Add Virtual Serial Communications Interfaces to the in-circuit Z80 emulator:
Patterned after the 64180/Z180 the configurable ports would be used to establish USB portals outside the vintage system.

[Updated on: Sun, 24 April 2022 12:36]

Report message to a moderator

Project (2): Thoughts on a P2, Z80-enhancement-module [message #9741 is a reply to message #9703] Sun, 06 February 2022 14:19 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
In the previous message, a Z80 enhancement module was suggested that included emulating a Z80.

This message takes an alternate approach and considers the advantage of offering the enhancements mentioned, packaged on a socket-pcb-Z80_socket stack.

PACKAGING AN ENHANCEMENT MODULE FOR A Z80 HOST SYSTEM:
=====================================================
I like the idea of adding a rotating media replacement/augment as a Z80-local enhancement-module.

This would give a vintage system operator the option, at any time, to use their original media/drives and/or a ramdisk/romdisk/SD_card/USB resource instead. This could extend the life of both the rotating media and its drive. It could also make it easier to archive media before they are unrecoverable.

An enhancement module that fits on socket-pcb-Z80_socket assembly could reasonably feature:
(1) a hyper-ram chip used as a ram-disk(s) using its byte-wide I/O transfer (eleven signals)
(2) a hyper-rom chip used as a rom-disk(s) using its byte-wide I/O transfer (one additional signal for CHIP ENABLE)
(3) a micro SD card inserted into a SD socket on the module pcb for vast storage capacity.
(4) a micro USB connector and cable to an external USB module to external system access.
This SMT circuitry could fit on a pcb sized to fit between a Z80 and its host system socket.

One could argue that (1,2,4) would suffice as the external USB module could interface with USB memory sticks, SD cards, or any other storage/archive portals. The hyper-rom would not give as much storage as a SD card, which is a reason to add it. A hyper-ram/rom combination requires just 12 signals and transfers byte-wide in auto-incrementing access.

A better solution would be to avoid the Z80 emulation and just make an Z80 system enhancement modules that connects via:
(1) a low-profile socket-pcb-module that inserts between the Z80 microprocessor and its system socket *OR*
(2) a low-profile clip-on pcb-module that clips onto the Z80 DIP package.
Advantages and Disadvantages include:
(1) socket-pcb-module:
. (A) advantage: more likely to fit into the original system because it raises the Z80 minimally.
. (B) advantage: the enhancement-module could hide some Z80 signal activity from original system.
. (C) advantage: hidden activity could simplify the interfacing and make it less system dependent.
. (D) disadvantage: could require the module to maintain RFSH# activity or simply use short activity bursts.
(2) clip-pcb-module:
. (A) disadvantage: would require a high cavity above the Z80 within the system enclosure, for the clip-on assembly.
. (B) disadvantage: could not hide any Z80 signal activity from the host system.
. (C) advantage: allows the Z80 to maintain RFSH# if the module avoids long BUSREQ# activity.
. (D) disadvantage: requires the signal wakeup of the module to avoid any host system reserved memory map addresses.

(1) would be a better solution where the module is intended to remain attached to the system and used whenever needed. However it requires a difficult installation for someone without technical skills.

(2) would be a better solution where the module is intended to be used temporarily to get files transferred between the host system's media/storage resources and any more modern system via a USB transfer.

Both serve a vintage system purpose.

At this point, it would likely be better to pursue a design as (1) socket-pcb-module but avoid hiding any Z80 signals from the host system. This would simplify the module pcb as no signal switching would be required to hide signals. But it means the interface to the module becomes more challenging (and interesting) as it has to be flexible to work with any host system.

Next I'll ponder the interface from the host system to activate the enhancement module and describe how it might interface with the host system...

[Updated on: Wed, 06 April 2022 10:03]

Report message to a moderator

Re: P2 based Chip Tester [message #9742 is a reply to message #9650] Mon, 07 February 2022 12:12 Go to previous messageGo to next message
cluso99 is currently offline  cluso99
Messages: 40
Registered: June 2017
Member
My P2 RetroBlade2 already has...
1. USB to PC via generic USB CP2102 or my CH340 plugin modules.
2. MicroSD socket and software supporting FAT32
3. 2x microUSB sockets and soft USB to support USB Keyboards and mouse via generic microUSB to USB socket short cable
4. VGA connector supporting up to 1920x1068 color (only limited by P2 internal memory or use external add on memory)
5. Supports LCD 480x320 (80x40 lines text supported by software) SPI module
6. Unused pins used to connect to real Z80 on protoboard or use internal P2 Z80 emulation.

Some of the above are separate software and needs integration to work together.

My RetroBlade2 runs Z80 emulation running CPM2.2 using 8*8 CPM HDDs on the microSD as FAT32 files with support to transfer files between CPM and FAT32.

It would not be difficult to substitute the Z80 emulation for using the real Z80 daughterboard. This could easily be made to then plug into a Z80 socket on a retro Z80 board or system.
Re: P2 based Chip Tester [message #9743 is a reply to message #9650] Tue, 08 February 2022 08:11 Go to previous messageGo to next message
lynchaj is currently offline  lynchaj
Messages: 1080
Registered: June 2016
Senior Member
Hi
Do you have a schematic for the P2 chip tester? Do you want help making a PCB?
Re: P2 based Chip Tester [message #9746 is a reply to message #9743] Tue, 08 February 2022 16:58 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
lynchaj wrote:
"Do you have a schematic for the P2 chip tester? Do you want help making a PCB?"

Thanks, but I'm not focused on the tester at the moment. When I do one pcb in low count volume, I combine several projects on the same pcb in snap-off sections. All projects done for the price of one pcb at the total pin cost. Big savings, less headaches.

UPDATE: 2022/02/11
A SCHEMATIC FOR THE P2 CHIP TESTER:
==================================
The chip tester is a reasonably simple design so I'll try to post a description in the next few days.
I'll use the design that:
(1) uses an P2 Edge card, that plugs into a connector so the card can be used for other projects too.
(2) the 'SETS' of tester modules will be the plug-in type, instead of the board-board approach.
(2+) that minimizes the test operator errors that could occur when chips are inserted to the wrong socket.
ADDITIONAL TASKS:
(3) I'll try to make time to do at least one ascii diagram/schematic so the basics are clear.
(3+) with that, any other other 'SETS' can be guided by the diagram/schematic.
(4) I'll post an image or a url link to the P2 Edge card with the posting.

=====================================================
PCB SAVINGS...USING SNAP-OFF BOARDS ON A COMBINED PCB
=====================================================
. . I learned that trick when I desgined a data collection instrument that required three boards (1) displays/buttons/ID-key (2) processor/power-supply and (3) I2C RTC/Memory/Temperature for storing datalogs.
. . PCB design contractor said, "that's 3 boards so that's 3x the $2,000 design fee or $6,000" (circa 1998).
. . I said, "NO."
. . Instead I bought 'Pads' PowerLogic/PowerPCB CAD software and designed it myself (1998). I talked to several board fab houses in DFW to find the best snap-off method so that the boards were stiff enough to go through thru-hole component insert and wave-solder and then SMT pick&place then IR oven.
The best answer was 50mil non-plated drill holes spaced apart at 65mil centers.
. . That made a very stiff pcb, and the flex-cables connecting them were added before wave-solder. The whole board was completely tested before snapping it apart and mounting it as an assembly. I even added two more small snap-off boards for serial communication options. :)

UPDATE: 2022/02/14
PCB EXAMPLE OF THE DRILLED LINE SNAP-OFF BOARDS:
==================================
Below is a de-rez photo example of the snap-off board use. The horizontal GREEN lines are just BELOW the 4 sets of snap-lines that separate the 3 main boards. The 2 GREEN vertical lines are just to the left of the snap-lines that separate the two option boards.

The two main boards at the top are roughly 5" x 2.5". Note that the snap lines do not run the whole length of the line, the rest is milled out.
/forum/index.php?t=getfile&id=2656&private=0
============================================================

Now back to the subject of this message...

So... right now I'm pondering a 40pin DIP Z80 replacement pcb assembly that can be controlled over USB from a Raspberry Pi 4 running a Linux C++ application to EITHER (1) communicate with a Z80-emulating-P2 (something Cluso99 has pioneered) OR (2) as a P2 in a socket-P2-Z80 layer sandwich that cuts in via BUSREQ# to do what the RPi4 operator wants to do (test/take-over FDC chips etc).

I thought about (2) last night during a movie and was going to use 50mil headers to bridge two sandwiched boards, inlaying the headers inside the DIP holes at each end of the DIP, 3 rows of 7 =21 at each end of the 40pin DIP 42 total giving two +5V and two GND pass throughs. That left the center rectangle area for smt components and SD socket and micro USB connector.

However the assembly would be hard to assemble and when I looked at the Parallax page this morning, I saw a 40pin DIP board (P1 I think) and that looked like it was a lot easier, so... it looks like the better choice is... (1) make the P2 emulate the Z80 also. More P2 programming, but I have a great data-structure driven Z80 emulator written in Linux G++, which is C++, that would be perfect for the P2 lookup table LUT-RAM. And once the emulation works, there is no need to do BUSREQ# as the USB link could control the EMU-Z80 under P2 programming.

When I get that EMU-Z80 in a DIP dual PCB sandwich board layout, I can THEN add some TTL DIP tester circuits and a P2-EDGE connector to keep the P2 cost down on the test circuits. In other words, you can have on P2-Edge card and plug it into your tester board when you want to run it, and when not, unplug the P2 Edge card and use it elsewhere.

[Updated on: Sun, 01 October 2023 18:17]

Report message to a moderator

Project (2): In-Circuit Z80 Emulator :: Discussion [message #9747 is a reply to message #9742] Tue, 08 February 2022 17:44 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
Cluso99 wrote:
"...My P2 RetroBlade2 already has...1:6..."

Yeah, your RetroBlade2 is a great system. I'm familiar with the SD card in FAT-32 supporting several drives. I think WMS used that on his eZ80 design and likely Z80/Z180 versions too. One microSD socket on the in-circuit Z80 emulator would suffice for most storage for a vintage system. I also saw Eric's P2 lecture on FLEX last night. He covered the subject of Virtual File System ("_vfs_...") so there is no reason to re-invent that wheel.

I see the in-circuit Z80 emulator as a tool to keep vintage systems in good repair without wearing them out. Choose DEMO MODE when you want to demonstrate the full original system. Otherwise use the P2 resources to bridge to the modern data-stores and let your drives and floppy disks rest.

Cluso99 wrote:
"...It would not be difficult to substitute the Z80 emulation for using the real Z80 daughter-board. This could easily be made to then plug into a Z80 socket on a retro Z80 board or system..."

I think you were ahead of me realizing that. As I stated in the prior message I wrote, I saw a easy 40pin DIP pcb on Parallax's website this morning and I had my "AHAH!" moment. Dropping the physical Z80 simplifies things.

I'm going to convert my Linux Z80 emulator code into a P2 version and ADD the necessary bus-cycle sequencing so the pins behave exactly like a Z80. I'll also add the USB interface for remote transfers or testing etc. Its a less intrusive way to test a vintage system. When it finds component problems, then you know its time to open up the enclosure and diagnose/replace it.

To me the in-circuit Z80 emulator is viable *IF* it helps me diagnose and restore my Z80 Systems and SBCs more efficiently. The idea is less appealing if it merely allows you to run a P2 Emulator in a fragile/declining vintage system. It should be a modern equivalent of the MITS/IMSAI front panel, only now it just takes a USB communications cable.


UPDATE:
If the P2 would be better powered by a power-adapter or USB cable, that could allow the P2 to power up and test the +5V signal before attempting to control the vintage system. The higher +5 voltage could be resistively divided to measure within the P2 VCC/GND range.

[Updated on: Sat, 23 April 2022 10:28]

Report message to a moderator

Re: P2 based Chip Tester [message #9749 is a reply to message #9747] Tue, 08 February 2022 22:57 Go to previous messageGo to next message
cluso99 is currently offline  cluso99
Messages: 40
Registered: June 2017
Member
Just so you are aware, you will not fit a P2 in a 0.6" wide pcb because the P2 itself is almost 0.6" wide. You will also need to allow for heat dissipation too as the P2 can get quite hot if there is a lack of copper.

"To me this 40pin DIP Z80-EMU is viable *IF* it helps me diagnose and restore my Z80 Systems and SBCs more efficiently. The idea is less appealing if in merely allows you to run a P2 Emulator in a fragile/declining Z80 vintage system. I kind of look at it as the modern equivalent of the MITS/IMSAI front panel, only now it just takes a USB communications cable. Smile"

You can certainly run code in the P2 to drive the bus while holding the Z80 in reset. This would make a good debugger because you could see what is happening on the bus and you could drive the various devices on the bus. If they are static devices then you can slow (or stop) the bus right down to report execution as it runs.

"I didn't buy a home PC until Christmas 1988 because I had to have one for grad school starting in Summer of 1989."

I am a little older (70) than you. I finished my Electronics & Communications in 1971 and after some design work went to work on mini-computers in 1974. I bought a Motorola 6800 D1 and then D2 kits as soon as they were released. I bought my first real computer in 1977 - it was an 18 month old Singer System Ten mini-computer - and it was the length of my garage - true because that is where I housed it, complete with air conditions and special power. A 10MB HDD was the size of a washing machine and cost $16,000 new in 1976. I had 3 on my mini. The microSDs make a mockery of these old disk drives!

[Updated on: Tue, 08 February 2022 22:58]

Report message to a moderator

Project (2): In-Circuit Z80 Emulator :: DIP PCB Stack [message #9750 is a reply to message #9749] Wed, 09 February 2022 17:44 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
The LOWER PCB:
The lower pcb is rectangular, approximating a 40pin DIP profile. It forms a 40pin DIP base that can be inserted into a Z80 socket. The outline keeps the pins visible for positioning them into the Z80 socket.

The main purpose of the lower pcb is to geometrically translate the 40-pin Z80 signals into SMT buffers/transceivers that translate the P2's 3.3V signals into 5V-safe vintage TTL with fan-out driving capacity, and route the P2 side of those chips, to the upper pcb. As the P2 requires as slightly wider pcb, the traces route to four areas of the pcb where they connect to headers going to the upper pcb. Those four areas are referred to as the North-West, South-West, South-East and North-East. At the North end, between the DIP rows, 50mil headers totaling 22 pins are used to connect the signals to the upper pcb. The South end is a reflection of this.

To tighten up the layout on the upper pcb, the P2 (diagonally oriented) and the two sets of end header-matrices have to shifted North. This give more contiguous space on the South end of the lower and upper pcbs. The diagram below doesn't show it but there will likely be additional headers under the P2 pad on the upper pcb, to allow thermal flow away from the P2.

In the diagram below these symbols are relevant to the lower pcb:
"+" marks the Z80 pin positions into which that the assembly inserts.
":" marks the 50mil 2x3 and 2x5 headers which connect signals to the upper PCB.

The UPPER PCB:
NOTE that there are no 40pin DIP obstacles on the upper PCB. The diagram's "+" mark the Z80 pins on the lower pcb only. The "o" mark the edges of the P2 on the upper pcb, top surface.

The upper PCB looks like an aircraft carrier and it has flanges out about 250mils to fit the P2 at a diagonal. The WEST flange would normally be symmetric with the EAST flange, but its extended SOUTH to give more room for the axis of the microUSB connector and cable. This may change. The diagonally oriented P2 supports the Z80 signals as they were quartered on the lower pcb.

"u" marks the axis of the microUSB connector and cable. This angles gives the cable more room.
Other components on the upper pcb can be identified in the diagram's LEGEND.

/forum/index.php?t=getfile&id=2723&private=0Below is a pcb stack diagram showing how it might fit together.

[Updated on: Sat, 23 April 2022 11:08]

Report message to a moderator

Project (2): In-Circuit Z80 Emulator :: Discussion [message #9751 is a reply to message #9749] Thu, 10 February 2022 09:25 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
Cluso99 wrote:
"...Just so you are aware, you will not fit a P2 in a 0.6" wide pcb because the P2 itself is almost 0.6" wide. You will also need to allow for heat dissipation too as the P2 can get quite hot if there is a lack of copper..."

Yeah, it needs to be a two pcb stack for the two reasons you make:
(1) The P2 will not fit between 100mil pin rows, 700mils apart.
(2) The thermal pad layout for the P2 transfer heat to the opposite surface.

The solution to (1) above is to place the P2 on an upper pcb of the air-gapped stack, where the lower pcb geometrically transfers the Z80 40pin locations to two header-matrices at its North and South ends, so that the upper pcb has its full West-East width for component placement, the P2 in particular.

The solution to (2) requires an upper pcb too for thermal reasons. As the in-circuit Z80 emulator is intended to be placed in vintage system enclosures, that environment is likely to be thermal challenged. So a GIVEN must be to maximize thermal protection in the stack assembly. On the upper pcb alone, that would require the recommended thermal pads with vias to the opposite surface with a pcb thick enough to do that well. An additional advantage of the two pcb stack is that the lower pcb, which is sparse, can be used as an additional thermal radiator, using connecting headers across the stack's air-gap to allow heat to expand into the lower pcb unused copper mass.

In addition to (2), it would be wise to cap the P2 with a moderate heatsink. I thought about putting a heatsink on the heat pad on the opposite side of the P2, but I'm leaning toward the use of headers to do that and involving the lower pcb. That allows the air-gap between the pcbs to be smaller.

The last thing that might help reduce thermal issues would be to adjust the clock speed and cog use when performance doesn't require it. I'm not sure the P2 has power reduction measures; I assume it does, for now.

Cluso99 wrote:
"...You can certainly run code in the P2 to drive the bus while holding the Z80 in reset. This would make a good debugger because you could see what is happening on the bus and you could drive the various devices on the bus. If they are static devices then you can slow (or stop) the bus right down to report execution as it runs..."

The current thinking on the in-circuit Z80 emulator is to abandon the physical Z-80. Removing it simplifies the circuitry and the pcb assembly, while making the coding of the P2 bigger and longer to complete. The ability to allow the P2 to emulate the Z80 is better because the P2 can provide more bus cycle testing than the actual Z80 CANNOT provide. The Z80 gives instruction step resolution, the P2 can give sub-instruction-step resolution.

As I defined my need for a in-circuit Z80 emulator replacing a 40pin DIP Z80, it would enable quicker testing/diagnosis/restoration of a vintage system. It allows the work to be done by a terminal connection over USB instead of opening the enclosure and attaching a lot of logic analyzer pins. So having the P2 emulate the Z80 is a major advantage for restoring vintage Z80 systems.

And it sounds like a LOT OF FUN! :)

UPDATE:
I changed the in-circuit Z80 emulator design so it can monitor DMA transfers when twenty eight of the Z80 signals are normally tri-stated. Couple this with the Hyper-Ram addition and the P2 can now buffer a lot of DMA signal activity to trouble-shoot DMA problems.

[Updated on: Sat, 23 April 2022 11:37]

Report message to a moderator

Project (2): In-Circuit Z80 Emulator :: Beyond Z80, Hyper-Ram/Rom, Bus Timing [message #9752 is a reply to message #9751] Thu, 10 February 2022 10:58 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
BEYOND THE Z80:
This was a good time to study other vintage computer system processors that the P2 could 'in-circuit (processor) emulate'. I completed a study the Intel 8080 and concluded that an in-circuit 8080 emulator variant is possible. It would have many of the advantages of the Z80 version, but be limited by the timing of the 8080 bus interface. But it could allow an 8080 vintage system to run most Z80 instructions; that would be the best justification for supporting the 8080.

I've also studied the Hitachi 64180 and Zilog 180. The interesting part about this study is that an in-circuit Z80 emulator could include virtual features from the 64180/Z180, including the MMU, the DMAC and several SIOs. No changes to the vintage system would be required because all the additional memory supported by a virtual MMU, would be memory in the P2 domain (its ram and memory in its peripherals).

The thought of a Z80 upgrade to a virtual 1MB MMU has me pondering how to insert a 1MB static ram chip between the P2 processor and the Z80 signal drivers near the 40-pin socket. It doesn't sound practical, but I'll play around with the idea for a while. Perhaps I'll just do that for the proof-of-concept build and then strip it down to practical application versions. My proof-of-concept build might be a taller pcb stack, good for Xerox 820 series but bad for SD Systems SBC-300 S100 boards. :)


P2 INTERFACE TO A Z80 SOCKET DONE:
I've completed the P2-to-Z80 circuitry design. I've added the ability for the P2 to 'SPY' on any DMA transfers that a Z80 is incapable of monitoring. This helpful feature allows diagnosis of DMA transfers from the Z80 socket instead of hooking up a lot of logic probes. The circuitry also has better fan-out capacity that an Z80 processor.

Now the fun part begins... what modern chips can I add to the P2 processor to put more power into this in-circuit Z80 emulator? I have about 22 unused I/O pins of the 64-pin P2 processor that I can split between those pins reserved for the P2's use (like a micro-SD card) and those pins used for additional resources on the P2 processor's pcb.

The strategy is to let the P2 communicate with an outside USB HUB; a single cable to leave the vintage system enclosure. As the in-circuit Z80 emulator is sized to fit into a 40-pin socket, heavy cables are just likely to rip it out of the socket. Better to add some USB converters on the external HUB.

I have a Hyper-Ram interface card for my "P2 Edge Module Breadboard" so I have to put a Hyper-Ram/Rom on it. :)

HYPER-RAM/ROM
Hyper-Ram/Rom is basically a huge capacity, byte-wide ram or rom, with auto-incrementing internal addressing, that is high speed and minimal interface, packed inside a BGA surface mount component, half the size of the body of a 555 timer. You load an address via a protocol interface and after that just recursively write data into it or read data out it. The addressing automatically bumps for you.

It is in effect, a silicon floppy disk or hard drive, more specifically a ram-disk and rom-disk. It can also be used to slam a lot of high speed telemetry inside (like the P2 spying on DMA activity and shoving all the information into a ram-disk). It could support copious debugging information storage. With an emulated MMU unit added to the in-circuit Z80 emulator, the additional memory beyond the 64KB in-system could be a part of the Hyper-Ram capacity.

The 'HyperBus' interface @ 3Vdc is 11 signals: {CS#,CK,RWDS,D0:7}, no differential clock used. Current capacity is {8MB,16MB,32MB}. That's Crazy-Z80 territory!
It only requires reserving 2 more pins from the P2:
  11*  ; pins for a full hyper-ram chip interface ::  * ISSI 'HyperBus' interface
 + 1   ; pin for the hyper-rom chip enable
 - 8   ; pins because the Z80 8-bit data bus, when tri-stated, can be used instead of another 8-bit bus from the P2
 - 2   ; pins by using two transceiver DIR signals, while those transceivers are tri-stated. When not tri-stated, the hyper-ram/rom are disabled.
 ====
   2   ; it would only require 2 pins additional pins from the P2: (1) Hyper-Ram Chip Enable and (2) Hyper-Rom Chip Enable.
Hyper-Ram can be used to save DMA transfer information for debugging or for fill it with debug trace information. The also make fast, byte-wide ram-disks and Hyper-Rom as rom-disks or hard drive.


In-Circuit Z80 & 64180 Emulator *FUSION*:
I'm combining the Z80 and 64180 development because its a good fit. I added an option to allow the z80 to upgrade itself to a 64180. If enabled the Z80 processes machine code using the same cog program that the 64180 does. The Z80 gets virtual 64180 MMU/DMAC/SIO ports and access to a 1MB SRam on the P2 internal bus so it can run like a 64180 with only 64KB on the vintage system board. The virtual MMU allows the Z80 to use MMU banking via the virtual 64180 MMU settings.

The internal bus can access P2 external resources without affecting the Z80 vintage system, and it requires fewer P2 I/O signals to support the 64pin socket of the in-circuit 64180 emulator.

Remember the virtual SIO won't have any new signals into the virtual system. It would be used as a communications serial data portal through the UBS connection outside of the vintage enclosure. The virtual SIO just gives the vintage system a way to send data and receive data for any local program acting like a communications portal; like modems accessing Bulletin Board Systems. It could be used to read a USB memory stick directory and select files to download into the vintage system.

My development board layout will likely be the same printed circuit boards for the P2 environment and the 1MB SRam and the lower pcb will be either a 40pin Z80 socket or a 64pin 64180 socket. For the development board, I'll use SMT down to TSSOP and QFP for easier in-home build. I may have BGA Hyper-Ram/Rom. When the development is completed, I'll do another build in BGA mostly too minimize the in-circuit emulator's size.

I'm going to do the Z80 version development inside my:
(1) Xerox 820
(2) Xerox 820-II
(3) Xerox 16-8
These systems have space for a slightly-bigger development version.


A DEEPER KNOWLEDGE OF Z80 BUS CYCLES AND T-STATES:
To emulate a physical Z80, I need to be more aware of the bus cycle sequences that make up a single instruction execution. My Linux software Z80-emulator is just a virtual machine like "MYZ80"; it doesn't actually emulate Z80 bus cycle sequences.

Actual Z80 instruction fetch cycles and the external read/writes are easy to figure out. While an in-circuit emulation would not have to maintain the exact sequence or delays as the original Z80, a careful study is needed to assure that the vintage system can support any alterations from Z80 actions.

UPDATE:
Confirmed, I was able to deduce the instruction bus cycles and clock timing from the manuals and even figure out some wasted clock cycles that the P2 could remove without adversely affecting the vintage system; the system just had to wait for longer bus cycles while the Z80 was manipulating the internal registers and doing calculations for (IX-D) addressing and changing the program counter arithmetically instead of incrementally.

The Mostek manual is a useful reference as they list the bus cycles sequences by the number of clock periods in each bus cycle. But be careful as that manual has many careless errors.


BUILDING DATA TABLES FOR Z80 BUS CYCLE TIMING
I'm putting the Z80 AC Characteristics into a spreadsheet table (below), for the Z80(2.5Mhz), Z80A(4Mhz), Z80B(6Mhz) and the Z80H(8Mhz). That will allow me to select a Z80 speed and have the spreadsheet look up the correct values for drawing that bus cycle. I'll take a shot at having the spreadsheet render the test graphics or maybe just do a signal graphic plot and use a screen image. Its just spreadsheet playing-around duty. The table's main purpose is to guide the assignment of Z80 timing tables in the P2 as it emulates Z80 pin activity. As the specification bounds the min/max timing of pin events, the P2 will have a table on pin timings within the specification bounds.

The following bus cycle symbols and timings are from a 1983 Reprint authorized by Zilog to Ampro within their Little Board Manual. It differs from the Zilog 1977 editions of the Z80-CPU, Z80A-CPU Technical Manual, in Symbol names but has more detailed bus cycle diagrams and includes the Z80B and Z80H. I'll use the Ampro Little Board manual as my source for now.
	Z80 Delay LUT								Timing Table (Z80 index=0:3)
Timing	Timing		Timing							Z80	Z80-A	Z80-B	Z80-H	
Symbol	Symbol		Symbol							2.5M	4M	6M	8M	Time
Index	Name		Description						x(y,0)	x(y,1)	x(y,2)	x(y,3)	Units
======	==============	======================================================	======	======	======	======	======
0	TcC		Time cycle CLK						400	250	165	125	ns Min
1	TwCh		Time width CLK high					180	110	65	55	ns Min
2	TwCl		Time width CLK low					180	110	65	55	ns Min
3	TfC		Time fall CLK						30	30	20	10	ns Max
4	TrC		Time rise CLK						30	30	20	10	ns Max
5	TdCr(A)		Time delay CLK rise to (Address valid)			145	110	90	80	ns Max
6	TdA(MREQf)	Time delay Address valid to (MREQ# fall)		125	65	35	20	ns Min
7	TdCf(MREQf)	Time delay CLK fall to (MREQ# fall)			100	85	70	60	ns Max
8	TdCr(MREQr)	Time delay CLK rise to (MREQ# rise)			100	85	70	60	ns Max
9	TwMREQh		Time width MREQ# high					170	110	65	45	ns Min
10	TwMREQl		Time width MREQ# low					360	220	135	100	ns Min
11	TdCf(MREQr)	Time delay CLK fall to (MREQ# rise)			100	85	70	60	ns Max
12	TdCf(RDf)	Time delay CLK fall to (RD# fall)			130	95	80	70	ns Max
13	TdCr(RDr)	Time delay CLK rise to (RD# rise)			100	85	70	60	ns Max
14	TsD(Cr)		Time setup Data to (CLK rise)				50	35	30	30	ns Min
15	ThD(RDr)	Time hold Data to (RD# rise)				0	0	0	0	ns Max
16	TsWAIT(Cf)	Time setup WAIT# to (CLK fall)				70	70	60	50	ns Min
17	ThWAIT(Cf)	Time hold WAIT# after (CLK fall)			0	0	0	0	ns Max
18	TdCr(M1f)	Time delay CLK rise to (M1# fall)			130	100	80	70	ns Max
19	TdCr(M1r)	Time delay CLK rise to (M1# rise)			130	100	80	70	ns Max
20	TdCr(RFSHf)	Time delay CLK rise to (RFSH# fall)			180	130	110	95	ns Max
21	TdCr(RFSHr)	Time delay CLK rise to (RFSH# rise)			150	120	100	85	ns Max
22	TdCf(RDr)	Time delay CLK fall to (RD# rise)			110	85	70	60	ns Max
23	TdCr(RDf)	Time delay CLK rise to (RD# fall)			100	85	70	60	ns Max
24	TsD(Cf)		Time setup Data to (CLK fall) during {M2,M3,M4,M5}	60	50	40	30	ns Min
25	TdA(IORQf)	Time delay Address stable to (IORQ# fall)		320	180	110	75	ns Min
26	TdCr(IORQf)	Time delay CLK rise to (IORQ# fall)			90	75	65	55	ns Max
27	TdCf(IORQr)	Time delay CLK fall to (IORQ# rise)			110	85	70	60	ns Max
28	TdD(MREQ:WRf)	Time delay Data stable to (MREQ#:WR# fall)		190	80	25	5	ns Min
29	TdCf(WRf)	Time delay CLK fall to (WR# fall)			90	80	70	60	ns Max
30	TwWR		Time width WR# low					300	220	135	100	ns Min
31	TdCf(WRr)	Time delay CLK fall to (WR# rise)			100	80	70	60	ns Max
32	TdD(IORQ:WRf)	Time delay Data stable to (IORQ#:WR# fall)		20	-10	-55	55	ns Min
33	TdCr(WRf)	Time delay CLK rise to (WR# fall)			80	65	60	55	ns Max
34	TdWRr(D)	Time delay WR# rise to (Data stable)			120	60	30	15	ns Min
35	TdCf(HALT)	Time delay CLK fall to (HALT# rise or fall)		300	300	260	225	ns Max
36	TwNMI		Time width NMI# low					80	80	70	60	ns Min
37	TsBUSREQ(Cr)	Time setup BUSREQ# to (CLK rise)			80	50	50	40	ns Min
38	ThBUSREQ(Cr)	Time hold BUSREQ# after (CLK rise)			0	0	0	0	ns Min
39	TdCr(BUSACKf)	Time delay CLK rise to (BUSACK# fall)			120	100	90	80	ns Max
40	TdCf(BUSACKr)	Time delay CLK fall to (BUSACK# rise)			110	100	90	80	ns Max
41	TdCr(Dz)	Time delay CLK rise to (Data float)			90	90	80	70	ns Max
42	TdCr(CTz)	Time delay CLK rise to ({MREQ#,IORQ#,RD#,WR#} float)	110	80	70	60	ns Max
43	TdCr(Az)	Time delay CLK rise to (Address float)			110	90	80	70	ns Max
44	TdCTr(A)	Time delay {MREQ#,IORQ#,RD#,WR#}rise to (Address hold)	160	80	35	20	ns Min
45	TsRESET(Cr)	Time setup RESET# to (CLK rise)				90	60	60	45	ns Min
46	ThRESET(Cr)	Time hold RESET# to (CLK rise)				0	0	0	0	ns Max
47	TsINTf(Cr)	Time setup INT# fall to (CLK rise)			80	80	70	55	ns Min
48	ThINTr(Cr)	Time hold INT# rise to (CLK rise)			0	0	0	0	ns Max
49	TdM1f(IORQf)	Time delay M1# fall to (IORQ# fall)			920	565	365	270	ns Min
50	TdCf(IORQf)	Time delay CLK fall to (IORQ# fall)			110	85	70	60	ns Max
51	TdCr(IORQr)	Time delay CLK rise to (IORQ# rise)			100	85	70	60	ns Max
52	TdCf(D)		Time delay Clock fall to (Data valid)			230	150	130	115	ns Max

DUPLICATION SYMBOL RESOLVED AS:
28: TdD(MREQ:WRf) is the new name for Memory Write Cycle TdD(WRf)
32: TdD(IORQ:WRf) is the new name for Output Write Cycle TdD(WRf)

NOTE: #51 above, TdCr(IORQr), was incorrectly listed as TdCf(IORQr) in the Z80 spec as reprinted by Ampro in their Little Board manual. They have the correct CLK rise arrow in the description but use "f" following the "TdC". Its shown on the interrupt acknowledge bus cycle where the correction can be deduced.

[Updated on: Sat, 07 May 2022 15:58]

Report message to a moderator

Re: P2 based Chip Tester [message #9753 is a reply to message #9650] Thu, 10 February 2022 13:39 Go to previous messageGo to next message
cluso99 is currently offline  cluso99
Messages: 40
Registered: June 2017
Member
When I run my setup and the tests it can go for more than a day. I have a heat sink on the P2 and another on the pair of LDOs. They are from a Raspberry Pi set on eBay or AliExpress. At the end of the pcb pair I sit a 30mm fan. This keeps the P2 cool and the regs too.
Project (2): In-Circuit Z80 Emulator :: Virtual Machine versus In-Circuit Processor Emulation [message #9765 is a reply to message #9752] Tue, 15 February 2022 10:26 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
Pondering the use of a P2 as an in-circuit processor emulator replacement, opens up a range of possibilities. Using a P2 is different than using a FPGA, in the same way that using a microprocessor was different than using racks of digital logic boards to do complex functions. The P2 is easier to change and to add complex features, than on a FPGA design.

To illustrate the additional scope of an in-circuit processor emulator replacement, I'll contrast it to a virtual machine approach.

(0) The Virtual Machine:
This is an emulation of the original processor's internal operation coupled with an emulation of its system resources, running on a different host microprocessor and OS or firmware. An example would be "MYZ80" that emulated the Z80 on Intel 16-bit processors and DOS, giving the user a terminal experience as if a Z80 was running CP/M. This is a full emulation of just the processor's function and system resources by software; it doesn't require adherence to the vintage systems hardware timing and signals activity when accessing resources. There are no vintage system, real-time constraints in this design.

(1) The In-Circuit Processor Emulator Replacement:
This is an emulation of BOTH the internal functions of the original processor AND its pin interface to the original system. Real-time constraints are important to assure the replacement operates in the original system as if it were the original processor. FPGAs are the common design choice and typically offer faster clock speeds for updated system designs. To use a FPGA replacement in an original Z80 system, it would have to perform at the original system clock and maintain the signal timing in relation to that clock speed.

The Parallax P2's speed and multi-processors (Cogs) appears capable of doing this at original system clock speeds. The advantage is you get the flexibility of software/firmware changes as opposed to FPGA constraints. The P2 software has to accomplish both an emulation of the processor's internal operation AND an accurate real-time simulation of pin/signal activity and timing. The latter is required to operate with the resources of the original system.

Now that the processor-replacement has been introduced, the P2 I/O assignments to Z80 pins are improved for the emulation.

Logical Groups of Z80 I/O pins assigned to P2 pins:
The initial diagram quartered the 40pin DIP pins and routed them to the upper PCB to the P2 without any constraints other than the easiest CAD bias. Now as the task involves more thought about how to structure the P2 code, there are obvious advantages to put certain Z80 signals together, to make the P2 code run more efficiently. As such these pin assignment will take precedence over the prior CAD bias.

In all bus cycles, Address pins and Data pins will changing the most in comparison to control signals. Address and Data also are often used in binary calculations, so it is advantageous to assign them to P2 ports in the same order. As the signal simulation will be dealing with Address and Data most of the time, having their pins aligned to ports in binary order saves any bit shift re-alignments. In the P2 there are basically two ports of 32 I/O pins. While you can select pins affected by a command and there are commands to mask and shift as necessary, its a good thing to avoid to gain better efficiency.

So... A15:00 should be assigned to a contiguous 16-bit I/O frame on one port taking advantage of binary math positions. {A15:00}==P2_I/O_Port0_pin{p15:00} and likewise {D07:00}==P2_I/O_Port1_pin{p07:00}. No surprise; data can be directly transferred between registers and 16-bit registers can be directly written to address signals.

In contrast, if this were a design to make reverse engineering more difficult, scrambling the pins assignments and contorting how they're handled by the very capable P2, would be added as a complicating, first level of dissuasion; yes the Z80 pins working backwards would identify which pins were which, by the back-path, but what you can do inside processors is how you make a reverse engineer go insane... but that's another topic. :)

When a bus cycle is started, the eight Z80 output control pins would likely be pulled from a table and written to an 8-bit frame in a P2 32-bit port. Seven of these eight signals are always OUTPUTs (though they may be in high impedance sometimes, as in following a BUSACK#). IORQ# has a role during interrupt bus cycles to grab a interrupt vector from the DATA bus.

There are six Z80 input control when including CLK as an input which the P2 monitors as a timing reference to for subsequent signal transitions. It follows the system clock because other hardware in the vintage system may be using that source of the CLK signal. There are likely exceptions, but the strategy is to make the P2(Z80) broadly compatible in vintage systems.

These 38 Z80 signal assignments are shown in the following text diagram:
	P2 Port 0, 32-bits
p0b31   ...available I/O
p0b30   ...available I/O
        Z80-CONTROL-INPUT-------Order: More Frequent is Lower
p0b29   p26 RESET#      INP     
p0b28   p25 BUSREQ#     INP
p0b27   p17 NMI#        INP
p0b26   p16 INT#        INP
p0b25   p24 WAIT#       INP
p0b24   p06 CLK	        INP
        Z80-CONTROL-OUTPUT------Group Position SET, Order NOT set
p0b23   p20 IORQ#       OUT
p0b22   p19 MREQ#	OUT
p0b21   p21 RD#         OUT
p0b20   p22 WR#         OUT
p0b19   p27 M1#         OUT
p0b18   p28 RFSH#       OUT
p0b17   p18 HALT#       OUT
p0b16   p23 BUSACK#     OUT
        Z80-ADDRESS-------------Group Position SET, Order SET
p0b15   p05 A15		OUT
p0b14   p04 A14         OUT
p0b13   p03 A13         OUT
p0b12   p02 A12         OUT
p0b11   p01 A11         OUT
p0b10   p40 A10         OUT
p0b09   p39 A09         OUT
p0b08   p38 A08         OUT
p0b07   p37 A07         OUT
p0b06   p36 A06         OUT
p0b05   p35 A05         OUT
p0b04   p34 A04         OUT
p0b03   p33 A03         OUT
p0b02   p32 A02         OUT
p0b01   p31 A01         OUT
p0b00   p30 A00         OUT
---------------------------
	P2 Port 1, 32-bits
p1b31   ...available I/O
...
p1b08   ...available I/O
        Z80-DATA BI-DIR---------Group Position SET, Order SET
p1b07   p13 D07         BID
p1b06   p10 D06         BID
p1b05   p09 D05         BID
p1b04   p07 D04         BID
p1b03   p08 D03         BID
p1b02   p12 D02         BID
p1b01   p15 D01         BID
p1b00   p14 D00         BID

[Updated on: Sat, 23 April 2022 11:58]

Report message to a moderator

Project (2): A P2 In-Circuit Z80 Emulator Can Eliminate Idle-Bus T-States [message #9786 is a reply to message #9765] Fri, 25 February 2022 12:51 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
REMOVING UNNECESSARY T-STATES FROM Z80 BUS CYCLES:
When pondering ways for a P2(Z80) to speed up a vintage system, the question is usually shot down by the existing motherboard's slow components or by wait states that are generated to slow the original Z80 down. There is no easy way to overcome these impediments, other than to remap the vintage memory into a faster memory resource.

However, there are some cases where a real Z80 consumes additional T-states just to do some internal operations while the vintage system just waits for the next bus cycle to start. These "Idle Bus T-states" slow the vintage system down a little and can be removed by a fast in-circuit emulator like the P2 without affecting the vintage system at all.

Note that the performance enhancement of removing these unnecessary T-states will not be an impressive nor noticeable change to the user because the most common (basic) instructions do not require additional T-states.


FINDING INSTRUCTIONS WITH IDLE-BUS T-STATES:
A clue to this is in the Z80 specification wherein they describe a Opcode Fetch, M1-Cycle as being {4,5,6} T-states long (excluding wait states). There are two phases of an opcode (and prefix-opcode) fetch: (1) the hardware memory read of the opcode into the Z80 instruction register, and (2) when the instruction register changes the microcode address and sets flags and resource selections with various internal operations and creates additional bus cycles. Phase 1 must always be the same because the micro has no fore-knowledge about the opcode being loaded. That logically means that all opcodes load into the Z80 in the same amount of time, and only phase 2, where the opcode affects the internal and external operation of the z80, can transfers and bus cycles change the full instruction timing.

We know from several methods that the basic phase 1 of an instruction fetch takes 4 T-states. Three sources of this information are revealed by example:

(1) The Z80-CPU, Z80A-CPU Technical Manual states that the refresh cycle occurs without any loss of performance so that static memory and dynamic memory would be equally efficient when running the Z80. That means that the 4 T-state fetch would require 4 T-states whether or not the refresh was added to that bus cycle. Thus it takes 4 T-states to fetch an opcode into the instructruction register, so why not add a refresh to the bus cycle.

(2) By minimal example, the NOP instruction is the fastest, least internal operation, instruction and it takes 4 T-states, as it has no additional internal operations to perform. A lot of 8-bit register to 8-bit register instructions can operate within 4 T-states too.

(3) By analyzing the necessary bus cycles for some instructions, the total bus cycle T-states add up to the minimum T-states for the series of fetch, memory/port read/writes input/ouputs with no extra T-states added (most noticeable exceptions involve stack pointer additions and (ix+d)/(iy+d) calculations).

Note that the T-state calculations above are done without wait-state as the latter is only relevent to assuring the hardware is capable of performing necessary bus activity. If calculations are done with wait-states they're added and subtracted back out, and thus just waste calculation time and effort.

So the rule is, opcode-prefixes and opcodes all load in 4 T-states.

Z80 specified memory reads and memory writes take only 3 T-states. Port inputs and outputs take 4 T-states (remember one wait state is inserted by the Z80 itself). Any bus cycle that takes longer than these segments, has Idle-Bus T-states that can be removed.

EXAMPLE:
F9h       ld sp,hl       ;BMT= 1,1,6 where B=bytes, M=machine cycles, T= clock periods or T-states
This is a single byte instruction that is fetched by a M1-cycle that is 6 T-states long. As a fetch single byte instruction with refresh takes only 4T, the additional 2T is due to the extra time the Z80 needs to internally copy the HL register value into the Stack Pointer register. That's 2T that can be removed by a P2 emulating a Z80.


NOW A RADICAL REDEFINITION OF Z80 REFRESH:
The Z80 refresh method is to take advantage of an already necessary 4 T-state opcode fetch bus cycle and insert a refresh at no additional cost in timing. This method does not refresh on a minimum timing basis, but upon a transparent machine-gunning approach. The refresh is transparent in the sense that adding it to the fetch bus cycle didn't make the bus cycle take any longer. Since its transparent it doesn't matter, within the Z80 if refreshes are done an excessive number of times because there is no time penalty.

But when a modern processor like the P2 in performing as a in-circuit Z80 emulator, that refresh is now a costly delay of 1 T-state on every opcode fetch. There is no longer a transparent hole to insert a refresh. The modern emulator can efficiently schedule appended refresh bus activity without machine-gunning the bus with unnecessary refreshes. Thus less bus T-states are used and the vintage system runs more efficiently. Opcode fetches are reduced from 4 T-states down to 3 T-states but scheduled refreshes put some of those 4 T-state refreshes on read or fetch bus cycles as minimally necessary.

If the modern in-circuit emulator scheduled a refresh, it could either add one 2 T-state refresh-only cycle, or more cleverly, blend a refresh on the next memory read bus cycle or the next opcode fetch bus cycle. The only difference is that these more efficient refreshes are not machine-gunned but refreshed by timing, when necessary, inserted into the next (1) memory read (2) opcode fetch or at the cost of adding only 1 T-state because of the blended bus cycle construction. This requires less refreshes than the Z80 generated and it inserts them an the bus cycle cost of only 1 T-state. Basically it takes all the Z80 machine-gunning refreshes away, and only puts some back at the scheduled dynamic memory refresh rate.

If the vintage system uses static memory that needs no refreshes, then all refreshes can be turned off in the in-circuit Z80 emulator. If a mix of DRam and SRam are used, the emulator can be given a system map and only use refresh when accesses blocks of DRam.



Next:
I'll introduce additional T-state reductions to make bus cycle use more efficient:
(1) Some Z80 instructions read operands that it will not use; save 1 or 2 bus cycles by not reading them.
(2) Some Z80 instructions issue 3 refreshes; if the scheduled refresh is turned off, these instructions can issue 2 refreshes and be ok.
(3) The auto-loop instructions are poorly implemented when compared to what the emulator can do.

[Updated on: Wed, 06 April 2022 10:07]

Report message to a moderator

Re: Project (2): P2(Z80) Can Eliminate Bus-Idle T-States For Microcoded Internal Operations [message #9788 is a reply to message #9786] Sat, 26 February 2022 17:26 Go to previous messageGo to next message
cluso99 is currently offline  cluso99
Messages: 40
Registered: June 2017
Member
@jayindallas
Very interesting discussion and observations.

If you were to go a little deeper into the Z80 board you are connecting to, you could indeed have the memory (SRAM/DRAM/ROM) emulated from within the P2. This is what I've done although I have replaced the whole Z80 system including I/O too.

It's possible to do this in two ways
1. If there is no video memory in the ram/rom space, then the whole ram/rom can be done inside the P2 and the ram/rom fetch cycle could be discarded completely.
2. If there is video memory in the ram/rom space, then you could
a. emulate the ram/rom internally in the the P2 and just execute writes to ram/rom to both internal and external, or
b. check for the hole in required ram/rom and perform this externally
c. just use the external ram/rom (ie don't emulate inside the P2)

[Updated on: Thu, 31 March 2022 22:21] by Moderator

Report message to a moderator

Project (2): In-Circuit Z80 Emulator :: T-state Reductions [message #9792 is a reply to message #9788] Sun, 27 February 2022 18:02 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
REMOVING WASTED BUS CYCLES FROM Z80 CONDITIONAL BRANCH INSTRUCTIONS
The Z80 executes conditional branches {JP,CALL,JR,DJNZ} by fetching the entire instruction, opcode and operands. Conditional {JP,CALL} instructions fetch the opcode, and reads two operands designating the branch address. Conditional {JR,DJNZ} instructions fetch the opcode, and reads one operand designating the branch-relative address.

The condition can be tested immediately after the opcode is fetched. Before reading the operand, it can be known whether the condition is TRUE or FALSE. If the condition is TRUE the operands are needed to make the branch. If the condition is FALSE the operands are not needed because no branch is executed.

It follows that a more efficient execution of conditional branch instructions would be to only read the operands when the condition is TRUE, and to skip past the operands when the condition is FALSE.

The P2 emulation of the Z80 can take this advantage. When the condition is false, it can bump the Program Counter past the operands and end the conditional branch execution.
                        ;   |<-FALSE-->|<-------TRUE--------->|
Z80:    jp cc,nn        ;Mc=|(4f,3f,3f)|Mc=(4f,3f,3f)         | Z80: if false 10 T-states, if true 10 T-states
                        ;   |          |                      |
P2:     jp cc,nn        ;Mc=|(4f,PC+=2)|Mc=(4f,3f,3f)         | P2:  if false  4 T-states, if true 10 T-states
                        ;   |    ***** |                      |

                        ;   |<-FALSE-->|<-------TRUE--------->|
Z80:    call cc,nn      ;Mc=|(4f,3f,3f)|Mc=(4f,3f,3f,1i,3w,3w)| Z80: if false 10 T-states, if true 17 T-states (include pushing return address onto the stack)
                        ;   |          |             **       |                                                ** Z80 has 1 T-state for internal operations
P2:     call cc,nn      ;Mc=|(4f,PC+=2)|Mc=(4f,3f,3f,3w,3w)   | P2:  if false  4 T-states, if true 16 T-states (include pushing return address onto the stack)
                        ;   |    ***** |                      |
The false condition machine cycle above is listed in the "no branch" position (left) as a "(4f,PC+=2)" meaning, fetch the opcode, if false, bump the program counter two address bytes, past the two operands and at the next instruction address.
                        ;   |<-FALSE-->|<-------TRUE--------->|
Z80:    jr cc,e         ;Mc=|(4f,3f)   |Mc=(4f,3f,5i)         | Z80: if false 7 T-states, if true 12 T-states  
                        ;   |          |          **          |                                                ** Z80 has 5 T-state for calculating program counter for jump
P2:     jr cc,e         ;Mc=(4f,PC+=1) |Mc=(4f,3f)            | P2:  if false 4 T-states, if true 7 T-states
                        ;   |   *****  |                      |

                        ;   |<-FALSE-->|<-------TRUE--------->|
Z80:    djnz e          ;Mc=|(4f,1i,3f)|Mc=(4f,1i,3f,5i)      | Z80: if false 8 T-states, if true 13 T-states  ** Z80 has 1 T-state for internal operations after opcode
                        ;   |    **    |       **    **       |                                                ** Z80 has 5 T-state for calculating program counter for jump
P2:     djnz e          ;Mc=|(4f,PC+=1)|Mc=(4f,3f)            | P2:  if false 4 T-states, if true 7 T-states
                        ;   |    ***** |                      |
The false condition machine cycle above is listed in the "no branch" position (left) as a "(4f,PC+=1)" meaning, fetch the opcode, if false, bump the program counter one address bytes, past the one operand and at the next instruction address.


Removing Excessive Refreshes From Some Z80 Instructions
The following Z80 instructions are four bytes long and structured as prefix1,prefix2,operand,opcode. Its an unusual construction that push the opcode to the very last position. Prefix1 is DDh or FDh to make the operand apply to the IX or IY index register. Prefix2 is CBh to specify the extended instruction set in the Z80 that does {bit,res,set} single bit operations and the eight bit rotate and shift operations. The third byte is the Delta (+d) needed by the selected IX or IY register. The last byte is the opcode that sets the operation.

These instructions are unique in having three refresh cycles in a four byte instruction. If scheduled refresh timing is disabled and regular Z80 opcode-refreshes are being used, these special instructions will drop the third refresh.

The in-circuit Z80 emulator drops the third refresh from these instructions and denotes the fetch as "3f*,":
        bit b,(irr+d)   ;P2+ Mc=(4f,4f,3f,3f*,3r)
        bit b,r,(irr+d) ;P2+ Mc=(4f,4f,3f,3f*,3r)
        res b,(irr+d)   ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        res b,r,(irr+d) ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        set b,(irr+d)   ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        set b,r,(irr+d) ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)

        rl  (irr+d)     ;P2+ Mc=(4f,4f,3f,3f*,3r,3w) 
        rl  r,(irr+d)   ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        rlc (irr+d)     ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        rlc r,(irr+d)   ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        rr  (irr+d)     ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        rr  r,(irr+d)   ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        rrc (irr+d)     ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        rrc r,(irr+d)   ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)

        sla (irr+d)     ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        sla r,(irr+d)   ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        sll (irr+d)     ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        sll r,(irr+d)   ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        sra (irr+d)     ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        sra r,(irr+d)   ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        srl (irr+d)     ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
        srl r,(irr+d)   ;P2+ Mc=(4f,4f,3f,3f*,3r,3w)
More than half of the instructions above are Z80 "undocs," instructions that work but are not specified by Z80 manufacturers.
The oddest instruction is the tertiary operand like the following example:
                                  ;example res b,(ix+d) where A6 opcode specifies a 110b register not in the normal table
DDCB02A6  res     4,(ix+2)        ;bit 4 is reset to 0 in memory location (ix+2)
                                  ;
                                  ;if you change the opcode to specify a register in {a,b,c,d,e,h,l}, such as 111b for register A, it leaves a copy of the result in the selected register.
DDCB02A7  res     4,a,(ix+2)      ;bit 4 is reset to 0 in memory location (ix+2), but changing 110b to 111b in the opcode byte, specifies register A
                                  ;an undoc'er found that a copy of the result written to memory, was also in left in the specified register
                                  ;this is a tertiary instruction because it has three operands {bit 4,register a, memory address (ix+2)}
Its actually a very useful instruction. In the example above, it changes the value at (ix+2) to the result but leaves a copy in the specified register so if you need to make another test on that value soon, you can reference internal register A instead of going back to the slow access memory location.

However, most assemblers were written assuming at most two operands so these instructions were probably not documented. Among documents on the "undoc" instructions, its often said that the computer game companies took maximum use out of them to get more power in their games.


Next: Speeding Up Auto Looping Instructions
The instructions that auto-loop until a termination condition or two is met, can see big speed gains when executed by an in-circuit Z80 emulator, instead of the Z80 itself.

[Updated on: Sat, 30 April 2022 17:57]

Report message to a moderator

Project (2): In-Circuit Z80 Emulator :: P2(Z80) Table of T-state savings [message #9824 is a reply to message #9650] Fri, 18 March 2022 14:31 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
This tabulation show the T-state advantage that the P2 can do without affecting the vintage system it runs on.

TABLE DESCRIPTION:


GROUP COLUMN: "Assembly Language"
---------------------------------
This set includes columns for:
(1a) "Pseudo Code" concisely describes a group of very similar instructions one each line.
(1b) "Instruction" lists each individual instruction, expanded from the Pseudo Code block.
(2) "Typ" for Type; is an assigned class instruction sets that mostly use the same machine cycles and T-States.
(3) "Category" divides instructions into 3 classes: Z80 only, Z80 & 8080 and Z80 undocumented.
Note that "Type" is useful in collecting sets of instructions that can be efficiently bundled in emulator code, as a first layer of grouping. And as the P2 will be emulating a Z80's operation, I add this info toward that later stage.


GROUP COLUMN: "Machine Code"
---------------------------------
This set includes columns for:
(1) "Cnt" for Count; obviously this is the number of bytes in the instruction.
(2) "B1" is the first byte and includes either its hex code or if pseudo, "pf" for prefix, "oc" for opcode, or "or" for operand.
(3) {"B2","B3","B4"} is obviously the rest of the bytes of the instruction, as described in (2)
B1:4 helps to figure necessary machine cycles reading the instruction, before applying additional machine cycles for memory or port access.


GROUP COLUMN: "Z80 Timing"
---------------------------------
This group is first split between instructions that do not branch and instructions that do branch, as there timing is usually different.
A branch could be described as an instruction that changes the program counter beyond the normal consecutive order.
Examples would be JP, JR, DJNZ, JP cc, CALL, CALL cc, RET, RET cc, RST n, and the special automatic CP/LD/IN/OUT.
Note that some timing values are undocumented and denoted by a "?" entry. Some Z80 "undocs" have timing information, some don't (yet).
This set includes columns for:
(1) "MC" for Machine Cycles; these variable length bus activity to access memory or ports or to internally move registers around.
(2) "TS" for T-states; these are the total number of clock cycles for the Z80 to execute the full instruction, without wait states.
As most instructions do not branch, its listed first. A JP nn instruction should be listed only under the branch section.


GROUP COLUMN: "P2(Z80) Timing"
---------------------------------
This group is structured like the Z80 timing just described. The only difference is that in includes a third column in each branch section.
"Tsav" for T-states saved. If a number is provided, then the P2(Z80) can execute the instruction more efficiently by cutting out unnecessary
bus delays. These removed delays are cause by the Z80's microcode that can take more time to complete than a minimum bus cycle requires.
These delays are hard coded in the microprocessor and will not "hide them" in available wait states. For example, any time the STACK is manipulated or an index plus delta "(irr+d)" instruction is executed, the Z80 has to take additional time to move registers around internal resources and these delays account for bus delays that the P2(Z80) can simply remove without adversely affecting the vintage computer system.


TABLE LEGEND:
---------------------------------
"ir" = {xh,xl,yh,yl} :: pseudo for an high or low 8-bit register of IX or IY accessible in some Z80 undocumented instructions
"irr" = {ix,iy} :: pseudo code reference for any 16-bit index register
"?" :: is used when undocs have no timing info and when the "Tsav" cannot be calculated because the Z80 timing is undocumented
"Tsav" :: P2 savings over Z80 T-state; a simple subtraction. If the value is zero, no entry is listed.
"r" = 8-bit register :: It loosely defines whatever such registers are available to the particular instruction group.
"rr" = 16-bit register :: It loosely defines whatever such registers are available to the particular instruction group.
"P2 Mc=(4f,4i) details how the P2 will emulate a machine cycle sequence. Suffix used: "f" = fetch, "r"/"w" memory, "i"/"o" port.
"P2+" denotes a P2 savings beyond the simple emoval of unnecessary bus cycle T-states that the Z80 used but the P2 doesn't need.


P2+ CASES
---------------------------------
(This section has been updated in the next message)


8-BIT AND 16-BIT INSTRUCTIONS ARE GROUPED TOGETHER
---------------------------------
Another departure from Zilog documentation is the bundling of 8-bit and 16-bit operand instructions such as "add a,b" and "add hl,bc".
The reason for bundling them is that emulator firm/soft-ware can handle both with small conditionals and thus reduce redundant code.

Z80 UNDOCUMENTED INSTRUCTIONS
---------------------------------
The table includes most potentially useful 'undocs' and a few redundant ones. Most undocs don't have their Mc and Ts documented, but they can be deduced by the nature of the of other Z80 instructions. The table update now uses likely Z80 Mc and Ts for undocs. On undoc instructions in the following table, assumed values have a "?" appended and instead of listing the P2 Mc/Ts in the right margin notes, the assumed Z80 Mc is shown instead. When the P2 Tsav value is uncertain, the value is appended with a "?".


The complete table is below:
||                                  ||                || Z80 Timing        || P2(Z80) Timing              || NOTES:
||Assembly Language                 ||Machine Code    ||No Branch|Branch   ||No Branch     |Branch        ||
||PSEUDO CODE      |TYP|CATEGORY    ||CNT|B1|B2 |B3|B4||MC  |TS  |MC  |TS  ||MC  |TS  |Tsav|MC  |TS  |Tsav||
|| adc a,r         | 1 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| adc a,ir        | 2 | Z80-Undoc  || 2 |pf|oc |  |  || 2  | 8? |    |    || 2  |  8 | 0? |    |    |    ||Z80 Mc=?{4f,4f} ir={xh,xl,yh,yl}
|| adc hl,rr       | 3 | Z80 only   || 2 |pf|oc |  |  || 4  | 15 |    |    || 2  |  8 |  7 |    |    |    ||P2 Mc=(4f,4f) 16-bit
|| adc a,n         | 4 | Z80 & 8080 || 2 |oc|or |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3f)
|| adc a,(hl)      | 5 | Z80 & 8080 || 1 |oc|   |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3r)
|| adc a,(irr+d)   | 6 | Z80 only   || 3 |pf|oc |or|  || 5  | 19 |    |    || 4  | 14 |  5 |    |    |    ||P2 Mc=(4f,4f,3f,3r)
|| add a,r         | 1 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| add a,ir        | 2 | Z80-Undoc  || 2 |pf|oc |  |  || 2  | 8? |    |    || 2  |  8 | 0? |    |    |    ||Z80 Mc=?{4f,4f} ir={xh,xl,yh,yl}
|| add hl,rr       | 3 | Z80 & 8080 || 1 |oc|   |  |  || 3  | 11 |    |    || 1  |  4 |  7 |    |    |    ||P2 Mc=(4f) 16-bit
|| add irr,rr      | 4 | Z80 only   || 2 |pf|oc |  |  || 4  | 15 |    |    || 2  |  8 |  7 |    |    |    ||P2 Mc=(4f,4f) 16-bit, irr={ix,iy}
|| add a,n         | 5 | Z80 & 8080 || 2 |oc|or |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3f)
|| add a,(rr)      | 6 | Z80 & 8080 || 1 |oc|   |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3r)
|| add a,(irr+d)   | 7 | Z80 only   || 3 |pf|oc |or|  || 5  | 19 |    |    || 4  | 14 |  5 |    |    |    ||P2 Mc=(4f,4f,3f,3r)
|| and r           | 1 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| and ir          | 2 | Z80-Undoc  || 2 |pf|oc |  |  || 2  | 8? |    |    || 2  |  8 | 0? |    |    |    ||Z80 Mc=?{4f,4f} ir={xh,xl,yh,yl}
|| and n           | 3 | Z80 & 8080 || 2 |oc|or |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3f)
|| and (hl)        | 4 | Z80 & 8080 || 1 |oc|   |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3r)
|| and (irr+d)     | 5 | Z80 only   || 3 |pf|oc |or|  || 5  | 19 |    |    || 4  | 14 |  5 |    |    |    ||P2 Mc=(4f,4f,3f,3r)
|| bit b,r         | 1 | Z80 only   || 2 |pf|oc |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| bit b,(hl)      | 2 | Z80 only   || 2 |pf|oc |  |  || 3  | 12 |    |    || 2  | 11 |  1 |    |    |    ||P2 Mc=(4f,4f,3r)
|| bit b,(irr+d)   | 3 | Z80 only   || 4 |pf|pf |or|oc|| 5  | 20 |    |    || 5  | 17 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r)
|| bit b,r,(irr+d) | 4 | Z80-Undoc  || 4 |pf|pf |or|oc|| 5  | 20?|    |    || 5  | 17 | 3? |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r) r<<-result
|| call nn         | 1 | Z80 & 8080 || 3 |oc|or |or|  ||    |    | 5  | 17 ||    |    |    | 5  | 16 |  1 ||P2 Mc=(4f,3f,3f,3w,3w)
|| call cc,nn      | 2 | Z80 & 8080 || 3 |oc|or |or|  || 3  | 10 | 5  | 17 || 1  |  4 |  6 | 5  | 16 |  1 ||P2+ Mc=(4f,PC+=2)|Mc=(4f,3f,3f,3w,3w)
|| cp r            | 1 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| cp ir           | 2 | Z80-Undoc  || 2 |pf|oc |  |  || 2  | 8? |    |    || 2  |  8 | 0? |    |    |    ||Z80 Mc=?(4f,4f) ir={xh,xl,yh,yl}
|| cp n            | 3 | Z80 & 8080 || 2 |oc|or |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3f)
|| cp (hl)         | 4 | Z80 & 8080 || 1 |oc|   |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3r)
|| cp (irr+d)      | 5 | Z80 only   || 3 |pf|oc |or|  || 5  | 19 |    |    || 4  | 14 |  5 |    |    |    ||P2 Mc=(4f,4f,3f,3r)
|| cpd             | 1 | Z80 only   || 2 |ED|A9 |  |  || 4  | 16 |    |    || 3  | 11 |  5 |    |    |    ||P2 Mc=(4f,4f,3r)
|| cpi             | 1 | Z80 only   || 2 |ED|A1 |  |  || 4  | 16 |    |    || 3  | 11 |  5 |    |    |    ||P2 Mc=(4f,4f,3r)
|| cpdr            | 2 | Z80 only   || 2 |ED|B9 |  |  || 4  | 16 | 5  | 21 || 3  | 11 |  5 | 1+ | 3+ | 18 ||P2+ Mc=(4f,4f,(L*(3r)+Rt*4f))
|| cpir            | 2 | Z80 only   || 2 |ED|B1 |  |  || 4  | 16 | 5  | 21 || 3  | 11 |  5 | 1+ | 3+ | 18 ||P2+ Mc=(4f,4f,(L*(3r)+Rt*4f))
|| ccf             | 1 | Z80 & 8080 || 1 |3F|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| cpl             | 2 | Z80 & 8080 || 1 |2F|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| daa             | 3 | Z80 & 8080 || 1 |27|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| di              | 4 | Z80 & 8080 || 1 |F3|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| ei              | 5 | Z80 & 8080 || 1 |FB|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| halt            | 6 | Z80 & 8080 || 1 |76|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| im0             | 7 | Z80 only   || 2 |ED|46 |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| im1             | 8 | Z80 only   || 2 |ED|56 |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| im2             | 9 | Z80 only   || 2 |ED|5E |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| neg             | 10| Z80 only   || 2 |ED|44 |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| nop             | 11| Z80 & 8080 || 1 |00|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| scf             | 12| Z80 & 8080 || 1 |37|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| dec r           | 1 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| dec ir          | 2 | Z80-Undoc  || 2 |pf|oc |  |  || 2  | 8? |    |    || 2  |  8 | 0? |    |    |    ||Z80 Mc=?(4f,4f) ir={xh,xl,yh,yl}
|| dec rr          | 3 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  6 |    |    || 1  |  4 |  2 |    |    |    ||P2 Mc=(4f) 16-bit
|| dec irr         | 4 | Z80 only   || 2 |pf|oc |  |  || 2  | 10 |    |    || 2  |  8 |  2 |    |    |    ||P2 Mc=(4f,4f) 16-bit, irr={ix,iy}
|| dec (hl)        | 5 | Z80 & 8080 || 1 |oc|   |  |  || 3  | 11 |    |    || 3  | 10 |  1 |    |    |    ||P2 Mc=(4f,3r,3w)
|| dec (irr+d)     | 6 | Z80 only   || 3 |pf|oc |or|  || 6  | 23 |    |    || 5  | 17 |  6 |    |    |    ||P2 Mc=(4f,4f,3f,3r,3w)
|| djnz e          | 1 | Z80 only   || 2 |10|e-2|  |  || 2  |  8 | 3  | 13 || 1  |  4 |  4 | 2  |  7 |  6 ||P2+ Mc=(4f,PC+=1)|Mc=(4f,3f)
|| ex de,hl        | 1 | Z80 & 8080 || 1 |EB|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| ex af,af'       | 2 | Z80 only   || 1 |08|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| exx             | 3 | Z80 only   || 1 |D9|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| ex (sp),hl      | 4 | Z80 & 8080 || 1 |E3|   |  |  || 5  | 19 |    |    || 5  | 16 |  3 |    |    |    ||P2 Mc=(4f,3r,3w,3r,3w)
|| ex (sp),ix      | 5 | Z80 only   || 2 |DD|E3 |  |  || 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2 Mc=(4f,4f,3r,3w,3r,3w)
|| ex (sp),iy      | 5 | Z80 only   || 2 |FD|E3 |  |  || 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2 Mc=(4f,4f,3r,3w,3r,3w)
|| in r,(n)        | 1 | Z80 & 8080 || 2 |oc|or |  |  || 3  | 11 |    |    || 3  | 11 |    |    |    |    ||P2 Mc={4f,3f,4i}
|| in r,(c)        | 2 | Z80 only   || 2 |pf|oc |  |  || 3  | 12 |    |    || 3  | 12 |    |    |    |    ||P2 Mc=(4f,4f,4i)
|| in f,(c)        | 3 | Z80-Undoc  || 2 |pf|oc |  |  || 3  | 12?|    |    || 3  | 12 | 0? |    |    |    ||P2 Mc=(4f,4f,4i) :: input renders flags
|| inc r           | 1 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| inc ir          | 2 | Z80-Undoc  || 2 |pf|oc |  |  || 2  | 8? |    |    || 2  |  8 | 0? |    |    |    ||Z80 Mc=?(4f,4f) ir={xh,xl,yh,yl}
|| inc rr          | 3 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  6 |    |    || 1  |  4 |  2 |    |    |    ||P2 Mc=(4f) 16-bit
|| inc irr         | 4 | Z80 only   || 2 |pf|oc |  |  || 2  | 10 |    |    || 2  |  8 |  2 |    |    |    ||P2 Mc=(4f,4f) 16-bit, irr={ix,iy}
|| inc (hl)        | 5 | Z80 & 8080 || 1 |oc|   |  |  || 3  | 11 |    |    || 3  | 10 |  1 |    |    |    ||P2 Mc=(4f,3r,3w)
|| inc (irr+d)     | 6 | Z80 only   || 3 |pf|oc |or|  || 6  | 23 |    |    || 5  | 17 |  6 |    |    |    ||P2 Mc=(4f,4f,3f,3r,3w)
|| ind             | 1 | Z80 only   || 2 |ED|AA |  |  || 4  | 16 |    |    || 4  | 15 |  1 |    |    |    ||P2 Mc=(4f,4f,4i,3w)
|| ini             | 1 | Z80 only   || 2 |ED|A2 |  |  || 4  | 16 |    |    || 4  | 15 |  1 |    |    |    ||P2 Mc=(4f,4f,4i,3w)
|| indr            | 2 | Z80 only   || 2 |ED|BA |  |  || 4  | 16 | 5  | 21 || 4  | 15 |  1 | 2+ | 7+ | 14 ||P2+ Mc=(4f,4f,(L*(4i+3w)+Rt*4f))
|| inir            | 2 | Z80 only   || 2 |ED|B2 |  |  || 4  | 16 | 5  | 21 || 4  | 15 |  1 | 2+ | 7+ | 14 ||P2+ Mc=(4f,4f,(L*(4i+3w)+Rt*4f))
|| jp nn           | 1 | Z80 & 8080 || 3 |oc|n  |n |  ||    |    | 3  | 10 ||    |    |    | 3  | 10 |    ||P2 Mc=(4f,3f,3f)
|| jp cc,nn        | 2 | Z80 & 8080 || 3 |oc|n  |n |  || 3  | 10 | 3  | 10 || 1  |  4 |  6 | 3  | 10 |    ||P2+ Mc=(4f,PC+=2)|Mc=(4f,3f,3f)
|| jp (rr)         | 3 | Z80 & 8080 || 1 |oc|   |  |  ||    |    | 1  |  4 ||    |    |    | 1  |  4 |    ||P2 Mc=(4f)
|| jp (irr)        | 4 | Z80 only   || 2 |pf|oc |  |  ||    |    | 2  |  8 ||    |    |    | 2  |  8 |    ||P2 Mc=(4f,4f) irr={ix,iy}
|| jr e            | 1 | Z80 only   || 2 |oc|or |  |  ||    |    | 3  | 12 ||    |    |    | 2  |  7 |  5 ||P2 Mc=(4f,3f)
|| jr cc,e         | 2 | Z80 only   || 2 |oc|or |  |  || 2  |  7 | 3  | 12 || 1  |  4 |  3 | 2  |  7 |  5 ||P2+ Mc=(4f,PC+=1)|Mc=(4f,3f)
|| ld r,r          | 1 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| ld r,ir         | 2 | Z80-Undoc  || 2 |pf|oc |  |  || 2  | 8? |    |    || 2  |  8 | 0? |    |    |    ||Z80 Mc=?(4f,4f) ir={xh,xl,yh,yl}
|| ld a,z          | 3 | Z80 only   || 2 |pf|oc |  |  || 2  |  9 |    |    || 2  |  8 |  1 |    |    |    ||P2 Mc=(4f,4f) z={i,r}
|| ld z,a          | 4 | Z80 only   || 2 |pf|oc |  |  || 2  |  9 |    |    || 2  |  8 |  1 |    |    |    ||P2 Mc=(4f,4f) z={i,r}
|| ld ir,r         | 5 | Z80-Undoc  || 2 |pf|oc |  |  || 2  | 8? |    |    || 2  |  8 | 0? |    |    |    ||Z80 Mc=?(4f,4f) ir={xh,xl,yh,yl}
|| ld ir,ir        | 6 | Z80-Undoc  || 2 |pf|oc |  |  || 2  | 8? |    |    || 2  |  8 | 0? |    |    |    ||Z80 Mc=?(4f,4f) ir={xh,xl,yh,yl}
|| ld sp,hl        | 7 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  6 |    |    || 1  |  4 |  2 |    |    |    ||P2 Mc=(4f) 16-bit
|| ld rr,irr       | 8 | Z80 only   || 2 |pf|oc |  |  || 2  | 10 |    |    || 2  |  8 |  2 |    |    |    ||P2 Mc=(4f,4f) irr={ix,iy}
|| ld r,n          | 9 | Z80 & 8080 || 2 |oc|or |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3f)
|| ld ir,n         | 10| Z80-Undoc  || 3 |pf|oc |or|  || 3  | 11?|    |    || 3  | 11 | 0? |    |    |    ||Z80 Mc=?(4f,4f,3f) ir={xh,xl,yh,yl}
|| ld bc,nn        | 11| Z80 & 8080 || 3 |oc|or |or|  || 3  | 10 |    |    || 3  | 10 |    |    |    |    ||P2 Mc=(4f,3f,3f) 16-bit
|| ld irr,nn       | 12| Z80 only   || 4 |pf|oc |or|or|| 4  | 14 |    |    || 4  | 14 |    |    |    |    ||P2 Mc=(4f,4f,3f,3f) irr={ix,iy}
|| ld r,(rr)       | 13| Z80 & 8080 || 1 |oc|   |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3f)
|| ld r,(irr+d)    | 14| Z80 only   || 3 |pf|oc |or|  || 5  | 19 |    |    || 4  | 14 |  5 |    |    |    ||P2 Mc=(4f,4f,3f,3r) irr={ix,iy}
|| ld a,(nn)       | 15| Z80 & 8080 || 3 |oc|or |or|  || 4  | 13 |    |    || 4  | 13 |    |    |    |    ||P2 Mc=(4f,3f,3f,3r)
|| ld hl,(nn)      | 16| Z80 & 8080 || 3 |oc|or |or|  || 5  | 16 |    |    || 5  | 16 |    |    |    |    ||P2 Mc=(4f,3f,3f,3r,3r) 16-bit
|| ld rr,(nn)      | 17| Z80 only   || 4 |pf|oc |or|or|| 6  | 20 |    |    || 6  | 20 |    |    |    |    ||P2 Mc=(4f,4f,3f,3f,3r,3r) 16-bit
|| ld hl,(nn)      | 18| Z80-Undoc  || 4 |pf|oc |or|or|| 6  | 20?|    |    || 6  | 20 | 0? |    |    |    ||Z80 Mc=?(4f,4f,3f,3f,3r,3r) 16-bit
|| ld irr,(nn)     | 19| Z80 only   || 4 |pf|oc |or|or|| 6  | 20 |    |    || 6  | 20 |    |    |    |    ||P2 Mc=(4f,4f,3f,3f,3r,3r) irr=(ix,iy)
|| ld (hl),r       | 20| Z80 & 8080 || 1 |oc|   |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3w)
|| ld (rr),a       | 21| Z80 & 8080 || 1 |oc|   |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3w)
|| ld (ix+d),a     | 22| Z80 only   || 3 |pf|oc |or|  || 5  | 19 |    |    || 4  | 14 |  5 |    |    |    ||P2 Mc=(4f,4f,3f,3w)
|| ld (nn),a       | 23| Z80 & 8080 || 3 |oc|or |or|  || 4  | 13 |    |    || 4  | 13 |    |    |    |    ||P2 Mc=(4f,3f,3f,3w)
|| ld (nn),hl      | 24| Z80 & 8080 || 3 |oc|or |or|  || 5  | 16 |    |    || 5  | 16 |    |    |    |    ||P2 Mc=(4f,3f,3f,3w,3w) 16-bit
|| ld (nn),rr      | 25| Z80 only   || 4 |pf|oc |or|or|| 6  | 20 |    |    || 6  | 20 |    |    |    |    ||P2 Mc=(4f,4f,3f,3f,3w,3w) 16-bit
|| ld (nn),hl      | 26| Z80-Undoc  || 4 |pf|oc |or|or|| 6  | 20?|    |    || 6  | 20 | 0? |    |    |    ||Z80 Mc=?(4f,4f,3f,3f,3w,3w) 16-bit
|| ld (nn),irr     | 27| Z80 only   || 4 |pf|oc |or|or|| 6  | 20 |    |    || 6  | 20 |    |    |    |    ||P2 Mc=(4f,4f,3f,3f,3w,3w) irr={ix,iy}
|| ld (hl),n       | 28| Z80 & 8080 || 2 |oc|or |  |  || 3  | 10 |    |    || 3  | 10 |    |    |    |    ||P2 Mc=(4f,3f,3w)
|| ld (irr+d),n    | 29| Z80 only   || 4 |pf|oc |or|or|| 5  | 19 |    |    || 5  | 17 |  2 |    |    |    ||P2 Mc=(4f,4f,3f,3f,3w)
|| ldd             | 1 | Z80 only   || 2 |ED|A8 |  |  || 4  | 16 |    |    || 4  | 14 |  2 |    |    |    ||P2 Mc=(4f,4f,3r,3w)
|| ldi             | 1 | Z80 only   || 2 |ED|A0 |  |  || 4  | 16 |    |    || 4  | 14 |  2 |    |    |    ||P2 Mc=(4f,4f,3r,3w)
|| lddr            | 2 | Z80 only   || 2 |ED|B8 |  |  || 4  | 16 | 5  | 21 || 4  | 14 |  2 | 2+ | 6+ | 15 ||P2+ Mc=(4f,4f,(L*(3r+3w)+Rt*4f))
|| ldir            | 2 | Z80 only   || 2 |ED|B0 |  |  || 4  | 16 | 5  | 21 || 4  | 14 |  2 | 2+ | 6+ | 15 ||P2+ Mc=(4f,4f,(L*(3r+3w)+Rt*4f))
|| or r            | 1 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| or ir           | 2 | Z80-Undoc  || 2 |pf|oc |  |  || 2  | 8? |    |    || 2  |  8 | 0? |    |    |    ||Z80 Mc=?(4f,4f) ir={xh,xl,yh,yl}
|| or n            | 3 | Z80 & 8080 || 2 |oc|or |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3f)
|| or (hl)         | 4 | Z80 & 8080 || 1 |oc|   |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3r)
|| or (irr+d)      | 5 | Z80 only   || 3 |pf|oc |or|  || 5  | 19 |    |    || 4  | 14 |  5 |    |    |    ||P2 Mc=(4f,4f,3f,3r)
|| out (n),a       | 1 | Z80 & 8080 || 2 |oc|or |  |  || 3  | 11 |    |    || 3  | 11 |    |    |    |    ||P2 Mc=(4f,3f,4o)
|| out (c),r       | 2 | Z80 only   || 2 |pf|oc |  |  || 3  | 12 |    |    || 3  | 12 |    |    |    |    ||P2 Mc=(4f,4f,4o)
|| out (c),f       | 3 | Z80-Undoc  || 2 |pf|oc |  |  || 3  | 12?|    |    || 3  | 12 | 0? |    |    |    ||Z80 Mc=?(4f,4f,4o)
|| outd            | 1 | Z80 only   || 2 |ED|AB |  |  || 4  | 16 |    |    || 4  | 15 |  1 |    |    |    ||P2 Mc=(4f,4f,4i,3w)
|| outi            | 1 | Z80 only   || 2 |ED|A3 |  |  || 4  | 16 |    |    || 4  | 15 |  1 |    |    |    ||P2 Mc=(4f,4f,4i,3w)
|| otdr            | 2 | Z80 only   || 2 |ED|BB |  |  || 4  | 16 | 5  | 21 || 4  | 15 |  1 | 2+ | 7+ | 14 ||P2+ Mc=(4f,4f,(L*(3r+4o)+Rt*4f))
|| otir            | 2 | Z80 only   || 2 |ED|B3 |  |  || 4  | 16 | 5  | 21 || 4  | 15 |  1 | 2+ | 7+ | 14 ||P2+ Mc=(4f,4f,(L*(3r+4o)+Rt*4f))
|| pop rr          | 1 | Z80 & 8080 || 1 |oc|   |  |  || 3  | 10 |    |    || 3  | 10 |    |    |    |    ||P2 Mc=(4f,3r,3r) 16-bit
|| pop irr         | 2 | Z80 only   || 2 |pf|oc |  |  || 4  | 14 |    |    || 4  | 14 |    |    |    |    ||P2 Mc=(4f,4f,3r,3r) irr={ix,iy}
|| push rr         | 1 | Z80 & 8080 || 1 |oc|   |  |  || 3  | 11 |    |    || 3  | 10 |  1 |    |    |    ||P2 Mc=(4f,3w,3w) 16-bit
|| push irr        | 2 | Z80 only   || 2 |pf|oc |  |  || 4  | 15 |    |    || 4  | 14 |  1 |    |    |    ||P2 Mc=(4f,4f,3w,3w) irr={ix,iy}
|| res b,r         | 1 | Z80 only   || 2 |pf|oc |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| res b,(hl)      | 2 | Z80 only   || 2 |pf|oc |  |  || 4  | 15 |    |    || 4  | 14 |  1 |    |    |    ||P2 Mc=(4f,4f,3r,3w)
|| res b,(irr+d)   | 3 | Z80 only   || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w)
|| res b,r,(irr+d) | 4 | Z80-Undoc  || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w) r<<-result
|| ret             | 1 | Z80 & 8080 || 1 |oc|   |  |  ||    |    | 3  | 10 ||    |    |    | 3  | 10 |    ||P2 Mc=(4f,3r,3r)
|| ret cc          | 2 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  5 | 3  | 11 || 1  |  4 |  1 | 3  | 10 |    ||P2 Mc=(4f)|P2 Mc=(4f,3r,3r)
|| reti            | 3 | Z80 only   || 2 |pf|oc |  |  ||    |    | 4  | 14 ||    |    |    | 4  | 14 |    ||P2 Mc=(4f,4f,3r,3r)
|| retn            | 4 | Z80 only   || 2 |pf|oc |  |  ||    |    | 4  | 14 ||    |    |    | 4  | 14 |    ||P2 Mc=(4f,4f,3r,3r)
|| rl r            | 1 | Z80 only   || 2 |pf|oc |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| rl (hl)         | 2 | Z80 only   || 2 |pf|oc |  |  || 4  | 15 |    |    || 2  | 14 |  1 |    |    |    ||P2 Mc=(4f,4f,3r,3w)
|| rl (irr+d)      | 3 | Z80 only   || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w)
|| rl r,(irr+d)    | 4 | Z80-Undoc  || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w) r<<-Result
|| rlc r           | 1 | Z80 only   || 2 |pf|oc |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| rlc (hl)        | 2 | Z80 only   || 2 |pf|oc |  |  || 4  | 15 |    |    || 2  | 14 |  1 |    |    |    ||P2 Mc=(4f,4f,3r,3w)
|| rlc (irr+d)     | 3 | Z80 only   || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w)
|| rlc r,(irr+d)   | 4 | Z80-Undoc  || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w) r<<-Result
|| rlca            | 1 | Z80 & 8080 || 1 |07|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| rrca            | 2 | Z80 & 8080 || 1 |0F|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| rla             | 3 | Z80 & 8080 || 1 |17|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| rra             | 4 | Z80 & 8080 || 1 |1F|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| rld             | 5 | Z80 only   || 2 |ED|6F |  |  || 5  | 18 |    |    || 4  | 14 |  4 |    |    |    ||P2 Mc=(4f,4f,3r,3w)
|| rrd             | 6 | Z80 only   || 2 |ED|67 |  |  || 5  | 18 |    |    || 4  | 14 |  4 |    |    |    ||P2 Mc=(4f,4f,3r,3w)
|| rr r            | 1 | Z80 only   || 2 |pf|oc |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| rr (hl)         | 2 | Z80 only   || 2 |pf|oc |  |  || 4  | 15 |    |    || 2  | 14 |  1 |    |    |    ||P2 Mc=(4f,4f,3r,3w)
|| rr (irr+d)      | 3 | Z80 only   || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w)
|| rr r,(irr+d)    | 4 | Z80-Undoc  || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w) r<<-Result
|| rrc r           | 1 | Z80 only   || 2 |pf|oc |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| rrc (hl)        | 2 | Z80 only   || 2 |pf|oc |  |  || 4  | 15 |    |    || 2  | 14 |  1 |    |    |    ||P2 Mc=(4f,4f,3r,3w)
|| rrc (irr+d)     | 3 | Z80 only   || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w)
|| rrc r,(irr+d)   | 4 | Z80-Undoc  || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w) r<<-Result
|| rst p           | 1 | Z80 & 8080 || 1 |oc|   |  |  ||    |    | 3  | 11 ||    |    |    | 3  | 10 |  1 ||P2 Mc=(4f,3w,3w)
|| sbc a,r         | 1 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| sbc a,ir        | 2 | Z80-Undoc  || 2 |pf|oc |  |  || 2  | 8? |    |    || 2  |  8 | 0? |    |    |    ||Z80 Mc=?{4f,4f} ir={xh,xl,yh,yl}
|| sbc hl,rr       | 3 | Z80 only   || 2 |pf|oc |  |  || 4  | 15 |    |    || 2  |  8 |  7 |    |    |    ||P2 Mc=(4f,4f) 16-bit
|| sbc a,n         | 4 | Z80 & 8080 || 2 |oc|or |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3f)
|| sbc a,(hl)      | 5 | Z80 & 8080 || 1 |oc|   |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3r)
|| sbc a,(irr+d)   | 6 | Z80 only   || 3 |pf|oc |or|  || 5  | 19 |    |    || 4  | 14 |  5 |    |    |    ||P2 Mc=(4f,4f,3f,3r)
|| set b,r         | 1 | Z80 only   || 2 |pf|oc |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| set b,(hl)      | 2 | Z80 only   || 2 |pf|oc |  |  || 4  | 15 |    |    || 4  | 14 |  1 |    |    |    ||P2 Mc=(4f,4f,3r,3w)
|| set b,(irr+d)   | 3 | Z80 only   || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w)
|| set b,r,(irr+d) | 4 | Z80-Undoc  || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w) r<<-result
|| sla r           | 1 | Z80 only   || 2 |pf|oc |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| sla (hl)        | 2 | Z80 only   || 2 |pf|oc |  |  || 4  | 15 |    |    || 2  | 14 |  1 |    |    |    ||P2 Mc=(4f,4f,3r,3w)
|| sla (ix+d)      | 3 | Z80 only   || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w)
|| sla (iy+d)      | 4 | Z80-Undoc  || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w) r<<-Result
|| sll r           | 1 | Z80-Undoc  || 2 |pf|oc |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| sll (hl)        | 2 | Z80-Undoc  || 2 |pf|oc |  |  || 4  | 15 |    |    || 2  | 14 |  1 |    |    |    ||P2 Mc=(4f,4f,3r,3w)
|| sll (ix+d)      | 3 | Z80-Undoc  || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w)
|| sll (iy+d)      | 4 | Z80-Undoc  || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w) r<<-Result
|| sra r           | 1 | Z80 only   || 2 |pf|oc |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| sra (hl)        | 2 | Z80 only   || 2 |pf|oc |  |  || 4  | 15 |    |    || 2  | 14 |  1 |    |    |    ||P2 Mc=(4f,4f,3r,3w)
|| sra (ix+d)      | 3 | Z80 only   || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w)
|| sra (iy+d)      | 4 | Z80-Undoc  || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 6  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w) r<<-Result
|| srl r           | 1 | Z80 only   || 2 |pf|oc |  |  || 2  |  8 |    |    || 2  |  8 |    |    |    |    ||P2 Mc=(4f,4f)
|| srl (hl)        | 2 | Z80 only   || 2 |pf|oc |  |  || 4  | 15 |    |    || 4  | 14 |  1 |    |    |    ||P2 Mc=(4f,4f,3r,3w)
|| srl (ix+d)      | 3 | Z80 only   || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 4  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w)
|| srl (iy+d)      | 4 | Z80-Undoc  || 4 |pf|pf |or|oc|| 6  | 23 |    |    || 4  | 20 |  3 |    |    |    ||P2+ Mc=(4f,4f,3f,3f*,3r,3w) r<<-Result
|| sub a,r         | 1 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| sub a,ir        | 2 | Z80-Undoc  || 2 |pf|oc |  |  || 2  | 8? |    |    || 2  |  8 | 0? |    |    |    ||Z80 Mc=?{4f,4f} ir={xh,xl,yh,yl}
|| sub a,n         | 3 | Z80 & 8080 || 2 |oc|or |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3f)
|| sub a,(hl)      | 4 | Z80 & 8080 || 1 |oc|   |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3r)
|| sub a,(irr+d)   | 5 | Z80 only   || 3 |pf|oc |or|  || 5  | 19 |    |    || 4  | 14 |  5 |    |    |    ||P2 Mc=(4f,4f,3f,3r)
|| xor r           | 1 | Z80 & 8080 || 1 |oc|   |  |  || 1  |  4 |    |    || 1  |  4 |    |    |    |    ||P2 Mc=(4f)
|| xor ir          | 2 | Z80-Undoc  || 2 |pf|oc |  |  || 2  | 8? |    |    || 2  |  8 | 0? |    |    |    ||Z80 Mc=?(4f,4f) ir={xh,xl,yh,yl}
|| xor n           | 3 | Z80 & 8080 || 2 |oc|or |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3f)
|| xor (hl)        | 4 | Z80 & 8080 || 1 |oc|   |  |  || 2  |  7 |    |    || 2  |  7 |    |    |    |    ||P2 Mc=(4f,3r)
|| xor (irr+d)     | 5 | Z80 only   || 3 |pf|oc |or|  || 5  | 19 |    |    || 4  | 14 |  5 |    |    |    ||P2 Mc=(4f,4f,3f,3r)

[Updated on: Sun, 24 April 2022 12:11]

Report message to a moderator

Project (2): In-Circuit Z80 Emulator :: T-state Reductions [message #9844 is a reply to message #9824] Wed, 30 March 2022 22:07 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
Z80 MEMORY REFRESH:
From the Zilog Z80 CPU technical manual, page 4:
"5. Memory Refresh Register (R).
The Z80 CPU contains a memory refresh counter to enable dynamic memories to be used with the same ease as static memories. Seven bits of this 8-bit register are automatically incremented after each instruction fetch. The eighth bit will remain as programmed as the result of an LD R,A instruction. The data in the refresh counter is sent out on the lower portion of the address bus along with with a refresh control signal while the CPU is decoding and execution the fetched instruction. This mode of refresh is totally transparent to the programmer and does not slow down the CPU operation. The programmer can load the R register for testing purposes, but this register is normally not used by the programmer. During refresh, the contents of the I register are placed on the upper 8 bits of the address bus.
"

The underlined sentence above reveals that the Z80, when reading an instruction prefix or opcode, requires an additional T-state before the the next bus cycle can begin. The Z80 also uses that delay to conclude a refresh cycle with the fetch.

The P2 emulating a Z80 does not need that additional T-state but will use a 4 T-state fetch bus cycle when it executes a fetch with refresh. The P2 has the option to fetch in 3 T-states if a refresh is not needed.

From the Zilog Z80 CPU technical manual, page 8:
"RFSH# Output active low. RFSH# indicates that the lower 7-bits of the address bus, A0:6, contains a refresh address for dynamic memories and the current MREQ# signal should be used to do a refresh read to all dynamic memories."

The underlined Z80 signals names above are the only signals necessary to issue a refresh cycle.


Z80 INTERESTING OBSERVATION:
From the Zilog Z80 CPU technical manual, page 12 and 13:
Comparing the READ timing between Fig 4.0-1 "INSTRUCTION OP CODE FETCH" (page 12) and Fig 4.0-2 "MEMORY READ OR WRITE CYCLES" (page 13), it shows that a fetch memory read cycle happens in 2 T-states (when no wait states added), instead of the standard memory read of 2.5 rounded up to 3 T-states.

The P2 emulating a Z80 could potentially reduce all memory reads from 3 T-states down to 2 T-states without adversely affecting the vintage system. As vintage Z80 systems fetch, with their selected number of wait-states, can accomplish this fast fetch bus cycle already, it follows that they are capable of reading memory equally fast.

Note that memory writes take 3T so no similar savings there.

- - -
P2 TAKING LIBERTIES WITH A Z80 REFRESH:
It follows from above that a refresh bus cycle appended to an opcode fetch takes 4 T-states (4T). That is 2T for the fast memory read (fetch) and another 2T to transition that read into a refresh. The only signals that define a refresh cycle are RFSH# and AD00:06 (7 bits from the R register) valid for 2T and MREQ# going active for for 1T in the middle of that 2T span.

So now radical bus timing will be explored for appending a refresh onto memory writes and port input/output cycles, to identify the most efficient choices.

1. Adding refresh to a normal memory read bus cycle
A normal read bus cycle takes 3T, even though the observation above shows that it *might* safely execute in 2T under P2 control. To append a refresh to a normal memory read, you would blend a +1T to add the transition from read to refresh, for 4T total. This should not adversely affect the vintage system. An appended refresh needs at least 2 clock cycles to function hence the term "blending" above. Converting a 3T memory read into a 4T memory read with refresh is certain to work as it is basically the same memory access as a fetch except for the uneededed the M1 signal.

2. Adding refresh to a normal memory write bus cycle
All write bus cycle takes 2.5T to 3T and appending a refresh adds another 2T; 5T total if no 'blending' can be applied. This makes a memory write(3T) a secondary choice for appending a refresh because a memory read can do it faster with just +1T, whereas the memory write appears to be +2T. I'll dig into the timing figures deeper to see if a valid memory write with refresh can be done in 4T; I'm relying of the signal diagrams for now.

3. Adding refresh to a normal port input or output bus cycle
A normal port input or output takes 4T total, 3T for signalling and 1T automatic wait state inserted. To append a refresh, another 2T in needed for 6T total or an alternate +2T.

Thus a memory read is the most efficient place to append a refresh; adding only a 1T to the bus cycle. If a refresh becomes more critical, it could be appended to a memory write (3T+2T) or port input/output (4T+2T). None of these should adversely affect a vintage system. Hopefully no systems wrongly required a M1 (fetch) in their refresh logic. If any vintage system did that, the P2(Z80) could offer an option to disable appended refreshes.

An option to disable all the P2 bus cycle enhancements might be useful to test it running just like a physical Z80 as through its 40 pin connection to the vintage system.


Next... a few notes on have the "P2+" designation found some other T-states to save, beyond those that the Z80 added for internal resource accesses.

There are now three Types of "P2+" T-state savings that affect certain instructions. A description of them follows:

EDITING THIS SECTION

REMOVING WASTED BUS CYCLES FROM "AUTO-LOOPING" INSTRUCTIONS:
The following instructions use an auto-looping execution until a terminating condition occurs. The Z80 loops (branch) by adding empty bus cycles while the microcode grabs the program counter and subtracts 2 from it so that it will next RE-READ that instructions AGAIN at the top of every loop, with all the refreshes included.

TYPE 1A defines an early, efficient auto-looping execution. It reads the prefix+opcode ONCE (4T+4T) and thereafter executes the loop IN-PLACE. This immediately removes 8T from every loop, but it requires that the execution add refresh cycle in the loop, as needed. Type 1A was originally assigned to all the auto-loop instructions but now Type 1B is more efficient and used when a memory read(3T) in inside the loop.


TYPE 1B adds a mere +1T to these instruction's loop where they contain a memory read(3T). When a refresh is needed, the memory read(3T) is turned into a memory read+refresh(4T). Type 1B is used in the following CP?R, LD?R and OT?R instruction descriptions. This is very efficient on the CP?R instructions which will be described first. But the IN?R instruction doesn't have a memory read(3T) in the internal loop, so either the Type 1A will be used or a Type 1B variant where a memory write(3T) is changed into a memory write+prefix(5T). That will be described in the last of the four sets of auto-loop instructions.
                        ;              read only   read+refresh
                        ;               |_____3r|__4r|
        cpdr            ;P2+ Mc=(4f,4f,((L-R)*3r+R*4r))
        cpir            ;P2+ Mc=(4f,4f,((L-R)*3r+R*4r))

                        ;               read only        read+refresh
                        ;               |______3r____|___4r___|
        lddr            ;P2+ Mc=(4f,4f,((L-R)*(3r+3w)+R*(4r+3w))
        ldir            ;P2+ Mc=(4f,4f,((L-R)*(3r+3w)+R*(4r+3w))

                        ;              read only         read+refresh
                        ;               |______3r____|___4r____|
        otdr            ;P2+ Mc=(4f,4f,((L-R)*(3r+4o)+R*(4r+4o)))
        otir            ;P2+ Mc=(4f,4f,((L-R)*(3r+4o)+R*(4r+4o)))


THE CPDR and CPIR INSTRUCTIONS
In the code block below, the CPDR and CPIR auto-looping instructions are listed with the P2+ algebraic definition in the right-most column. The best refresh placement is to turn a 3T memory read into a 4T memory read with refresh. The algebraic formula would then have two loop timings, one for no refresh and another for refresh:
||                 ||                || Z80 Timing        || P2(Z80) Timing              || LOOP: 1 memory read (3T)
||Assembly Language||Machine Code    ||No Branch|Branch   ||No Branch     |Branch        ||              read only   read+refresh
||INSTRUCTION      ||CNT|B1|B2 |B3|B4||MC  |TS  |MC  |TS  ||MC  |TS  |Tsav|MC  |TS  |Tsav||               |_____3r|__4r|
|| cpdr            || 2 |ED|B9 |  |  || 4  | 16 | 5  | 21 || 3  | 11 |  5 | 1+ | 3+ | 18 ||P2+ Mc=(4f,4f,((L-R)*3r+R*4r))
|| cpir            || 2 |ED|B1 |  |  || 4  | 16 | 5  | 21 || 3  | 11 |  5 | 1+ | 3+ | 18 ||P2+ Mc=(4f,4f,((L-R)*3r+R*4r))
When the prefix+opcode is read(4T+4T), two refreshes occur. A normal branch loop on a Z80 takes 21T-states. As long as we maintain two refreshes every 21 T-states, we assure that we maintain the Z80 'machine-gunning' refresh rate which assures it maintains perfect refresh on vintage systems.

21T Refresh Stage 1:
The two byte instruction is fetched once, the (4T+4T) above. To calculate how many loops can run under that refresh, the calculation is:
(21 T-states - 8 T-states used by the two bytes instruction fetch) / memory read(3 T-states). (21-8)/3= 4.3 or four loops.
04T     Fetch:  Prefix fetch+refresh(4T)
08T     Fetch:  Opcode fetch+refresh(4T)
11T     Loop 1: Memory read(3T)
14T     Loop 2: Memory read(3T)
17T     Loop 3: Memory read(3T)
20T     Loop 4: Memory read(3T)
21T     1T accumulated as spare
So... Stage 1 refreshed faster than the Z80 WHILE running the loop FOUR TIMES QUICKER! Remember the Z80 executes each loop in 21T.

21T Refresh Stage 2:
Hereafter, in Stage 2, the instruction is never re-read again like the Z80 does. All refreshes are now done in two-loop memory read+refresh(4T) in addition to as many full non-refresh loops can run within 21T plus the total accumulated spares T-cycles. Full non-refresh loops count is calculated by:
(21 T-states - 8 T-states used by two LOOPS with a memory read+refresh(4T) in each) / memory read(3T) = (21-8)/3= 4.3 or four loops.
SUM     DESCRIPTION
04T     Loop 1: Memory read+refresh(4T)
08T     Loop 2: Memory read+refresh(4T)
11T     Loop 3: Memory read(3T)
14T     Loop 4: Memory read(3T)
17T     Loop 5: Memory read(3T)
20T     Loop 6: Memory read(3T)
21T     1T accumulated spare, total now 2T to spare
So... Stage 2 is refreshing faster than the Z80 WHILE running the loop SIX TIMES QUICKER from here on.

Because CPDR/CPIR accumulates 1T in each Stage 2 pass and the non-refresh loop is only 3T, every third Stage 2 pass will have 3T spares, and be able to add a seventh loop with 0T spares afterward. This pattern is unique to CPDR/CPIR:
{Stage 1: 4L + Stage 2: 6L,7L, + M*{6L,6L,7L} } where "L" is the number of loops in each Stage 2 pass, refresh and non-refresh.

Of course the auto-loop stops when the one of the two terminating conditions become true, but however long it runs it will maintain the Z80 instruction's refresh rate or better.

Looking ahead... the rest of the auto-loop commands have longer loop T-states and their patterns differ. For example the OTDR/OTIR creates a reduction of spare T-states with each Stage 2 pass and eventually has to give up a non-refresh loop where the remainder creates an amount of spare T-states that again start reducing with each Stage 2 pass.


THE LDDR and LDIR INSTRUCTIONS
In the code block below, the LDDR and LDIR auto-looping instructions are listed with the P2+ algebraic definition in the right-most column. The best refresh placement is to turn a 3T memory read into a 4T memory read with refresh. The algebraic formula would then have two loop timings, one for no refresh and another for refresh:
||                 ||                || Z80 Timing        || P2(Z80) Timing              || LOOP: 1 memory read then 1 memory write (6T)
||Assembly Language||Machine Code    ||No Branch|Branch   ||No Branch     |Branch        ||               read only        read+refresh
||INSTRUCTION      ||CNT|B1|B2 |B3|B4||MC  |TS  |MC  |TS  ||MC  |TS  |Tsav|MC  |TS  |Tsav||               |______3r____|___4r___|
|| lddr            || 2 |ED|B8 |  |  || 4  | 16 | 5  | 21 || 4  | 14 |  2 | 2+ | 6+ | 15 ||P2+ Mc=(4f,4f,((L-R)*(3r+3w)+R*(4r+3w))
|| ldir            || 2 |ED|B0 |  |  || 4  | 16 | 5  | 21 || 4  | 14 |  2 | 2+ | 6+ | 15 ||P2+ Mc=(4f,4f,((L-R)*(3r+3w)+R*(4r+3w))
When the prefix+opcode is read(4T+4T), two refreshes occur. A normal branch loop on a Z80 takes 21T-states. As long as we maintain two refreshes every 21 T-states, we assure that we maintain the Z80 'machine-gunning' refresh rate which assures it maintains perfect refresh on vintage systems.

21T Refresh Stage 1:
The two byte instruction is fetched once, the (4T+4T) above. To calculate how many loops can run under that refresh, the calculation is:
(21 T-states - 8 T-states used by the two bytes instruction fetch) / (memory read+memory write). (21-8)/6= 2.167 or two loops.
04T     Fetch:  Prefix fetch+refresh(4T)
08T     Fetch:  Opcode fetch+refresh(4T)
14T     Loop 1: Memory read(3T)+memory write(3T)
20T     Loop 2: Memory read(3T)+memory write(3T)
21T     1T accumulated as spare
So... Stage 1 refreshed faster than the Z80 WHILE running the loop TWO TIMES QUICKER! Remember the Z80 executes each loop in 21T.

21T Refresh Stage 2:
Hereafter, in Stage 2, the instruction is never re-read again like the Z80 does. All refreshes are now done in two-loop memory read+refresh(4T)+memory write(3T) in addition to as many full non-refresh loops can run within 21T plus the total accumulated spares T-cycles. Full non-refresh loops count is calculated by:
(21 T-states - 14 T-states used by two loops of memory read+refresh(4T)+memory write(3T)) / (memory read(3 T-states)+memory write(3T)). (21-14)/6= 1.167 or 1 loop.
SUM     DESCRIPTION
07T     Loop 1: Memory read+refresh(4T)+memory write(3T)
14T     Loop 2: Memory read+refresh(4T)+memory write(3T)
20T     Loop 3: Memory read(3T)+memory write(3T)
21T     1T accumulated spare, total now 2T to spare
So... Stage 2 is refreshing faster than the Z80 WHILE running the loop THREE TIMES QUICKER from here on.

Because LDDR/LDIR accumulates 1T in the Stage 1 pass and 1T in each Stage 2 pass and the non-refresh loop is 6T, the 5th Stage 2 pass will have 6T spares, and be able to add a fourth loop with 0T spares afterward. Thereafter, Stage 2 will accumulate 1T each pass and when 6T spares are available, it can add 1 additional loop in that pass. This Stage 2 only pattern repeats thereafter.

{Stage 1: 2L + Stage 2: 3L,3L,3L,3L,4L, + M*{3L,3L,3L,3L,3L,4L} } where "L" is the number of loops in each Stage 2 pass, refresh and non-refresh.

Of course the auto-loop stops when the one of the two terminating conditions become true, but however long it runs it will maintain the Z80 instruction's refresh rate or better.

* EDITOR'S NOTE:
* I'll convert OTDR/OTIR to the table of Stage 1 and Stage 2 loop passes format.
* Its much easier to understand than the algebraic and pseudo code methods.

THE OTDR and OTIR INSTRUCTIONS
In the code block below, the OTDR and OTIR auto-looping instructions are listed with the P2+ algebraic definition in the right-most column. The best refresh placement is to turn a 3T memory read into a 4T memory read with refresh. The algebraic formula would then have two loop timings, one for no refresh and another for refresh:
||                 ||                || Z80 Timing        || P2(Z80) Timing              || LOOP: 1 memory read then port output (7T)
||Assembly Language||Machine Code    ||No Branch|Branch   ||No Branch     |Branch        ||               read only        read+refresh
||INSTRUCTION      ||CNT|B1|B2 |B3|B4||MC  |TS  |MC  |TS  ||MC  |TS  |Tsav|MC  |TS  |Tsav||               |______3r____|___4r____|
|| otdr            || 2 |ED|BB |  |  || 4  | 16 | 5  | 21 || 4  | 15 |  1 | 2+ | 7+ | 14 ||P2+ Mc=(4f,4f,((L-R)*(3r+4o)+R*(4r+4o)))
|| otir            || 2 |ED|B3 |  |  || 4  | 16 | 5  | 21 || 4  | 15 |  1 | 2+ | 7+ | 14 ||P2+ Mc=(4f,4f,((L-R)*(3r+4o)+R*(4r+4o)))

I'll finish this up before the weekend...

[Updated on: Sun, 24 April 2022 12:15]

Report message to a moderator

Project (2): In-Circuit Z80 Emulator :: Bridge to 2022 [message #9845 is a reply to message #9702] Thu, 31 March 2022 23:18 Go to previous messageGo to next message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
Cluso99 wrote:
"...the design of the P2. There is even a Monitor and also a Forth language inbuilt in the 16KB ROM..."

That's interesting that it has or can have Forth in it.

I plan to add a few "new Z80 instructions" that the Z80 could process safely to basically change nothing, but that is a unique construct that the P2 could recognize and interpret a instruction bridge for external processing (i.e. outside the in-circuit Z80 emulation environment, particularly that stack). This gives the vintage system an easy method to invoke external powerful tools to access modern resources and communications without the limitations of a 64KB memory limit.

While the P2 is running these external tasks, the in-circuit Z80 emulator would just maintain DRam memory with refreshes and be available if the external task needs to access/communicate via the vintage system resources.

For example: Opening up a CP/M Z80 assembler to create a short, temporary .COM file to load a program from a memory stick to a diskette user area:
        db      DDh,FDh,C9h       ;an "ix iy twiddled-return" Z80-safe instruction-bridge to P2
        db      "load M:\Z80\CPM\BASIC\GAMES\HAMURABI.BAS B8:HAMURABI.BAS"
        db      00h               ;terminate string and instruction

Another example might be to download the file from the internet instead if the P2(Z80)'s USB link has access.

NOTE: the ix-iy-twiddle-return instruction will execute on a Z80 as effectively a NOP,NOP,RET sequence, but as there is little reason to execute such a RET, so its a fair construct for a new Z80 instruction working as a P2 instruction-bridge to external tasks.

The construct above could be easily done in a CP/M CCP hack. If the CCP reads in a line with a "@" prefix character it simply copies the line to a instruction-bridge buffer with the "@" removed and the terminates the copied string with a 00h, then it calls the "ix-iy-twiddle-return" instruction preceeding the instruction-bridge buffer.
BRIDGE: db DDh,FDh,C9h ;The P2 will interpret this as a command to process the following text string, then return.
BUFFER: ds 256 ;Place the text string here and terminate it with a 00h null.

With a P2(Z80) and the CCP hack installed, type this like into a CP/M prompt and the P2 will the command. CP/M might give you an error message even so... but the CCP should include a cancellation of that entry and just return a new CP/M prompt.
A0: @load M:\Z80\CPM\BASIC\GAMES\HAMURABI.BAS B8:HAMURABI.BAS

Another useful construct I plan to include in the P2(Z80) would only operate inside designated vintage system memory blocks. Basically the in-circuit Z80 emulation would recognize any program counter pointing into a designated 256 block of memory as an instruction-bridge request and use the low address byte of (pc) as an index parameter passed to an external index table of 256 external routines. That block would be treated as entry by CALL instructions only as the task will terminate with a RET with the stack.

This construct could allow the BIOS and other parts of CP/M to be transferred outside of the 64KB vintage system space, thus increasing the TPA area. CP/M 2.2 with 63.75KB of TPA?

Just call an address in the range of FF00:FFFFh and you have 256 intercepted vectors for the P2 index, run, then issue a RET instruction.

Another way to use the 256 call-table would be to write a CP/M interface program using those calls and run a small routine to tell the P2 which available functions to put into its interpretive call-table. If you're writing SWEEP-22.com, you might just include P2 functions that traverse file resources, identify file characteristics, and make transfer between the external resource and the host vintage system.

There are endless ways to do this once your grab hold of the useful-insanity of this idea. heheh


UPDATE: 04/12/2022
I was reviewing my collection of UNDOC articles and found that someone in the past has added some 'synthetic Z80 instructions' in the unused ED range. I've marked these for possible inclusion in the P2-based in-circuit Z80 emulator. Maybe no one still uses those, but they spent the time to think about so I'll see about adding them to my table.

[Updated on: Sun, 24 April 2022 12:19]

Report message to a moderator

Re: Project (2) P2(Z80) Bridging the Z80 to modern resources [message #9846 is a reply to message #9845] Sat, 02 April 2022 18:02 Go to previous messageGo to next message
cluso99 is currently offline  cluso99
Messages: 40
Registered: June 2017
Member
"Another useful construct I plan to include in the P2(Z80) would only operate inside designated vintage system memory blocks. Basically the in-circuit Z80 emulation would recognize any program counter pointing into a designated 256 block of memory as an instruction-bridge request and use the low address byte of (pc) as an index parameter passed to an external index table of 256 external routines. That block would be treated as entry by CALL instructions only as the task will terminate with a RET with the stack.

This construct could allow the BIOS and other parts of CP/M to be transferred outside of the 64KB vintage system space, thus increasing the TPA area. CP/M 2.2 with 63.75KB of TPA?"

The P1 and P2 Z80/CPM emulation basically does this to a lesser extent. The BIOS call to the I/O (terminal and disk) already does this. The code to then output to the USB serial (ie PC) or to the P1/P2 video driver is done by the P1/P2 software and does not use the Z80 64KB memory space. Of course much more of the CCP/BDOS/CBIOS could be moved into the P1/P2, or an extended Z80 RAM window.

I have often thought that the whole BDOS could be re-written to use FAT32 rather than the CPM FAT system so that files would be directly accessible under the FAT32 system on the SD card. Currently, each of my CPM HDDs are 8MB contiguous files as FAT32 files. I can transfer files to/from the CPM FAT system to FAT32 system. I haven't ever had the time to try this. I would need to see if any of the CPM programs bypassed the standard BDOS and wrote directly to the CPM FAT system on HDD - they still could be trapped but it would be harder.

[Updated on: Sat, 02 April 2022 18:07]

Report message to a moderator

Project (2): In-Circuit Z80 Emulator :: Discussions :: Call to external code [message #9847 is a reply to message #9846] Sun, 03 April 2022 07:58 Go to previous message
jayindallas is currently offline  jayindallas
Messages: 110
Registered: June 2021
Senior Member
Cluso99 wrote:
"...I have often thought that the whole BDOS could be re-written to use FAT32..."

I've read those discussions. I think the solution is to just use CP/M for accessing the vintage resources and use an instruction-bridge to call external code that does the interfacing to modern resources. The ICE(Z80) using an MPU could execute FAT-32 code much faster and with less memory constraints than the vintage CPU.

In addition the ICE(Z80) can add many instruction-bridges without adding any circuitry to the vintage system. This simplifies the the Z80's ability to call external code in the ICE(Z80) MPU because the emulation of the Z80 can trap certain instructions and repurpose them to access external code (which is run externally by the emulator, maintaining the Z80 environment, even if it executes the external code in MPU native code instead of emulated Z80 instructions.

So, you don't have to take the time to rewrite CP/M to support FAT32. Just create a instruction-bridge construct that gives the vintage system a method to uniquely call external function, that will be executed by the ICE(Z80) MPU. The vintage system with CP/M is somewhat reduced to being a terminal/portal to and from its own vintage resources. Taking it further, the instruction-bridge could be used to run the operating system (like CP/M) externally; why put it in your meager 64KB Ram if you can export it to the MPU. Bigger TPAs allows larger Z80 code in the vintage system.

Back in the day a FAT32::CP/M discussion, I made the analogy that vintage system should just behave like a vintage modem program using the MPU to interface the modern resources. Pick a source, pick a destination and then download or upload the file. The dial up modem was the bridge to modern resources, back then. Or to look at it another way, the new instruction-bridge just controls a virtual modem to the modern world resources.

So a vintage system limited to 64KB just needs a concise instruction-bridge to external functions to do what it has no memory to do. Wrap it into a MODEM7.COM or MEX.COM or NEWSWEEP.COM or whatever your favorite vintage modem program was. After all, when online on a BBS, the processing load of the experience was shared between the vintage computer and the BBS host computer. That's a good model for this bridge too.

One Instruction-Bridge to MPU external functions that I've created, for example
I've refined the bridge to MPU function tables so that the 256B memory block can still be used for code, data and buffers. The MPU emulating the Z80 will only treat a call or conditional call to an address inside the designated 256 address block as a bridge-instruction. Any other instruction activity within that 256B address block is processed like a normal Z80. Note that the call to that 256B memory block can be from anywhere in the 64KB memory space.

The Bridge is recognized when the MPU processes the call instructions and finds that the destination is within the designated 256B address block. At that point the MPU emulator saves the address it would normal place on the Z80 stack, and holds it instead in the MPU, outside the emulator environment. The emulator never looks at the contents of the called address, it just keeps the low address A00:07 as an index into the MPU's external function table and starts executing the selected function as a MPU, not as a Z80. Those functions can access the MPU emulator environment to read the registers and flags in the 'Z80' as parameters to the function call. If the function returns values, they'll be placed directly into the documented Z80 registers changed by the function. When done, the saved return address in put into the program counter register of the z80 emulator environment and the next Z80 instruction is executed by the emulation.

CORRECTION on the underlined text above:
Actually the "synthetic" call instruction is never executed inside the Z80 environment, so the return address is moot because the next address following the "synthetic" call instruction would be the next instruction per the program counter increment.

However some thought needs to be taken in regard to a previous conditional call modification that would skip reading the address operands when the conditional flag is already know to be false. Rather a "Catch-22" in that you need at least the high address byte ALWAYS to test if its in the bridge instruction 256 byte block.

So the quick solution for a three byte machine code call instruction [C3 LL HH] when the flag is false, would be to just skip the low address operand [LL] and always read the high address operand [HH]. If [HH] matches the 256 block the it becomes a "synthetic" call to a bridge instruction and only then will it read the [LL] (out of order) which the synthetic call uses as an index into the function call table. The PC would then be restored to the address of the next instruction.

Only three byte call instructions would be used this way, when that feature is enabled.
Interesting twist... :)


There are several ways to deal with the conditional calls, I'm still pondering choices.
(1) conditional calls that are FALSE would not invoke a bridge and the next Z80 instruction would be started.
(2) the TRUE or FALSE is passed on to the bridge function as a quick parameter.
(3) the eight types of conditional calls would pass opcode bits 5:3 into parameter values {1:8} by index lookup and unconditional call would pass {0}.

The first would be more Z80-like, i.e. if the call condition is false, don't call the address.

But the second would allow a quick parameter to be passed without preloading additional registers.

The third would be like a ONGOTO selection, expanding the potential vectors to 9x256 or 2,304 vectors without preloading additional registers. Opcode bits 5:3 designate the flag condition of the opcode:
Where 'AA' is the high address of the bridge to MPU function table. For example 'AA'=0FFh for the block FF00h:FFFFh
Where 'ij' is the 1 of 256 low addresses in the table. 'ij' from 00:FFh
76543210        Z80 ASM            Instuction Type        MACRO       ACTION
||||||||
11001101        call AAij          call unconditionally   callT0 ij    Call table 0 funct(00:FF)
||||||||
11000100        call nz,AAij       call if not zero       callT1 ij    Call table 1 funct(00:FF)
..001...        call z, AAij       call if zero           callT2 ij    Call table 2 funct(00:FF)
..010...        call nc,AAij       call if not carry      callT3 ij    Call table 3 funct(00:FF)
..011...        call c, AAij       call if carry          callT4 ij    Call table 4 funct(00:FF)
..100...        call po,AAij       call if parity odd     callT5 ij    Call table 5 funct(00:FF)
..101...        call pe,AAij       call if parity even    callT6 ij    Call table 6 funct(00:FF)
..110...        call p, AAij       call if sign positive  callT7 ij    Call table 7 funct(00:FF)
..111...        call m, AAij       call if sign negative  callT8 ij    Call table 8 funct(00:FF)
Instead of using these Z80 instructions, a macro equivalent like "callTn" would better express the special intent of the Z80 instruction as an external function call.

CLARIFICATION: The syntax above makes it appear that the nine special call instructions can only index to a single byte of Table N's function data structure. That's just a simplification to show how these nine instruction cases can vector to 2,304 choices. The construct of each table needs to pull an address or address index greater than a single byte to fully point to a MPU function, so the [ij] index would be multiplied by the number of bytes of each vector record. The illustration just demonstrates the mapping that results in 2,304 vectors, not the actual algorithm the MPU would utilize in a data structure to vector the MPU to the actual MPU function's address.

Update Note June 24, 2022
I'll be editing my previous messages to modify the term "P2" (the Parallax P-2 MPU) to a more generic term "MPU" instead. This will be more accurate since I've switched to using a very economical Raspberry Pi Pico or Pico W as the MPU.

Update Note November 03, 2022
CPU SET IN-CIRCUIT-EMULATION SYSTEM:::
CONCEPT:
The 40-pin Dual-Inline-Package (DIP) profile is a three pcb stack:
    (1) A commercial Pico/Pico W         |    The magic software is in (1), 
    (2) The Resource/Feature pcb         |       the magic hardware is in (2) and
    (3) The CPU Adapter socket pcb       |          the no-magic signal conversion to CPU pinout is in (3).

For a CPU in-circuit-emulator, you'd have the standard multi-CPU capable (1)(2) assembly...
(1) would have the software loaded for a particular vintage CPU.
(2) has the hardware that can be manipulated by (1) to emulate the signals and voltages of a set of vintage CPUs.
(3) is unique for one type of CPU. It routes signals from (2) into the CPU pinout pattern for the vintage CPU socket.

If you had a Z80(1)(2)(3) assembly and wanted to in-circuit-emulate a 6502,
A). you'd load the 6502 software into (1),
B). then unplug the Z80(3) and plug in a 6502(3)

None of these assemblies and software exist but this is the design concept I'm pursuing.
This keeps the cost of the hardware down to a minimum and maintains CPU flexibility if done well.

[Updated on: Thu, 03 November 2022 10:38]

Report message to a moderator

Previous Topic: ECB-DMA
Next Topic: Haalllloooo...


Current Time: Sun Oct 26 01:56:16 PDT 2025

Total time taken to generate the page: 0.01494 seconds