According to the SPR016 datasheet, figure 2 shows the bit order of the address 16-bit value sent over the serial line starts with A15 and ends with A0. The data byte sequence is reversed and starts with D0 and ends with D7. Why two different bit orders in the same interface? Who knows but if there is a problem with the interface, my number one suspect is bit order.
My suggestion is to try sending the zeroth allophone to the SP0256 and see what sort of address it produces. It should a short, silent pause.
Who knows what sequence they used to capture the SP0256 internal mask ROM so it may just take some experimentation to get it right. Are you able to monitor the serial bits using a logic analyzer? That may be the way to figure this out.
The serial speech ROM interface is just weird. I think it is solvable but seems internally inconsistent.