Viewing Issue Advanced Details
ID Category [?] Severity [?] Reproducibility Date Submitted Last Update
04723 Sound Major Random Mar 10, 2012, 17:41 Jun 23, 2017, 06:49
Tester Samurai Fox View Status Public Platform MAME (Official Binary)
Assigned To Resolution Fixed OS Windows Vista/7 (64-bit)
Status [?] Resolved Driver neodriv.hxx
Version 0.145 Fixed in Version Build Normal
Summary 04723: sonicwi2: [possible] Music freezing up occasionally.
Description It seems that on random occasions (although not every time), the music locks up and hangs for the duration of the emulation. This also prevents any more sound effects from being played. I'm not sure what causes this to happen. I have an .inp here from using the latest MAME build (without the use of NVRAM), using the default bios setting (which should be Europe). In my replay, the bug happens at the beginning of the 2nd boss fight.
Steps To Reproduce
Additional Information Not sure whether this should be labeled as a major bug, but seeing as how all sounds stop playing (except for the music hanging on a single note) after the bug kicks in...
Flags Possible
Regression Version
Affected Sets / Systems sonicwi2
Attached Files
zip file icon fox_sonicwi2.zip (95,110 bytes) Mar 10, 2012, 17:41 Uploaded by Samurai Fox
zip file icon sonicwi2.zip (25,383 bytes) Oct 1, 2012, 04:50 Uploaded by B2K24
save state
Relationships
There are no relationsihp linked to this issue.
Notes
11
User avatar
No.08986
Reyn
Tester
Sep 29, 2012, 16:36
Tested with 0.147 up to level 9: The reported error could not be reproduced.
User avatar
No.08992
B2K24
Moderator
Oct 1, 2012, 04:51
I reproduced it here. The reported error started at the beginning of Stage 4.
User avatar
No.11274
project_ako
Tester
Dec 2, 2014, 10:40
I don't have a replay or anything but I can confirm that this is still happening, as I practice sonic wings 2 every day and it happens at some point just about every day as well.
User avatar
No.11374
project_ako
Tester
Jan 16, 2015, 18:28
edited on: Jan 16, 2015, 19:04
Is there a forum or anything where I can discuss this problem? I've been looking into it myself lately and started to reverse engineer the game's audio.

I noticed something about the problem. In z80 audio ram for this game, at F803~F822, there is a buffer for bytes representing sounds to be played. Normally it's nearly empty as there's not that many sound effects ever going in this game. But whenever the sound has crashed I noticed this buffer is totally filled up. The game stops emptying its sound buffer. Also, the z80 gets stuck in a loop at $0213. Whenever the sound crashes, it's always like this.

http://puu.sh/eBrXX/235450f3e2.png

It's stuck here forever, as reads from $F825 are always zero. The call to $08E8 is a dummied out call that contains only a ret statement (I guess it was a debug call taken out of the final game?), so there is no possible way for execution to ever leave this loop. Maybe some kind of interrupt is supposed to set $F825? I notice if I manually set it to nonzero, exactly 1 sound will play and the buffer will empty by 1 byte. A wrench is being thrown into this thing from somewhere but I'm not sure where. I assume a write to $F825 is some kind of "go now" signal from somewhere.

It seems that normally, execution at 003E (looks to be in the interrupt routine) sets F852. But that interrupt is never firing after the crash.

The last thing I can say is, if I manually take execution out of that locked loop and jump to the interrupt (set pc to 0x38), the sound effects come back, but music stays crashed until the next music change in game.

Wish I knew more about this hardware, maybe I could fix the problem on my own. I'm not able to find much information on it so I'm kind of at a loss and have to make assumptions about interrupts and such.

User avatar
No.11375
Haze
Senior Tester
Jan 16, 2015, 21:37
edited on: Jan 16, 2015, 21:40
At this stage I'd be quite interested in knowing if it's possible to crash it in the same way on the real hardware, sounds almost like it could be an original game bug, although maybe a little too easy to reproduce for that to be likely?

Could be that the z80 program is buggy, but on real hardware waitstates slow the 68k down to the point where it can't overload the z80?

Could be some kind of timer issue..

Could be a lot of things.

User avatar
No.11376
project_ako
Tester
Jan 17, 2015, 00:28
edited on: Jan 17, 2015, 01:09
Yeah, the only things I really know are that interrupts are working in normal play in that loop, but they aren't working after the crash.

I think you're on to something about those timers... although I don't know that much about them, what I just tested seems to be leading me towards them.

What I just did: I figured "Why don't I just make sure interrupts are going in this loop"? So I made a little hex edit to my sw2 rom, and replaced the empty call with a simple ei and some nops. Interrupts should be working now for sure right? Nope. When I load up my post-crash savestate, there's STILL no interrupts going in that loop. I have no idea why that would be, as I don't know that much about how interrupts are working in this ecosystem. (There's NMIs and whatnot?)

Now, when I jump to the interrupt manually by setting the pc in the debugger, as ugly as that is, sometimes kind of like jiggering a lock the sound wakes up, so I think there must be something else in the interrupt routine fixing the interrupts. Actually, the only thing in the interrupt routine really is some code that messes with z80 ports $04 and $05.

Actually you know what, it's really short so here: http://puu.sh/eCjk1/5dda492dcf.png

$04 specifying the target address between the z80 and ym2610, and $05 being the data being sent to it... So we have $04 <- targetting address $27 (controls a bunch of stuff for the timers) and $05 <- data $35 (0011_0101). According to https://wiki.neogeodev.org/index.php?title=YM2610_registers this is indeed targetting some hairy timer stuff which I don't fully understand yet for the TA IRQ... but either way, I'm feeling confident that a wrench is being thrown into the z80 timers somehow, because re-running this routine manually seems to fix the interrupts not firing. Like due to some weird edge case maybe the IRQ is being disabled somehow and not being re-enabled?

One thing just now, I was comparing my crashed state to a working on. In the memory viewer the thing labeled YM2160 ymsnd 0 ST->mode, in every clean state that value is 0x35 (what it should be based on the above), but in the crashed state I have it's 0xB4. Not sure why that would be...

User avatar
No.11377
project_ako
Tester
Jan 17, 2015, 20:33
edited on: Jan 17, 2015, 20:49
Welp. I'm PRETTY SURE the issue has been pinpointed.

I just spent a lot of time today with tracelogging playing the game waiting for sound to crash. I got it to to crash twice while tracing.

In short: For some unknown reason, whoever programmed this sound driver thought it was fine to enable interrupts before finishing up writing data to z80 port $05. Maybe there's a 1 opcode grace period after an "ei" instruction where interrupts won't fire?

All over this code, in all sorts of functions, there will be stuff like this

A=08 -- 1635: out ($04),a
A=08 -- 1637: push af
A=08 -- 1638: ld a,($0000)
A=ED -- 163B: ld a,($0000)
A=ED -- 163E: ld a,($0000)
A=ED -- 1641: ld a,($0000)
A=ED -- 1644: pop af
A=08 -- 1645: ld a,e
A=08 -- 1646: ei
A=08 -- 1647: out ($05),a

with the ei coming before the out ($05). Of course... once in awhile, an interrupt actually does fire before the write actually occurs.

A=06 -- 1646: ei
A=06 -- 0066: out ($18),a
[...]
1647: out ($05),a

Now when it comes back to that "out ($05),a", the output write won't be targeting the correct address anymore! The interrupts are changing the target through port ($04), and now the upcoming write to ($05) will write to the wrong place. In fact it's going to put some garbage data into the ym2160 timer control, because the timer control is the last thing the interrupt routine selected! That's why the timers are breaking. So I'm 99% sure - the source of the crash is an interrupt edge case, when an interrupt goes off after the "ei", but before the "out ($05)". I'm not sure why the interrupt never goes off like that on hardware.

I'm doing another test as a workaround for now. I've reversed the order of these instructions everywhere in the sound rom. Gonna see if it crashes anymore. So far it seems to be working...

User avatar
No.11378
AWJ
Developer
Jan 18, 2015, 04:58
On a Z80, an IRQ won't trigger immediately after an EI instruction; it is delayed until after the instruction after the EI (otherwise not even "ei; reti" would be safe). So the code you're looking at is perfectly valid, safe Z80 code.

But if it's going to 0066, then that's an NMI, not an IRQ. NMIs aren't affected by the EI instruction at all (that's what "non-maskable" means). So something else is wrong. It's possible that how we handle the (Neo-Geo specific) NMI masking through Z80 ports $08 and $18 isn't quite correct (until recently we didn't emulate the masking at all...)
User avatar
No.11379
project_ako
Tester
Jan 18, 2015, 05:48
edited on: Jan 18, 2015, 06:00
0066 is the NMI yes. It's true that the 0066 NMI going off by itself isn't the problem. But, let me paste the trace of shortly before my sound crashes

A=09 -- 1635: out ($04),a
A=09 -- 1637: push af
A=09 -- 1638: ld a,($0000)
A=ED -- 163B: ld a,($0000)
A=ED -- 163E: ld a,($0000)
A=ED -- 1641: ld a,($0000)
A=ED -- 1644: pop af
A=09 -- 1645: ld a,e
A=06 -- 1646: ei
A=06 -- 0066: out ($18),a
A=06 -- 0068: push af
A=06 -- 0069: push bc
A=06 -- 006A: push hl
A=06 -- 006B: ld a,($F824)
A=C0 -- 006E: and $7F
A=40 -- 0070: out ($0C),a
A=40 -- 0072: ld a,($F801)
A=0B -- 0075: ld c,a
A=0B -- 0076: inc a
A=0C -- 0077: and $1F
A=0C -- 0079: ld ($F801),a
A=0C -- 007C: ld b,$00
A=0C -- 007E: ld hl,$F803
A=0C -- 0081: add hl,bc
A=0C -- 0082: in a,($00)
A=68 -- 0084: cp $01
A=68 -- 0086: jp z,$09A3
A=68 -- 0089: cp $03
A=68 -- 008B: jp z,$07FB
A=68 -- 008E: ld (hl),a
A=68 -- 008F: pop hl
A=68 -- 0090: pop bc
A=68 -- 0091: ld a,($F824)
A=C0 -- 0094: out ($00),a
A=C0 -- 0096: out ($0C),a
A=C0 -- 0098: out ($08),a
A=C0 -- 009A: pop af
A=06 -- 009B: retn
A=06 -- 0038: di
A=06 -- 0039: push af
A=06 -- 003A: ld a,($F825)
A=00 -- 003D: inc a
A=01 -- 003E: ld ($F825),a
A=01 -- 0041: in a,($04)
A=01 -- 0043: rlca
A=02 -- 0044: jp c,$0041
A=02 -- 0047: ld a,$27
A=27 -- 0049: out ($04),a
A=27 -- 004B: push af
A=27 -- 004C: ld a,($0000)
A=ED -- 004F: pop af
A=27 -- 0050: ld a,$35
A=35 -- 0052: out ($05),a
A=35 -- 0054: pop af
A=06 -- 0055: ei
A=06 -- 0056: reti
A=06 -- 1647: out ($05),a

The routine around 1635 is doing its thing when the NMI (0066) goes off. NMI isn't touching port $04 at all so that's fine. But once the NMI is done, 0038 is going off immediately, before the other routine got a chance to write the data. 0038 messes around with port $04, so now the data write at 1647 isn't targeting the right address anymore. This is the last activity of 0038 in both of my traced crashes, and both times 0066 goes off followed immediately by 0038, leading to garbage data being put into the timer control.

So to correct what I said, I think the problem is the NMI going off is also allowing this other interrupt to go off which probably shouldn't be able to.

I don't know anything about how the real hardware behaves compared to this - I can only tell you what patterns I've noticed in the debugger and tracelog before my sound crashes.

User avatar
No.11380
AWJ
Developer
Jan 18, 2015, 07:45
I wonder if the NMI followed by immediate IRQ does occur on real hardware, but the second write is ignored by the YM2610 because it's too soon after the previous write. According to the YM2608 and YM2610 manuals you need to wait 83 cycles between writes to the chip (that's what those repeated ld a,($0000) instructions are for)

If that's the case then when that particular sequence of interrupts happened on real hardware you'd probably get a single bad note (due to whatever register the write was *supposed* to go to not getting written) but it wouldn't mess up the timers and hang up the sound system entirely.
User avatar
No.13936
Fujix
Administrator
Jun 23, 2017, 06:49
I played the game for two loops and completed, but no music freeze experienced.