Networking is Alive BUT…
Progress on the networking code in A&A is coming along. I now have the ability to send packets from Classic A&A v1.02 in DOSBox to a special A&A Server and have it send packets to another instance running Classic A&A v1.02 Windows version. I am excited with the ability to see the traffic with tools we didn’t have back then. Not only do I get multiple windows in the Windows version, but I can even dig deeper with Wireshark and look at the raw packets and timing. This is good stuff. Except for one thing … it doesn’t play the game.
Sad. And the problem is more of a devil than I would have expected.
Now, I’ll admit I know I have at least one bug running around in the system, but as I search for that bug, I discovered one very serious flaw. The A&A networking algorithm is wrong and won’t work for more than 2 players. Let me see if I can educate everyone on what’s going on, why it’s wrong, and how I plan to fix it.
–[Programmer Speak ON]–
A quick primer about communications. In network communications, there are many layers and protocols, but many of them can be explained as two types: lossy and lossless. In lossy communications, a packet is a group of data is bundled together and sent to a target address. Its kind of like postal mail. You put your data in an envelop, write on the outside a return address, and write a destination address. You then put it on the ‘network’ and hope it gets to the target. But, like the postal mail service, your mail can get lost. Thus the “lossy” part of it. The good news is that systems (even the post office) generally get your packet to where you want it, but there just any way of telling if it makes it there or not.
Okay, what about lossless communications. Naturally, we’re almost there. The return address on your packet/mail tells the receiver who sent the packet/mail. You just have to send a packet back saying, “I got it!”. This response is called an “ACK” or Acknowledgement. We can even go one step further and send a “NACK” or Negative Acknowledged if the packet/mail was delivered but is broken/missing/corrupt. The sender then has four cases to consider. If the sender gets an ACK, great! We’re done with the transaction and we’re done. If the sender gets a NACK, then oops, something went wrong, and the sender just needs to send another copy and hope it gets through the second time. But there is a third and fourth case. Can you guess what they are? Packets get dropped/lost. If the packet is sent and the receiver never gets it, no ACK or NACK response will be sent. And, likewise, the receiver may get the packet, but the ACK/NACK gets lost. So, what is a sender to do? Time it. If a ACK/NACK is not received in a short enough time, the sender will just send the packet AGAIN. The receiver might get a duplicate if the ACK/NACK gets lost, but the receiver just sends an ACK/NACK and throws away this duplicate packet.
As you can see, this can be a fair amount of work just to send a small amount of data that may or may not make it. But dropping packets cannot be acceptable.
In A&A, we choose to use UDP which is a lossy transmission method. UDP is also very simple and fast. Actually, the original protocol used IPX which is very similar to UDP, but worked only on a LAN network. What DOSBox and the Windows version of A&A does is disguise IPX packets as UDP packets. Anywho, all you need to know is that we’re sending lossy packets. So, we have to write the code to handle all the lost packets. Side note: TCP packets take care of all that messy lost packets BUT has a nasty habit of making things slower than we like. TCP was designed mostly for big files, not lots of little transactions.
I also got a little fancy with A&A by making it use both lossy and lossless packets. In the original MMO structure of the game, the server really did all the work of tracking the game. If you moved in a room, it wasn’t a big deal if you jumped a little bit from location A->B. But if you drop an item, the server definitely needs to know that the item appeared. So, movement on a level was lossy since location information was constantly being sent. But item pick up/drop and other major actions were lossless. This allowed the game to drop packets and keep the speed up. To put this in perspective, we wanted the ability to have 32 people in a single room running on 9600 baud modems. It’d be chunky and slow, but it’d work.
When we dropped the MMO client/server structure of the game and went to a new up to 4-player direct-link game, we changed the whole strategy. Sort of. It’s kind of a third strategy.
In a synchronous game, we have to keep each player totally in sync on each computer. Where and what each player is doing is assigned a specific point in time. When all 4 player’s data is received for that point in time, the creatures/logic of the game react and make their decision. Because all players have exactly the same state, the creatures/logic work exactly the same way. Even the pseudo-random number generators are in sync. It’s like a turn by turn game with hundreds of predetermined rules running very fast. But back in the 1990’s that round trip time for one packet to be sent and responded to was usually 200-300 ms. That means you only get 3-5 or so updates a second. But this is an action game! Let’s get more speed!
The key to make the game run smoothly is to allow time to slip just while we send more packets back and forth. In a way, the players are ahead of the game by roughly 10-500 milliseconds and the game (creatures/logic/doors, etc.) are catching up to the history of what the players did — as soon as it arrives from all the other players. The amount we let this time slip occur is controlled, and if it gets too big, we have a couple of choices. 1) Pause the player until the network catches up or 2) allow the game to take bigger jumps in time per “turn”. Players hate #1, so #2 is executed. What happens is less and less data is sent until the bandwidth adjusts to something more agreeable. In A&A, we set a target of roughly 10 frames per second from the network, meaning we only try to send/receive packets at about 100 ms intervals to ensure a regular rate of updates is kept. We also allow up to 1 second of time before we just stopped sending any information and let net lag take over (the player still can move but the game level basically stops moving creatures). With today’s hardware and networking, we should be able to increase the speed of updates significantly.
All in all, even as complex as this is, the strategy works very well. Each player may be slightly on a different time position than the other, but they’re reacting and sending their commands to everyone else as fast as possible.
So, where is the problem? Playing with more than 1 player and ACK packets. If I type up a message saying, “Hey! Go forward!”, it gets put into a specialized message packet and put into the output stream and broadcast to all the members in the group. Now, the nice thing about this packet is it doesn’t affect the game, so we can just handle it whenever, so it doesn’t stop game play — but we want one packet to be sent to each player successfully and no duplicates. But here’s the problem with the code: I only look for 1 — yes — 1 ACK packet. If I’m playing with 3 friends, the packet gets sent out once as a group broadcast packet. Then when any ONE person ACKs it, then A&A considers it sent. Yep, that means that with 3 friends, 1 could have seen my message and the other 2 may have had dropped packets.
For message packets, that’s one thing, but for creating new groups and starting adventures it’s even worse. How about ending a level and going to the next one? Same problem, one person triggers and the rest get the message to jump levels. And if that doesn’t happen correctly, boom!
–[Programmer Speak OFF]–
So, with a recent bug I’m having with starting a new game, I dug into this and found that the packet messaging is just plain wrong. In fact, I’m surprised it works at all. From my perspective, it can only work for 2 players — and I don’t want that — I want more players. I want 4 players have a good time and possibly some type of large arena combat down the road.
I’ve got to fix this. And I believe I know how, but its going to take some work.
Despamming a Wiki
In other news, I’ve been doing some clean up on the A&A wiki at wiki.amuletsandarmor.com. You may recall I had said that spammers were attacking the site in a past email and that I changed the wiki to require an email to get a login. Thinking I had done that correctly when I had not, I left the website sit for the next month and didn’t bother to look at the logs. Recently I took a look and lo and behold the spammers were still on there. They had created almost 5000 accounts and about as many behind the scenes pages. The pages are weird too. Just a bunch of words, some numbers, and links. I have two theories on what these are. First, these could be hackers making decoding dictionaries on easy to access wiki sites. Or Second, these are just advertising links packed with words to help raise ratings on search engines. I’m inclined to think its both — there was LOTS going on there.
Luckily the server had a limit on the size of the database and was at its max of 1 GB. This had stopped the addition of new accounts, but I needed to clean it out and badly. Looking at various tools, I set down and, without going into too much details, managed to kill the roughly 5000 accounts and associated pages. How big is the SQL database now? About 2 Megs. Yep. In the end, I should have just created a new wiki and copied in the useful data. It would have been faster and easier, but was a learning experience. Afterward, I closed the wiki properly and tightly. Not going to make that mistake again.
So, if you ever put a public wiki out there, make sure it’s locked tight or have better anti-spamming capabilities. Apparently CAPTCHA is the most easily hacked tool in the world. Next time, I’m going to just have people enter the answer to calculus questions or answer Zen koans (just kidding). For now, a “mother may I” approach to new accounts works for me.
So where to next? I’ll continue on the networking and let you know where I’m at. That bug with just getting a game going has blocked me, but it should be fixed soon.