|
Post by Jim Brain on Aug 17, 2005 11:54:41 GMT -5
Since last night, I've been debugging an issue with my server code.
It seems as the latency goes up, the protocol (or at least my implementation) goes bad.
When I connect to my server from the box it runs on, all is well, but if someone remote tries to connect, the packet system negotiation works fine, but the ResetAcks I send all get immediately followed by another REset request from the client.
At present, I have a loopback setup from Sweden to give me guaranteed latency, and I can reproduce the issue using the same client that was working before.
Any ideas?
|
|
|
Post by Jim Brain on Aug 17, 2005 16:09:42 GMT -5
FIXED!
It seems the client is VERY particular as it relates to packets.
I had been sending the data of the packet like this:
send(data) send(0x0d)
On quick links, that shows up as:
data 0x0d
but on lagged links, it could show up as
data <time> 0x0d
Evidently, QLink does NOT like the latter. For now, I make sure my TCP packet contains the 0x0d and do the send in one go. Works now (Jeff, another try tonight, perhaps?)
But, I don;t understand why a latent reception of the 0x0d would cause such an issue. And, I don't know how to prevent TCP fragmentation from aggregating and splitting a message before the 0x0d in normal usage.
Very strange.
Jim
|
|
|
Post by Keith Henrickson on Aug 17, 2005 16:59:31 GMT -5
I know why that would make a difference. The qlink protocol was a line by line protocol. That is, they arranged with their X.25 providers to give them one packet per line. This is why the packet ends with a carriage return. It is typing lines of text, as far as the X.25 network sees it.
This means, in upshot, that either the entire packet will be received in a close group of bytes, or that the entire packet will be waylaid. Thus, the client probably has a timer buried within it that says, "If you take more than x milliseconds to receive a byte while framed, fail." Now that I think about it, I swear I saw that timer during one of my recent pokes through the code. Of course, I couldn't figure out for the life of me what it was for, but it is DEFINATELY counting rasters between bytes. It counts 8 rasters per byte, and must send an error response at 500ms.
How to fix: Simple solution is, modify the redirector to use UDP. A full layer 2 implementation will handle any packet loss that arises. Not as efficiently as possible, as it's a data link protocol, not a network protocol. But it should do just fine given the low data rates the clients are capible of.
Of course, this means you have to actually implement sliding windows and everything.
Another option is to try and set the 'do not fragment' bit, and disable Nagle. Of course, that means if you hit someone with an MTU smaller than 128 bytes, then they'll get NO connectivity. But TCP ought to work. It's not going to be perfect, depending on which IP stacks are encountered across the way.
Or, since you KNOW that the packets are framed with 5A and 0D at each end, write a reframer that just sifts through the data framing up qlink packets as they're encountered. This will work fine with all implementations of TCP.
Or, finally, work out a patch to the client to disable the fragmentation check. The loop is somewhere around $B800 in memory. I just haven't yet worked out how to patch the disk. It could be patched in memory maybe. Not sure.
|
|
|
Post by Jim Brain on Aug 17, 2005 17:52:06 GMT -5
UDP. Hmm, doable, but I think it still has a fragmentation issue, no?
I thought the smallest MTU had to be 196bytes, as per the stds.
As for the 5a/0d packets, I have no issues sending them out as one chunk and getting them back in as one chunk. It's the fragmentation in the middle that would worry me.
Since the Reset will never fragment (no way an MTU is below 50 bytes), the protocol will be setup before I fragment, so I'll just rely on Layer2 to deal with a sequence error, and wait to patch the disk to remove that timer. It may be that the patch is never needed.
Of course, this puts even more pressure on me to get my sliding window code working... Argh!
|
|
|
Post by Jeff Ledger on Aug 17, 2005 18:12:42 GMT -5
Evidently, QLink does NOT like the latter. For now, I make sure my TCP packet contains the 0x0d and do the send in one go. Works now (Jeff, another try tonight, perhaps?) I'm game! Jeff
|
|
|
Post by Keith Henrickson on Aug 18, 2005 0:17:09 GMT -5
As for the 5a/0d packets, I have no issues sending them out as one chunk and getting them back in as one chunk. It's the fragmentation in the middle that would worry me. Oh, I was just thinking somehow inserting an extra piece of software to absorb the fragmented TCP packets and send them out over serial to the commie. That would absorb 100% of the possible fragmentation, and is basically what Tymnet/Telenet did. They received the X.25 from the server, absorbed any fragmentation (of which their should be none), and sent the bytes at modem link speed to the client. But you're probably right. The minimum MTU is probably greater than our tiny packets. The largest packets are encountered in file transfers.
|
|
|
Post by Jim Brain on Aug 18, 2005 22:53:40 GMT -5
Ahh, I see. I was trying to minimize the amount of code on the client side, but your idea would address the issue.
Jim
|
|