07-22-2020, 02:33 PM
I am wondering if the fearful unconscious part of me could cloak my problem skills as a strategy to not let me succeed because of its fear of success and possibly the accompanying changes.
This is a thought that did pop into my mind recently... It seems clever enough to write it down here.
On my trading project, it seems like OF is having a positive effect on it. I have solved 2 excruciating mysteries.
1. I did experience threads exiting because of uncaught exceptions due to some exchange REST API calls. The way to address this problem was to wrap the API call into a small loop and do a small pause of few msecs between calls after a failure. It didn't work. The problem is there is a parameter called 'nonce' which must always be higher than the previous one sent. An easy way to handle this requirement is to somehow base its value on the wall clock time. The problem was that I'm calling the API from my server and my client at about the same time since they receive the same events. I reported this problem in my BASE journal, I found that the server clock was desynchronized quite a lot. It was 12 seconds ahead of the real time. Well, it seems like it started to drift again. I checked it again it was 200 ms ahead. That was enough to make my client nonce values to be rejected even if I attempted 3-4 times with about 5ms apart each attempt.
2. The other one, was a bit tricky. My client crashed while it was attempting to reconnect to the exchange while it was under maintenance. I don't have much clue to investigate. Core dump file was truncated. Even if it wasn't, I did recompile the various libs, therefore more likely than not, even if I had the full core, I wouldn't have been able to extract anything useful from it. The only and last resort is doing some static analysis of the source code knowing approximately what the code was doing when it crashed. It must be sheer luck but as I was doing some unrelated work and having a terminal window on my second screen displaying its logs, the problem has jumped right in front of my eyes. It must be quite a rare occurance to happen so I'm lucky to have it happen in my face. I have WebSocket protocal handling code. The exchange does send WS messages. The protocol, depending on msg lengths can send them in one shot or in several small chunks. The code accumulate the small chunks into a buffer and once the whole message is received, it is passed to the next processing stage. The problem that I found was that if there was a connection error triggering a reconnection AND the chunk buffer was not empty, on reconnection, the first message would be appended on the last partial (now stalled) msg. The fix was just to clear the buffer on reconnection.
I'm not 100% sure that this was causing the crash that I have seen but I have been using this code since last November. It is pretty robust and there must not have much remaining problems in it. Having found this big omission is very likely to be it...
This is a thought that did pop into my mind recently... It seems clever enough to write it down here.
On my trading project, it seems like OF is having a positive effect on it. I have solved 2 excruciating mysteries.
1. I did experience threads exiting because of uncaught exceptions due to some exchange REST API calls. The way to address this problem was to wrap the API call into a small loop and do a small pause of few msecs between calls after a failure. It didn't work. The problem is there is a parameter called 'nonce' which must always be higher than the previous one sent. An easy way to handle this requirement is to somehow base its value on the wall clock time. The problem was that I'm calling the API from my server and my client at about the same time since they receive the same events. I reported this problem in my BASE journal, I found that the server clock was desynchronized quite a lot. It was 12 seconds ahead of the real time. Well, it seems like it started to drift again. I checked it again it was 200 ms ahead. That was enough to make my client nonce values to be rejected even if I attempted 3-4 times with about 5ms apart each attempt.
2. The other one, was a bit tricky. My client crashed while it was attempting to reconnect to the exchange while it was under maintenance. I don't have much clue to investigate. Core dump file was truncated. Even if it wasn't, I did recompile the various libs, therefore more likely than not, even if I had the full core, I wouldn't have been able to extract anything useful from it. The only and last resort is doing some static analysis of the source code knowing approximately what the code was doing when it crashed. It must be sheer luck but as I was doing some unrelated work and having a terminal window on my second screen displaying its logs, the problem has jumped right in front of my eyes. It must be quite a rare occurance to happen so I'm lucky to have it happen in my face. I have WebSocket protocal handling code. The exchange does send WS messages. The protocol, depending on msg lengths can send them in one shot or in several small chunks. The code accumulate the small chunks into a buffer and once the whole message is received, it is passed to the next processing stage. The problem that I found was that if there was a connection error triggering a reconnection AND the chunk buffer was not empty, on reconnection, the first message would be appended on the last partial (now stalled) msg. The fix was just to clear the buffer on reconnection.
I'm not 100% sure that this was causing the crash that I have seen but I have been using this code since last November. It is pretty robust and there must not have much remaining problems in it. Having found this big omission is very likely to be it...