Stage 4, day 29:
After almost a full week of effort, I have finally launch the optimized server.
The result was a total disappointment. Granted when I watch the logging window it seems to scroll much faster than it used (or perhaps it is placebo effect) but it doesn't change the timing of the critical code path.
I must confess that I am shooting in the dark to find the bottleneck of that code path. For some reason, my profiler (valgrind) does not work well with my program. (I got an idea while typing this entry. It might be because it is caught offguard by my usage of io_uring...)
I did figure that performance issue was coming from the logging library. The reason being that I measure 2 different input.
A: Which is the baseline
B: Does A + a little bit more validation and about twice as much logging than when doing A
Since processing B was taking about twice as much time than processing A along with generating twice as much logs, logging code was an obvious suspect.
On top of that, upon inspecting the logging code, I spotted 2 issues.
1. Sometimes C++ can be treacherous where when you manipulate small objects (ie std:
tring), the code does plenty of expensive dynamic memory allocation without you being aware.
2. The logging library is a thin layer over sprintf. sprintf is pretty fast but it needs to scan the format string and make a buffer copy. If your string has no substitution in it, it is a wasteful scan and copy. About half of my logs are of that type
Therefore, I did the following:
1. 100% Eliminate logging related memory allocation
2. Create a fast-track path in the logging code for 0 substitution log strings
3. Deploy everywhere in my codebase the improved logging (this is what was long and painful to do)...
This is definitely a good improvement but it was just the wrong one for the current problem at hands...
Am I learning something here? I guess that I could have made a quick test to validate the concept before going all in the project and spend a week on that...
To add insult to the injury, I have commented out a simple function call (getName()) used to build one of the log string to simplify it. This simple change that took less than 5 seconds to make did improve B processing time by 10%!
I must succeed make my profiler work on my program... Otherwise, finding the bottleneck is like the proverbial needle in the stack search...
After almost a full week of effort, I have finally launch the optimized server.
The result was a total disappointment. Granted when I watch the logging window it seems to scroll much faster than it used (or perhaps it is placebo effect) but it doesn't change the timing of the critical code path.
I must confess that I am shooting in the dark to find the bottleneck of that code path. For some reason, my profiler (valgrind) does not work well with my program. (I got an idea while typing this entry. It might be because it is caught offguard by my usage of io_uring...)
I did figure that performance issue was coming from the logging library. The reason being that I measure 2 different input.
A: Which is the baseline
B: Does A + a little bit more validation and about twice as much logging than when doing A
Since processing B was taking about twice as much time than processing A along with generating twice as much logs, logging code was an obvious suspect.
On top of that, upon inspecting the logging code, I spotted 2 issues.
1. Sometimes C++ can be treacherous where when you manipulate small objects (ie std:
![Confused Confused](https://subliminal-talk.com/images/smilies/confused.gif)
2. The logging library is a thin layer over sprintf. sprintf is pretty fast but it needs to scan the format string and make a buffer copy. If your string has no substitution in it, it is a wasteful scan and copy. About half of my logs are of that type
Therefore, I did the following:
1. 100% Eliminate logging related memory allocation
2. Create a fast-track path in the logging code for 0 substitution log strings
3. Deploy everywhere in my codebase the improved logging (this is what was long and painful to do)...
This is definitely a good improvement but it was just the wrong one for the current problem at hands...
Am I learning something here? I guess that I could have made a quick test to validate the concept before going all in the project and spend a week on that...
To add insult to the injury, I have commented out a simple function call (getName()) used to build one of the log string to simplify it. This simple change that took less than 5 seconds to make did improve B processing time by 10%!
I must succeed make my profiler work on my program... Otherwise, finding the bottleneck is like the proverbial needle in the stack search...