I succeeded having all the cycles that I was looking for.
The exact answer is 12,196,648 cycles.
It took about 2.3G of RAM to store all the data related to those cycles. It is about a little bit less than 200 bytes per cycle.
On one hand, I wouldn't worry that much if I had a computer with 128GB of RAM (I only have 12GB) but 200 bytes/cycle is a lot of memory for what I am storing. I better spend some time to optimize memory usage of this beast. Having such a big memory footprint is bad for having a good cache hit.
What I would find a reasonable memory usage is 50 bytes/cycle. And I suspect that it isn't really my data that is using all that memory but a lot of memory is wasted due to millions of small allocation/deallocation...
Update: I just managed to reduce the memory usage to about 140 bytes/cycle for a total of 1.7 GB of RAM for the 12M+ total cycles.
From now on, I'll limit the chain length to 3 and once the concept is proven to work, I'll slowly ramp up the chain length...
Update #2:
By limiting the cycle length, I got interesting numbers:
Max length - # of cycles
3 - 188
4 - 3020
5 - 16190
6 - 152,062
With a length of 6, this requires up to few thousands cycle value recalculation... I don't think this can be done real-time with my current hardware... If this turns out to be a profitable venue... I'll explore doing it with CUDA or OpenCL...
The exact answer is 12,196,648 cycles.
It took about 2.3G of RAM to store all the data related to those cycles. It is about a little bit less than 200 bytes per cycle.
On one hand, I wouldn't worry that much if I had a computer with 128GB of RAM (I only have 12GB) but 200 bytes/cycle is a lot of memory for what I am storing. I better spend some time to optimize memory usage of this beast. Having such a big memory footprint is bad for having a good cache hit.
What I would find a reasonable memory usage is 50 bytes/cycle. And I suspect that it isn't really my data that is using all that memory but a lot of memory is wasted due to millions of small allocation/deallocation...
Update: I just managed to reduce the memory usage to about 140 bytes/cycle for a total of 1.7 GB of RAM for the 12M+ total cycles.
From now on, I'll limit the chain length to 3 and once the concept is proven to work, I'll slowly ramp up the chain length...
Update #2:
By limiting the cycle length, I got interesting numbers:
Max length - # of cycles
3 - 188
4 - 3020
5 - 16190
6 - 152,062
With a length of 6, this requires up to few thousands cycle value recalculation... I don't think this can be done real-time with my current hardware... If this turns out to be a profitable venue... I'll explore doing it with CUDA or OpenCL...