Thursday 4 August 2022

Shrinking trailing latency

Foreword

In the previous post we managed to reduce initial latency in order to improve throughput.

Just to put everything into perspective again. Our memory controller is clocking at 16.7MHz. A request to read/write from memory is provided at the first clock pulse, and data is expected at the second pulse.

In the previous post we found that requested data from memory was available shortly after the second pulse, which is simply too late.

In this post we will attempt to reduce some of the trailing latency even further so at least we can have the requested data before the second pulse.

The exercise of reducing trailing latency has proven not be so major, so will find that this post will be shorter than usual.

The Plan

Let us quickly review our plan by looking at the diagram below.

Point 1 is the first clock edge of the 16.7MHz clock signal. At this point we assert a command for memory access.

At point 2 we are expecting the required data to be ready. However, the data is captured only at point 3, which is way after the second required clock edge.


My plan is to rather capture the data at the negative clock edge of the clock that drives the output of the Iserdes block, which I have indicated in the diagram as point 4. In the next section I will indicate how to accomplish this.

Capturing at the right moment

Let us now see how we can capture the data at the right moment, as outlined in the previous section.

From the diagram we have seen in the previous section, we have seen that the capture should happen at state 2f.

The state register is maintained in the module mcontr_sequencer, and we check for state 2f:

module  mcontr_sequencer   #(
  ...
)(
 ...
);
...
assign store_captured_data = state == 7'2f;
...
endmodule
We can now pass this signal down subsequent modules in we reach the module iserdes_mem.
module  iserdes_mem #
(
   ...
) (
...
    input        store_captured_data,
...
);
...
always @(negedge oclk_div)
begin
    if (store_captured_data)
    begin
        dout_le <= {dout_le[3:0], iserdes_out};
    end
end
...
endmodule
As you can see we do the capture on negative edge of the clock clocking the output of a iserdes block.

With this logic we will be able to capture data at point 4, indicated in the diagram of the previous section.

In Summary

In this post we managed to shorten the trailing latency, so that we would be able to capture the required data at the second pulse of a 16.7MHz clock signal.

So far in the game, we have worked with multiple 16-bit bursts at a time with every memory access. This is not really suitable for an Amiga based design which only work with a single 16-bit piece of data at a time from memory.

So, in the next post we will start focussing on changing our design so that it works only with one 16-bit value at a time from memory.

Till next time!