The required computation is completely contrived. Interpolation/projection from Cartessian-to-polar (and vice versa) is much more involved in the real-world. We purposely simplified problem to allow people to focus on the more universal design problem of managing concurrency and locality. We don't want the “math” to be a barrier to entry for the challenge (which is open to all comers).
You can use any platform. An all-software platform is valid. An all-software entry almost won the “absolute performance” category in 2008. “Almost won” as in coming in 2nd, but it was 10x slower than the first place entry. ( 2008 results) I suspect you won't win in the normalize performance category with an all-software entry.
You can do anything you want as long as you produce the same results as the reference implementation, to the accuracy required by the specification, except in those points explicitly excluded by the specification. (The specification does not specify an algorithm.)
For this contest, you must report measured wall-clock time so we cannot accept performance estimates based on simulation.
No. The starter project is built for EDK 9.2. If you are using an earlier EDK, you should open the starter project (mmm) from the contest in 2007 then follow the instructions below to refresh the software.
If you don't want to use the DSOCM fifo interface (not necessary, you can using Xilinx IPs or do your own), you can very easily create a new project (in any EDK version) from scratch for a PPC405 or microblaze system using EDK's Base System Builder Wizard. After you have verified your new system is working with a simple program, you will be ready to add the reference implementation's .c and .h files (in the RefSW directory) to your new project. (Microblaze systems would require new timing functions to be written in timer.h.)
If someone does this and wants to share it, please let me know.
The spec requires that the input array CART starts and the output array POL finishes in external off-chip memory. During benchmarking, if the input and output arrays are allowed to be contained in, for example, a processor's cache, one might be able to achieve a level of performance that is unrealizable when real data are involved.
This requirement is somewhat vague to allow for a variety of platforms to be used. Generally speaking, the input array at the start of timing should not be on the same chip as where computation is performed (e.g., not in cache for CPU, not in BRAM for Xilinx FPGA); the output array also should not be on the same chip when timing stops. Any form of standalone memory module (external DRAM or flash) should qualify.
Before timing starts, you can use the CPU or FPGA to help initialize memory content (copying bits without pre-processing or analysis). You can use the CPU or FPGA after timing to retrieve data for validation.
When in doubt, email the organizer for clarification.
The get() and set() functions allow for custom data layout. You cannot perform computation in them.
You cannot “replicate” data in the set() function. The set() function must set a unique location that the get() function can later retrieve unambiguously. If you are using a distributed platform, the set() and get() function, however, can support the distribution of data based on their positions in CART and POL and real vs. imaginary.
The get() and set() functions are only allowed to make use of the problem paramter N. You may not take into consideration of R or theta.
On 3/20, we will reveal a set of testcases which will be used for validation and timing measurements. The testcases will cover a wide range of the problem parameters. Performance competitions will be based on the 9 revealed testcases plus 3 secret ones to be released after the submission. We will require additional correctness tests if you are in the running for the cash prizes.
Please review the official contest rules for submission requirements and instructions.