Software Performance Optimisation

Modern computers may be incredibly fast, but with the majority of software packages growing seemingly slower and slower with each release, the performance promised by each technological advance is rarely delivered.

Having decades of in-depth experience in writing efficient software, we understand the way systems work from the large, popular modern frameworks, down through the operating system, to the component level on the motherboard. We have system-level knowledge of servers and PCs that is rarely seen outside of the embedded software industry.

We can offer help and advice on most languages and database systems and if required, we can recode critical routines in C, C++ or assembly language. We can also work with you to interface any low-level routines to the language of your choice.

Often, the solution to performance issues isn't down to the specific technology used, it's the overall approach that is taken. For example:

  • SQL-based systems perform at their best when the relational engine is doing the work - processing each row sequentially is a recipe for poor performance. You'd be surprised at the amount of work that judiciously-placed aggregate functions combined with CASE statements can achieve.
  • Reducing the number of network round-trips that your software makes can result in tremendous performance increases - not to mention, the other applications on your network will benefit from the reduced contention too.
  • For very high-performance software such map routing algorithms, memory itself is often the bottleneck in a modern machine. Simply altering the order in which you stride an in-memory data structure, or indeed, modifying the structures themselves, can yield significant performance benefits.

Case Studies

Statistical Analysis - 2,700 times faster

A client had a relatively new web application that performed large amounts of data analysis over many millions of records. Since performing these calculations was slow, the existing software carried out its number-crunching activities overnight and stored the pre-computed results ready for the following day. There were numerous problems with this approach including:

  • The overnight process would occasionally fail, causing support issues.
  • The process itself was very disk and network intensive, taking a significant amount of resources to complete.
  • Since pre-calculated data from the night before was being used, the figures available were never completely up-to-date.

We considered the overall problem to be solved rather than focusing on individual details - having discovered what was really needed, we completely changed the way the statistics were gathered and calculated. As a result, the following benefits were realised:

  • Since the replacement component was far more efficient, we discovered that the statistics could be calculated in real-time - the old system took hours to complete, the new replacement provided by Arcamex took only a couple of seconds on average.
  • Since the overnight process was no longer required, it was removed - as a result, the support issues surrounding this were eliminated.
  • The additional caching mechanisms were no longer required either - these were removed as well, reducing the system complexity and lowering the future software maintenance burden.
  • Since the statistics could be recalculated within a second or two - the figures provided were now generated in real-time rather than the point-in-time snapshots as used previously - much to the delight of the end-users.

Smartphone Application - seven times faster

A smartphone application was making web service calls over a mobile GPRS connection to a back-end server to gather historical data - the results were then displayed by the application for the user to browse at their leisure. The time taken end-to-end to achieve this was around fourteen seconds on average - a very noticeable delay.

Using a combination of multi-threading, efficient data representation and data compression, the end-to-end time taken was reduced to around two seconds. Not only did this improve the user experience a great deal, it also significantly reduced the application's GPRS data consumption, resulting in lower monthly bills for the organisation.

Business Optimisation Engine - High Performance Computing

We were approached by a client to design, build and deliver an optimisation engine to solve complex scheduling requirements. This required the very best performance possible - the more passes the engine could make, the more optimal the end results would be.

Many techniques were used to achieve maximum performance, including:

  • Writing the engine in C directly rather than the easier solution of using more "modern" languages such as .NET or Java.
  • Using a custom memory allocator to yield the best performance possible.
  • Judicious use of bit manipulation meant that 32 operations could be completed in a single instruction.
  • Intel Xeon processors were specified for the hardware to run the system, and great care was taken to keep the working set of the process' data structures within the L2 processor cache.

Although clustering and GPU-based computation were also considered, these were deemed un-necessary as the package met its performance targets running on simple commodity hardware. The project was a complete success and still delivers on its original promise many years later, saving its users thousands of pounds each year.


Whatever performance limitations your current system is reaching, whether it's network bandwidth or latency, CPU time, or a flaw in the overall design itself, consider using Arcamex to take your application to the next level of performance.

See Also: Bespoke Software Development