Practical CPU time performance tuning for security software: Part 2
In a previous blog, we discussed how to monitor, troubleshoot, and fix high %CPU issues. We also revealed a system API that could have an unexpected impact on CPU consumption. In this episode, we’ll discuss another time-related performance aspect that is unique to security software: application startup time. You don’t need to be a developer to benefit from this article. As long as you are curious to find out how the applications in your environment perform and why sometimes they seem to be very slow and want to improve that, you might find this blog post helpful.
What is application startup time?
Security software usually sits along the critical path an application takes when it is about to launch or execute. The software makes the decision either to block or allow the launching/execution of the application. The decision-making process takes time and therefore impacts the launching speed of the application. To test application startup time, you might choose a popular application in your environment, such as Microsoft Word, Outlook, PowerPoint, Chrome, Edge, Safari, or Visual Studio, and measure the time between double-clicking the application (or invoking it from the command line) and when the application UI is fully ready for user interaction. If an application is popular and likely to be common in most customers’ environments, then it’s also worth testing the application’s startup time as part of in-house performance testing.
A variety of different tools can be used to measure application startup time, ranging from a stopwatch to more sophisticated automated tools. I wrote a simple Windows program to measure how fast my Microsoft Word program launches. I also encourage you to find or create your favorite tool and start to measure things.
Establish a baseline
First of all, we need to measure the application’s startup time without the security software installed. That result will be used as the baseline to compare the startup time when the software is installed and actively running. Then keeping everything else the same in the environment, we’ll install the security software, make sure it’s fully functional, and measure the application startup time again. This result will be compared with the baseline to get an idea of the software’s impact on the application’s startup time.
Set a goal
It shouldn’t be a surprise to find out that it always takes longer for the application to start up with security software running. If you are the developer, you want to optimize the security software so its impact is as small as possible — ideally, zero impact.
However, in the real world, trying to achieve zero impact would make performance tuning a never ending task. You need to have a clear goal to know when performance is good enough to stop tuning. For example, you could set an impact threshold of 10%, which would mean that any impact on application startup time of less than 10% of the baseline value would be considered acceptable. Alternatively, you could define acceptable impact in terms of human perception — if the baseline value is 10 milliseconds and the startup time with security software running is 20 milliseconds, you can probably consider this to be an acceptable level of impact, since the difference of 10 milliseconds likely won’t be noticed by users. There may be other criteria that are best suited for your product.
Isolate the problem
Let’s say that you’ve identified unacceptably long delays caused by a security product. If you are a user of the security product, it’s time to report the issue to the vendor. The developers will then try to reproduce the issue in-house; however, they may not always succeed due to unpredictable environmental factors. In such a case, the developers will need your help narrowing down the problem to one or a few features. To do this, measure application startup time with and without various features enabled until you’re able to pinpoint the specific features that have the biggest impact.
Instrumentation profilers
After isolating the problem, you have helped the developers narrow down the problem area to one or a few functions. Now it’s time for the developers to go back to the source code. Instead of using a profiling tool as described in our earlier blog, this time we suggest using instrumentation to find out how fast or slow each function runs.
Instrumentation is inserting code into your program to collect information. The instrumentation code is usually small, so it does not introduce too much extra cost to your program or return confusing results. One good way to limit the impact of the instrumentation code is to apply it to debug builds only — enclosing the instrumentation in #ifdef DEBUG and #endif pairs so it doesn’t have any impact on your release build.
To find out which part of the code introduces a long delay during runtime, we typically add code that collects time information to measure the time being spent on particular parts of the code. There are many options for doing this. For example, on Windows, the easiest way could be placing GetTickCount64() at the entrance and exit of a function and subtracting the return values to obtain the approximate execution time of a function. See the following example:
int Foo() { ULONGLONG start = GetTickCount64(); …… // Foo() function logic here ULONGLONG stop = GetTickCount64(); ULONGLONG duration = stop - start; printf("Time interval from start to stop is: %d(ms)", duration); return 0; }
Other than that, the C++ library provides std::chrono::steady_clock, which is compatible across operating systems and is easy to use. Here's an example of using std::chrono::steady_clock to get time duration.
The checkCSInfoObserveCPU sample also demonstrates a simple example of using clock_gettime to get the time that was taken to use SecStaticCodeCheckValidityWithErrors on the code object of Xcode. The instrumented code printed out the following result that shows it took 134 seconds for the API to validate Xcode’s code object:
SecStaticCodeCheckValidityWithErrors took 134 seconds to finish.
Therefore, by using code instrumentation, we have successfully identified the hot spot in our sample, which is SecStaticCodeCheckValidityWithErrors — an Apple security framework API. Again, the purpose of the sample is not to question Apple’s implementation of SecStaticCodeCheckValidity* on the performance side. Instead, it is a proof-of-concept showing that code signing validation against large bundles is CPU-intensive. If such a function is called by security software inside of the critical path during an application’s launch stage, it can result in significant increases in application startup time.
Fix
In our earlier blog, we proposed a fix for the sample program, which is telling SecStaticCodeCheckValidityWithErrors to not validate the presence and contents of all bundle resources (if any). It indeed made the program run dramatically faster; however, it comes at the price of sacrificing the overall bundle security. Therefore, we should carefully evaluate our needs, make tradeoffs, and choose the optimal fixes. Sometimes, in order to fix a performance issue, changes must be made on a design level.
Let’s take a step back and look at what we need. If SecStaticCodeCheckValidity* was used to validate the codesign information of a process, then we need to find another way to achieve the same without using SecStaticCodeCheckValidity*.
Luckily, we do have another option. The macOS Endpoint Security (ES) framework provides the process’s code signing information along with every event it passes to a third-party ES extension. I wrote an article about the new macOS System Extensions and the ES framework last year. The article mentions that the signing information is available for each ES event (the following example was collected by ProcessMonitor on macOS 10.15.6 when launching Xcode 12.2):
"signing info (reported)" : { "teamID" : "APPLECOMPUTER", "csFlags" : 570452481, "signingID" : "com.apple.dt.Xcode", "platformBinary" : 0, "cdHash" : "6BEB4DCEA2E64B149BB2314EB71652694DA195BE" }
The macOS ES framework puts a process’s code signing information in es_process_t, which comes with every event’s message(es_message_t), and passes the information to third-party ES extensions. es_process_t structure is defined in XCode’s SDK header file:
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/EndpointSecurity/ESMessage.h
Please note the codesigning_flags, signing_id, and team_id fields in es_process_t and carefully read the documentation in the header file (the header file includes more detailed explanations than the online document for es_process_t). codesigning_flags can be used to check the process’s code signing state that the kernel has already validated under certain conditions. signing_id and team_id can be used to identify the process without having to call the security framework APIs. Apple also provides a sample ES extension that demonstrates how to deny or allow an application’s execution based on the signing_id that comes with the event’s message.
Summary
A number of things can impact security software performance. It could be just a tiny bug, but it might also be a mistake in architectural design. Although improving performance is usually a long-term and consistent ongoing effort, the techniques described in this blog series are proven to be useful. Regardless of which stage of development you’re in and what part of the product you’re working on, it’s always better to take performance into consideration as early as possible, even before development cycles begin, instead of trying to fix performance issues later.
If you’re wondering how things perform in your environment, try installing Elastic Observability. It provides these cool metrics to help you identify performance issues.