System resource utilization

Introduction

This article shall explain in more detail how exactly system resources are allocated and used during scanning.

The focal point here would be the Node Agent host machine.

 

Performance generally

Card/Data/Enterprise Recon is built upon a high performance pattern matching engine specifically designed for identifying PCI & PII data.
This underlying foundation enables our products to operate at extremely high levels of efficiency achieving scanning of random data streams rates in excess of 1,000 Megabytes per second.

It is important to note that in reality the scanning speed will be significantly slower due to the performance limitations on an average workstation or server.

The primary factor to determine scanning speeds is disk access capabilities with most disk access arrays providing Read throughput capabilities of around 20 Megabytes per second.

 

Resources used

CPU

The CPU will be used throughout a scan to decode the contents of each file in order to understand any information stored in an attempt to identify genuine PCI/PII data.

The scanning engine contained in Card Recon and Enterprise Recon is designed specifically to utilize only a single core of a CPU to ensure other cores remain available for use by other applications.
This means when other applications begin requesting CPU usage, the Enterprise Recon scanning engine will release CPU cycles until more idle CPU is available.
By default, the scanning engine will run in 'Low Priority' mode ensuring any other applications requiring CPU resource will be given priority.

Our underlying scanning engine has been bench-marked at scanning over 1,000MB/sec.
At this speed, 100% utilization of a 2.8Ghz single core CPU was required.

As most normal storage can only provide read-access at 20MB/sec, the likelihood of sustained 100% CPU for extended periods of time is low (taking into account additional complex decoding requirements).

Memory

Memory is used by the scanning engine throughout a scan to temporarily store data being read from disk and index complex file types when required.

Our products has been designed with the ability to read files of any size without excessive memory usage.
This means in the event of a large file, it will be incrementally read in small chunks to minimize the amount of memory consumed.

The amount of memory required for any given scan will vary from 50MB - 100MB on average and may increase to high levels (200MB+) in rare circumstances.

Disk IO

This is the most important factor that will determine the speed of a scan.
Disk IO is the speed at which data can be read from a disk by the scanning engine when attempting to identify stored cardholder data.

Whilst a scan is being performed, each and every file on the target file system is being incrementally opened, decoded, scanned and closed.
The disk IO will increase and decrease throughout a scan depending on complexity and size of each file being scanned.

For example, a live Disk IO monitor may display read speeds of 1MB - 20MB/sec on a conventional desktop system with an average of 5-8MB. These read-spikes will be a reflection of the data types being read.

 

What about proxy "Agentless" scanning?

Assuming the User is using a proxy Node Agent installed on a desktop to scan other desktops that doesn't have the Node Agent installed.

How Agentless scanning works

  1. Proxy Node Agent transmits scanning engine to scan Target
  2. Scan commences
  3. Results & status updates would be constantly transmitted back to the Proxy Node Agent host which would subsequently be transmitted to the Master Server
  4. Scan completes, waits for all results to finish transmitting to Proxy Node Agent host before cleaning itself up (scanning engine self-destructs)

More resources (CPU/RAM/DISK IO) would be used by the scanning engine in the scan Target than the Proxy Node Agent.
However, a good amount of network bandwidth would also be needed to stream data between the Target > Node Agent > Master Server.
The network requirements will scale linearly with the number of concurrent scans being proxied through the Node Agent.
Multiple scans directed at the same Target will be run sequentially but the Proxy Node Agent will run multiple scans for different scan Targets in parallel.

 

Content types

Content/data types are also a key determination factor when attempting to understand the potential duration of a scan.

For the purpose of simplicity, example content types can be classified into the following categories:

Simple content types (Low CPU/memory required)

  • MS Office documents
  • PDF files (where content within mostly consist of plain text)
  • TXT, RTF, CSV, XML, HTML files
  • TAR, and other uncompressed archive formats
  • File formats that do not store data in methods requiring the use of lookup tables or complex indexes, or require intense mathematical calculation to extract the raw data contained within

Complex content types (Higher CPU/memory required)

  • ZIP, GZ, RAR, and other compressed archive formats
  • Databases
  • Email storage

 

All information in this article is accurate and true as of the last edited date.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.