Some notes on “making Snort go fast under Linux”

Updated: 23th of December 2011
Snort version: 2.9.2.0
DAQ version:0.6.2

These are general pointers too things you want to dig into when you need to optimize Snort. If you are one of those who believe that Snort can’t go beyond 250Mbit/s and still not drop packets, you should read on. Comments/feedback/new tips/corrections on how to tune a Snort system is very welcome.

–[ Optimize the hardware ]–
This is always a moving target… And you need to keep yourself updated on the topic and pay attention when you buy your hardware. If someone in the community is maintaining a updated list of such hardware, give me a note!

Intel Network Interface Controllers(NIC) are the off the shelf choice of network adapters, 825NNXX PCI Express series (Chipset 82575/576/580 (1 Gbit) or 82598/82599 (10 Gbits)), and maybe Bypass possibilities if you use in-line mode/IPS. If possible, you could also buy a Intel card that works with the TNAPI drivers to get even better (best) performance.

If you want to pay someone that already has researched a bit (pure speculation from my side), then maybe Endace or napatech could be a choice. But if you first go there, then why not just go straight to Sourcefire (The makers of Snort) and buy their appliances (You will loose all you functionality that comes with controlling your own OS though). That said, all the special “super cards” are loosing paste against Intel server NICs, so getting the extra few % of performance gain, is probably not worth the extra bucks! Check out PF_RING, and if thats not good enough for you, check out “super cards”.

(Matt Jonkman states that you can increase your Snort throughput up to a 16-fold increase if you introduce Endace platform’s acceleration features (February 2008 – probably outdated). Matt is the founder of Emerging Threats, and also deep into the OISF and the Suricata project)

At one time (early 2009), a discussion on IRC (Freenode) summed up in something like this:
“IICH8 southbridge, and 975G north bridge performing at 1066MHz, 8GB of 1333MHz DDR2 ram on a Intel quad core 3.2Ghz 8MB L2 cache processor running at 1333 MHz FSB and Intel 825NNXX PCI Express Gigabit Ethernet Controller.” – for a high end sniffer at that time.

Your whole system would benefit great from fast hard drives, as hard drives I/O generally sucks juice, and locks up the system.

To sum it up:
Fast CPUs, fast RAM (~4 GB or more), fast buses, fast hard drives (SSD) and a good Intel Network Adapter .

–[ Optimize the Linux kernel ]–
In the file /etc/sysctl.conf – you should consider tuning options like these (Examples):

# Just sniffing:
net.core.netdev_max_backlog = 10000
net.core.r mem_default = 16777216
net.core.rmem_max = 33554432
net.ipv4.tcp_mem = 194688 259584 389376
net.ipv4.tcp_rmem = 1048576 4194304 33554432
net.ipv4.tcp_no_metrics_save = 1
# IF also in Inline mode:
net.core.wmem_default = 16777216
net.core.wmem_max = 33554432
net.ipv4.tcp_wmem = 1048576 4194304 16777216
# Memory handling – not that important
vm.overcommit_memory=2
vm.overcommit_ratio = 50

–[ Optimize your network interface card ]–
Change the RX and TX parameters for the interfaces. The following command will display the current settings and the maximum settings you can bump them up to.

# ethtool -g ethX

To change settings, the command is something like this:

# Just sniffing
ethtool -G ethX rx
# and for inline mode, also add
ethtool -G ethX tx

Adding the command to /etc/rc.d/rc.local so that they are execute automatically when you boot would be a good idea.

–[ Optimize DAQ ]–
DAQ abstracts snort from the actual packet capturing. This way, it may make it easier for the user to focus on snort and on how the packets are captured, and not “mix” them.

It is said that AFPacket is slightly faster than PCAP. The thing they both have in common is that you can specify the amount of memory that they can use as a buffer to store packets in, example:
--daq-var buffer_size_mb=512MB
This should reduce total amount of dropped packets if configured correctly. So if you have spare RAM on your server, this is one of the place you might want to consider strongly.

–[ Optimize Snort ]–
Snorts performance is based on several factors.
1 – YOUR network!
2 – How snort is compiled
3 – Preprocessors enabled
4 – Rules
5 – Snort in general and snort.conf

–[ 1. YOUR network! ]–
Your network is a variable that is most likely not like any other networks. The amount of concurrent connections, packets and packet size flowing through, is most likely unique. Also, depending on the payload in your packets, Snort will perform differently. Also, if your $HOME_NET is one single host, compared to complex list of “networks” and “!networks”, Snort will spend more time figuring out what to do.

–[ 2. How snort is compiled ]–
First, I recommend only to compile Snort with the options that you need. I used to compile Snort in two different ways, one including options among “–enable-ppm and –enable-perfprofiling” and one without. But as my sensors are not suffering enough at the moment, I include both the compile options by default, for easy access to preprocessor and rule performance data if I need too.

Also, I have not confirmed this, because its out of my budged reach, but the rumors are that Snort performs up to 30% better if it is compiled with an Intel C compiler (and probably run on pure Intel hardware).

With older versions of snort, you could use Phil Wood mmap libpcap and compile Snort with that, you would get some better performance in the packetcapture, giving you less dropped packets. I nice writeup/howto is found here. But newer versions of libpcap (1.0.0 or newer) now ships with this, so there should not be need to fiddle with Phil Woods libpcap version. Also, snort uses DAQ to retrieve its packets, so compiling snort towards Phil Woods mmap libpcap would not make any sens.

–[ 3. Preprocessors enabled ]–
How many and which preprocessors you have enabled is also playing a role on the total performance of your system. So if you can, you need to reduce the numbers of preprocessor to a minimum. Also you need to read the Snort documentation, and figure out the best settings that you can live with for each preprocessors that takes configuration options. The flow_depth parameter in the http_inspect preprocessor is a good example.
That said, dont throw out a preprocessor just because you need more performance… They are there for a good reason and if you dont understand them, they might lessen your visibility and false negatives might become a big issue.

Here are two settings/views I switch between when profiling preprocessors:

config profile_preprocs: print 20, sort avg_ticks, filename /tmp/preprocs_20-avg_stats.log append
# And
config profile_preprocs: print all, sort total_ticks, filename /tmp/preprocs_All-total_stats.log append

You should now review the *stats.log files and make changes based on your interpretation, and profile again to see if things get better or worse.

If you have memory to spare (after you have added memory to DAQ buffer), adding more possible memory usage to Stream5 and Frag3 might be a good choice (in that order). Snort should output some info from the preprocessrs if
For Frag3, look into prealloc_nodes.

–[ 4. Rules ]–
The amount of rules also affects the performance of Snort. So tuning your rules to just enable the ones that you need is essential when aiming for performance.
Also, how a rule is performing on your network, might defer from how it performs in my network… That said, you need to profile your set off rules, and tweak or disable them so your system uses less overall “ticks”.

Here are two settings/views I switch between when profiling rules:

config profile_rules: print 20, sort avg_ticks, filename /tmp/rules_20-avg_stats.log append
# And
config profile_rules: print all, sort total_ticks, filename /tmp/rules_All-total_stats.log append

You will get a fairly good view of rules that needs/should/would benefit from tuning/disabling.

–[ 5. snort in general and snort.conf ]–
* search-method
You should look into which search-method snort is using.
Default snort uses ac-bnfa-nq. This is probably the best overall search method. If you have very little memory, and need to have snort use as little as possible, you should look at lowmem-nq and tuning your ruleset to a minimum.

If you have the RAM for it, the best performance is supposedly to come from ac or ac-q. Personally I use ac-split at the moment, as It seems to give a good trade off between performance and memory usage.
To enable ac-split, add something like:

config detection: search-method ac-split, max-pattern-len 20, search-optimize

* Latency-Based Packet Handling
This option can help you control latency on packets in snort (If you run snort inline, this is important for real-time applications such as Video and Voice services etc).
Also, if you have a problem with dropped packets (non-inline), I would say over 1% on an average, I would recommend enabling Latency-Based Packet Handling. You should run some tests in your environment to find a value that works for you, but the general situation is like this:
If your Snort “Packet Performance Summary” is telling you that your “avg pkt time is 10 usecs” then Snort can inspect about 1000 packets in 10000 usecs. If a packet for some reason is using 10000 usec to get through Snort, you may have dropped/sacrificed 1000 other packets in that time frame, just to inspect this packet. So if you configure max-pkt-time to be 1000, Snort will stop inspecting packets that take more time than 1000 usec, and in this basic example leaving you with 100 dropped packets instead of 1000. You choose! (The example is not technical correct, as a packet can take over 10000 usec with out Snort dropping any packets at all (Imagine if there is only one packet going through snort that day…), but in my tests, this is more or less the real world outcome of enabling Latency-Based Packet Handling). Beware of the possibility of false negatives on the fastpathed packets here, but if your dropping packets anyway, you still will have false negatives, and this might even lessen the total amount of false negatives!
Example:

config ppm: max-pkt-time 10000, fastpath-expensive-packets, pkt-log

Playing with the above configuration option, and monitoring your log file (syslog) and the dropped packets in the stat file, will give you an idea how your tuning is going and how snort behaves. From the snort doc regarding max-pkt-time: “reasonable starting defaults: 100/250/1000 for 1G/100M/5M nets”. Also, if you dont specify “fastpath-expensive-packets”, snort wont interfere with your current setup, just print the stats to syslog etc.

* Latency-Based Rule Handling
This options can also help control latency in snort (inline) by suspends expensive rules if you want that for a period of time, or disable the rule completely for the rest of the time snort is running. Again, here you should play with the settings and get a feeling on how this works in you setup (again, watch your log files etc). In non-inline mode, this can also help snort so that it does not drop packets, much in the same way that Latency-Based Packet Handling does. The threshold is just a simple counter, so if its set to 5, and a rule is considered expensive right now in your network 4 times, then if it some time next week is expensive just one time, that will make the counter reach 5, and the rule will be suspend according too your settings. This can help you free “Ticks” and have snort in total drop less packets, but this can introduce false negatives for that rule that was suspended, but maybe help in the overall total false negatives being less!
Example:

config ppm: max-rule-time 4096, threshold 5, suspend-expensive-rules, suspend-timeout 10, rule-log log

From the snort doc regarding max-rule-time: “reasonable starting defaults: 100/250/1000 for 1G/100M/5M nets”

–[ PF_RING ]–
If you really need the speed, PF_RING + TNAPI seems to be the way to go. I have not yet had the need to try this out myself, but different sources are saying good words about this solution. PF_RING alone will get you to the next level, and you can balance the traffic over several snort instances over several CPU cores and you’ll even get to a whole new level of high speed snorting!

–[ Additional notes ]–
Obviously, if you need to go as fast as possible, your system should not be used for lots of other different stuff. So keep your running processes/services too a minimum.

Snort is also, as far as I can tell, single threaded when it comes too packet inspection. There is a pdf here from Intel, explaining how Sensory Networks Software Acceleration Solutions boost performance of Snort and things alike, making them Multi-core enabled/aware.

That said, Snort benefits from sticking to one CPU, so using schedtool in a proper way, might help snort perform overall better. If you are running multiple instances of Snort on one multi-CPU server, you should use schedtool to stick each Snort process to its own physical CPU etc. Example:

$ man schedtool # and read about “AFFINITY MASK” and understand the difference between cpu-cores and hyper-threading etc.
$ schedtool <pid of snort> # Displays current settings
$ schedtool -a 0x01 <pid of snort> # Pin the snort process to one CPU (The first)
$ schedtool -M 2 -p 10 # Change the policy to SCHED_RR and set priority to 10 (0 highest, 100 lowest)
$ schedtool <pid of snort> # to verify your changes

That said, I’m confident that modern kernel handle this stuff in a good way, and I have not been able to measure and document any real benefit from this. And I don’t have time to set up an environment just for testing this.

Always when optimizing a system, you should have some sort of measuring system. I use Munin. I wrote some basic Munin plugins for Snort which monitors the most important stuff.

And as always,
“Measure, don’t speculate” — Unknown
“Premature optimization is the root of all evil” — Tony Hoare

Advertisements

4 thoughts on “Some notes on “making Snort go fast under Linux”

  1. This article is awesome! I think the author stands on the perspective of holism to figuring out what part(hardware, driver, userspace config) may be affecting Snort’s performance. Thanks anyway!

    Like

  2. LV says:

    Metaflows put together a pf_ring snort cluster of 8 on just on an i7 950 and the results suggest hyper-threading was beneficial. I’m trying to find other data to back that up, found this page in my quest…

    Like

  3. Nicholas says:

    If you are using an Intel Xeon-based system or newer, AND your motherboard supports it (and I have yet to see a Xeon-based system board not support it,) using Intel IOAT/DMA helps with disk IO issues when using snort in an environment where it is using a database as a backend. Other examples would be Snorby or SIEMs such as OSSIM.

    IOAT/DMA needs to be turned on in the BIOS of the system, and kernel module ioatdma needs to be loaded when the linux system is running. Without any fancy manipulation, IOATDMA will speed up CPU to disk operations, and I think CPU to RAM operations. With command arguements, IOATDMA can enhance capture performance because it creates multiple data paths between the NIC and the CPU. In the kernel and to snort it looks like multiple RX and TX rings.

    Last time I tried it, doing IOATDMA for the NIC is not recommended because it confuses snort/DAQ.

    IOAT/DMA is an Intel technology, and is not available for AMD platforms.

    Like

  4. Pingback: [9.075][OPEN] Slow Throughput - Astaro User Bulletin Board

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s