So you have a network forensics solution that is able to capture and store terabytes of packets? Presumably, this represents days of network traffic. Tell me, how long will it take to find the needle in the haystack? No, don’t rely on the product’s marketing literature. Those figures are usually contrived to represent a perfect and best case scenario. Rather, give it a real-world test and see how the numbers work for your situation.
For example, assuming you have a few days of recorded packets from a 10 Gig network, how long will it take to find the number of times an obscure IP address traversed your network in the last 72 hour? In terms of time, if your solution takes just as long (or longer) to find the packets as it took to capture, then is it really a solution? More importantly, if you are an incident responder, would you find this response time acceptable?
One of my favorite explanations about Big Data goes something like this:
Big Data is about strategy. It is about being able to process a colossal amount of data within a desirable time frame.
I couldn’t agree more — especially when it comes to capturing terabytes, even petabytes, of network traffic. Packet capture is the epitome of Big Data. By definition it is high velocity, high volume, and high variety. Therefore, finding the “needle in the haystack” is a colossal challenge that requires a solid strategy.
We recently had a large enterprise customer place one of our packet capture probes along side their legacy solution. They wanted to compare the relative performance between the two “big data” solutions by applying the use case mentioned above. The incumbent took over 16 hours to complete the query. In contrast, our solution took less than a minute to complete the same query — while capturing at 10Gbps.
Yep, most certainly, not all Big Data solutions are the same. It’s definitely about strategy. How good is your vendor’s Big Data strategy? A quick way to find out is to apply a “needle in the haystack” test.