In the world of log parsing, strange acronyms are used for products (splunk, punch) and for operators (grok). At least dissect is more explicit : it allows you to efficiently cut a string into interesting sub-parts. The basic task you do to parse and normalise your data.

Recently Elastic introduced that new dissect operator. Check out this blog, it explains it all. A similar dissect punch operator is now supported as well. Here is how it looks when used in a punchlet :

dissect("%{clientip} %{ident} %{agent} [%{timestamp} %{+timestamp}]").on([logs][log]).into([logs][result])

It benefits from the compact and powerful punch syntax, including the method chaining style. What this does is to  automatically dissect your input data and stores the sub-matches nicely, in a single operation. Refer to the punch documentation for details.

The rationale of introducing dissect is twofold : performance and code clarity. The grok operator works with regexes. It is inherently less efficient than a split based operator. In addition regexes can turn out to be a nightmare when they fail to match your data. This is called regular expression denial of service (Redos). We experienced some redos in production when receiving error data, slowing down some punchlets from several thousands of logs per seconds to several hundreds.

Before having the dissect operator at hand, we (the puncher in charge of writing log parsers) took care of this by not overusing groks. We even designed some specific punch operators for common use cases such as parsing syslog headers.

Now we are simply dissecting rather than groking. It is easy to see the performance gain you achieve, use the punchplatform log injector to stress your punchlet, you will measure by yourself the benefits.


Categories: Technical

1 Comment

Brad Release 4.0.0 Annoucement - Punchplatform · January 11, 2018 at 15:27

[…] dissect operator, an efficient and optimized alternative to the grok well known operator. Refer to this blog […]

Leave a Reply