Leveraging the Elastic Common Schema

Introduction

The Elastic Common Schema (ECS) is a new normalized format proposed by the Elastic community. Although still in beta status, it is already usable, concrete and more importantly promising. The ECS idea is simple: benefit from a common specification to structure the data indexed in Elasticsearch. Such a data normalization makes it simple and efficient to process data coming from various sources (firewalls, systems, servers, applications, sensors, whatever).
What are the key strengths of ECS ?

 

  • An identical naming convention for all Elasticsearch-based projects
  • The entire Elastic suite is adopting this nomenclature (beats, logstash, APM, …)
  • The proposed field naming is both simple and clear.
  • It makes it easier to design common Kibana dashboards acting on different sources of data
  • It simplifies the user experience navigating from one domain to another: cybersecurity, network, application, cloud resources etc..
To learn more, check out this excellent webinar. The source code is directly available on github.

 


ECS and the Punch

We did not wait for the Elastic team to come up with a standard taxonomy to tackle this kind of problems. Several nomenclatures to name the log data fields already exist. We based ours on Open XDAS. Open XDAS is a standard model for working and thinking with events related to network components and security data analysis. Because the punch is used in production for several years now, we have enriched and adapted this format with our additional functional needs.

 

What is the interest of the Punch to move from Open XDAS to ECS then? Simply said it is strategic. The punch is above all an Elasticsearch-centric solution, and ECS  precisely aims at unifying the various data source formats so that it becomes simple and efficient to process that data as one well-normalized dataset. In turn, such a common schema will allow the emergence of a market place providing elasticsearch users with processors, parsers, dashboards.

 

In addition, some of the beats already rely on that schema. The elastic stack soon to come Version 7  will use it for the majority of its components. In short: we have been waiting for such a standard for a long time.

 


Punch ECS Integration

The Punch is particularly modular by design. To allow our users to easily switch to ECS, we simply provide a new Punchlet. A punchlet is a micro-processing module, that user can deploy anywhere in their pipelines. This new punchlet is named the “ecs-convertor.punch” and is now available on the Thales inner source punch marketplace, and of course to our external customers.

 

This is yet another example where the punch pipeline modularity excels. Using a simple single punchlet, our 70 standard log parsers can now generate ECS conforming data just by a configuration change. What you do is to switch from :

 

 

to:

 

Where the ECS punchlet is depicted in red. It cannot be simpler.


ECS in Action

Let us see ECS in action. Once ECS data has been ingested in Elasticsearch, here is a standard and simple dashboard (one available on the Punch market place). It illustrates how various log data coming from different sources can all be visualised on a common dashboard.  Note that this is ready to be demonstrated on a punch standalone distribution.

Let us focus on the details to better understand what the ECS is about. Consider an  HTTP access Apache log.  Here is a comparison of the final document obtained before and after the conversion to ECS format.
The original apache log is:

Here are the fields:

Converted to ECS Original Punch fields Value
ecs.version v1.0.0-beta2
labels.channel channel apache
labels.tenant tenant mytenant
labels.vendor vendor apache_httpd
event.type type web
event.action action Not Found
event.created obs.ts 2012-12-31T03:00:00.000+01:00
event.original message Feb 21 10:48:35 host2 189.134.68.95 – alice [31/Dec/2012:03:00:00…
event.alarm.id alarm.id 160018
event.severity alarm.sev 2
source.address 189.134.68.95
source.ip init.host.ip 189.134.68.95
source.geo.city_name init.usr.loc.cty_short Mexico City
source.geo.country_iso_code init.usr.loc.country_short MX
source.geo.country_name init.usr.loc.country Mexico
source.geo.location.lat init.usr.loc.geo_point[1] 19.4342
source.geo.location.lon init.usr.loc.geo_point[0] -99.1386
source.user.name init.usr.name alice
observer.hostname obs.host.name host2
http.request.method web.request.method GET
http.request.referrer web.header.referer http://www.example.com/start.html
http.response.body.bytes session.out.byte 8368
http.response.status_code web.request.rc 404
http.version web.header.version 1.0
url.path target.uri.urn /software/winvn/index.php?q=3#article
url.query q=3
url.fragment article
user_agent.original web.header.user_agent Mozilla/5.0 (Linux; Android 5.1.1; Nexus 5 Build/LMY48B; wv …
parser.name apache_httpd
parser.version 1.2.0
col.host.name punch-elitebook
lmc.parse.host.ip 127.0.0.1
lmc.parse.host.name punch-elitebook
lmc.parse.ts 2019-03-06T14:30:05.552+01:00
obs.ts 2012-12-31T03:00:00.000+01:00
rep.host.name host2
rep.ts 2019-02-21T10:48:35.000+01:00
size 325

 

What Next?

The punch parsers are provided as modules. They can be deployed in various pipelines. Leveraging the ECS format the punch will make their customers themselves pay much more attention to their data normalization. And in turn, they will immediately benefit from the Elastic ecosystem. Well, this actually is already the case,  we now deploy solutions based on beats rather than any other metric or log agents for various Thales applications.

And because they do that, they can now execute machine learning processings on top of their data using the Punch PML feature.

Stay tuned for more news on this soon.

Guillaume Fayemi, Dimitri Tombroff

Thanks for reading our blog. 

Questions ? contact@punchplatform.com