Leveraging the Elastic Common Schema

Introduction The Elastic Common Schema (ECS) is a new normalized format proposed by the Elastic community. Although still in beta status, it is already usable, concrete and more importantly promising. The ECS idea is simple: benefit from a common specification to structure the data indexed in Elasticsearch. Such a data normalization makes it simple and Read more about Leveraging the Elastic Common Schema[…]

Introducing OmniSci

Before diving into leveraging OmniSci, let us quickly recap the well-known database story. Relational databases have long been the standard for storing pre-processed data. Although they cannot keep up with today’s applications requirements in terms of raw performance and volumetry, they stay by far the most popular and the most widely used databases. Their low Read more about Introducing OmniSci[…]

News From Logstash

In the punch, we do not use logstash for high-performance logs parsing. Why is that? Mainly because logstash is nor easily scalable nor does it provide an end-to-end acknowledgment pattern. This is a serious lack because we cannot afford to lose logs whatever happens. We thus selected an alternative technology (apache storm) to run our equivalent input/filter/output processors, Read more about News From Logstash[…]

Scaling with PML

To process and analyze data for typical analytics or machine learning use cases, it is common to hit storage and processing power issues. If the dataset is too big to be stored on your laptop drive, if it cannot be entirely loaded into memory, you must find practical solutions. It is, of course, possible to Read more about Scaling with PML[…]

Punch Data Science Meetup

Punch Data Science Meetup When: on the 15th of October. Where: in Rennes at the Google atelier numérique. Organised as part of the Rennes Machine Learning Meetup /French Tech Speaker: Simon Grah from Thales Theresis Research Lab Topic: Maritime Traffic Anomaly Detection We are very happy to have Simon animate our next meetup in Rennes. Many thanks to  Camille Saumard (from lumenai), to invite Read more about Punch Data Science Meetup[…]

Punchplatform Machine Learning (PML) for platform monitoring

Punchplatform periodically collects and stores data characterizing the health of the platform (metrics). It gathers both system metrics (CPU, RAM) and applicative metrics like the tuple travel time through a Storm topology. Since its last version, Punchplatform contains a specific module dedicated to machine learning based on Apache Spark: Punchplatform Machine Learning (check out this Read more about Punchplatform Machine Learning (PML) for platform monitoring[…]

Kafka Streams Integration

Context While working on a surveillance solution an interesting problem arose: an analysis component had to be built which had to be light, robust and highly adaptative. This component consists of a set of sequential or parallel tasks, each providing a unitary treatment. The solution is built on top of the Punchplatform the use of Read more about Kafka Streams Integration[…]