Packaging Python

Dealing With Python Apps The punch story with python is an old one. We implemented an elasticsearch aggregator tool in the Brad release (now deprecated), we also leveraged the elasticsearch curator application that we provide as a deployable administrative service in the Craig release. More importantly, we decided to fully support pyspark and to make Read more about Packaging Python[…]

The Punch at Big Data Paris 2019

Big Data Paris 2019 Workshop Tuesday 12th of March, 14H00 Machine Learning for Critical Systems Damien Fontanes, Dimitri Tombroff Abstract In real-life critical systems and applications, accessing the data to test, tune and run artificial intelligence algorithms involved several under-estimated problems. The first challenge is mainly to provide consistent tools and execution engines usable for Read more about The Punch at Big Data Paris 2019[…]

News From Logstash

In the punch, we do not use logstash for high-performance logs parsing. Why is that? Mainly because logstash is nor easily scalable nor does it provide an end-to-end acknowledgment pattern. This is a serious lack because we cannot afford to lose logs whatever happens. We thus selected an alternative technology (apache storm) to run our equivalent input/filter/output processors, Read more about News From Logstash[…]

Punch Data Science Meetup

Punch Data Science Meetup When: on the 15th of October. Where: in Rennes at the Google atelier numérique. Organised as part of the Rennes Machine Learning Meetup /French Tech Speaker: Simon Grah from Thales Theresis Research Lab Topic: Maritime Traffic Anomaly Detection We are very happy to have Simon animate our next meetup in Rennes. Many thanks to  Camille Saumard (from lumenai), to invite Read more about Punch Data Science Meetup[…]

(PP-1703) additional carriage return make input topologies very slow

Description If logs sent using tcp contain extra carriage returns, the input topologies reading drop to a low rate. The reason is that these carriage returns are silently dropped (which is correct) in a way to cause Storm engine to think it should wait before getting the next logs. This (storm) behaviour is configured using Read more about (PP-1703) additional carriage return make input topologies very slow[…]

Posted in bug

(PP-1841) Kafka memory leak

Description Kafka brokers suffer from a memory leak (https://github.com/apache/kafka/pull/4307) related to the metric reporting feature. This is unfortunately not a feature that can be disabled. This leak affects version 0.10.x and 0.11.x releases and has been fixed in release 1.x only. Impact This leak will slowly fill the heap of kafka broker processes. It is easily identifiable Read more about (PP-1841) Kafka memory leak[…]

Posted in bug