GSoC 2017 Final Report

Introduction

The last week of GSoC is approaching. We have managed to achieve a lot, but there is still room for improvements and new technologies that need a try. In this blog post, I will quickly summarize what we have done so far, what I particularly have and what can be improved/explored more.

What was done

Overall, the biggest new is that we completely managed to remove JMS and use Apache Kafka instead of it. That was our main goal when we started the project. Goal set, goal met :smiley: If a push message is sent via UPS REST API, a Kafka producer handles it and adds it to a topic. A Kafka stream is initialized based on this topic. It processes the push messages to 6 different output topics based on its variant type (Android, iOS, Windows and etc.). These messages go through further complicated processing which includes different Kafka consumers and producers until final push message sending is triggered.

But not only push sending part can move to Kafka, collecting of push metrics is also meant to be implemented with Kafka. We have already started by having

As a continuation of GSoC, more metrics using Apache Kafka can be collected. Hopefully, one day - sooner or later, our GSoC Kafka POC will be part of the master branch.

UnifiedPush Server pull requests

Kafka CDI library pull requests

Problems

Improvements

Other assignees’ tasks

Overall Stats

TODO List

There are three major topics that we started but there are still things which can be added or improved. One of them is a stress/performance test of UPS after we replaced JMS with Kafka. In other words how the usage of Kafka improves the throughput. The second is the collection of push metrics - we did some initial tasks but we believe that more is yet to come. And the last one is Kafka Security. Its research was done and concrete Jira tasks were created. Though it was not the highest priority for us, there is no way GSoC POC project to be merged to master and used if security is not implemented.

Jira epics for these topics can be found here:

Future Work

Initially, when I applied for GSoC, integration of HBase was part of the plan. Unfortunately, we did not have the time to do it. HBase is the Hadoop database, a distributed, scalable, big data store. It provides linear and modular scalability; strictly consistent reads and writes (no “eventually consistent”); automatic and configurable sharding of tables; automatic failover support between RegionServers and easy-to-use Java API for client access. HBase can improve read/write access in the UPS, so we will leave it as an idea for GSoC Aerogear UnifiedPush Server 2018 :wink:

Share your feedback and stay tuned for my next post.
Do not overthink, be happy and just keep smiling…
Polina