# pafka
**Repository Path**: zhanghao_4pd/pafka
## Basic Information
- **Project Name**: pafka
- **Description**: Pafka: Persistent Memory (PMem) Accelerated Kafka
- **Primary Language**: Java
- **License**: Apache-2.0
- **Default Branch**: bugfix/remove_legacy_code
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2021-06-09
- **Last Updated**: 2021-06-09
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
[](https://join.slack.com/t/memarkworkspace/shared_invite/zt-o1wa5wqt-euKxFgyrUUrQCqJ4rE0oPw)
[](https://github.com/4paradigm/pafka/releases)
[](https://hub.docker.com/r/4pdopensource/pafka-dev)
[](https://github.com/4paradigm/pafka/stargazers)
[](https://github.com/4paradigm/pafka/network/members)
[](https://github.com/4paradigm/pafka/blob/main/LICENSE)
Pafka: Persistent Memory (PMem) Accelerated Kafka
===
## Introduction
Pafka is an evolved version of Apache Kafka developed by [MemArk](https://memark.io/).
Kafka is an open-source distributed event streaming/message queue system for handling real-time data feeds efficiently and reliably.
However, its performance (e.g., throughput) is constrained by the disk bandwidth, which further deteriorates due to the file system overhead.
Pafka equips Kafka with Intel® Optane⢠Persistent Memory (PMem) support, which relies on the native pmdk libraries
rather than treat PMem as a normal disk device.
With careful design and implementation, Pafka can achieve 7.5 GB/s write throughput and 10 GB/s read throughput in terms of single-server performance. Futhermore, it is able to reduce the hardware total cost to 9% of the Kafka-based solution.
## Pafka vs Kafka
### Performance
We conducted some preliminary experiments on our in-house servers.
One server is used as the Kafka broker server,
and another two servers as the clients.
Each of the client servers run 16 clients to saturate the server throughput.
We're using the `ProducerPerformance` and `ConsumerPerformance` shipped by Kafka
and the record size of 1024 for the benchmark.
#### Server Specification
The server spec is as follows:
|Item|Spec|
|---|----|
|CPU|Intel(R) Xeon(R) Gold 6252 Processor (24 cores/48 threads) * 2|
|Memory|376 GB|
|Network|Mellanox ConnectX-5 100 GBps|
|PMem|128 GB x 6 = 768 GB|
The storage spec and performance:
|Storage Type|Write (MB/s)|Read (MB/s)|
|---|---|---|
|HDD|32k: 5.7
320k: 37.5
3200k: 78.3
|86.5|
|HDD RAID|530|313|
|Sata SSD|458|300|
|NVMe SSD|2,421|2,547|
|PMem|9,500|37,120|
For `HDD`, we use batch size of 32k, 320k and 3200k for write, respectively, while read does not change much as we increase the batch size.
For other storage types, we use batch size of 32k, as increasing to larger batch size does not increase the performance much.
For `PMem`, we use `PersistentMemoryBlock` of [pmdk llpl](https://github.com/4paradigm/llpl) for the performance benchmark.
#### Performance Results
## Get Started
For complete documentation of Kafka, refer to [here](README.kafka.md).
### Docker Image
The easiest way to try Pafka is to use the docker image: https://hub.docker.com/r/4pdopensource/pafka-dev
```
docker run -it -v $YOUR_PMEM_PATH:/mnt/mem 4pdopensource/pafka-dev bash
```
where $YOUR_PMEM_PATH is the mount point of PMem (DAX file system) in the host system.
If you use the docker image, you can skip the following `Compile` step.
### Compile
#### Dependencies
- [pmdk pcj](https://github.com/4paradigm/pcj)
- [pmdk llpl](https://github.com/4paradigm/llpl)
> :warning: **We have done some modifications on the original pmdk source codes.
> Please download the source code from the two repositories provided above.**
**Actually we have already shipped pcj and llpl jars in `libs` folder in the Pafka repository.
They are compiled with java 8 and g++ 4.8.5. In general, you are not required to compile the two libraries
by yourself. However, if you encounter any compilation/running error caused by these two libraries,
you can download the source codes and compile on your own environment.**
##### Compile pmdk libraries
After cloning the source code:
# compile pcj
cd pcj
make && make jar
cp target/pcj.jar $PAFKA_HOME/libs
# compile llpl
cd llpl
make && make jar
cp target/llpl.jar $PAFKA_HOME/libs
#### Build Pafka jar
./gradlew jar
### Run
#### Environmental setup
To see whether it works or not, you can use any file system with normal hard disk.
For the best performance, it requires the availability of PMem hardware mounted as a DAX file system.
#### Config
In order to support PMem storage, we add some more config fields to the Kafka [server config](config/server.properties).
|Config|Default Value|Note|
|------|-------------|----|
|storage.pmem.path|/tmp/pmem|pmem mount path|
|storage.pmem.size|21,474,836,480|pmem size|
|log.pmem.pool.ratio|0.8|A pool of log segments will be pre-allocated. This is the proportion of total pmem size. Pre-allocation will increase the first startup time, but can eliminate the dynamic allocation cost when serving requests.|
|log.channel.type|file|log file channel type. Options: "file", "pmem".