rsync or cp over net based file systems

Read in 7 min.

Question

Skimming the net with all thinkable efforts I could find some passionately led discussions and interesting articles about cp and rsync performance comparisons. Foremost LWN-CP-RSYNC raised my attention for it promising in a technically perfect manner a more than doubling of throughput when opting for coreutils' cp in default local data migration setups instead for rsync.  For me being a Data Centre engineer,  I was wondering if cp would still outperform rsync when data is to be beamed over non-local, net based file systems. Although being vividly discussed, I could not find any recent and reliabe data to that topic online. I expected the for the local migration measured  findings to hold for net based file systems because of the minimalistic data pushing facilities of cp and the a little more complex architecture and algorithms of rsync. I did not consider cat or cpio here since both seem to show the more or less same performance level as cp.

Setup

I chose a minimal but still lifelike enough setup to compare the two tools. All components were based on KVM/Qemu  were controlled and run with libvirt mechanism, especially virsh. I won't go into detail about that is done as the respective docs are more exhaustive on that as I could. What I want to point out here is virt-builder of the libguestfs package which made creating the VMs quite comfortable and quick.

setup

Mainly, I opted for NFS as a network based FS as it is open and quickly and with ease to configure. Moreover, it's still quite widespread in data centres as well.

Basically, the two opensuse 42.1 based VMs were the NFS serving machines. The fedora24 Server Version VM had the NFS client data pushing role. Fedora ran on kernel 4.5.7-300.fc24.x86_64, the suse ones on 4.1.12-1-default. All machines had 2024 MB phys. RAM and 1 phys. Processor.

The virsh brought up and not manually tweakd virtual net I tested with:


[root@suse42_n1 \~]# iperf3 -s

[root@fedora24 \~]# iperf3 -c suse42_n1

[...]

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval Transfer Bandwidth Retr

[ 4] 0.00-10.00 sec 2.78 GBytes 2.39 Gbits/sec 15859 sender

[ 4] 0.00-10.00 sec 2.78 GBytes 2.39 Gbits/sec receiver


So not the fastest connection one seen in data centres but representative enough for the comparison.

The data source and sinks were 5GB large Virtio-Disks with BTRFS on them. Going for BTRFS was quite arbitrary since the local backend FS was not of to much of avail for the comparison.

The mounts:


[root@fedora24 \~]# nfsstat -m
/source from suse42_n1:/source
Flags: rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,

timeo=600,retrans=2,sec=sys,clientaddr=192.168.122.89,local_lock=none,addr=192.168.122.158

/sink from suse42_n2:/sink
Flags: rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,port=0,

timeo=600,retrans=2,sec=sysclientaddr,=192.168.122.89,local_lock=none,addr=192.168.122.52


Approach

All the action was on the fedora24 node for it being the data pushing entity. I decided to have several runs with different data constellations in /source. genbackupdata was my tool of choice for data generation although not fully working. It promises files with distributed shape, but it can produce files only uniformly. Its --delete, --rename, or --modify flags were somehow not implemented. Still, a good choice to create random data.

I went for 6 runs and generated for each.


genbackupdata --create=SIZE [--chunk-size=C_SIZE] [--file-size=F_SIZE] --depth=3 /source/

SIZE C\_SIZE F\_SIZE Gen. Files
100M - - 2M
80000 80000 10M
160000 160000 20M
500M - - 2M
80000 80000 10M
160000 160000 20M

For instance for genbackupdata --create=100M we get:

matthias@suse42_n1:\~> du -h /source/
2,1M    /source/0/0/0/0/0
2,0M    /source/0/0/0/0/1
2,1M    /source/0/0/0/0/6
2,1M    /source/0/0/0/0/2
[...]
17M    /source/0/0/0
17M    /source/0/0
17M    /source/0
100M    /source/


Before every run I diligently cleaned up all the caches:


echo 1 > /proc/sys/vm/drop_caches
echo 2 > /proc/sys/vm/drop_caches
echo 3 > /proc/sys/vm/drop_caches
swapoff -a
sync


I tracked every run:


time -a -o measurem -f "real %e user %U sys %S avg-io-pg-fault %F fs-in %I fs-out %O avg-mem %K max-resident %M avg-res %t cpu %P% "


The CMD itself and intermediate steps abstractly:

  1. run copy
  2. pollute source to make delta sync necessary
  3. run sync
  4. cleanup /sink

and  in practice then:


  • RSYNC:
    1. rsync -aHv --no-whole-file --progress /source/ /sink
    2. find /etc/ -type f | sort -R input | head -n 50 | xargs echo "`head -c 20 /dev/urandom`" >> {}
    3. rsync -aHv --no-whole-file --progress /source/ /sink
    4. rm -r /sink/*
  • CP:
    1. cp -au /source/cp -au /source/* /sink/* /sink/
    2. find /etc/ -type f | sort -R input | head -n 50 | xargs echo "`head -c 20 /dev/urandom`" >> {}
    3. rsync -aHv --no-whole-file --progress /source/ /sink
    4. rm -r /sink/*

Outcome

The table does not show CPU usage, since that was for all runs <5% for cp and near 20 % for rsync. Therefore, the net was mainly the bottleneck, as expected.

Data Mass Filesize Tool Elapsed Sys User
100M 2M RSYNC 229.15 2.91 0.80
1.34 0.20 0.02
CP 272.85 1.86 0.26
1.72 0.25 0.02
10M RSYNC 51.02 0.79 0.51
0.26 0.03 0.00
CP 53.00 0.38 0.04
0.26 0.04 0.00
20M RSYNC 30.42 0.52 0.47
0.15 0.02 0.00
CP 28.59 0.20 0.03
0.14 0.02 0.000
500M 2M RSYNC 1022.57 13.85 4.17
6.48 0.96 0.11
CP 957.04 7.73 0.99
6.22 0.95 0.09
10M RSYNC 244.52 3.63 2.54
1.26 0.02 0.19
CP 243.73 1.81 0.21
1.30 0.19 0.01
20M RSYNC 142.88 2.69 2.30
0.62 0.09 0.01
CP 132.73 0.95 0.08
0.67 0.10 0.00

[root@fedora24 \~]# nfsiostat

suse42_n2:/sink mounted on /sink:

ops/s rpc bklog

101.887 0.000

read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)

3.138 51.153 16.300 0 (0.0%) 0.549 0.560

write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)

6.654 109.051 16.389 0 (0.0%) 34.072 34.098

suse42_n1:/source mounted on /source:

ops/s rpc bklog

79.758 0.000

read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)

5.091 82.983 16.300 0 (0.0%) 0.730 0.740

write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)

4.714 77.251 16.389 0 (0.0%) 25.937 25.960


I stopped here because I conceived no further insights could be made by having more runs. I am prepared to get corrected on that.

Conclusion

Some insights I got herefrom were as expected. Others quite surprised me. I did not expect that the delta sync after a cp would be only \~10-15% percent less performant than the sync after a preceding rsync approach. Moreover, I expected both tools to have hard times when it comes to migrating small files, but I honestly fathomed cp to outstrip rsync here clearly. It does not. cp shows to be ahead when it comes to larger files. But still, the difference is not astoundingly significant so to speak. That may deviate with real hugely whopping files, what I may look into deeper.  What I mainly take away for me is, that in the average real world case it does not really matter which tool to choose for migrating data when  performance aspects with respects to network based file systems play a role. Coreutils' cp does not necessarly outperform rsync over network based file systems.

Category
Tagcloud
operations interpretation transmitting wan networks testbed cable to sky TX hint skb HAP flowing engineering tc tcp rmem ethtool bandwith product driver queues scaling KVM/Qemu satellites up/down link backbone internet transmission queues airborne networking stack config airborne internet backbone optics tcp wmem genbackupdata queuing discipline system operations rsync cp flowing Traffic Control traffic flows transeive window scaling congestion control theory queueing resource usage sending socket radio links transmit packet steering ibvirt FSO links egress data migration user land core emulator bottle-neck utility perf json flow grouping msi XPS net traversal NAPI comparison data series explanation machine readable wireless suite c-ISP future backbones ss cwnd speed of light stack traversal global backbone network high altitude platform vs Monitoring laser communication mechanism introspection network paths nfs space-to-earth-link trouble shooting virt-builder flent nic driver flows automation metrics airship softirq ebpf fibre backbone analysis bcc traffic volume multi queueing correlation internet engineering atmospheric mitigation techniques tcp python tcp/ip statistics linux engineering tool qdisc kernel research networking tcp flows Tuning performance linux network stack