TWiki
>
Computing Web
>
LocalGridInternals
>
LocalGridBonding
(06 Oct 2011,
_47C_61UK_47O_61eScience_47OU_61Birmingham_47L_61ParticlePhysics_47CN_61lawrence_32lowe
?
)
(raw view)
E
dit
A
ttach
---+ Local Grid Bonding ---++ General DPM performance issues The data that can be read/written from a disk pool node is limited by several factors: * the speed that data can be read/written from DPM disk areas. This is limited by the intrinsic speed of the disk devices in the RAID, and the method of connection of the RAID units to the node, which in our case is via a dual SCSI interface, each at 320 MBytes/sec. * the speed that data can pass through the network between client and server. This is affected by number and speed of network interfaces, and the network switch setup. * in both these aspects, there is contention between the different clients accessing the SE or pool nodes simultaneously: each receive a share of the available bandwidth. This favours a setup where there are multiple disk pool nodes, either each exclusively handling a reasonably small quantity of data, or each able to have uncontended access to all the storage. * There is also contention within a worker node (WN), between the different clients (eg rfcp) on the same node, as these share the same WN network interface(s). ---++ Implementing network bonding I've implemented [[http://en.wikipedia.org/wiki/Channel_bonding][network bonding]] on the disk pool node epgsr1, in order to help with a bottleneck which became clear during the running of STEP 09. Subsequently we have implemented bonding on all our disk pool nodes. ---+++ Switch Setup The two gigabit interfaces on epgsr1 are now both connected to the dLink 48-port switch, on ports 17 and 18. In readiness, those two ports have been declared as _trunked_, using the switch GUI. The switch has been physically labelled accordingly. Trunking means that the switch will be aware that outgoing packets (from epgsr1) on those two ports can have the same source MAC address, and that incoming packets (to epgsr1) are to be _distributed_ amongst the two ports. In practice, the port offset (0, 1, ...) within the trunk-set is given by (source MAC addr XOR dest MAC address) modulo (number of trunked ports). This is confirmed by tests performed on several combinations of nodes. This is a form of load sharing which generally works quite well on average. It's an unfortunate fact, though, that all the MAC addresses of our Supermicro twin nodes are even, as is the MAC address of the edge switch gateway that communicates through to the outside world (including BlueBEAR WNs). So for incoming data, one switch port is used predominantly, rather than the load being shared across both the ports in the trunk. (Possible solutions are: use the eth1 port on half the workers in place of eth0; increase the number of gigabit interfaces on epgsr1 to 3). This is not a big issue for us, because most of the benefit of trunking is in _reading_ from the disk pool node, which is outgoing traffic, not incoming. The port used for outgoing traffic is determined by the _bonding_ module in the Linux system on epgsr1: see next section. Note added 2011: for some time now we have been making use of 4-way bonding on all our disk pool nodes. This improves the sharing of incoming data between the eth0-3 ports. ---+++ Pool node Setup A good reference for the kernel bonding module is /usr/share/doc/kernel-doc- _version_ /Documentation/networking/bonding.txt. which is also [[http://www.cyberciti.biz/howto/question/static/linux-ethernet-bonding-driver-howto.php][on the web]]. I've chosen to use the balance-rr mode of distributing packets, which means that packets are transmitted in round-robin (sequential) order to the ports in the bond. This provides load balancing and fault tolerance, as it says in the docs. In brief, * /etc/modprobe.conf has two lines added, to allow loading of the bonding driver when the interface is referenced: <verbatim> alias bond0 bonding options bond0 mode=balance-rr miimon=100</verbatim> * In /etc/sysconfig/network-scripts directory, files ifcfg-eth0, ifcfg-eth1 and ifcfg-bond0 were modified / created as required. In practice, this was first done in a duplicate directory network-scripts.bonding, with a copy of the original at directory network-scripts.normal, to make it easy to move between the two scenarios. * ifcfg-eth0: <verbatim> DEVICE=eth0 BOOTPROTO=none ONBOOT=yes MASTER=bond0 SLAVE=yes USERCTL=no </verbatim> * ifcfg-eth1-3: same as above except DEVICE=eth1 to DEVICE=eth3 * ifcfg-bond0: <verbatim> DEVICE=bond0 BOOTPROTO=none ONBOOT=yes IPADDR=147.188.xx.8 GATEWAY=147.188.xx.1 NETWORK=147.188.xx.0 NETMASK=255.255.255.0 USERCTL=no TYPE=Ethernet </verbatim> * The gigabit ports were connected to the new trunked pair on the dLink switch. * A service network restart was done, from the node console. * It took several seconds before network connectivity was resumed, while the switch worked out where to send packets in the new scenario. This was done on a live system, and network connections that were already in place continued without break. ---+++ Monitoring of the interfaces My ifrate command gives output like this, in a busy environment with lots of outgoing data, and a little incoming data: <pre><verbatim>bond0: 8 MB/s in 245 MB/s out eth0: 0 MB/s in 122 MB/s out eth1: 8 MB/s in 123 MB/s out</verbatim></pre> ---+++ Effect on Pool node CPU utilisation The rfiod daemons are more active now they can deliver data faster, but are still well within the capabilities of the AMD quad core processor on this node. Note added 2011: the disk pool nodes now use dual quad Intel processors. -- Originally created by Main.LawrenceLowe - 11 Jun 2009
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r5
<
r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r5 - 06 Oct 2011
-
_47C_61UK_47O_61eScience_47OU_61Birmingham_47L_61ParticlePhysics_47CN_61lawrence_32lowe
?
Computing
Log In
Computing Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
Webs
ALICE
ATLAS
BILPA
CALICE
Computing
General
LHCb
LinearCollider
Main
NA62
Publish
Sandbox
TWiki
Welcome
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback