TWiki
>
Computing Web
>
LocalGridJournal
(revision 13) (raw view)
Edit
Attach
---+ Local Grid Journal 2013 This is a reverse order diary of events, without retrospective editing (so keep it raw and short, max ~ 3 lines). See other pages like LocalGridInternals for more carefully considered documentation. | 20130429 | MWS | Problem found with right (1) channel on f14. Swapped into one daisy chain with other RAIDs on epgsr1 but missed out this channel on f14. Also, switched all LUNs to the one channel in f14 config | | 20130405 | LSL | Tidying epgd workers. Moved eight 160GB disks from epgd21-24 to epgd01-08, and two 160GB disks from epgd20 to epgd12 and epgd15 (each had one bad disk). Installed ten new 500GB disks in epgd20-24. So epgd01-19 now have two 160GB, epgd20-24 now have two 500GB. | | 20130405 | MWS | Upgraded CREAM, Torque and BDII to EMI2 and jobs seem to be going through OK. Still have ARGUS and APEL to go but I'm waiting for the accounting to sort itself out first. | | 20130405 | MWS | Applied the security kernel upgrade to all nodes and rebooted | | 20130220 | MWS | Added another QOS group of 'high' for all sgm/ops jobs as setting the priority on hte groups didn't seem to be working | | 20130220 | MWS | Changed the sysctl network parameters for the pool nodes as these apparently stop the slow transfer to BNL problem (https://ggus.eu/ws/ticket_info.php?ticket=86105). This is still not understood! | | 20130219 | MWS | Disabled the cache in Argus as I've heard this will stop the argus service from failing once a week | | 20130131 | MWS | Set the parameter FSQOSWEIGHT and removed user and group ones in the maui.cfg to actually make it register the new QOS fairshares | | 20130130 | MWS | After running process accounting on epgf01 for 24 hours and checking the log files didn't blow up, I've enabled process accounting (psacct) on all machines through puppet | | 20130128 | MWS | Changed the MAUI config to use QOS Fairshare targets for the various 'groups': Alice(60):Atlas(30):LHCb(5):Others(5). Blanked all the others. | | 20130125 | MWS | Added the OpenIPMI-tools to puppet so all machines will now have this installed. | | 20130124 | MWS | Released the limit of ALICE jobs in MAUI so they are now only controlled by fairshare. I'll see if we get any more problems with too much load on the epf* nodes. | | 20130122 | MWS | Changed the fair share for ATLAS jobs to 15% from 23% as the combined amount of pilot and prod jobs were ~50% (i.e. too high) and blocking the site. Need to improve the fairshare to take account of multiple groups... | | 20130121 | MWS | Installed VomsSnooper (and the required java-1.7.0-openjdk) on epgpe03 to ease keeping the VOMS info up to date. Go to /opt/GridDevel/vs_scripts, run set_rpm_paths.sh, go to usecases/newVomsRecsForMySite, run voidRun.sh and then use void/xml/site-info.def to update the site-info.def template. | | 20130121 | LSL | C6145: Dell informs that C6145 batteries for RAID units have a short life and can be proactively replaced. | | 20130108 | LSL | BB2 nodes: Investigate 10GbE driver: SL5 built-in version doesn't work, so download mlnx4_en driver version 1.5.9 from Mellanox website, and get it working. | | 20130108 | LSL/MWS | BB2 nodes: decide it best not to ask ITS to replace Mellanox !ConnectX-3 firmware for PXE as this will make swap of a failing node much more difficult. | | 20130104 | MWS/LSL | SE (implemented on epgpe10) using EMI2, previously flakey, is now performing reliably since nscd service put into use, caching DNS requests | | 20130103 | LSL | RAID f25 switched on, logical drives already in place since before Christmas, now logical volumes created and then partitioned, as required for ESDS raids. | Previous journals: LocalGridJournal2012, LocalGridJournal2011, LocalGridJournal2010, LocalGridJournal2009. Created Main.LawrenceLowe - 07 Jan 2013
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r14
<
r13
<
r12
<
r11
<
r10
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r13 - 29 Apr 2013
-
_47C_61UK_47O_61eScience_47OU_61Birmingham_47L_61ParticlePhysics_47CN_61mark_32slater
?
Computing
Log In
Computing Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
Webs
ALICE
ATLAS
BILPA
CALICE
Computing
General
LHCb
LinearCollider
Main
NA62
Publish
Sandbox
TWiki
Welcome
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback