New hardware
While we prepare the new and improved hardware to host our databases and parallel filesystems, we have been using a temporary system provided to us by the data center. All data is confirmed intact and there has been no data loss as we continue to recover. The recovery system is a stand-in for the storage server that failed, selected for hardware compatibility to recover the data. We will
not be continuing with the recovery system indefinitely, and it will be discontinued only once the new storage system has been fully installed and synced with the recovery system for a smooth handoff.
BOINC database is UP
The BOINC database is now up and running, joining the website/forums database which has been up since last week. However,
upload/download of workunits is paused until we restore the parallel filesystem that supports the workunit management stack, to the state it was in at the time of the hardware failure.
Deadlines have been extended and valid results computed during this pause will be credited when we resume.
Website crashes
During the hardware recovery process the website has been intermittently crashing. Looking into the cause we identified bugs that only present themselves in such cases as the BOINC database being offline, and other resources unavailable as we recover the system. The website will now remain available to users in these cases or restart automatically after crashing.
In the meantime, we have posted research updates from the
ARP and
MCM teams. We are planning on sharing more updates soon.
If you have any comments or questions, please leave them in this thread for us to answer. Thank you for your support, patience and understanding.
WCG team