WCG problems

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
I've noticed for a couple of days or so that results haven't been uploading. I tried to find out from WCG's website what was going on, and it seems only the front page is working.
Clicking on multiple links there, they all came up with errors, including their forums.

Anyone know what's going on?

>>>>[edit] Their forums are back up now, latest info here (or at the twitter link provided below by Skillz)<<<<
 
Last edited:

Skillz

Senior member
Feb 14, 2014
933
959
136
Last edited:

crashtech

Lifer
Jan 4, 2013
10,530
2,116
146

waffleironhead

Diamond Member
Aug 10, 2005
6,920
431
136
Thats why my potato fleet was inactive. Had to switch them over to asteroids for time being. Hopefully they get it sorted.
 

crashtech

Lifer
Jan 4, 2013
10,530
2,116
146
The most recent news on WCG's storage array problems, most recently edited (update #5, presumably) 2 days ago:
Hello everyone, hope you had a great weekend. We are still working with data centre to resolve the hardware failure so we can restart the storage, BOINC and website ASAP. We will post updates as we receive them. Thank you for your patience.
Update: Unfortunately, additional hardware problem on the storage server besides the RAID card are preventing us from restarting. Working with the data center on the alternative solutions.
Update #2: Unfortunately, the RAID controller was not the root cause of our storage system failure, the PCI bus failed. Data center is in the process of moving the disks to an alternate system and we will post updates as we progress. Once again, thank you for your patience.
Update #3: As of this morning, the data center continues to work on booting the temporary replacement DSS 7000 storage system. They are attempting multiple alternative strategies to resolve current failures.
Update #4: The "new" system did recognize the data hardware RAIDs. All have been rebuilt, and the data center is attempting to repair the OS drives/RAID.
Update #5: The storage server was revived yesterday late afternoon. Both database filesystems mounted as before, but the science filesystem did not. It needs a repair; erasing the old log first.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,599
14,575
136
So, yesterday, I checked and it looked like it was back up. So I added a project to BOINC using WCG, so it looked to be up. But its still waiting for a unit ??? Still a problem.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,599
14,575
136
They've put some info up on the WCG site:

Thanks. When it first went down, there was no way to get to that forum, so I was using facebook updates. Those stopped a few days ago. This is really a pisser.
 

crashtech

Lifer
Jan 4, 2013
10,530
2,116
146
Ran across this today, I may have missed some previous updates:
New hardware
While we prepare the new and improved hardware to host our databases and parallel filesystems, we have been using a temporary system provided to us by the data center. All data is confirmed intact and there has been no data loss as we continue to recover. The recovery system is a stand-in for the storage server that failed, selected for hardware compatibility to recover the data. We will not be continuing with the recovery system indefinitely, and it will be discontinued only once the new storage system has been fully installed and synced with the recovery system for a smooth handoff.

BOINC database is UP
The BOINC database is now up and running, joining the website/forums database which has been up since last week. However, upload/download of workunits is paused until we restore the parallel filesystem that supports the workunit management stack, to the state it was in at the time of the hardware failure. Deadlines have been extended and valid results computed during this pause will be credited when we resume.

Website crashes
During the hardware recovery process the website has been intermittently crashing. Looking into the cause we identified bugs that only present themselves in such cases as the BOINC database being offline, and other resources unavailable as we recover the system. The website will now remain available to users in these cases or restart automatically after crashing.

In the meantime, we have posted research updates from the ARP and MCM teams. We are planning on sharing more updates soon.

If you have any comments or questions, please leave them in this thread for us to answer. Thank you for your support, patience and understanding.

WCG team
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,599
14,575
136
Yeah, and my uploads are still stuck. It doesn't appear that this project has the budget or sense of urgency at Krembil that would be necessary to return it to its former glory...
Due to 1) my electric bill and 2) the lack of cancer research places that I will support (nothing to do with Russia) I am down to 8 computers running. But at least I have 60-70 mill ppd going for F@H. 4 5950x and 4 7950x configured for the power of 142 watts the same as 5950x. One 5950x will soon be replaced by a 7950x@142 watts, do it will be 3 and 5, with 2 idle 5950x's idle and for sale !
 

cellarnoise

Senior member
Mar 22, 2017
716
396
136
Due to 1) my electric bill and 2) the lack of cancer research places that I will support (nothing to do with Russia) I am down to 8 computers running. But at least I have 60-70 mill ppd going for F@H. 4 5950x and 4 7950x configured for the power of 142 watts the same as 5950x. One 5950x will soon be replaced by a 7950x@142 watts, do it will be 3 and 5, with 2 idle 5950x's idle and for sale !
Well maybe more medical than cancer (and with little Russia ties that I know of), but TN-Grid and Denis have tasks occasionally. They have been more on than off recently....

Not so much for you Mark, but for others that have not ran these projects before:
Denis

TN-Grid:

And Mark continues to run Rosetta of course :

Carry ON!
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,599
14,575
136
Well maybe more medical than cancer (and with little Russia ties that I know of), but TN-Grid and Denis have tasks occasionally. They have been more on than off recently....

Not so much for you Mark, but for others that have not ran these projects before:
Denis

TN-Grid:

And Mark continues to run Rosetta of course :

Carry ON!
Yup, at the moment, its Rosetta and F@H.

I do miss the competition, but I don't miss the "lets massacre your teammates using the cloud(in an inter-team competition)".
 
Last edited:

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,599
14,575
136
It didn't happen last year either.
Mr know-it-all, if you want to keep arguing, maybe I should quit the team altogether, since you seem to speak for it now.

Edit: and YOU are the reason that I quit competition.
 

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
I don't believe Skillz has ever claimed to speak for the team, and their are plenty of us with our own voices, so I'm not sure where you get that idea from.

I don't know (or recall) what the situation was regarding your statement that icloud was used in competition, IIRC it isn't (or wasn't at least) a banned practice in races.
That said, I'm on the fence as to whether it's fair. Is it any different than buying new computer gear? I suspect many of us who haven't commented about it are because of these uncertainties.
I vaguely recall that some of us discussed this in another thread, perhaps someone could link me to it if all angles were covered there? (was it the F@H race thread?[edit] Yep, from here, at a glance [edit 2] Nope, that wasn't the whole conversation, but more here). Failing that, I'd rather we didn't carry it on in this thread, but perhaps start a new thread if wanted?

Btw, I think it's rather off of you that you threaten to leave the team because of a disagreement with a single (?) team member.

*****************************************

Going back to the WCG problems, I've still got a bunch of tasks to upload, I assume they're not fully back up yet? Or is it just me? lol
 
Last edited:

crashtech

Lifer
Jan 4, 2013
10,530
2,116
146
@Assimilator1 and all, here's the latest:
March 27, 2023 update

Data transfer to new storage system
We have started transferring all data from the recovery storage unit to our new storage system on Friday. Based on the current rate of transfer, we expect to have all data transferred/verified later this week. We will then download all processed WUs, after which we can resume sending work units to volunteers. We plan to start with MCM and OPN/OPNG; followed by ARP and then the new SCC work units.

In the meantime, we have confirmed that our daily database backups for BOINC and for the website/forums are working. The databases have been recovered and transferred to the new, faster storage already. Incremental backup to tape archive has been implemented on the new storage.
 

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
Yep, mine are still stuck too.
From their forum :-

March 31, 2023 update

Data transfer update & maintenance check
Earlier in the week, we ran into HDD failures while transferring the data from recovery storage to the new storage system. The issue has since been resolved and we have resumed transferring data, expecting to finish it by 5pm today. At 4pm, we will be conducting a brief maintenance on the website and forums to transfer the DB2 filesystems to our new storage system, which will result in restricted access to the website for up to 30 minutes. If all goes well, this could be the final step towards the full storage system upgrade.

We evaluated the possibility of starting download of processed WUs, while not sending new WUs out. It was determined that the risk of complications that might result from doing this with incomplete information available to our scheduler and BOINC or any other unforeseen issues is too high. We have extended the deadlines for workunits that were processed and await upload to WCG.

While we wait for the data transfer to finish, we are working on resolving other long standing issues such as device recognition.
 
Last edited:
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |