WCG problems

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

crashtech

Lifer
Jan 4, 2013
10,530
2,116
146
Latest:
WU Distribution Update

We are working towards resuming a consistent WU supply similar to what we had before the storage system failure. The recent sparsity of OPN1 WU was caused by a batch that has blocked the create-work process for all other projects. We have found and fixed the glitch, and the system is busy creating work for OPN1 right now. We still have an ARP1 backlog of unsent results (see ARP project update ), but we now have a spare capacity for a larger backlog. After OPN1 work units are prepared, the system will prepare ARP1 work units.

On the back end, we still had to finalize setup of the new storage as there was a networking issue that was preventing us from accessing the tape archive. Data center admins have helped to fix it, and the production system on the new storage is being backed up.

We continue to investigate the errors in the BOINC system services, specifically assimilators and validators. Unfortunately, the application is written such that an unexpected error halts the service (which happened when our storage system failed). We are attempting to clear out the problematic data to allow the applications to continue processing other results, but BOINC doesn't seem to have an easy method of flushing specific workunits or results out of its system.

If you have any comments or questions, please leave them in this thread for us to answer. Thank you for your support, patience and understanding.

WCG team

 

StefanR5R

Elite Member
Dec 10, 2016
5,539
7,879
136
I received ARP1 work (it's the only WCG subproject which I have selected currently), but its large result data are uploading very slowly — considerably slower than my own slowish internet uplink would allow. It's a combination of generally low transfer rate with occasional transient HTTP errors.

It's a good thing that the client eventually stops to request more work when there are too many uploads in progress (per project). Although right now this client-side stopper isn't even triggered, as the server-side ARP1 work supply appears to have dried out again already. (This subproject has always been submitting work in waves, not contiguously.)
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,599
14,575
136
WCG login gives:

503 Service Unavailable​

No server is available to handle this request.


And after waiting a while, then I get:

System error​

 

StefanR5R

Elite Member
Dec 10, 2016
5,539
7,879
136
I am idly wondering how much of the downgrade of WCG by its transition from IBM to Krembil might be due to loss of knowledge and experience, due to downsized manpower, or due to downsized equipment (servers including networking, storage etc.) — if any of it actually was downsized, that is.
 

Assimilator1

Elite Member
Nov 4, 1999
24,120
507
126
Website working for me (although it loaded slowly), but I've got no WUs.
I can't see a link for server status
 

Kiska

Golden Member
Apr 4, 2012
1,015
290
136
I am idly wondering how much of the downgrade of WCG by its transition from IBM to Krembil might be due to loss of knowledge and experience, due to downsized manpower, or due to downsized equipment (servers including networking, storage etc.) — if any of it actually was downsized, that is.

I believe that all of that was downsized...

We know that the site was moved to the Krembil compute cluster which has seen network performance degradation... and perhaps even compute as well. WCG runs on IBM WebSphere and last I know they have no experience in using that piece of software.

IBM WebSphere gets mentioned here https://web.archive.org/web/20220513180748/https://www.worldcommunitygrid.org/news/0512 and https://web.archive.org/web/20220511184724/https://www.worldcommunitygrid.org/news/0510

From those 2 news posts, I'll speculate that both knowledge and experience was lost. And from this tweet I would say the compute environment has also been reduced. At least when it was on IBM systems, they could just spin more if needed I believe

Website working for me (although it loaded slowly), but I've got no WUs.
I can't see a link for server status
They never had a server status page to begin with
 
Reactions: cellarnoise

Kiska

Golden Member
Apr 4, 2012
1,015
290
136
I am idly wondering how much of the downgrade of WCG by its transition from IBM to Krembil might be due to loss of knowledge and experience, due to downsized manpower, or due to downsized equipment (servers including networking, storage etc.) — if any of it actually was downsized, that is.

Found some more info about the reduction:

Jurisica:
Bare minimum operation is 2 technical and 1 communication staff.
And the WCG budget:


From: https://www.cs.toronto.edu/~juris/jlab/wcg.html
 

Exascaletech

Junior Member
Dec 31, 2023
19
3
16
For those that browse their website, there's not a lot of happy campers when it comes to how the program is being run. Funding will always be an issue with DC but communication is free and that is something they lack also. And yeah to the comment that these issues almost never get fixed if it happens on a Friday as most likely no staffing on weekends. I get the feeling WCG is going to have to make some big changes one way or another, can't survive like this.
 
Reactions: Markfw
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |