SoFiA Production Update 7 { Cancelled - See Forum for Details }

Message boards : News : SoFiA Production Update 7 { Cancelled - See Forum for Details }

To post messages, you must log in.

AuthorMessage
Profile Sam
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 9 Feb 17
Posts: 239
Credit: 7,636
RAC: 0
Message 1152 - Posted: 24 Jan 2018, 5:09:09 UTC

So the SoFiA devs managed to work out a fix to those extremely large memory requirements I mentioned last week, and I've been able to drop the VM memory requirements down to 2GB per VM. I know that's still a lot more than they were before, but it doesn't look like I'll be able to drop them any less without hitting some huge performance decreases (due to disk swapping).

The VM has been updated with this new version of SoFiA, and I've spend some extra time compacting it as far as possible. I managed to get the VM size below 500MB compressed!

The workunits from last week didn't release properly due to an issue with the workunit generator that I couldn't get around to fixing until now, but the test WUs should be flowing as I write. I'm interested to see how the system handles them.
ID: 1152 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kevin K

Send message
Joined: 15 Dec 17
Posts: 1
Credit: 123
RAC: 0
Message 1153 - Posted: 24 Jan 2018, 7:19:17 UTC - in response to Message 1152.  

May be having a problem with these new workunits. So far tasks are not finishing. Unless there's just a bug with the percentage reading.

The old Sofia tasks finished in 30 minutes for me. These new ones reach 50% in about 3 minutes, then slowly decelerate until they hit 100.000% after 45 minutes or so, and then just sit there for ages. Have not had a task finish yet. All stuck at 100% for a couple hours now.
ID: 1153 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
gary19821119

Send message
Joined: 14 Dec 17
Posts: 1
Credit: 5,724
RAC: 13
Message 1154 - Posted: 24 Jan 2018, 7:39:00 UTC

same here
ID: 1154 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Michael H.W. Weber
Avatar

Send message
Joined: 5 Jun 17
Posts: 5
Credit: 133,831
RAC: 24
Message 1155 - Posted: 24 Jan 2018, 8:46:11 UTC

Same here.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.


President of Rechenkraft.net.
ID: 1155 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Gap Filler

Send message
Joined: 3 Nov 17
Posts: 5
Credit: 91,710
RAC: 751
Message 1159 - Posted: 25 Jan 2018, 2:39:34 UTC - in response to Message 1155.  

To add to the noise, me too.

I aborted 3 jobs after 14,8500 and 8,600/8,600 seconds (WUs 27158, 271600 and 271601).
3 more have failed with "Error while computing" after 23,350 seconds (WUs 27601, 271604 and 271605)
WUs 271599 (ready to start) and 271602 (running) are still "In progress", but I expect "Eic"s in the next 24 hours.

All jobs start saying estimated time is 3:53, but get to 0 secs remaining after about 30 minutes then switch to --- remaining for the next several hours before "Error in computing"

I started with Vbox 5.1.26 but upgraded to 5.2.6 to see if it helped. It didn't.
boinc 7.8.3 (x64)

Hope this helps.
djc
ID: 1159 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
G_UK
Avatar

Send message
Joined: 7 May 17
Posts: 22
Credit: 74,799
RAC: 14
Message 1160 - Posted: 25 Jan 2018, 3:42:43 UTC

I gave it a good whack but this batch but I get the same as everyone else.

My VBoxSVC.log file has a lot of these errors in it.

ERROR [COM]: aRC=E_ACCESSDENIED (0x80070005) aIID={4afe423b-43e0-e9d0-82e8-ceb307940dda} aComponent={MediumWrap} aText={The object is not ready}, preserve=false aResultDetail=0

Gridcoin: Rx5iQUC9fdZkYuxrjW6ySV6Jfttsw5Ub2L
Bitshares: g-uk https://wallet.bitshares.org
Ethereum: 0x418025088C2638c6816561fC2e026AE78daa2c8c
Storj: 0x734E41c433DE29383957A80dc57B8D025dd326b5
ID: 1160 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Gap Filler

Send message
Joined: 3 Nov 17
Posts: 5
Credit: 91,710
RAC: 751
Message 1161 - Posted: 25 Jan 2018, 7:47:07 UTC - in response to Message 1159.  

From the boinc logfor WU 271602, talk 668302
25/01/2018 5:04:22 PM | duchamp | Aborting task sofia_20_askap_cube_1_10_39_0: exceeded elapsed time limit 23321.30 (100000.00G/4.29G)
25/01/2018 5:04:42 PM | duchamp | Computation for task sofia_20_askap_cube_1_10_39_0 finished
25/01/2018 5:04:42 PM | duchamp | Output file sofia_20_askap_cube_1_10_39_0_r1949228642_0 for task sofia_20_askap_cube_1_10_39_0 absent
25/01/2018 5:04:42 PM | duchamp | Starting task sofia_20_askap_cube_1_10_36_0
25/01/2018 5:06:07 PM | duchamp | Sending scheduler request: To report completed tasks.
25/01/2018 5:06:07 PM | duchamp | Reporting 1 completed tasks
25/01/2018 5:06:07 PM | duchamp | Not requesting tasks: "no new tasks" requested via Manager
25/01/2018 5:06:15 PM | duchamp | Scheduler request completed

Another few hours down the drain :(
ID: 1161 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TIQA

Send message
Joined: 21 Jan 18
Posts: 1
Credit: 0
RAC: 0
Message 1162 - Posted: 25 Jan 2018, 11:23:18 UTC

i aborted after 8hours of running
ID: 1162 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Michael H.W. Weber
Avatar

Send message
Joined: 5 Jun 17
Posts: 5
Credit: 133,831
RAC: 24
Message 1163 - Posted: 25 Jan 2018, 13:42:27 UTC

I aborted all tasks and won't resume to contribute to this project unless the project admin resolves and explains the issues.
I had also criticised before that there is an unacceptable I/O on the hard drives with this project hwne running multiple tasks on the same machine. No reply here, too.

Michael.
Fördern, kooperieren und konstruieren statt fordern, konkurrieren und konsumieren.


President of Rechenkraft.net.
ID: 1163 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Gap Filler

Send message
Joined: 3 Nov 17
Posts: 5
Credit: 91,710
RAC: 751
Message 1164 - Posted: 26 Jan 2018, 2:51:14 UTC - in response to Message 1163.  

Re I/O usage with multiple tasks running - this may be a paging issue.
How much physical memory do you have in your system?
Does the I/O reduce when only running one task?
ID: 1164 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
siliconpop

Send message
Joined: 26 Jan 18
Posts: 1
Credit: 0
RAC: 0
Message 1165 - Posted: 26 Jan 2018, 4:40:41 UTC - in response to Message 1164.  

I am now also not able to process any tasks as well I get very, very long run times and even if it reaches 100 % the it does not upload.

To give you some back ground, i have 10 computers running Boinc from Raspberry Pis to Ryzen 7 - 1700X but Sourcefinder will only run properly on an old Dell 3770, Windows 7 with 8 GB ram that I picked up used. Not sure why but only this machine worked really well with Sourcefinder and it worked well up until this last batch. (it was in the top 10 over Xmas) .

But even the Dell will not run these new batch of applications.

As an aside I also think greatly increasing the file size and/or greatly increasing the memory usage will further limit the number of computers that will be able to help you out.

Regards

Kirk
ID: 1165 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile tullio

Send message
Joined: 9 Jan 18
Posts: 5
Credit: 115
RAC: 0
Message 1166 - Posted: 26 Jan 2018, 6:48:28 UTC

I aborted two tasks on my Linux box which were running at 100% after 11 hours. I now have one running but I put NNT not to download are more task.
Tullio
ID: 1166 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
adrianxw

Send message
Joined: 4 May 17
Posts: 55
Credit: 45,194
RAC: 0
Message 1167 - Posted: 26 Jan 2018, 8:12:28 UTC
Last modified: 26 Jan 2018, 8:24:15 UTC

I'm curious about their testing procedures, my report is in another thread. How can such a serious fault across architectures and operating systems not be seen and corrected before release? After the last problems I'd set NNT, but allowed the all new shiny version to download a single workunit and saw exactly this issue, ran to completion in 45 minutes and just went on and on and ..... until aborted. It is the entire BOINC system they are risking with this kind of thing, inexperienced users connect, and this happens, and figure BOINC is not for them.

I'd first reported this issue last July.
ID: 1167 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LumenDan

Send message
Joined: 9 Feb 17
Posts: 101
Credit: 182,042
RAC: 518
Message 1173 - Posted: 27 Jan 2018, 0:15:46 UTC - in response to Message 1167.  

Is the new version of SoFiA chirping the data?
I remember that bypassing the chirping step is what resolved the time limit exceeded issues we had with the initial release.
ID: 1173 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TorqueMaster

Send message
Joined: 13 Jul 17
Posts: 1
Credit: 151
RAC: 1
Message 1177 - Posted: 29 Jan 2018, 5:11:38 UTC

I was thrilled to get some work on this project, but the two of my machines that are getting work are crashing/hanging. Didn't put it together until the 2nd one started doing it. I'll check back someday to see if the bugs are worked out, until then, adios!
ID: 1177 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yavanius
Volunteer moderator
Avatar

Send message
Joined: 12 Feb 17
Posts: 146
Credit: 169,205
RAC: 98
Message 1190 - Posted: 31 Jan 2018, 2:00:55 UTC
Last modified: 31 Jan 2018, 2:10:40 UTC

Cross-post from Sam in Number Crunching:


Message 1186 - Posted: 31 Jan 2018, 1:14:24 UTC

This latest set of WU's seem to have been a bit of a disaster.
The requirement to up the VM memory size to 2GB, along with the attempt to run all parameters in each workunit instead of splitting them has just caused the whole system to fall over. Honestly, this is my fault and I apologise.

I'm cancelling the current set of workunits because I doubt any of them are going to run to completion, then I'm going to create a sofia_beta application that people can opt into to test the system with these new 100MB workunits.

Unfortunately I can't do much about the new memory requirement. One of the SoFiA developers was nice enough to provide a fix to reduce the application's memory requirements, but it's still not nearly as low as I'd like.

I'm beginning to suspect that this is going to be a lot more work to get workunits this big to play nicely on BOINC.


---/---
If you want to comment more on the issues, please do so in the Number Crunching forum under Endless WUs; https://sourcefinder.theskynet.org/duchamp/forum_thread.php?id=244

Thank you!

~Yav
ID: 1190 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : News : SoFiA Production Update 7 { Cancelled - See Forum for Details }


©2018 ICRAR