SoFiA Beta Update 1

Message boards : News : SoFiA Beta Update 1

To post messages, you must log in.

AuthorMessage
Profile Sam
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 9 Feb 17
Posts: 216
Credit: 7,636
RAC: 0
Message 746 - Posted: 6 Sep 2017, 3:07:54 UTC

Removed CNHI Filter from all test parameters. This was causing huge workunit runtimes, ultimately resulting in work units timing out.

Added a per-parameter timeout of 30 mins. The VM will now attempt to process each parameter set for a max of 30 mins before timing out. In my local tests, most parameters only take around 2 - 5 mins to run, so this should be more than enough time. I'll keep an eye on whether a 30 min timeout becomes an issue in the future.

Dropped VM RAM size to 128MB from 1024MB. This should significantly reduce the memory requirements of the host system.

Updated vboxwrappers for apple (x86 and x64) to version 26199 from version 26196


A new set of test work units are available now.
ID: 746 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Phil
Avatar

Send message
Joined: 9 Feb 17
Posts: 15
Credit: 155,143
RAC: 1,723
Message 747 - Posted: 6 Sep 2017, 7:35:24 UTC - in response to Message 746.  

Dropped VM RAM size to 128MB from 1024MB. This should significantly reduce the memory requirements of the host system.
I'm seeing very heavy disk access, has the VM run out of memory and is furiously swapping out? (A lunux top display would be useful here).

The VM will now attempt to process each parameter set for a max of 30 mins before timing out. In my local tests, most parameters only take around 2 - 5 mins to run, so this should be more than enough time. I'll keep an eye on whether a 30 min timeout becomes an issue in the future.
My BOINC tasks are taking about 40min but only consume about 6 mins CPU - again I'm wondering if all the spare time is just waiting for the pagefile.
ID: 747 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile redtiger

Send message
Joined: 9 Feb 17
Posts: 30
Credit: 160,591
RAC: 599
Message 748 - Posted: 6 Sep 2017, 9:22:17 UTC - in response to Message 747.  
Last modified: 6 Sep 2017, 9:38:13 UTC

Yes I think 128MB RAM limit is a little low.
My system (Linux) usually runs with no swap used and both processors usage about 99%.
The system is now using 168MB swap and one processor keeps dipping down below 50% giving it an average of about 47%.

I think increasing the RAM limit to 256MB would considerably improve the performance.

Note HTOP is showing VIRT at 1449M for all running VBoxHeadless tasks.
ID: 748 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dave Peachey

Send message
Joined: 9 Feb 17
Posts: 46
Credit: 767,358
RAC: 0
Message 749 - Posted: 6 Sep 2017, 13:26:44 UTC
Last modified: 6 Sep 2017, 13:30:27 UTC

Reasonably successful here - averaging around 10 minutes of CPU time against 13-15 minutes of elapsed for each WU (15 in all before I ran out of work). This is on a Win7 machine running BOINC 7.6.33 with VBox 5.1.26.

I didn't notice significant swap file usage (but, then again, I'm running Win7 not Linux ... unless this is a reference to the internals of the virtual machine which I didn't interrogate) and I do have 8GB RAM in this machine so increasing the RAM usage shouldn't be an issue if that were to be done.

I would also comment that the SoFiA vdi file (at 1.8GB) is quite a bit larger than that for SourceFinder (1.1 GB) which, if this is indicative of future activity, means that multiple instances of SoFiA will require that much more hard disk space to run!
ID: 749 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
iFoggz.canada

Send message
Joined: 17 May 17
Posts: 10
Credit: 0
RAC: 0
Message 750 - Posted: 6 Sep 2017, 18:54:22 UTC - in response to Message 746.  

Strangely I got the new work units to test out and they still running after 9 hours.

some of stderr.txt:

2017-09-06 02:09:49 (6063): Setting CPU throttle for VM. (100%)
2017-09-06 02:09:49 (6063): Setting checkpoint interval to 1000000 seconds. (Higher value of (Preference: 60 seconds) or (Vbox_job.xml: 1000000 seconds))
2017-09-06 03:43:29 (6063): Status Report: Elapsed Time: '6000.317496'
2017-09-06 03:43:29 (6063): Status Report: CPU Time: '5562.680000'
2017-09-06 05:23:31 (6063): Status Report: Elapsed Time: '12001.898253'
2017-09-06 05:23:31 (6063): Status Report: CPU Time: '11137.550000'
2017-09-06 07:03:29 (6063): Status Report: Elapsed Time: '18002.384977'
2017-09-06 07:03:29 (6063): Status Report: CPU Time: '16733.660000'
2017-09-06 08:43:29 (6063): Status Report: Elapsed Time: '24004.232098'
2017-09-06 08:43:29 (6063): Status Report: CPU Time: '22348.550000'
2017-09-06 10:23:32 (6063): Status Report: Elapsed Time: '30005.934255'
2017-09-06 10:23:32 (6063): Status Report: CPU Time: '27954.780000'


no updates in the VBox.log since start:

00:10:38.080696 VBVA: InfoScreen: [0] @0,0 800x600, line 0xc80, BPP 0, flags 0x5
00:10:38.080739 Display::handleDisplayResize: uScreenId=0 pvVRAM=00007f94e0596000 w=800 h=600 bpp=0 cbLine=0xC80 flags=0x5

vbox_checkpoint.xml:

<vbox_checkpoint>
<elapsed_time>35359.988713</elapsed_time>
<cpu_time>32818.260000</cpu_time>
<webapi_port>0</webapi_port>
<remote_desktop_port>0</remote_desktop_port>
</vbox_checkpoint>

idk where else to get information for u
ID: 750 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
iFoggz.canada

Send message
Joined: 17 May 17
Posts: 10
Credit: 0
RAC: 0
Message 751 - Posted: 6 Sep 2017, 21:02:21 UTC - in response to Message 750.  

they finally kicked it:

Wed 06 Sep 2017 12:00:20 PM PDT | duchamp | Aborting task 4_askap_cube_24_15_37_1: exceeded elapsed time limit 35582.52 (100000.00G/2.81G)
Wed 06 Sep 2017 12:00:32 PM PDT | duchamp | Computation for task 4_askap_cube_24_15_37_1 finished
Wed 06 Sep 2017 12:00:32 PM PDT | duchamp | Output file 4_askap_cube_24_15_37_1_r43155850_0 for task 4_askap_cube_24_15_37_1 absent

9.8 hours in .
ID: 751 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
iFoggz.canada

Send message
Joined: 17 May 17
Posts: 10
Credit: 0
RAC: 0
Message 752 - Posted: 6 Sep 2017, 21:03:08 UTC - in response to Message 751.  

note im on linux
ID: 752 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LumenDan

Send message
Joined: 9 Feb 17
Posts: 88
Credit: 164,599
RAC: 610
Message 753 - Posted: 6 Sep 2017, 23:36:22 UTC - in response to Message 750.  

Strangely I got the new work units to test out and they still running after 9 hours.


The first two Sofia 1.1 units that I got ran long as well, Virtualbox showed the VMs as inaccessible and reported a file creation error.
I restarted the the computer and cleared the the broken VMs, the next unit worked fine.
The affected machine is still running Virtualbox v5.1.22, my other computer with v5.1.26 ran 14 work units with no issue.
ID: 753 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
iFoggz.canada

Send message
Joined: 17 May 17
Posts: 10
Credit: 0
RAC: 0
Message 754 - Posted: 7 Sep 2017, 4:02:49 UTC - in response to Message 753.  

so u suggesting i just remove the duchamp completely and start fresh it did download a new vm image
ID: 754 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LumenDan

Send message
Joined: 9 Feb 17
Posts: 88
Credit: 164,599
RAC: 610
Message 755 - Posted: 7 Sep 2017, 7:28:41 UTC - in response to Message 754.  
Last modified: 7 Sep 2017, 7:36:46 UTC

so u suggesting i just remove the duchamp completely and start fresh it did download a new vm image

I opened VirtualBox and removed the broken VMs from there. Each task creates a unique Virtual Machine and some of the early tasks had been created with errors.
Here is what I did:
-Use BOINC Manager to suspend the Source Finder project.
-Use BOINC Manager to abort the long running Source Finder work units.
-Open VirtualBox user interface.
-Delete all of the broken BOINC virtual machine instances.
-Restart the computer.
-Use BOINC Manager to resume the Source Finder project.
ID: 755 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
iFoggz.canada

Send message
Joined: 17 May 17
Posts: 10
Credit: 0
RAC: 0
Message 756 - Posted: 7 Sep 2017, 22:40:37 UTC - in response to Message 755.  

ill figure out how to use the virutal bxo on linux to do that. i figured removing the project would of removing all those but it seems to remove all files from boinc but didnt stop it from running 9 horus. i'll try now
ID: 756 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
iFoggz.canada

Send message
Joined: 17 May 17
Posts: 10
Credit: 0
RAC: 0
Message 757 - Posted: 7 Sep 2017, 22:42:57 UTC - in response to Message 755.  

when i open virutal box in linux it shows nothing at all. i aborted the tasks reset the project and will remove it and re add the project
ID: 757 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
iFoggz.canada

Send message
Joined: 17 May 17
Posts: 10
Credit: 0
RAC: 0
Message 758 - Posted: 7 Sep 2017, 22:52:08 UTC - in response to Message 755.  

one more thing on all the tasks i crunched it was only windows users who successfully doing so. linux had error.
ID: 758 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Sam
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 9 Feb 17
Posts: 216
Credit: 7,636
RAC: 0
Message 759 - Posted: 8 Sep 2017, 22:58:07 UTC

It seems interesting that Windows users are having an alright time, but Linux users are seeing errors.

I'll take that recommendation and up the VM RAM size to 256MB, and hopefully we'll see less disk writes on Linux in the next beta update.

I'll see if I can compact the VM down to something below 1.8GB. You should expect the image to be larger than Duchamp, but certainly not 700MB larger.

Also, it will probably help to reset the project after each beta update. I know that can get a bit annoying, but I've found during my own testing that the project isn't very happy when I update the VM (I have to force a new app version for each VM change).
ID: 759 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yavanius
Volunteer moderator
Avatar

Send message
Joined: 12 Feb 17
Posts: 121
Credit: 163,211
RAC: 2,542
Message 760 - Posted: 9 Sep 2017, 15:05:45 UTC - in response to Message 759.  

You might chat with the LHC folks. They have multiple projects that are running on Vbox that they (relatively) recently combined under the main LHC banner.
ID: 760 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : News : SoFiA Beta Update 1


©2017 ICRAR