The future of Sourcefinder

Message boards : News : The future of Sourcefinder

To post messages, you must log in.

AuthorMessage
Profile Sam
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 9 Feb 17
Posts: 167
Credit: 7,636
RAC: 1
Message 688 - Posted: 19 Jul 2017, 0:14:08 UTC
Last modified: 2 Aug 2017, 5:31:09 UTC

As some of you will remember from my previous post, Sourcefinder is going to see some significant changes coming in the next few months. I thought it was time I properly outlined exactly what's happening.

The introduction of SoFiA
I'm currently working on integrating the SoFiA sourcefinding application in to Sourcefinder. In order to integrate SoFiA cleanly, I'm working on a fairly significant overhaul to a lot of the Sourcefinder backend systems that will allow support for multiple sourcefinding applications. My aim here is to make it as easy as possible to add new sourcefinding applications to the system in the future. If anyone is interested, you can see the changes I'm making in the module_rework branch of our git repository.
I'll most probably be sending out quite a few test work units while working on integrating SoFiA, so you'll probably get odd spats of work until it's integrated properly.
Once SoFiA is working correctly, we'll be processing all of the work units in the simulated cube again, but this time using SoFiA instead of duchamp.

SoFiA vs Duchamp research paper
The scientists who will be using the data from this project plan on writing a research paper comparing the performance of Duchamp and SoFiA as sourcefinders. From what I've been told, the data analysis side of this project is most likely to be performed by an ICRAR studentship student either at the end of this year or the end of next year.
I plan on ensuring that as many people as possible who contributed to Sourcefinder will have their names/usernames listed in the research paper before it's published.
You'll hear more about this paper in a few months once SoFiA is integrated properly in to Sourcefinder.

Real data from ASKAP
As I stated in the previous post, we should have some real data from ASKAP to process on Sourcefinder in the coming months. The moment this data becomes available to me, I'll be sending out work units for Duchamp, and later for SoFiA. I don't have a timeframe on when this data will be available aside from "soon", but I'm hoping we'll see it within a few months.

Visualisation of Sourcefinder results
I plan on developing a little web applet that will probably live on http://www.theskynet.org to allow anyone to view the sources found by Duchamp and SoFiA. My current plan for this applet is to display an image of the cube slice that the source was found in, a small highlight indicating the source, and a list of the users who contributed to finding the source.
I'll be starting work on this applet after SoFiA is integrated in to Sourcefinder.

Workunits
The workunits that were lost in the back end storage issue that I spoke about in the last post has all been reprocessed (thank you!). This means there wont be a significant number of workunits for Sourcefinder for a little while. I'll try to make this period as short as possible (hopefully a month or two at most), but it really depends on how easy it is to integrate SoFiA.

Project URL change
At some point I plan on changing the project URL from https://sourcefinder.theskynet.org/duchamp to https://sourcefinder.theskynet.org/sourcefinder. The original name 'duchamp' was a carry over from before I inherited this project. I didn't think we'd be running multiple applications, so I just left it. Obviously once SoFiA is working, the 'duchamp' part of the URL wont make much sense, so I'll be changing it to the more generic 'sourcefinder'. I'll give everyone a weeks notice before I change anything, so you should have time to change over easily. I'll also ensure the old URL still works, but simply re-directs to the new one.

Gridcoin whitelist
There's currently a poll up for adding Sourcefinder to the gridcoin whitelist. If you're interested in voting yes or no, please check out the post Erkan made about it.

I think that's about everything I have for now. I'll try to keep everyone as updated as possible on all of these issues.
Thank you again for helping out with Sourcefinder!

Edit: Additional Information as of 2nd August, 2017
The ASKAP data is still a work in progress, and I've been given an ETA of "before the end of this year". Data measurements on ASKAP have been taken at different rotations of the Earth, and so need to be Doppler corrected to be stacked in to a cube appropriately. This process is still being worked on, but they expect to make significant progress on finalising it in September.

SoFiA work units will have to be around 100mb as opposed to the 10mb of Duchamp work units. I've been told that this is because SoFiA requires a larger cube to develop a source reliability estimate. The 10mb cubes that Duchamp used simply aren't large enough to develop a meaningful reliability measure.
In order to not reduce the number of work units by a factor of 10, I plan on releasing the same cube multiple times with a different parameter set for each work unit.
Originally with Duchamp, each cube was released as one work unit with 176 different parameters to run on that cube.
With SoFiA, each cube will be released in multiple work units, with a smaller number of parameters per cube.
Ultimately, this will result in a set of larger, and slightly longer work units than Duchamp.
ID: 688 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Gunde

Send message
Joined: 12 Jul 17
Posts: 1
Credit: 88
RAC: 0
Message 690 - Posted: 19 Jul 2017, 16:28:05 UTC - in response to Message 688.  

Thanks for the info Sam
ID: 690 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
iFoggz.canada

Send message
Joined: 17 May 17
Posts: 10
Credit: 0
RAC: 0
Message 693 - Posted: 23 Jul 2017, 2:19:45 UTC - in response to Message 688.  

I'm on board. get the work going :) cheers and welcome it looks like your going to make it on the whitelist so far.
ID: 693 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
G_UK

Send message
Joined: 7 May 17
Posts: 3
Credit: 8,509
RAC: 0
Message 694 - Posted: 29 Jul 2017, 3:25:00 UTC - in response to Message 688.  
Last modified: 29 Jul 2017, 3:25:32 UTC

I am happy to say that it looks like the project has been whitelisted, I am sure project stats will be included shortly in a coming SuperBlock.

We stand ready to crunch when the new data becomes available. I've already re-attached the project ready to go.

Edit: typo
ID: 694 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Sam
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 9 Feb 17
Posts: 167
Credit: 7,636
RAC: 1
Message 695 - Posted: 31 Jul 2017, 0:04:21 UTC - in response to Message 694.  
Last modified: 31 Jul 2017, 0:09:49 UTC

Excellent, I'm really glad to hear it. When I get the chance, I'll ask around to see if the ASKAP data is ready yet.

EDIT: Also, I estimate that SoFiA should be ready for test work units within the next few weeks (maybe 3 weeks or so).
ID: 695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mercosity

Send message
Joined: 27 May 17
Posts: 2
Credit: 0
RAC: 0
Message 696 - Posted: 1 Aug 2017, 13:49:09 UTC

Looking forward to 'crunching' this project.

Mercosity -Team Gridcoin
ID: 696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Sam
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 9 Feb 17
Posts: 167
Credit: 7,636
RAC: 1
Message 697 - Posted: 2 Aug 2017, 5:30:18 UTC

I have a few pieces of additional information to add after a meeting today:

The ASKAP data is still a work in progress, and I've been given an ETA of "before the end of this year". Data measurements on ASKAP have been taken at different rotations of the Earth, and so need to be Doppler corrected to be stacked in to a cube appropriately. This process is still being worked on, but they expect to make significant progress on finalising it in September.

SoFiA work units will have to be around 100mb as opposed to the 10mb of Duchamp work units. I've been told that this is because SoFiA requires a larger cube to develop a source reliability estimate. The 10mb cubes that Duchamp used simply aren't large enough to develop a meaningful reliability measure.
In order to not reduce the number of work units by a factor of 10, I plan on releasing the same cube multiple times with a different parameter set for each work unit.
Originally with Duchamp, each cube was released as one work unit with 176 different parameters to run on that cube.
With SoFiA, each cube will be released in multiple work units, with a smaller number of parameters per cube.
Ultimately, this will result in a set of larger, and slightly longer work units than Duchamp.
ID: 697 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
iFoggz.canada

Send message
Joined: 17 May 17
Posts: 10
Credit: 0
RAC: 0
Message 699 - Posted: 3 Aug 2017, 2:15:59 UTC - in response to Message 697.  

Excellent cant wait to start my client has been attached for awhile now waiting to begin hopefully get some units in near future :)
ID: 699 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
G_UK

Send message
Joined: 7 May 17
Posts: 3
Credit: 8,509
RAC: 0
Message 700 - Posted: 3 Aug 2017, 19:47:22 UTC

Sourcefinder is now being included in the Gridcoin Superblock.

https://www.gridcoinstats.eu/project/sourcefinder
ID: 700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
hsdecalc

Send message
Joined: 29 Apr 17
Posts: 2
Credit: 61,518
RAC: 104
Message 701 - Posted: 5 Aug 2017, 8:33:51 UTC

Gridcoin new poll:
whitelist poll: sourcefinder with zero workunits for at least three weeks. remove from whitelist?
Poll will be closed on the 10rd of august.
ID: 701 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
LumenDan

Send message
Joined: 9 Feb 17
Posts: 73
Credit: 125,827
RAC: 188
Message 702 - Posted: 5 Aug 2017, 10:53:17 UTC - in response to Message 697.  
Last modified: 5 Aug 2017, 10:54:47 UTC

In order to not reduce the number of work units by a factor of 10, I plan on releasing the same cube multiple times with a different parameter set for each work unit.

This sounds like a good way to go but please make sure the cube data files remain cached on the client machines until all of the child work units have been processed so that the 100Mb data files don't have to be downloaded multiple times on the same machine. Also try to ensure that a single machine can receive multiple work units utilising the same (sub)cube data when requesting work thus reducing the average amount of data downloaded per work-unit. You can probably use the <sticky/> tag to keep the data files cached (similarly to the parameters file) but you will need to clean up the obsolete data regularly so as not to bloat file storage on client machines.
ID: 702 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
G_UK

Send message
Joined: 7 May 17
Posts: 3
Credit: 8,509
RAC: 0
Message 703 - Posted: 6 Aug 2017, 21:38:48 UTC - in response to Message 701.  

The new poll was created by a user due to concerns that a handful of people running Gridcoin still had Recent Average Credit (RAC) on the project and therefore were getting unearned rewards.

As a compromise the project has now been temporarily removed from the whitelist to stop people from being unfairly rewarded, it will be re-added when work is available.

In my opinion the current poll has now been invalidated.
ID: 703 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Sam
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 9 Feb 17
Posts: 167
Credit: 7,636
RAC: 1
Message 704 - Posted: 9 Aug 2017, 0:12:27 UTC - in response to Message 702.  

Certainly, I was already thinking that a 100mb work unit is pretty large to expect people to download more than once.
It looks like I'm able to make all cube files sticky, and use DeleteFile to explicitly inform hosts to remove cube files once all work units associated with that cube file have been completed.

Alternatively, this page on BOINC file management says "On the client, input files are deleted when no workunit refers to them, and output files are deleted when no result refers to them."
This seems to imply that the client will automatically keep files locally, then delete them when all associated work units are complete.
ID: 704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Yavanius
Avatar

Send message
Joined: 12 Feb 17
Posts: 53
Credit: 29,911
RAC: 51
Message 705 - Posted: 12 Aug 2017, 17:57:46 UTC - in response to Message 688.  

Hi Sam,

In order to effectively compare Duchamp and SoFiA, wouldn't you need to run both sourcefinders on the new data coming in? Even additionally or alternatively, running SoFiA on the past data?

Then there would be an exact comparison and maybe even see if one or the other excels in certain types of comparisons and possibly integrate into the other or create a whole new sourcefinder.

Best,

~Y
ID: 705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Sam
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 9 Feb 17
Posts: 167
Credit: 7,636
RAC: 1
Message 707 - Posted: 14 Aug 2017, 0:02:52 UTC - in response to Message 705.  

Yup, that's the current plan.
When the ASKAP data is available, we'll be running it through both Duchamp and SoFiA.
Once I get SoFiA up and running, we'll also be running the simulated data through it, the same way we did for Duchamp.
ID: 707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
frolo

Send message
Joined: 9 Feb 17
Posts: 2
Credit: 188
RAC: 0
Message 710 - Posted: 15 Aug 2017, 20:54:58 UTC

Can you update ssl certificate for the site?
ID: 710 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Sam
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 9 Feb 17
Posts: 167
Credit: 7,636
RAC: 1
Message 712 - Posted: 15 Aug 2017, 23:39:23 UTC - in response to Message 710.  

Yup, done.
ID: 712 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Sam
Volunteer moderator
Project administrator
Project developer
Project tester
Volunteer developer
Volunteer tester
Project scientist
Help desk expert

Send message
Joined: 9 Feb 17
Posts: 167
Credit: 7,636
RAC: 1
Message 713 - Posted: 23 Aug 2017, 0:25:43 UTC

Just a short development progress update,

The newly restructured workunit generation pipeline is working for both SoFiA and Duchamp. I'm currently working on writing the SoFiA validator and restructuring the Duchamp validator. After that, the only remaining item is the assimilator.
When all of the server apps are finished, I'll be pushing out some small batches of SoFiA and Duchamp work units just to ensure the new system works fully (you'll still get credit for these workunits!).

So I also said in a previous thread post that SoFiA would need 100MB cubelets as opposed to the 10MB ones we'd been using so far. I've decided that I'm going to run SoFiA on BOTH the 10MB cubelets and the 100MB cubelets. This not only gives us a pile more workunits to run, it'll also allow us to make a determination on the influence of the cube size and SoFiA's sourcefinding accuracy. We may find that the 10MB cubelets will work just as well as the 100MB ones, or we may find that the 10MB cubelets don't work at all. Regardless, it gives us the ability to say that we've tried both and have the data to back up our findings.
ID: 713 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : News : The future of Sourcefinder


©2017 ICRAR