Czech Dutch English French German Spanish
Home | Search | About | Forum | Login

BOINC

FAQ Service (English)

 

boinc_graphics_make_shmem failed: 2

FAQ: boinc_graphics_make_shmem failed: 2
Title: boinc_graphics_make_shmem failed: 2
Author: Jorden
Views: 28479
Category: 08. Project Application Errors
Available in: English
Created: 07/02/2010 00:50:41
Last Modified: 07/02/2010 00:50:41

Contents:

Despite its name, this is a science or project application error.

`boinc_graphics_make_shmem()` (called from the main science application) creates a shared memory segment of a predefined size. When this shared memory segment cannot be made, e.g. due to out-of-memory problems, the application will fail to start up and fail the task.

boinc_graphics_make_shmem failed: 2 means literally:
ENOENT No such file or directory. A component of a specified pathname did not exist, or the pathname was an empty string.
 

Breakpoint Encountered (0x80000003) at address 0x77F767CD

FAQ: Breakpoint Encountered (0x80000003) at address 0x77F767CD
Title: Breakpoint Encountered (0x80000003) at address 0x77F767CD
Author: Jorden
Views: 34097
Category: 08. Project Application Errors
Available in: English
Created: 12/01/2009 15:40:30
Last Modified: 13/01/2009 12:38:33

Contents:

This error can happen when a task has exceeded the maximum CPU time or maximum disk space. The abort message causes the science application to call DebugBreak() which causes the 0x80000003 error code.

The idea being that if either the memory or CPU usage has been exceeded then there is either a memory leak or an infinite loop that the project needs to debug. DebugBreak() will cause the Windows Debugger to start up and dump a stack trace of the stuff the application was doing, and report it back to the project.

Most of the times you come across this error message however is when you have manually aborted one or more tasks that were running at that time. Aborting running tasks will also call the DebugBreak() routine and do the same dump of information as before.

Address 0x77F767CD
This doesn't mean you have a hardware error. The address 0x77F767CD is the user breakpoint set in ntdll.dll which will allow for a graceful break of your software, rather than dump you to a blue screen of death.
 

Can’t create shared memory: system shmget (Macintosh)

FAQ: Can’t create shared memory: system shmget (Macintosh)
Title: Can’t create shared memory: system shmget (Macintosh)
Author: Odysseus
Views: 38573
Category: 08. Project Application Errors
Available in: English
Created: 16/09/2007 22:28:07
Last Modified: 16/09/2007 22:28:07

Contents:

BOINC science applications use shared memory to communicate with the core client, and a certain amount is reserved for each current task, whether running or waiting. The default configuration of a multi-CPU Mac (regardless of how much RAM is installed) is sometimes inadequate to support several projects at once.

If you get this error message, which is usually followed by a “Couldn't start or resume: -144” error in the current computation, it may help to set your BOINC general preference “Leave applications in memory while suspended?” to no; however, the problem will probably just become less frequent rather than being solved, and moreover some projects’ applications, those that save checkpoints infrequently (or erratically), may not take well to this setting.

A better solution is available—if you don’t mind reconfiguring your system’s kernel a little. The method, accompanied by a more detailed description of the issue, is outlined in Configuring Shared Memory on Mac OS X from Spy Hill Research (hosts of the BOINC Pirates@home project). It involves creating a text file “/etc/sysctl.conf” that contains the following commands:

Code:
kern.sysv.shmmax=16777216
kern.sysv.shmmin=1
kern.sysv.shmmni=128
kern.sysv.shmseg=32
kern.sysv.shmall=4096

and rebooting the computer. This will quadruple the default allocation of shared memory.

Note that the problem may manifest with slightly different error messages: “Can't create shared memory: system shmat” followed by “Couldn't start or resume: -146”.
 

CreateProcess() failed - Access is denied. (0x5)

FAQ: CreateProcess() failed - Access is denied. (0x5)
Title: CreateProcess() failed - Access is denied. (0x5)
Author: Jorden
Views: 36701
Category: 08. Project Application Errors
Available in: English
Created: 07/08/2008 09:04:24
Last Modified: 05/12/2009 20:49:43

Contents:

You start to see errors like this one:
8/6/2008 2:14:18 AM|Einstein@Home|Starting h1_0228.00_S5R4__46_S5R4a_0
8/6/2008 2:16:19 AM|Einstein@Home|[error] Process creation failed: Access is denied. (0x5)
8/6/2008 2:18:19 AM|Einstein@Home|[error] Process creation failed: Access is denied. (0x5)
8/6/2008 2:20:20 AM|Einstein@Home|[error] Process creation failed: Access is denied. (0x5)
8/6/2008 2:22:21 AM|Einstein@Home|[error] Process creation failed: Access is denied. (0x5)
8/6/2008 2:24:21 AM|Einstein@Home|[error] Process creation failed: Access is denied. (0x5)
8/6/2008 2:24:22 AM|Einstein@Home|Computation for task h1_0228.00_S5R4__46_S5R4a_0 finished
8/6/2008 2:24:22 AM|Einstein@Home|Output file h1_0228.00_S5R4__46_S5R4a_0_0 for task h1_0228.00_S5R4__46_S5R4a_0 absent

This is caused by something blocking BOINC from starting up the science application. Always check that you allowed BOINC through your firewall and exclude both the BOINC and BOINC Data directories from actively being scanned by your anti virus and anti spyware product(s).

Put BOINC (boinc.exe and boincmgr.exe) in the trusted zone of your firewall and only scan the directory or directories by hand with your anti-virus and other anti-malware software, after you closed down or suspended BOINC.
 

Einstein - Exit code 10

FAQ: Einstein - Exit code 10
Title: Einstein - Exit code 10
Author: Jorden
Views: 36889
Category: 08. Project Application Errors
Available in: English
Created: 12/08/2007 23:59:53
Last Modified: 12/08/2007 23:59:53

Contents:

Exit code 10: It means that the App could not resume from a previously written checkpoint. Again, the output listed in stderr out of the result should give a hint why. Most of the errors we get of this type are apparently due to a broken harddisk sector or even filesystems (e.g. some have the checkpoint file point to what looks like a portion of the client_state.xml). Again there's one error of this type we are trying to understand better in order to do something about it: It's an empty checkpoint file, in which case there will be an "EOF encountered" listed at the bottom of stderr out.
 

Einstein - Exit Code 99

FAQ: Einstein - Exit Code 99
Title: Einstein - Exit Code 99
Author: Jorden
Views: 38042
Category: 08. Project Application Errors
Available in: English
Created: 12/08/2007 23:58:28
Last Modified: 12/08/2007 23:58:28

Contents:

I asked Bernd Marschalk of Einstein what is causing the Exit Code 99 that is hampering the S5R2 run at this moment.

It means that the HierarachicalSearch's main() exited with -1, which is an error code. There should be a dump of the LALStatus structure in stderr. In most cases you'll find an "Input Domain Error" in the stderr output, that means that something is broken in the input data representation in memory, although nothing apparently went wrong during reading the data. We are still looking into this (actually I'm getting increasingly desperate about it).

So this could be caused by bad data or just a typo in a file. They are looking into it.

Resetting the project and forcing BOINC to download either a new datafile, or redownloading the old one should fix this.

Additional info from this thread:
Exit code 99: This means that the App terminated because an internal check failed. Again there should be something at the end of stderr out that allows to further diagnose the problem. If stderr out lists "file SFTfileIO.c" at the bottom, the check that failed was a sanity check of the data read from the input files. Resetting the project and thus downloading a fresh set of data files might help. Again, there is one type of error we are working on to better understand what's happening in order to prevent this from happening again: In these cases the following lines are shown at the bottom of stderr out:
[CRITICAL]: Required frequency-bins [-8, 8] not covered by SFT-interval [...]
XLAL Error - LocalXLALComputeFaFb (LocalComputeFstat.c:536): Input domain error
 

Einstein - Exit codes -1,0,1

FAQ: Einstein - Exit codes -1,0,1
Title: Einstein - Exit codes -1,0,1
Author: Jorden
Views: 37029
Category: 08. Project Application Errors
Available in: English
Created: 13/08/2007 00:01:35
Last Modified: 13/08/2007 00:01:35

Contents:

Exit codes -1,0,1: These look like a program other than the BOINC Client (such as a malware scanner) terminated the App in the middle of crunching. The stderr out doesn't show anything helpful in these cases. Again, Bernd (and probably a lot of participants) would be thankful for a hint why these are happening.
 

Error 107

FAQ: Error 107
Title: Error 107
Author: Jorden
Views: 37786
Category: 08. Project Application Errors
Available in: English
Created: 19/12/2006 11:30:26
Last Modified: 19/12/2006 11:30:26

Contents:

Anyone getting work unit crashes with 107 error codes, typically caused by graphics problems, would do well to disable the screen saver and avoid maximizing the the graphics window that's accessed through the Boinc Manager button.

There's a lot of self-help available in the READ ME posts here:

http://www.climateprediction.net/board/viewforum.php?f=36&sid=6fba3f1d34bb9c0971d14f3a0eccdee2

They were compiled for ClimatePrediction members, but if you think of a 'model' as a 'work unit', most of the advice is useful for crunchers of any project.
 

exit code -1 (0xffffffff)

FAQ: exit code -1 (0xffffffff)
Title: exit code -1 (0xffffffff)
Author: Jorden
Views: 35565
Category: 08. Project Application Errors
Available in: English
Created: 09/05/2008 23:49:52
Last Modified: 09/05/2008 23:49:52

Contents:

Exit code -1 (0xffffffff) seems to be an incompatibility between the science application and your video card. You'll run into it when you use the graphics or screen saver, tasks running at the time when either is activated will error out with the above error message.

It's always good to report these errors on the project forums, especially when the project doesn't have graphics.

Mostly found on ATI video cards.
Try updating your video card drivers and DirectX version.

But the only sure workaround in case the project does have graphics is not to look at the graphics or to use the screen saver.
 

exit code -12 (0xfffffff4)

FAQ: exit code -12 (0xfffffff4)
Title: exit code -12 (0xfffffff4)
Author: Joe W. Segur
Views: 9141
Category: 08. Project Application Errors
Available in: English
Created: 24/03/2012 07:20:54
Last Modified: 24/03/2012 10:26:49

Contents:

Specifically in the S@H code it means "Unsupported function". BOINC doesn't know that so just calls it "unknown".

For the CUDA GPU code for triplets it is one of two conditions:

1. More peaks above threshold in one array than the code is prepared to handle. Stock builds allow 10, optimized builds 11.

2. More triplets in one array than the code is prepared to handle. Stock CUDA builds quit when a second triplet is found, optimized builds handle two but quit if a third is found.

Both of those are based on how much memory would need to be set aside to handle more. The GPU is doing very many triplet searches simultaneously in parallel and each search needs separate space to store that kind of information. It adds up to quite a lot.
 

exit code -4 (0xfffffffc)

FAQ: exit code -4 (0xfffffffc)
Title: exit code -4 (0xfffffffc)
Author: Jorden
Views: 34606
Category: 08. Project Application Errors
Available in: English
Created: 25/06/2008 15:33:21
Last Modified: 25/06/2008 15:33:50

Contents:

This error notifies you of problems with your page file. It may be accompanied by
"Project Application Name" error -4 Can't allocate memory

It happens when your page file is full or trying to grow as big as all the free space on your drive.
Make sure you have free disk space, otherwise clean out the drive the page file lives on, or move your page file to a drive that is big enough to hold it.

Information about and how to move the page file:
in Windows Vista
in Windows XP
in Windows NT and 2000
in Windows 9x

Calculating page file sizes in 64bit Windows.
 

exit code 1073807364 (0x40010004) (on Windows Vista)

FAQ: exit code 1073807364 (0x40010004) (on Windows Vista)
Title: exit code 1073807364 (0x40010004) (on Windows Vista)
Author: Jorden
Views: 42255
Category: 08. Project Application Errors
Available in: English
Created: 22/05/2008 11:10:36
Last Modified: 22/05/2008 11:10:36

Contents:

This error has the same ground as exit code -1073741510 (0xc000013a) (On Windows Vista) has for the BOINC client errors.

When you log off of Windows the application gets terminated quite abruptly. Apparently BOINC/the science applications don't like that too much. Vista can shut down in 2 seconds... that's not enough time for BOINC to stop.

So before you shut down your computer next time, exit BOINC.
- If you run as a normal user install, it's done through Boinc Manager, File->Exit.
- If you run as a service install, you need to stop the service first. Start->Run, type net stop boinc and hit enter.

Or:
Copy the following text and paste it into the Notepad window, called WaitToKill.reg

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control]
"WaitToKillServiceTimeout"="20000"


The 20000 is 20 seconds. Increase/decrease to your liking.
To add this to the registry double-click the file. The UAC will come up, press continue, press Yes on the next window and OK on the one there-after. You need to reboot to make the changes have effect.

You can also use the following bit of code in a batchfile (.bat).
Put it in Notepad, save it as Shutdown.bat (make sure the extension is .bat, not .txt)
Do change the path between %PROGRAMFILES% if your BOINC lives elsewhere.

Code:
cd %PROGRAMFILES%\BOINC
boinccmd --quit
shutdown -s -t 20


Updating to BOINC 5.10.45 can also fix this error. It has special code in for Vista to count this as a non-error.
 

Exit Codes

FAQ: Exit Codes
Title: Exit Codes
Author: Jorden
Views: 34891
Category: 08. Project Application Errors
Available in: English
Created: 21/12/2006 13:44:38
Last Modified: 21/12/2006 13:44:38

Contents:

0. SUCCESS

Not an error message. It's a success message. Rejoice!

-1. CANT_CREATE_FILE



-2. READ_FAILED



-3. WRITE_FAILED



-4. MALLOC_FAILED

Memory allocation failed.

-5. FOPEN_FAILED

File Open has failed. Make sure you have read/write rights to the whole BOINC directory and sub directories, that all files are not read-only. Also make sure your BOINC directory isn't hidden.

-6. BAD_HEADER



-7. BAD_DECODE



-8. BAD_BIN_READ



-9. RESULT_OVERFLOW

In the Seti project this is a benign error. It means that there were more radio signals found than there is space in the output file to store them in. This is most usually caused by extra terrestrial RFI signals. Not aliens.

-10. UNHANDLED_SIGNAL



-11. FP_ERROR



-12. ATEXIT_FAILURE
(Use WinError.h to trouble shoot.)

 

Incorrect function. (0x1) - exit code 1 (0x1)

FAQ: Incorrect function. (0x1) - exit code 1 (0x1)
Title: Incorrect function. (0x1) - exit code 1 (0x1)
Author: Jorden
Views: 33338
Category: 08. Project Application Errors
Available in: English
Created: 15/04/2009 09:42:03
Last Modified: 13/12/2009 17:21:29

Contents:

I'll be adding possible causes when I see them.

1. When running into this error on CUDA, check your videocard driver. For most CUDA projects the absolute minimum driver version is 177.35, anything below it can cause this error.

2. When running into this error on CUDA and your driver is above the minimum of 177.35, but your BOINC version is below 6.6.20, this may be a stuck task in the video-memory. Do a full power-cycle, aka reboot your computer, to clear up anything stuck in memory.

3. When running into this error on CUDA and your driver is 195.62, you may want to downgrade to a previous stable version. Lots of people report problems with the 195.62 driver. Just remember that the latest isn't always the greatest.

4. It happens on occasion that people see this error when they run a screen saver as well as use the GPU for calculations. Please don't use the (BOINC) screen saver. Your GPU is already busy with doing many calculations, it cannot show intricate 3D patterns on your monitor at the same time. Using a screen saver will also use lots of video memory that you cannot do without on doing CUDA calculations.

5. You may have a memory problem on the card. Use either of the below testers to check the memory and logic of the card. Note: This memtest version is for Nvidia cards only, it does not work on ATIs.

MemtestG80. Copy the cudart.dll file from C:\Windows\system32\ to the directory you install this application in. Run it from a command-line window (Start->Run, type cmd, click OK. In the command line window use DOS commands to navigate to the correct directory

Code:
e.g. when installed in C:\cudatest
cd\ {Enter}
cd\cudatest {Enter}
memtestg80 {Enter}

You need to enter the graphics card memory speed and GPU core speed. If you don't know what they are you can get and run GPU-Z to find out.

OCCT Perestroika 3.0.1, choose the CUDA MemTest tool in the \bin\CUDAMemTest\ directory.

Post the results in your project forum of choice.
 

Maximum disk space exceeded

FAQ: Maximum disk space exceeded
Title: Maximum disk space exceeded
Author: Jorden
Views: 35492
Category: 08. Project Application Errors
Available in: English
Created: 19/05/2008 02:34:43
Last Modified: 19/05/2008 02:34:43

Contents:

Maximum disk space exceeded is an error you will get when the amount of disk space that the task uses exceeds the amount of space specified in the <rsc_disk_bound>n</rsc_disk_bound> amount given to the task.

This number is set by the project. You cannot change the number without immediately crashing the task. Always report this error on the project forums.
 

Maximum elapsed time exceeded

FAQ: Maximum elapsed time exceeded
Title: Maximum elapsed time exceeded
Author: Jorden
Views: 36734
Category: 08. Project Application Errors
Available in: English
Created: 19/05/2008 02:32:16
Last Modified: 11/04/2009 13:22:07

Contents:

Maximum elapsed time exceeded is an error you will get when the CPU or GPU exceeds the amount of time specified in the <rsc_fpops_bound>n</rsc_fpops_bound> amount given to the task.

This number is set by the project. You cannot change the number without immediately crashing the task. Always report this error on the project forums.
 

process exited with code 193 (0xc1)

FAQ: process exited with code 193 (0xc1)
Title: process exited with code 193 (0xc1)
Author: Jorden
Views: 39699
Category: 08. Project Application Errors
Available in: English
Created: 26/08/2007 23:08:38
Last Modified: 26/08/2007 23:08:38

Contents:

Code 193 is a segmentation violation error.

You either have problems with your memory or swap file, or the application attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location, or to overwrite part of the operating system).

Use a memory checking program like memtest86+ to rigorously test your memory.
And always when you have this error, report it on the forums of the application it happens with. It may well be an error in the application's code.
 

process exited with code 22 (0x16, -234)

FAQ: process exited with code 22 (0x16, -234)
Title: process exited with code 22 (0x16, -234)
Author: Jorden
Views: 39715
Category: 08. Project Application Errors
Available in: English
Created: 26/09/2007 15:45:17
Last Modified: 26/09/2007 15:45:17

Contents:

When running a 64-bit Linux on a project that sends 32-bit applications only, you can run into results erroring out with process exited with code 22.

The explanation for this is that 32-bit binaries don't just work on every 64-bit Linux. If for example you install a fresh Ubuntu 6.10 or 7.04, 32-bit binaries won't work. They are not even recognized as valid executables. You first have to install the ia32 package and dependent packages. Further, for programs that link with the graphic library, you will manually have to copy a 32-bit libglut library to the usr/lib32 directory.

If after this you still get client errors, post on the forums of the project that you have this problem and ran ldd on the executable in the projects directory to see what libraries are missing. Post which libraries these are and ask for instructions on how to get them.
 

process exited with code 4

FAQ: process exited with code 4
Title: process exited with code 4
Author: Jorden
Views: 38088
Category: 08. Project Application Errors
Available in: English
Created: 02/10/2007 00:22:41
Last Modified: 04/01/2011 08:26:36

Contents:

With thanks to Bernd Machenschalk of Einstein for this explanation.

In general there are two reasons for an exit code 4:

- a signal 4 (illegal instruction) happens when the application tries to execute commands the CPU isn't capable of (e.g. AltiVec code running on a G3 Mac)

- an error in the command-line that is passed from the Client to the application. Typically something went wrong in the communication of the Core Client, talking to either the server, the application or the file-system of the host.

Always report this error on the project's forums. The developers will want to know about it and fix it.
 

Result '(result)' exited with zero status but no 'finished' file/Task (task) exited with a DLL initialization error/Unrecoverable error for result (task) (too many exit(0)s)

FAQ: Result '(result)' exited with zero status but no 'finished' file/Task (task) exited with a DLL initialization error/Unrecoverable error for result (task) (too many exit(0)s)
Title: Result '(result)' exited with zero status but no 'finished' file/Task (task) exited with a DLL initialization error/Unrecoverable error for result (task) (too many exit(0)s)
Author: Jorden
Views: 40221
Category: 08. Project Application Errors
Available in: English
Created: 19/05/2008 12:53:00
Last Modified: 06/10/2010 06:42:20

Contents:

This is a difficult one to diagnose, therefore the developers have added a couple of different error messages over time, in different BOINC versions. Let's first split them down and tell what the separate errors mean.

Version changes:
1. In BOINC 5.2.x to 5.8.16 you can come across the "Result exited with zero status but no 'finished' file" error.

2. In the BOINC 5.10 series, the "Task (task) exited with a DLL initialization; If this happens repeatedly you may need to reboot your computer." error has been added.

3. In BOINC 6, "Unrecoverable error for result (task) (too many exit(0)s)" is added, so all three are possible.

So what do they mean?
1. The "Result exited with zero status but no 'finished' file" error can mean that the science application is unable to find the last checkpoint that it wrote. For some reason, the latest checkpoint wasn't written to disk, perhaps due to a corruption of the task, the disk or because the directory was locked (possibly due to an anti virus scan or anti spyware scan).

It will try to write and read the checkpoint, but goes stuck in an ever lasting loop trying to do so. Exiting BOINC and restarting it may help. Rebooting the computer may also help. usually though, the task will error out after you restarted.

2. The "Task (task) exited with a DLL initialization; If this happens repeatedly you may need to reboot your computer." error is still a big unknown on many projects. We don't know what DLL file exactly causes this problem. Fact is that a restart of BOINC or a reboot of the computer may fix it, although it also happens that the task immediately errors out when you do restart.

3. The "Unrecoverable error for result (task) (too many exit(0)s)" error is added to the first error. In the past, the "Result exited with zero status but no 'finished' file" could loop forever, which was a problem on unsupervised PCs. Now a count is started at the first sign of this error.
When the count reaches 100, the task is unceremoniously discarded with the too many exit(0)s error.

What can I do to prevent them from happening?
Until it's known what is exactly causing them, not much. There are some tips, though:

1. Make sure you exclude the BOINC directory and all subdirectories (or the BOINC Data directory and all subdirectories in BOINC 6) from being actively scanned by anti-virus and anti-spyware software. Only scan when you have exited BOINC.

2. Don't defrag your disk with BOINC on.

3. Don't run Scandisk with BOINC on.

4. Disable Drive Indexing.

5. Update your motherboard chipset drivers, specifically those for your IDE or SATA controllers.

6. Disable the Time synchronization in Windows XP/Vista. Normally found under the clock (double click it in the system tray), third tab (Internet in English), uncheck the sync option.

7. When you use use BOINC's CPU throttling function, you can run into the too many exit(0)s error. The advice here is to disable the BOINC throttling (set it to 100%) and reduce the amount of CPUs/cores for BOINC to use.
** Use at most 100.0 percent of CPU time.
* In BOINC 5.10 and before, this is done through the option On multiprocessors, use at most x processors.
* In BOINC 6.1 and above, this is done through the option On multiprocessors, use at most xxx% of the processors.

8. In BOINC 6.10 and above, the "Suspend work if CPU usage is above X%" preference can trigger this error. Sometimes BOINC science-applications and the screen saver are seen as non-BOINC applications and affected by this setting. Should be fixed in BOINC 6.12

9. Always report these errors on the forums of the project you see this problem on. It may be that the science application is out-of-date and the developers need to know about it.
 

SETI@Home Informational message -9 result overflow

FAQ: SETI@Home Informational message -9 result overflow
Title: SETI@Home Informational message -9 result overflow
Author: Jorden
Views: 35678
Category: 08. Project Application Errors
Available in: English
Created: 19/05/2008 02:39:52
Last Modified: 19/05/2008 02:39:52

Contents:

A -9 result overflow error is a Seti specific error. It means that the science application was stopped because it found too much noise in the task. Trying to filter it all out would mean you needed a lot of disk space and memory.

These are specifically flagged with this error so the project can redo them, by resplitting these tasks. Although they error out, you will get a minute amount of credit for them.
 

Signal 8 / Error 38 / Linux

FAQ: Signal 8 / Error 38 / Linux
Title: Signal 8 / Error 38 / Linux
Author: Jorden
Views: 35734
Category: 08. Project Application Errors
Available in: English
Created: 06/06/2008 16:30:05
Last Modified: 14/08/2008 14:37:06

Contents:

When you're running into constant signal 8 / error 38 problems with a project and your operating system is Linux with a Kernel version of between 2.6.20 and 2.6.27, then read on.

The problem is that the Kernel is compiled with CONFIG_PREEMPT=Y while it should be =N
The CONFIG_PREEMPT option preempts any running task in memory, by permitting a low priority process to be preempted involuntarily even if it is in kernel mode executing a system call and would otherwise not be about to reach a natural preemption point. This wrecks the FPU stack, which in return gives you a signal 8 error outcome.

To check what the status of CONFIG_PREEMPT is in your Kernel, check the .config file in the /usr/src/linux directory, or similar directory, so do:
grep PREEMPT /usr/src/linux/.config

Or using /proc/config.gz do:
cat /proc/config.gz | gunzip - | grep PREEMPT

You'll get a list of PREEMPT options, the ones that your Kernel can do and which ones are set.
You'll have to make sure that CONFIG_PREEMPT isn't set to Yes, but preferably that PREEMPT_VOLUNTARY is set.

You can do this by recompiling your kernel. The how and what on that depends on your distro and your skill to roam around in Linux.

There are also runtime /proc/sys knobs and boot-time flags to turn voluntary preemption (CONFIG_VOLUNTARY_PREEMPT) and kernel preemption (CONFIG_PREEMPT) on/off:

# turn on/off voluntary preemption (if CONFIG_VOLUNTARY_PREEMPT)
echo 1 > /proc/sys/kernel/voluntary_preemption
echo 0 > /proc/sys/kernel/voluntary_preemption

# turn on/off the preemptible kernel feature (if CONFIG_PREEMPT)
/proc/sys/kernel/kernel_preemption
/proc/sys/kernel/kernel_preemption

The 'voluntary-preemption=0/1' and 'kernel-preemption=0/1' boot options can be used to control these flags at boot-time.

Not all distros allow for the latter use of startup flags, though.

For more information on this bug in Linux, please read this thread at Einstein for the whole discussion that led to this discovery.

PREEMPT options and what they do:

PREEMPT_NONE — No forced preemption (server)

This is the traditional Linux preemption model, geared toward maximizing throughput. It still provides good latency most of the time, occasional longer delays are possible.

Select this option if you are building a kernel for a server or scientific/computation system, or if you want to maximize the raw processing power of the kernel, irrespective of scheduling latencies.

PREEMPT_VOLUNTARY — Voluntary kernel preemption (desktop)

This option reduces the latency of the kernel by adding more "explicit preemption points" to the kernel code. These new preemption points have been selected to reduce the maximum latency of rescheduling, which provides faster response to applications at the cost of slighly lower throughput.

This option speeds up reaction to interactive events by allowing a low-priority process to voluntarily preempt itself even if it is in kernel mode executing a system call. This allows applications to appear to run more smoothly even when the system is under load.

Select this if you are building a kernel for a desktop system.

PREEMPT — Preemptible kernel (low-latency desktop)

This option reduces the latency of the kernel by making all kernel code (except code executing in a critical section) preemptible. This allows reaction to interactive events by permitting a low priority process to be preempted involuntarily even if it is in kernel mode executing a system call and would otherwise not be about to reach a natural preemption point. This allows applications to appear to run more smoothly even when the system is under load, at the cost of slightly lower throughput and a slight runtime overhead to kernel code.

Select this if you are building a kernel for a desktop or embedded system with latency requirements in the milliseconds range.

Fix
Recently this bug was fixed in the Kernel. You will need a kernel of 2.6.25.6 or higher for this fix.
 

Some CUDA questions and answers

FAQ: Some CUDA questions and answers
Title: Some CUDA questions and answers
Author: Jorden
Views: 35628
Category: 08. Project Application Errors
Available in: English
Created: 07/01/2009 18:43:25
Last Modified: 14/01/2009 01:15:49

Contents:

In between reporting a whole lot of erroneous tasks from Seti to the Nvidia developer, I also asked him some other useful questions.

Is there an easy way for people to check for memory leaks when using the Seti CUDA application?
The application primarily uses video memory for data. There is no straight forward method to check for leaks, but it can be done by hand instrumenting the source code with a global list or something. I’m not sure of the behavior on freeing VRAM memory if the application or data hangs or catches an exception. That is something we have to check into.

Weird one maybe, but what's the preferred screen resolution to use with the CUDA application?
It doesn’t really matter as long as there is enough video memory available. Even on a 256MB GPU card, at 1920x1200x32bpp resolution it would consume only 10MB of VRAM and it’s likely that any extra off screen GDI memory is no more than another 10MB. That still gives S@h CUDA enough to do its thing given the fact that there is no other app running consuming more VRAM. Just realize that on a single primary GPU, CUDA and GDI (Desktop) have to share the same device so running Photoshop or playing Youtube video will rob CUDA of some crunching performance and vise versa running SETI@home will rob perf from your desktop app. It’s best to just run with a blank screen saver (or the BOINC screen saver).

What about the colour depth. Would that make a difference? 8bit, 16bit, (24bit), 32bit?
If available VRAM is a problem on a 256MB configured GPU and it is being used as a primary display, changing to a lower bpp value (16 or 8) will help reduce the GDI usage of VRAM heap so that it can run.

Can the GPU be benchmarked and if yes, which application would be preferred?
Ideally another CUDA app or test with the usage of the cuFFT lib would be the best way to test the GPU compute rather than a 3rd party Graphics app or game. There are plenty of sample apps in the publicly available CUDA SDK that may suffice as a way to benchmark any given CPU.

Does the Seti application put the GPU under continuous load, or does it do it in bursts?
The CUDA code kernels sent to the GPU put this under a heavy continuous load. The work does come in bursts but the period between bursts is probably insignificant. We scale the work to the capabilities of the GPU which means we try to keep it saturated with computing tasks.

Seti specific: Is it known for certain that Very Low Angle Range (VLAR) or Very High Angle Range (VHAR) tasks will always error?
The problem stems from CUDA pulse detection code path in the GPU taking way too long to complete on some VLARs. This can cause the GPU to time out towards the OS and driver. The driver instability may be a result of those long execution times. We are investigating the problem and will try to fix it as soon as possible.

Can the GPU be throttled in another way? (BOINC uses a pause system to throttle CPU calculations, if set by the user. It then pauses all of BOINC for the duration of a second or more. I'm wondering what effect that has on the GPU's lifespan.) In other words, can the GPU be set to use only half its capacity (50% comparable CPU cycles) or not?
I don't know of any supervisory ways to assign a CUDA app to limit to a percentage of the GPU throughput. You could always throttle the CPU thread that feeds the GPU to effectively limit the GPU.

Ah, but the problem here is that it uses so little CPU already. What does it take, 3 to 4% of the CPU? And then it only uses the CPU when data is transferred from the GPU's memory to disk and from the disk to the GPU's memory. The rest is done solely by the GPU.
The CPU usage is fairly small in SETI, but that’s only because we sleep in the driver waiting for the GPU to complete its current task at hand. Program-wise pausing the execution of the CPU thread that is feeding CUDA kernel functions will effectively reduce GPU usage rate because you’re starving the GPU for data to crunch. The downside is that it will slow down speed of the app.

But: The values set by the drivers in combination with the VBIOS should already monitor temperatures and regulate the fan and GPU clocks accordingly. This may not work on a deliberately overclocked GPU.
 

Unrecoverable error for result "result" (- exit code -1073741819 (0xc0000005))

FAQ: Unrecoverable error for result "result" (- exit code -1073741819 (0xc0000005))
Title: Unrecoverable error for result "result" (- exit code -1073741819 (0xc0000005))
Author: Jorden
Views: 43152
Category: 08. Project Application Errors
Available in: German English
Created: 14/06/2007 21:40:42
Last Modified: 14/08/2008 11:48:21

Contents:

This is a very difficult error to trace correctly. It could be anything of the ones I'll give below, or none of them. Please be patient in tracing this error. Do report it on the project's forums and let the people there guide you along as well.

1. Your system is over-clocked to the max. Clock back to a more normal level for troubleshooting.

2. You use corrupt system drivers. Update your motherboard chipset drivers.

3. You use outdated/corrupt graphics drivers. Update the drivers for your graphics card or do not use the screen saver/graphics in BOINC.

4. Your DirectX version is outdated. Update DirectX. See Self Help: Links to Drivers for the link.

5. The image link(s) in BOINC are broken. Uninstall BOINC and reinstall it/upgrade it to the latest available version.

6. The image link(s) in the project's application are broken. Reset the project from within BOINC.

7. Since the error is based on the Windows 0xc0000005 error (Access Violation), it could also be:
- faulty RAM. --> test with memtest86+
- an incorrect/corrupt device driver. --> check for updates
- poorly written/updated software.
- malware/adware installations.

Check your system with Prime95. If it can survive this stress test, it's probably not hardware related. Don't run BOINC and Prime95 together at the same time.
 

Unrecoverable error for result "result" (There are no child processes to wait for. (0x80) - exit code 128 (0x80))

FAQ: Unrecoverable error for result "result" (There are no child processes to wait for. (0x80) - exit code 128 (0x80))
Title: Unrecoverable error for result "result" (There are no child processes to wait for. (0x80) - exit code 128 (0x80))
Author: Jorden
Views: 45293
Category: 08. Project Application Errors
Available in: German English
Created: 21/12/2006 13:07:49
Last Modified: 10/11/2009 13:00:30

Contents:

If you encounter an error called Unrecoverable error for result "result" (There are no child processes to wait for. (0x80) - exit code 128 (0x80)), your DirectX version is out of date.

Please update your DirectX version through Windows Update, or get it from here. (August 2009 redistributable edition, multi-language, 103.3MB)

One other possible cause for this is when the programmer of the application has used .NET technology to make his application. Then you need to install .NET on your computer. It's only available to Windows computers and then only to Windows 2000 and above.

So if you encounter this error after you updated your DirectX to the latest available version, post about it on that project's forums and ask if they used .NET to make their application.

Quote: skildude
With ATI GPU usage, the drivers needed to run Milkyway and Collatz are converted from a amdxxxxx.dll to ATIxxxxx.dll these changes occur with the newer and latest driver updates. a copy and rename in the system32 folder is all that it takes to fix this problem.
 

Copyright © Neil Munday 2008