Divide by zero error

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Divide by zero error

John Rigg-3
I've just upgraded from 0.101.1 to 0.102.20, and I've noticed that a
problem I reported about 10 months ago is still there. It's also worse
than I suspected; it's not specific to x86_64 like I thought, but affects
x86 too. (Original thread is here:
http://sourceforge.net/mailarchive/forum.php?thread_id=9661504&forum_id=3040 )

Basically the problem is a floating point exception that kills jackd at
seemingly random times. I tracked it down to the "post x-run handling"
section starting at line 2066 in jackd/engine.c:
               
                /* post xrun-handling */
               
                jack_nframes_t period_size_guess =
                        engine->control->current_time.frame_rate *
                           ((timer->next_wakeup - timer->current_wakeup) / 1000000.0);

                timer->frames +=
                        ((engine->driver->last_wait_ust -
                         engine->control->frame_timer.next_wakeup) /
                        period_size_guess) *
                        period_size_guess;

Unfortunately period_size_guess always evaluates to zero,
so timer->frames ends up being (somevalue/0.0)*0.0 . This usually also
evaluates to zero, so most of the time it doesn't have a visible effect,
but every once in a while it produces a floating point exception.

It is caused by the implicit cast to a float when dividing by 1000000.0.
The expression only works if the cast is to a double (on both x86_64 and x86).
This can be done very simply by changing the type declaration of
period_size_guess from jack_nframes_t (which is uint_32_t) to double,
as in the patch below.

This cures the problem on my AMD64 system, and also works on my spare
x86 system.

John
________________________________________________________________

diff -uprN j-0.102.20/jackd/engine.c j2-0.102.20/jackd/engine.c
--- j-0.102.20/jackd/engine.c 2006-08-01 04:26:36.000000000 +0100
+++ j2-0.102.20/jackd/engine.c 2006-12-31 01:03:45.000000000 +0000
@@ -2065,7 +2065,7 @@ jack_run_cycle (jack_engine_t *engine, j
 
  /* post xrun-handling */
 
- jack_nframes_t period_size_guess =
+ double period_size_guess =
  engine->control->current_time.frame_rate *
    ((timer->next_wakeup - timer->current_wakeup) / 1000000.0);
 

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Jackit-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jackit-devel
Reply | Threaded
Open this post in threaded view
|

Re: Divide by zero error

John Rigg-3
On Sun, Dec 31, 2006 at 11:09:03AM +0000, John Rigg wrote:
> This can be done very simply by changing the type declaration of
> period_size_guess from jack_nframes_t (which is uint_32_t) to double,

Oops.
s/uint_32_t/uint32_t

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Jackit-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jackit-devel
Reply | Threaded
Open this post in threaded view
|

Re: Divide by zero error

John Rigg-3
In reply to this post by John Rigg-3
On Sun, Dec 31, 2006 at 11:09:03AM +0000, John Rigg wrote:

> jack_nframes_t period_size_guess =
> engine->control->current_time.frame_rate *
>   ((timer->next_wakeup - timer->current_wakeup) / 1000000.0);
>
> timer->frames +=
> ((engine->driver->last_wait_ust -
> engine->control->frame_timer.next_wakeup) /
> period_size_guess) *
> period_size_guess;
>
> Unfortunately period_size_guess always evaluates to zero,
> so timer->frames ends up being (somevalue/0.0)*0.0 .

Correction:
timer->frames ends up having (somevalue/0.0)*0.0 added to it.

> This usually also
> evaluates to zero, so most of the time it doesn't have a visible effect,
> but every once in a while it produces a floating point exception.

John

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Jackit-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jackit-devel
Reply | Threaded
Open this post in threaded view
|

Re: Divide by zero error

John Rigg-3
In reply to this post by John Rigg-3
On Sun, Dec 31, 2006 at 11:09:03AM +0000, John Rigg wrote:

> /* post xrun-handling */
>
> jack_nframes_t period_size_guess =
> engine->control->current_time.frame_rate *
>   ((timer->next_wakeup - timer->current_wakeup) / 1000000.0);
>
> timer->frames +=
> ((engine->driver->last_wait_ust -
> engine->control->frame_timer.next_wakeup) /
> period_size_guess) *
> period_size_guess;
>
> Unfortunately period_size_guess always evaluates to zero,
> so timer->frames ends up being (somevalue/0.0)*0.0 . This usually also
> evaluates to zero, so most of the time it doesn't have a visible effect,
> but every once in a while it produces a floating point exception.
>
> It is caused by the implicit cast to a float when dividing by 1000000.0.
> The expression only works if the cast is to a double (on both x86_64 and x86).
> This can be done very simply by changing the type declaration of
> period_size_guess from jack_nframes_t (which is uint_32_t) to double

A further note. I believe this has gone unreported for so long (except
by me AFAIK) because the post xrun-handling code doesn't run very often
on the average system. I'm using multiple sound cards with ALSA pcm_multi,
which causes this code to run fairly frequently even though alsa_pcm
doesn't report any xruns and nothing is audibly wrong. On my system
the fp exception occurred frequently enough to be a real problem.

John

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Jackit-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jackit-devel
Reply | Threaded
Open this post in threaded view
|

Re: Divide by zero error

Fons Adriaensen-2
On Tue, Jan 02, 2007 at 02:55:12PM +0000, John Rigg wrote:

> On Sun, Dec 31, 2006 at 11:09:03AM +0000, John Rigg wrote:
> > /* post xrun-handling */
> >
> > jack_nframes_t period_size_guess =
> > engine->control->current_time.frame_rate *
> >   ((timer->next_wakeup - timer->current_wakeup) / 1000000.0);
> >
> > timer->frames +=
> > ((engine->driver->last_wait_ust -
> > engine->control->frame_timer.next_wakeup) /
> > period_size_guess) *
> > period_size_guess;
> >
> > Unfortunately period_size_guess always evaluates to zero,
> > so timer->frames ends up being (somevalue/0.0)*0.0 . This usually also
> > evaluates to zero, so most of the time it doesn't have a visible effect,
> > but every once in a while it produces a floating point exception.
> >
> > It is caused by the implicit cast to a float when dividing by 1000000.0.
> > The expression only works if the cast is to a double (on both x86_64 and x86).
> > This can be done very simply by changing the type declaration of
> > period_size_guess from jack_nframes_t (which is uint_32_t) to double
>
> A further note. I believe this has gone unreported for so long (except
> by me AFAIK) because the post xrun-handling code doesn't run very often
> on the average system. I'm using multiple sound cards with ALSA pcm_multi,
> which causes this code to run fairly frequently even though alsa_pcm
> doesn't report any xruns and nothing is audibly wrong. On my system
> the fp exception occurred frequently enough to be a real problem.

Two further notes / questions raised :

1. Why is it necessary to "guess" the period size after an xrun ?
   It shouldn't have changed from what is was before. So either the
   nominal value or a measured one (availble from the DLL state)
   should do.

2. The second statement increments timer->frames by a multiple of
   period_size_guess. Since we had an xrun in the first place, and
   the alsa driver will have been restarted, is there any reason to
   believe that timer->frames needs to be incremented in this way ?
   The underlying assumption is that the regular sequence of driver
   wakeups continues without a 'phase jump', but is this really the
   case ?

--
FA

Lascia la spina, cogli la rosa.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Jackit-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jackit-devel
Reply | Threaded
Open this post in threaded view
|

Re: Divide by zero error

John Rigg-3
In reply to this post by John Rigg-3
On Tue, Jan 02, 2007 at 04:02:29PM +0100, Fons Adriaensen wrote:
> 1. Why is it necessary to "guess" the period size after an xrun ?
>    It shouldn't have changed from what is was before. So either the
>    nominal value or a measured one (availble from the DLL state)
>    should do.
>
> 2. The second statement increments timer->frames by a multiple of
>    period_size_guess. Since we had an xrun in the first place, and
>    the alsa driver will have been restarted, is there any reason to
>    believe that timer->frames needs to be incremented in this way ?

Not being a JACK dev I don't know enough to answer that, but I do
note that this code was added to CVS on 2004-12-11. Apart from the
occasional fp exception, nobody seems to have noticed any ill effects
from period_size_guess always being zero and no increment occurring.

John

>    The underlying assumption is that the regular sequence of driver
>    wakeups continues without a 'phase jump', but is this really the
>    case ?

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Jackit-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/jackit-devel