[Jack-Devel] Netjack crashes, alignment broken recently

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[Jack-Devel] Netjack crashes, alignment broken recently

Xavier Mendez
Just compiled the latest jackd2, and when using Netjack both manager and
driver crash after sending parameters. I traced back to this change from
02f74a659, which modifies the network parameter struct:

--- a/common/JackNetTool.h
+++ b/common/JackNetTool.h
@@ -94,9 +94,9 @@ namespace Jack
-        char fName[JACK_CLIENT_NAME_SIZE];
-        char fMasterNetName[JACK_SERVER_NAME_SIZE];
-        char fSlaveNetName[JACK_SERVER_NAME_SIZE];
+        char fName[JACK_CLIENT_NAME_SIZE+1];
+        char fMasterNetName[JACK_SERVER_NAME_SIZE+1];
+        char fSlaveNetName[JACK_SERVER_NAME_SIZE+1];

That extra byte for the NUL terminator breaks the alignment, and somehow
causes some ints to be sent as little-endian:

     Sample rate : 12288000 frames per second
     Period size : 131072 frames per period

Reverting the three lines seems to resolve the problem. I don't really
see the use for having the NUL terminator go through the network...

What do you think?
Xavi
_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: Netjack crashes, alignment broken recently

Filipe Coelho
On 10.03.2016 18:41, Xavier Mendez wrote:

> Just compiled the latest jackd2, and when using Netjack both manager
> and driver crash after sending parameters. I traced back to this
> change from 02f74a659, which modifies the network parameter struct:
>
> --- a/common/JackNetTool.h
> +++ b/common/JackNetTool.h
> @@ -94,9 +94,9 @@ namespace Jack
> -        char fName[JACK_CLIENT_NAME_SIZE];
> -        char fMasterNetName[JACK_SERVER_NAME_SIZE];
> -        char fSlaveNetName[JACK_SERVER_NAME_SIZE];
> +        char fName[JACK_CLIENT_NAME_SIZE+1];
> +        char fMasterNetName[JACK_SERVER_NAME_SIZE+1];
> +        char fSlaveNetName[JACK_SERVER_NAME_SIZE+1];
>
> That extra byte for the NUL terminator breaks the alignment, and
> somehow causes some ints to be sent as little-endian:
>
>     Sample rate : 12288000 frames per second
>     Period size : 131072 frames per period
>
> Reverting the three lines seems to resolve the problem. I don't really
> see the use for having the NUL terminator go through the network...
>
> What do you think?

I can confirm this is an issue for me.
The "-d net" option stopped working some time ago and I've been
wondering why, that change seems to be the reason.

Reverting it in both client&server makes netmanager work again.


Please do a pull request against jack2's github repo.
That would be the fastest way to get this fixed upstream.

_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: Netjack crashes, alignment broken recently

Stéphane Letz

Le 11 mars 2016 à 12:05, Filipe Coelho <[hidden email]> a écrit :

> On 10.03.2016 18:41, Xavier Mendez wrote:
>> Just compiled the latest jackd2, and when using Netjack both manager and driver crash after sending parameters. I traced back to this change from 02f74a659, which modifies the network parameter struct:
>>
>> --- a/common/JackNetTool.h
>> +++ b/common/JackNetTool.h
>> @@ -94,9 +94,9 @@ namespace Jack
>> -        char fName[JACK_CLIENT_NAME_SIZE];
>> -        char fMasterNetName[JACK_SERVER_NAME_SIZE];
>> -        char fSlaveNetName[JACK_SERVER_NAME_SIZE];
>> +        char fName[JACK_CLIENT_NAME_SIZE+1];
>> +        char fMasterNetName[JACK_SERVER_NAME_SIZE+1];
>> +        char fSlaveNetName[JACK_SERVER_NAME_SIZE+1];
>>
>> That extra byte for the NUL terminator breaks the alignment, and somehow causes some ints to be sent as little-endian:
>>
>>    Sample rate : 12288000 frames per second
>>    Period size : 131072 frames per period
>>
>> Reverting the three lines seems to resolve the problem. I don't really see the use for having the NUL terminator go through the network...
>>
>> What do you think?
>
> I can confirm this is an issue for me.
> The "-d net" option stopped working some time ago and I've been wondering why, that change seems to be the reason.
>
> Reverting it in both client&server makes netmanager work again.
>
>
> Please do a pull request against jack2's github repo.
> That would be the fastest way to get this fixed upstream.
>

It would be better to understand why changing the  data structure (which is used both side of the connection… ) caused the crash. Are you sure the code is correctly recompiled everywhere?

Stéphane

_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: Netjack crashes, alignment broken recently

Xavier Mendez
In reply to this post by Filipe Coelho
El 11/03/16 a les 12:05, Filipe Coelho ha escrit:

> On 10.03.2016 18:41, Xavier Mendez wrote:
>> Just compiled the latest jackd2, and when using Netjack both manager
>> and driver crash after sending parameters. I traced back to this
>> change from 02f74a659, which modifies the network parameter struct:
>>
>> --- a/common/JackNetTool.h
>> +++ b/common/JackNetTool.h
>> @@ -94,9 +94,9 @@ namespace Jack
>> -        char fName[JACK_CLIENT_NAME_SIZE];
>> -        char fMasterNetName[JACK_SERVER_NAME_SIZE];
>> -        char fSlaveNetName[JACK_SERVER_NAME_SIZE];
>> +        char fName[JACK_CLIENT_NAME_SIZE+1];
>> +        char fMasterNetName[JACK_SERVER_NAME_SIZE+1];
>> +        char fSlaveNetName[JACK_SERVER_NAME_SIZE+1];
>>
>> That extra byte for the NUL terminator breaks the alignment, and
>> somehow causes some ints to be sent as little-endian:
>>
>>     Sample rate : 12288000 frames per second
>>     Period size : 131072 frames per period
>>
>> Reverting the three lines seems to resolve the problem. I don't really
>> see the use for having the NUL terminator go through the network...
>>
>> What do you think?
>
> I can confirm this is an issue for me.
> The "-d net" option stopped working some time ago and I've been
> wondering why, that change seems to be the reason.
>
> Reverting it in both client&server makes netmanager work again.
>
>
> Please do a pull request against jack2's github repo.
> That would be the fastest way to get this fixed upstream.

Done. https://github.com/jackaudio/jack2/pull/195
_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: Netjack crashes, alignment broken recently

Xavier Mendez
In reply to this post by Stéphane Letz
El 11/03/16 a les 12:17, Stéphane Letz ha escrit:

>
> Le 11 mars 2016 à 12:05, Filipe Coelho <[hidden email]> a écrit :
>
>> On 10.03.2016 18:41, Xavier Mendez wrote:
>>> Just compiled the latest jackd2, and when using Netjack both manager and driver crash after sending parameters. I traced back to this change from 02f74a659, which modifies the network parameter struct:
>>>
>>> --- a/common/JackNetTool.h
>>> +++ b/common/JackNetTool.h
>>> @@ -94,9 +94,9 @@ namespace Jack
>>> -        char fName[JACK_CLIENT_NAME_SIZE];
>>> -        char fMasterNetName[JACK_SERVER_NAME_SIZE];
>>> -        char fSlaveNetName[JACK_SERVER_NAME_SIZE];
>>> +        char fName[JACK_CLIENT_NAME_SIZE+1];
>>> +        char fMasterNetName[JACK_SERVER_NAME_SIZE+1];
>>> +        char fSlaveNetName[JACK_SERVER_NAME_SIZE+1];
>>>
>>> That extra byte for the NUL terminator breaks the alignment, and somehow causes some ints to be sent as little-endian:
>>>
>>>     Sample rate : 12288000 frames per second
>>>     Period size : 131072 frames per period
>>>
>>> Reverting the three lines seems to resolve the problem. I don't really see the use for having the NUL terminator go through the network...
>>>
>>> What do you think?
>>
>> I can confirm this is an issue for me.
>> The "-d net" option stopped working some time ago and I've been wondering why, that change seems to be the reason.
>>
>> Reverting it in both client&server makes netmanager work again.
>>
>>
>> Please do a pull request against jack2's github repo.
>> That would be the fastest way to get this fixed upstream.
>>
>
> It would be better to understand why changing the  data structure (which is used both side of the connection… ) caused the crash. Are you sure the code is correctly recompiled everywhere?

Yes, I cloned and compiled 7bdad4966b29 on both ends, and even made sure
to configure with the same options. The driver end is a Raspberry Pi
with Raspbian, the manager end is my laptop (x64).

This is the discovery packet I get from the Pi, running "jackd -dnet -n
raspberrypi -C1 -P1 -i1 -o1":

00000000  70 61 72 61 6d 73 00 00  00 00 00 08 00 00 00 00
|params..........|
00000010  72 61 73 70 62 65 72 72  79 70 69 00 00 00 00 00
|raspberrypi.....|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
|................|
*
00000150  00 00 73 61 6c 61 00 00  00 00 00 00 00 00 00 00
|..sala..........|
00000160  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
|................|
*
00000250  00 00 00 00 00 00 05 dc  00 00 00 00 00 00 00 00
|................|
00000260  00 00 00 01 00 00 00 01  00 00 00 01 00 00 00 01
|................|
00000270  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
|................|
00000280  00 00 00 00 00 00 00 05                           |........|
00000288

As you can see, we have:

fMtu: 00 00 00 05 (correct)
fID: DC 00 00 00
fTransportSync: 00 00 00 00

And then, instead of the expected 00 00 00 01 for fSendAudioChannels, we
have 00 00 00 00 01, which suggests the compiler has added a byte of
padding to align the int? I don't know.

> Stéphane
>
> _______________________________________________
> Jack-Devel mailing list
> [hidden email]
> http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
>
_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: Netjack crashes, alignment broken recently

Adrian Knoth
On 03/11/16 12:50, Xavier Mendez wrote:

>> It would be better to understand why changing the  data structure
>> (which is used both side of the connection… ) caused the crash. Are
> And then, instead of the expected 00 00 00 01 for fSendAudioChannels,
> we have 00 00 00 00 01, which suggests the compiler has added a byte
> of padding to align the int? I don't know.

I didn't closely look into this, but I guess this is what's happening:

https://github.com/jackaudio/jack2/blob/8b8be738805ae1c5fe3688a49fd696a9f080b59e/common/jack/systemdeps.h#L123

has

#if defined(__arm__) || defined(__ppc__) || defined(__powerpc__)
     #undef POST_PACKED_STRUCTURE
     #define POST_PACKED_STRUCTURE
#endif /* __arm__ || __ppc__ || __powerpc__ */

which disables

    #define POST_PACKED_STRUCTURE __attribute__((__packed__))

on the Raspberry (since it's ARM), but not on the amd64.

We then have

     struct _session_params
     {
         char fPacketType[8];                        //packet type ('param')
         uint32_t fProtocolVersion;                  //version
         int32_t fPacketID;                          //indicates the
packet type
         char fName[JACK_CLIENT_NAME_SIZE+1];          //slave's name
         char fMasterNetName[JACK_SERVER_NAME_SIZE+1]; //master hostname
(network)
         char fSlaveNetName[JACK_SERVER_NAME_SIZE+1];  //slave hostname
(network)
         uint32_t fMtu;                              //connection mtu
         uint32_t fID;                               //slave's ID
         uint32_t fTransportSync;                    //is the transport
synced ?
         int32_t fSendAudioChannels;                 //number of
master->slave channels
         int32_t fReturnAudioChannels;               //number of
slave->master channels
         int32_t fSendMidiChannels;                  //number of
master->slave midi channels
         int32_t fReturnMidiChannels;                //number of
slave->master midi channels
         uint32_t fSampleRate;                       //session sample rate
         uint32_t fPeriodSize;                       //period size
         uint32_t fSampleEncoder;                    //samples encoder
         uint32_t fKBps;                             //KB per second for
CELT encoder
         uint32_t fSlaveSyncMode;                    //is the slave in
sync mode ?
         uint32_t fNetworkLatency;                   //network latency
     } POST_PACKED_STRUCTURE;

from
   <https://github.com/jackaudio/jack2/blob/master/common/JackNetTool.h#L91>


Since we're talking arm<->amd64 here, alignment and/or byte ordering
is different. Been there in 2004 for powerpc<->i386<->amd64, but forgot
most of it. ;)

JACK_CLIENT_NAME_SIZE+1 is most likely the absolute minimum, since we
need the space, at least on the host. Not on-wire, though (see (3)
below).

Options that might work:

(1) Reorder the struct and put char fName, fMasterNetName and
fSlaveNetName to the end, thus cheating a bit on the alignment issues.
The resulting strings would most likely be crippled. Just as a first
test. Or let's not do this, see (2) instead.

(2) Instead of +1, add +8. Since JACK_CLIENT_NAME_SIZE==64 and
JACK_SERVER_NAME_SIZE==256, we'd be on 8-byte boundaries. This is a
trivial fix that doesn't rely on (1). Please try and report back.

(3) Define a wire-format that doesn't rely on POST_PACKED_STRUCTURE but
absolute offsets. Probably not too much work, just get the alignment
right and remove POST_PACKED_STRUCTURE from all on-wire structs. Use
union and byte positions instead.

(4) In conjunction with (3), use pahole(1) to clean up the codebase.
Investigate if dropping POST_PACKED_STRUCTURE entirely is possible.


3+4 are obviously heavy, so I'd go for (2) for now and see if it fixes
the problem.


There's another dirty alternative: if you don't want to add +8, your
only other minimally invasive option is to add +0, effectively reverting
the commit. For this to work, all write access to the three char fields
must make sure not to overrun the array and if in doubt drop the last
character in favour of the terminating NUL byte. It's error-prone and
should probably only be done via setter functions.


Again, my gut feeling from just looking at the struct is that adding +8
is a quick, safe and moderately clean solution. But don't publically
quote me on that. ;)

Disclaimer: it's been more than a decade now that I last dealt with that
stuff. Take everything above with a grain of salt.



HTH
_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: Netjack crashes, alignment broken recently

Xavier Mendez
El 11/03/16 a les 13:49, Adrian Knoth ha escrit:

> On 03/11/16 12:50, Xavier Mendez wrote:
>
>>> It would be better to understand why changing the  data structure
>>> (which is used both side of the connection… ) caused the crash. Are
>> And then, instead of the expected 00 00 00 01 for fSendAudioChannels,
>> we have 00 00 00 00 01, which suggests the compiler has added a byte
>> of padding to align the int? I don't know.
>
> I didn't closely look into this, but I guess this is what's happening:
>
> https://github.com/jackaudio/jack2/blob/8b8be738805ae1c5fe3688a49fd696a9f080b59e/common/jack/systemdeps.h#L123
>
>
> has
>
> #if defined(__arm__) || defined(__ppc__) || defined(__powerpc__)
>      #undef POST_PACKED_STRUCTURE
>      #define POST_PACKED_STRUCTURE
> #endif /* __arm__ || __ppc__ || __powerpc__ */
>
> which disables
>
>     #define POST_PACKED_STRUCTURE __attribute__((__packed__))
>
> on the Raspberry (since it's ARM), but not on the amd64.

Ah! That seems to explain everything...

> We then have
>
>      struct _session_params
>      {
>          char fPacketType[8];                        //packet type
> ('param')
>          uint32_t fProtocolVersion;                  //version
>          int32_t fPacketID;                          //indicates the
> packet type
>          char fName[JACK_CLIENT_NAME_SIZE+1];          //slave's name
>          char fMasterNetName[JACK_SERVER_NAME_SIZE+1]; //master hostname
> (network)
>          char fSlaveNetName[JACK_SERVER_NAME_SIZE+1];  //slave hostname
> (network)
>          uint32_t fMtu;                              //connection mtu
>          uint32_t fID;                               //slave's ID
>          uint32_t fTransportSync;                    //is the transport
> synced ?
>          int32_t fSendAudioChannels;                 //number of
> master->slave channels
>          int32_t fReturnAudioChannels;               //number of
> slave->master channels
>          int32_t fSendMidiChannels;                  //number of
> master->slave midi channels
>          int32_t fReturnMidiChannels;                //number of
> slave->master midi channels
>          uint32_t fSampleRate;                       //session sample rate
>          uint32_t fPeriodSize;                       //period size
>          uint32_t fSampleEncoder;                    //samples encoder
>          uint32_t fKBps;                             //KB per second for
> CELT encoder
>          uint32_t fSlaveSyncMode;                    //is the slave in
> sync mode ?
>          uint32_t fNetworkLatency;                   //network latency
>      } POST_PACKED_STRUCTURE;
>
> from
>
> <https://github.com/jackaudio/jack2/blob/master/common/JackNetTool.h#L91>
>
>
> Since we're talking arm<->amd64 here, alignment and/or byte ordering
> is different. Been there in 2004 for powerpc<->i386<->amd64, but forgot
> most of it. ;)
>
> JACK_CLIENT_NAME_SIZE+1 is most likely the absolute minimum, since we
> need the space, at least on the host. Not on-wire, though (see (3)
> below).
 >

> Options that might work:
>
> (1) Reorder the struct and put char fName, fMasterNetName and
> fSlaveNetName to the end, thus cheating a bit on the alignment issues.
> The resulting strings would most likely be crippled. Just as a first
> test. Or let's not do this, see (2) instead.
>
> (2) Instead of +1, add +8. Since JACK_CLIENT_NAME_SIZE==64 and
> JACK_SERVER_NAME_SIZE==256, we'd be on 8-byte boundaries. This is a
> trivial fix that doesn't rely on (1). Please try and report back.
>
> (3) Define a wire-format that doesn't rely on POST_PACKED_STRUCTURE but
> absolute offsets. Probably not too much work, just get the alignment
> right and remove POST_PACKED_STRUCTURE from all on-wire structs. Use
> union and byte positions instead.
>
> (4) In conjunction with (3), use pahole(1) to clean up the codebase.
> Investigate if dropping POST_PACKED_STRUCTURE entirely is possible.
>
>
> 3+4 are obviously heavy, so I'd go for (2) for now and see if it fixes
> the problem.
>
>
> There's another dirty alternative: if you don't want to add +8, your
> only other minimally invasive option is to add +0, effectively reverting
> the commit. For this to work, all write access to the three char fields
> must make sure not to overrun the array and if in doubt drop the last
> character in favour of the terminating NUL byte. It's error-prone and
> should probably only be done via setter functions.

That's what I had in mind when I made the PR. As you said it'd be error
prone so yeah, I'd vote for (2), at least for now.

> Again, my gut feeling from just looking at the struct is that adding +8
> is a quick, safe and moderately clean solution. But don't publically
> quote me on that. ;)
>
> Disclaimer: it's been more than a decade now that I last dealt with that
> stuff. Take everything above with a grain of salt.
>
>
>
> HTH
_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: Netjack crashes, alignment broken recently

Adrian Knoth
On 03/11/16 14:43, Xavier Mendez wrote:

>> (2) Instead of +1, add +8. Since JACK_CLIENT_NAME_SIZE==64 and
>> JACK_SERVER_NAME_SIZE==256, we'd be on 8-byte boundaries. This is a
>> trivial fix that doesn't rely on (1). Please try and report back.
> That's what I had in mind when I made the PR. As you said it'd be error
> prone so yeah, I'd vote for (2), at least for now.

Are you going to test (2) and maybe send/update a pull request once
successful?


Cheers
_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: Netjack crashes, alignment broken recently

Xavier Mendez
El 11/03/16 a les 14:58, Adrian Knoth ha escrit:

> On 03/11/16 14:43, Xavier Mendez wrote:
>
>>> (2) Instead of +1, add +8. Since JACK_CLIENT_NAME_SIZE==64 and
>>> JACK_SERVER_NAME_SIZE==256, we'd be on 8-byte boundaries. This is a
>>> trivial fix that doesn't rely on (1). Please try and report back.
>> That's what I had in mind when I made the PR. As you said it'd be error
>> prone so yeah, I'd vote for (2), at least for now.
>
> Are you going to test (2) and maybe send/update a pull request once
> successful?

I'm on it.

> Cheers
_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
Reply | Threaded
Open this post in threaded view
|

Re: Netjack crashes, alignment broken recently

Xavier Mendez
El 11/03/16 a les 15:05, Xavier Mendez ha escrit:

> El 11/03/16 a les 14:58, Adrian Knoth ha escrit:
>> On 03/11/16 14:43, Xavier Mendez wrote:
>>
>>>> (2) Instead of +1, add +8. Since JACK_CLIENT_NAME_SIZE==64 and
>>>> JACK_SERVER_NAME_SIZE==256, we'd be on 8-byte boundaries. This is a
>>>> trivial fix that doesn't rely on (1). Please try and report back.
>>> That's what I had in mind when I made the PR. As you said it'd be error
>>> prone so yeah, I'd vote for (2), at least for now.
>>
>> Are you going to test (2) and maybe send/update a pull request once
>> successful?
>
> I'm on it.

Done, everything seems to be working as expected.
I'll update the PR and bump the protocol version to 9.

>> Cheers
> _______________________________________________
> Jack-Devel mailing list
> [hidden email]
> http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org
_______________________________________________
Jack-Devel mailing list
[hidden email]
http://lists.jackaudio.org/listinfo.cgi/jack-devel-jackaudio.org