Header::{De, }Serialize with Buffer::Iterator copies ?!

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Header::{De, }Serialize with Buffer::Iterator copies ?!

Natale Patriciello

Hi,

I lost the entire day in looking for a bug.

Can someone explain the rationale behind the signature

  virtual void Serialize (Buffer::Iterator start) const = 0;
  virtual uint32_t Deserialize (Buffer::Iterator start) = 0;

in the class Header? It is *clear* that "start" will be modified, but
still, we are modifying a copy. This, along with the typical
*undocumented* line you'd find on various Header subclasses

  Buffer::Iterator i = start;

made me looking like a stupid with my colleagues. Here is the code, if
you want to look at it [1].

If you'd ask me, the correct signature for all the Serialize/Deserialize
function MUST be:

  virtual void Serialize (Buffer::Iterator *start) const = 0;
  virtual uint32_t Deserialize (Buffer::Iterator *start) = 0;

and all the classes, struct, whatever, that wants to be serializable
(including header) MUST implement as well:

  uint32_t  GetSerializedSize () const;
  void Print (std::ostream &os) const;

To overcome this limitation, some people had to manually call Next() on
the buffer:

  void
  Icmpv4DestinationUnreachable::Serialize (Buffer::Iterator start) const
  {
    NS_LOG_FUNCTION (this << &start);
    start.WriteU16 (0);
    start.WriteHtonU16 (m_nextHopMtu);
    uint32_t size = m_header.GetSerializedSize ();
    m_header.Serialize (start);
    start.Next (size);  // Manually advancing start, as the previous Serialize didn't do nothing to it
    start.Write (m_data, 8);
  }

while others invented their own Serialize() method:

  Buffer::Iterator Serialize (Buffer::Iterator start) const;

(that allowed them to write:)

  Buffer::Iterator i = start;
  i = m_capability.Serialize (i);
  i = m_code.Serialize (i);
  i.WriteHtolsbU16 (m_aid);
  ...

With pointers, the correct code and usage of it would have been more recognizable.
With references, the effect would be the same, but there will be less
visibility:

  void foo(Buffer::Iterator &i)
  {
    m_something.Serialize (i); // is i gonna be modified?
  }

  versus

  void foo(Buffer::Iterator *i)
  {
    m_something.Serialize (i); // i is a pointer, the name is serialize..
                               // definitely will be changed.
  }

Have a nice day
Nat

[1] https://pastebin.com/pagDkvyb
Reply | Threaded
Open this post in threaded view
|

Re: Header::{De, }Serialize with Buffer::Iterator copies ?!

Tommaso Pecorella
I totally feel Nat's pain, because it's mine as well.

This is the very same reason tor updating Serialize and Deserialize from Ipv[4,6] addresses (as a side note, please comment this:
https://gitlab.com/nsnam/ns-3-dev/-/merge_requests/290

About the serialization, Peter mentioned that he was unhappy with it as well, because GetSerializedSize isn't really "necessary", as the buffer should work more like an output stream.

I'd say that we should put a priority effort to "fix" this - perhaps as a main goal for 3.32.

The (only) problem is that we'll break a lot of existing code. However, you don't make an omelette without breaking some eggs, isn't it ?

T.
 

> On 22 May 2020, at 16:46, Natale Patriciello <[hidden email]> wrote:
>
>
> Hi,
>
> I lost the entire day in looking for a bug.
>
> [...]

--------------------------------------------------------------

Che cos'è il genio?
È fantasia, intuizione, colpo d'occhio e velocità d’esecuzione.
—- Il Perozzi

--------------------------------------------------------------

Tommaso Pecorella - Ph.D.

Assistant professor
Dpt. Ingegneria dell'Informazione
Università di Firenze

CNIT - Università di Firenze Unit

via di S. Marta 3
50139, Firenze
ITALY

email: [hidden email]
       [hidden email]

phone : +39-055-2758540
mobile: +39-320-4379803
fax   : +39-055-2758570




Reply | Threaded
Open this post in threaded view
|

Re: Header::{De, }Serialize with Buffer::Iterator copies ?!

Tom Henderson-2
On 5/23/20 6:54 AM, Tommaso Pecorella wrote:

> I totally feel Nat's pain, because it's mine as well.
>
> This is the very same reason tor updating Serialize and Deserialize from Ipv[4,6] addresses (as a side note, please comment this:
> https://gitlab.com/nsnam/ns-3-dev/-/merge_requests/290
>
> About the serialization, Peter mentioned that he was unhappy with it as well, because GetSerializedSize isn't really "necessary", as the buffer should work more like an output stream.
>
> I'd say that we should put a priority effort to "fix" this - perhaps as a main goal for 3.32.
>
> The (only) problem is that we'll break a lot of existing code. However, you don't make an omelette without breaking some eggs, isn't it ?

I am not convinced that changing the API here is worth the breakage; it
is a style change, not a 'fix', in my opinion.

- Tom

Reply | Threaded
Open this post in threaded view
|

Re: Header::{De, }Serialize with Buffer::Iterator copies ?!

Natale Patriciello
In reply to this post by Natale Patriciello
On 22/05/20 at 04:46pm, Natale Patriciello wrote:
[cut]
> Can someone explain the rationale behind the signature
>
>   virtual void Serialize (Buffer::Iterator start) const = 0;
>   virtual uint32_t Deserialize (Buffer::Iterator start) = 0;
>
> in the class Header? It is *clear* that "start" will be modified, but
> still, we are modifying a copy.


I went to a journey that took days of normal living time, but years for
my mind. I explored places that I'd never go rationally, and after all
this time, the answer to the question above is:

    To allow the implementation of "PeekHeader" inside Packet.

The Header class does not know if it's being Removed, or Peek'd. So, it
works on a copy of the original buffer.

This opens other questions: by using a reference, or a pointer, in the
signature, we are "disabling" the possibility to Peek() an header. On
other classes, Serializing and Deserializing with a reference or a
pointer means it will be impossible to check for the header existence
with Peek. Anyhow, I wonder how is it possible to check the header
existence with Peek, as it is possible that two (or more) headers share
the first n-bit representation (just with different meaning...)


Ciao,
Nat
Reply | Threaded
Open this post in threaded view
|

Re: Header::{De, }Serialize with Buffer::Iterator copies ?!

Tom Henderson-2
On 5/27/20 12:48 AM, Natale Patriciello wrote:

(snip)

>  Anyhow, I wonder how is it possible to check the header
> existence with Peek, as it is possible that two (or more) headers share
> the first n-bit representation (just with different meaning...)
>

That limitation is what the PacketMetadata system is intended to remedy.

In relation to your previous comment on the API and how it led you to
waste a day, I would propose that the best course of action in that
instance would be to document better how the API works, perhaps by
adding your test program with some comments about not forgetting to use
Next() when it is needed.

- Tom
Reply | Threaded
Open this post in threaded view
|

Re: Header::{De, }Serialize with Buffer::Iterator copies ?!

Tommaso Pecorella
In reply to this post by Tom Henderson-2
I agree that it's a style change. However, style is there to prevent mistakes.

About the specific issue in Ipv[4,6] addresses, please comment on the merge request.

Personally, I'm more than convinced that the patch is wrong and we should instead remove Serialize and Deserialize from Ipv[4,6]addresses - because they're not headers, they're data types, and from a style point of view, users do not expect to have a Serialize in a data type (and not in other ones).

As for the code "beautifulness", I'll wait until we'll have a serialize / deserialize that works like a (normal) i/o function.
The (only) real pro of having a Serialize in Ipv4Address is to be able to use Read32NtoH transparently - but that can be achieved by having a specialized ReadIpv4Address / ReadIpv6Address in the buffer iterator - which is perhaps the most logical thing to have.

Cheers,

T.


> On 23 May 2020, at 16:06, Tom Henderson <[hidden email]> wrote:
>
> On 5/23/20 6:54 AM, Tommaso Pecorella wrote:
>> I totally feel Nat's pain, because it's mine as well.
>>
>> This is the very same reason tor updating Serialize and Deserialize from Ipv[4,6] addresses (as a side note, please comment this:
>> https://gitlab.com/nsnam/ns-3-dev/-/merge_requests/290
>>
>> About the serialization, Peter mentioned that he was unhappy with it as well, because GetSerializedSize isn't really "necessary", as the buffer should work more like an output stream.
>>
>> I'd say that we should put a priority effort to "fix" this - perhaps as a main goal for 3.32.
>>
>> The (only) problem is that we'll break a lot of existing code. However, you don't make an omelette without breaking some eggs, isn't it ?
>
> I am not convinced that changing the API here is worth the breakage; it is a style change, not a 'fix', in my opinion.
>
> - Tom
>

--------------------------------------------------------------

``... anyone can do any amount of work, provided it isn't the
  work he is supposed to be doing at that moment.''
-- Robert Benchley, in Chips off the Old Benchley, 1949

--------------------------------------------------------------

Tommaso Pecorella - Ph.D.

Assistant professor
Dpt. Ingegneria dell'Informazione
Università di Firenze

CNIT - Università di Firenze Unit

via di S. Marta 3
50139, Firenze
ITALY

email: [hidden email]
       [hidden email]

phone : +39-055-2758540
mobile: +39-320-4379803
fax   : +39-055-2758570










Reply | Threaded
Open this post in threaded view
|

Re: Header::{De, }Serialize with Buffer::Iterator copies ?!

Barnes, Peter D.

> On May 27, 2020, at 8:20 AM, Tommaso Pecorella <[hidden email]> wrote:
>
> I agree that it's a style change. However, style is there to prevent mistakes.
>
> About the specific issue in Ipv[4,6] addresses, please comment on the merge request.
>
> Personally, I'm more than convinced that the patch is wrong and we should instead remove Serialize and Deserialize from Ipv[4,6]addresses - because they're not headers, they're data types, and from a style point of view, users do not expect to have a Serialize in a data type (and not in other ones).

The main reason we want to revisit Get/De/Serialize is to support dynamic load balancing, which requires moving (migrating) the state of an entire Node between processes on different processors.  For this case all data types will need De/Serialize functions, not just packet headers.  We’re investigating ways to automate the production of those functions...

> As for the code "beautifulness", I'll wait until we'll have a serialize / deserialize that works like a (normal) i/o function.
> The (only) real pro of having a Serialize in Ipv4Address is to be able to use Read32NtoH transparently - but that can be achieved by having a specialized ReadIpv4Address / ReadIpv6Address in the buffer iterator - which is perhaps the most logical thing to have.

Yuck.  Expanding a broken API with specializations for specific types?  

Take a look at boost::serialization[1].  Note that there is only one function: serialize<>(), which serves both serialization and deserialization.  There is no GetSerializedSize().  serialize<>() is templated on the “archive” type, so it can be specialized for input/output, of course, but also ascii (logging, tracing, human-readable), binary (checkpoints, migration), and even network <—> host order.  


Peter

[1] https://www.boost.org/doc/libs/1_73_0/libs/serialization/doc/index.html


_____________________________________________________________
Dr. Peter D. Barnes, Jr. CASC Division, B451 R2035
Lawrence Livermore National Laboratory Computation Directorate
7000 East Avenue, L-561 email:  [hidden email]
P. O. Box 808 Voice:  (925) 422-3384
Livermore, California 94550



Reply | Threaded
Open this post in threaded view
|

Re: Header::{De, }Serialize with Buffer::Iterator copies ?!

Tommaso Pecorella
Hi Peter,

if there's a significant work on this, I'll avoid to add "noise" to the process.

My rationale in killing Serialize and Deserialize from Ipv[4,6]Address is that it's not following any convention - especially not  Buffer::Iterator ones.
However, if you're working on an alternative solution with a short-medium horizon, I can also avoid to add noise on this particular point.

Cheers,

T.


> On 27 May 2020, at 19:21, Barnes, Peter D. <[hidden email]> wrote:
>
>
>> On May 27, 2020, at 8:20 AM, Tommaso Pecorella <[hidden email]> wrote:
>>
>> I agree that it's a style change. However, style is there to prevent mistakes.
>>
>> About the specific issue in Ipv[4,6] addresses, please comment on the merge request.
>>
>> Personally, I'm more than convinced that the patch is wrong and we should instead remove Serialize and Deserialize from Ipv[4,6]addresses - because they're not headers, they're data types, and from a style point of view, users do not expect to have a Serialize in a data type (and not in other ones).
>
> The main reason we want to revisit Get/De/Serialize is to support dynamic load balancing, which requires moving (migrating) the state of an entire Node between processes on different processors.  For this case all data types will need De/Serialize functions, not just packet headers.  We’re investigating ways to automate the production of those functions...
>
>> As for the code "beautifulness", I'll wait until we'll have a serialize / deserialize that works like a (normal) i/o function.
>> The (only) real pro of having a Serialize in Ipv4Address is to be able to use Read32NtoH transparently - but that can be achieved by having a specialized ReadIpv4Address / ReadIpv6Address in the buffer iterator - which is perhaps the most logical thing to have.
>
> Yuck.  Expanding a broken API with specializations for specific types?  
>
> Take a look at boost::serialization[1].  Note that there is only one function: serialize<>(), which serves both serialization and deserialization.  There is no GetSerializedSize().  serialize<>() is templated on the “archive” type, so it can be specialized for input/output, of course, but also ascii (logging, tracing, human-readable), binary (checkpoints, migration), and even network <—> host order.  
>
>
> Peter
>
> [1] https://www.boost.org/doc/libs/1_73_0/libs/serialization/doc/index.html
>
>
> _____________________________________________________________
> Dr. Peter D. Barnes, Jr. CASC Division, B451 R2035
> Lawrence Livermore National Laboratory Computation Directorate
> 7000 East Avenue, L-561 email:  [hidden email]
> P. O. Box 808 Voice:  (925) 422-3384
> Livermore, California 94550
>
>
>

--------------------------------------------------------------

``... anyone can do any amount of work, provided it isn't the
  work he is supposed to be doing at that moment.''
-- Robert Benchley, in Chips off the Old Benchley, 1949

--------------------------------------------------------------

Tommaso Pecorella - Ph.D.

Assistant professor
Dpt. Ingegneria dell'Informazione
Università di Firenze

CNIT - Università di Firenze Unit

via di S. Marta 3
50139, Firenze
ITALY

email: [hidden email]
       [hidden email]

phone : +39-055-2758540
mobile: +39-320-4379803
fax   : +39-055-2758570










Reply | Threaded
Open this post in threaded view
|

Re: Header::{De, }Serialize with Buffer::Iterator copies ?!

Barnes, Peter D.
Not short or medium term, unfortunately.  I thought I should share our long-term goals, in order to forestall lots of special cases which would have to be refactored later.  (I would call new Buffer API for specific data types a special case: it puts the implementation (in Buffer) far from where the details are known (in Ipv4Address, for example).  Keeping Ipv4Address::De/Serialize does follow a pattern: the pattern used by things serializable into a Packet Buffer.)

But you have a problem now, which needs a reasonable solution.

IMO the best option at the moment is to support De/Serialize for data types which commonly appear in Packets.  This ensures that it’s done right (field sizes, byte ordering…) and only once, not forcing everyone to roll their own.

P

On May 27, 2020, at 2:26 PM, Tommaso Pecorella <[hidden email]<mailto:[hidden email]>> wrote:

Hi Peter,

if there's a significant work on this, I'll avoid to add "noise" to the process.

My rationale in killing Serialize and Deserialize from Ipv[4,6]Address is that it's not following any convention - especially not  Buffer::Iterator ones.
However, if you're working on an alternative solution with a short-medium horizon, I can also avoid to add noise on this particular point.

Cheers,

T.


On 27 May 2020, at 19:21, Barnes, Peter D. <[hidden email]<mailto:[hidden email]>> wrote:


On May 27, 2020, at 8:20 AM, Tommaso Pecorella <[hidden email]<mailto:[hidden email]>> wrote:

I agree that it's a style change. However, style is there to prevent mistakes.

About the specific issue in Ipv[4,6] addresses, please comment on the merge request.

Personally, I'm more than convinced that the patch is wrong and we should instead remove Serialize and Deserialize from Ipv[4,6]addresses - because they're not headers, they're data types, and from a style point of view, users do not expect to have a Serialize in a data type (and not in other ones).

The main reason we want to revisit Get/De/Serialize is to support dynamic load balancing, which requires moving (migrating) the state of an entire Node between processes on different processors.  For this case all data types will need De/Serialize functions, not just packet headers.  We’re investigating ways to automate the production of those functions...

As for the code "beautifulness", I'll wait until we'll have a serialize / deserialize that works like a (normal) i/o function.
The (only) real pro of having a Serialize in Ipv4Address is to be able to use Read32NtoH transparently - but that can be achieved by having a specialized ReadIpv4Address / ReadIpv6Address in the buffer iterator - which is perhaps the most logical thing to have.

Yuck.  Expanding a broken API with specializations for specific types?

Take a look at boost::serialization[1].  Note that there is only one function: serialize<>(), which serves both serialization and deserialization.  There is no GetSerializedSize().  serialize<>() is templated on the “archive” type, so it can be specialized for input/output, of course, but also ascii (logging, tracing, human-readable), binary (checkpoints, migration), and even network <—> host order.


Peter

[1] https://www.boost.org/doc/libs/1_73_0/libs/serialization/doc/index.html


_____________________________________________________________
Dr. Peter D. Barnes, Jr. CASC Division, B451 R2035
Lawrence Livermore National Laboratory Computation Directorate
7000 East Avenue, L-561 email:  [hidden email]<mailto:[hidden email]>
P. O. Box 808 Voice:  (925) 422-3384
Livermore, California 94550




--------------------------------------------------------------

``... anyone can do any amount of work, provided it isn't the
  work he is supposed to be doing at that moment.''
-- Robert Benchley, in Chips off the Old Benchley, 1949

--------------------------------------------------------------

Tommaso Pecorella - Ph.D.

Assistant professor
Dpt. Ingegneria dell'Informazione
Università di Firenze

CNIT - Università di Firenze Unit

via di S. Marta 3
50139, Firenze
ITALY

email: [hidden email]<mailto:[hidden email]>
       [hidden email]<mailto:[hidden email]>

phone : +39-055-2758540
mobile: +39-320-4379803
fax   : +39-055-2758570











_____________________________________________________________
Dr. Peter D. Barnes, Jr. CASC Division, B451 R2035
Lawrence Livermore National Laboratory Computation Directorate
7000 East Avenue, L-561 email:  [hidden email]<mailto:[hidden email]>
P. O. Box 808 Voice:  (925) 422-3384
Livermore, California 94550