Author Topic: Heterogeneous saving and loading objects in C++ (Read 32340 times)

Krice · « **on:** April 05, 2016, 09:42:44 AM »

Has anyone done this? I was thinking of first saving the type of the class in a list, because you simply have to know the type of object to create its actual type before loading the data. With homogeneous lists this is easy, because you simply create the class n times for each list item. The reason I'm planning this is that I want to extend inheritance of some object types.. I don't even know if it is a good idea, but it has started to annoy me that some classes have "empty" components that could be put to inherited (extended) classes where they in theory should be placed anyway.

Skullcoder · « **Reply #1 on:** April 05, 2016, 06:31:20 PM »

Java also uses a similar method under the hood for its serialization protocol, whereby the type is written before the data; But type alone is not the only consideration to make if you're talking about generic object storage and instantiation...

I've done something similar back in C99. When porting my OOP system to C++, and before the introduction of run-time type inspection to C++, I had included a type ENUM and a macro which when inserted into each new type would add a new entry to the OB_TYPE_ID enumeration, and assign that as the static ID of the type.

At the time I had created a scripting language with tight bindings to C which allowed C code to transparently read/write properties and call functions on script objects using some preprocessor and macro magic (rather than eval("script code") or similar as seen in Lua and other embedded scripting langs). In constructing my Object class (building custom OOP within C), I included the run-time type ID by default, and a registration system which allowed objects to register the default instance of themselves with a factory. Then factory->OB_Instantiate( dataptr ) would read a type ID from the dataptr, advance the pointer, look up the object registered for the ID, then call the (script provided) function Object->clone()->load( dataptr ); In addition to polymorphism this allows "prototypical" behavior since scripts could replace objects with new objects that provide more / new functionality via registering a compatible subclasses for a TypeID -- However, that form of dynamism is purely optional. Of course the gnarly script-required calling semantics in the host language were hidden behind a macro...

When registering a type with the factory, one also registered its property definition, of interest was the offset of pointers within the object data which could reference other objects that also needed to be saved or loaded. In the OB_Instantiate() function the property definition for the clonee would be queried to determine if there were any references to other objects that the object needed to load. It would then "recursively" (via iterative trampoline) load all of the objects referenced by this object. This allowed me to perform loading and saving of the entire game state via a single save( game ) or load( game );

One issue I ran into (but you may not, since my implementation was meant to serialize general purpose language objects rather than a specific set of pre-known classes), was that that several objects could reference the same other object. This means that I not only needed to record the type of the object ahead of its data, but also the globally unique identifier of the object instance; Otherwise, when saving or loading data a single object may get turned into multiple separate instances (due to its reference by other objects). Since repeated saving was benchmarked as more frequent than loading events I optimized for save speed and the unique ID was implemented simply by storing the integer value of the object's pointer prior to its type ID and property data in the stream. I also marked each object as "saved" in the memory management bookeeeping header that precedes all object instances... more about this later. If you're not using a custom allocator and can't rely on bookeeping metadata for the object instances then you can incorporate the "saved" flag into the lower or upper bit of the type ID, or use a map as a key / value store to keep track of which object pointers have been saved.

When storing an object I first determine whether or not it's already been stored by querying its status bit (or you can query the "saved" map for the existence of its pointer (key)). If the object is already saved this "batch" then I don't need to write the Type ID nor any instance data, only store the pointer to the object data in the output stream (or an otherwise unique per object instance number). If the object pointer to be stored is NULL then only the NULL pointer value is written and the storage function does not recurse.

When loading the data the pointer values loaded do not reflect the actual memory location of an object, of course. However, they are unique and thus allow me to build a key / value store when loading (I use a hashmap, but a treemap or other map will do). To perform loading I first read the object's unique instance ID (its old pointer) from the data stream, then I test for the existence of the key in this batch's "loaded" map. If the key does not exist then the object is instantiated using the type ID and data from the stream; The new object instance pointer is added to the the map as the value for the key of its "unique ID" (the old pointer loaded from the stream). The new instance pointer is returned from the instantiation factory function (which might be returned directly, or be setting the property of another object being loaded). If the key does exist, then its value (an instance pointer created this loading batch) is returned instead of instantiating a new object from the data stream. The special case of NULL is addressed by adding "NULL:NULL" (0:0) to the key:value map prior to loading a the batch of objects. Thus if an object has a pointer property that may reference another object type, but is set to NULL, then a new object of that referenced type is not instantiated, the pointer gets set to null as intended.

A note on bit-flag optimization: I was able to avoid plaicing a "saved" flag in the TypeID or using a map structure during saving since I had a custom allocator and garbage collector which provided some metadata prior to the object's pointer. In C and C++ when you call malloc() or new() the pointer value returned is typically advanced just beyond some record keeping data that is needed for the allocator to locate the memory management structures in order to free() the data. The free() or delete() function typically subtracts some constant value from the pointer passed in and this gets you to the memory management record keeping header data which then allows the standard lib (or kernel) to return the memory to the process's (or system's) memory pool. Since all of my allocation sizes are limited to aligning on word boundaries of the platform (16 or 32 or 64 bit = 2, 4 or 8 bytes), and I recorded the size of the allocation for range checking and to determine which size-range specific allocator to use: I was able to use the lowest bit of the "size" field of the allocation's GC header data as the "saved" bit, the low bit was simply masked off during free(). So, if you request 7 bytes you'll get allocated 8 usable bytes (round up to the nearest word boundary) + some header data (a "size" word, in this case), and the pointer returned will point to the usable data just beyond the memory management header. That's simply the overhead of dynamic memory allocation. Knowing this, however, allows you to place global record keeping information (such as a "stored" or "loaded" flag, or even the runtime type IDs) outside of the "user's" object definitions. This is how C++ provides your run time type data for introspection (on supporting compilers), beware that it may not be standardized across compilers, thus making it somewhat useless and necessitating a "roll your own anyway" approach (which I find, unfortunately, quite a common occurrence in C++).

You can provide your own compiler agnostic global instance data by overloading malloc() or new() and requesting a few bytes more from the underlying functions in order to store your record keeping data. Before you return the pointer, just be sure to advance the value returned from the underlying functions beyond your record keeping header. You'll also have to overload free() or delete() and manually modify (and reinterpret cast) the pointer value so that it actually points to the top of your object+header. Users of the returned value thus remain unawares that there's extra stuff before the instance pointer. Using this, and potentially a macro or two in your object definition, you can keep the clutter to a minimum rather than requiring each Object class to have unique code that explicitly performs serialization of its data. Note that if you decide to extend all objects from a root GameObject class, that including the typeID field in the root object may quickly slam your head into the object inheritance diamond problem. This is due to an absurd deficiency in C++ whereby the "virtual" keyword is applicable to methods, but not instance variables... If only variables could be declared "virtual" (and thus their position data added to the same VTABLE that "virtual" functions are), then C++ would be far less retarding to the implementation of advanced functionality as "pure virtual" classes could then contain vars (and templates wouldn't have to take up so much of the slack).

Of course this is only one way to achieve the goal of generic global object save / load functionality. Each will have pros and cons. I primarily posted this to bring up the issue of deduplication & instance resolution. Look into the custom Allocator facilities that C++ provides, and esp: Per-class overriding of the new operator.

A word on the type property definition: For the purpose of object serialization one only needs to record the number of object pointers within the type data record, and the offset of each pointer within the type record. This can be trivially constructed by using a macro to create your type definition and a macro to declare pointers to objects within said type definition macro (or crazily constructed using a set of C++ template functions which abuse the preprocessor to perform addition and address-of operations to construct a "class_def" symbiote to make up for the lack of proper introspection, and once again "roll your own anyway").

Omnivore · « **Reply #2 on:** April 05, 2016, 08:22:59 PM »

Here's a complete library of C++ serialization/deserialization as part of an object oriented database tookit I wrote in 2001: https://sourceforge.net/projects/cpolib/files/cpolib/initial_alpha/.

It should have everything you need.

Hope this helps,
Brian aka Omnivore

PS: some of the dependencies are no longer available and the source at that link for the multithreaded version is crap. Still it has everything you need to do serialization in a number of ways, along with options to work with DLLs and it handles circular references, is STL compatible, etc.

Krice · « **Reply #3 on:** April 06, 2016, 06:50:11 AM »

Quote from: Skullcoder on April 05, 2016, 06:31:20 PM

A word on the type property definition: For the purpose of object serialization one only needs to record the number of object pointers within the type data record, and the offset of each pointer within the type record.

etc.. You wrote lot of text not related to this problem. Why? What you are using is "ridiculous C tricks with C++ hater attitude". Didn't you see the 'C++' in the title of this thread?

Krice · « **Reply #4 on:** April 06, 2016, 06:51:31 AM »

Quote from: Omnivore on April 05, 2016, 08:22:59 PM

Hope this helps,

It didn't help, because I didn't check it out. If it's too difficult to express your opinions about this problem in this thread using words then do not participate.

Omnivore · « **Reply #5 on:** April 06, 2016, 08:09:03 AM »

Quote from: Krice on April 06, 2016, 06:51:31 AM

Quote from: Omnivore on April 05, 2016, 08:22:59 PM
Hope this helps,

It didn't help, because I didn't check it out. If it's too difficult to express your opinions about this problem in this thread using words then do not participate.

I simply don't care enough to write an explanation only to be accused of whatever you're blathering about in regards to Skullcoder's reply.

Omnivore

Krice · « **Reply #6 on:** April 06, 2016, 08:47:31 AM »

Quote from: Omnivore on April 06, 2016, 08:09:03 AM

I simply don't care

Well it's good that I do. When I was thinking this more I figured out something that might just work.

I'm saving objects in a list. The first data saved is simply the number of objects in the list. Then object data is saved per object in a special save game format and "container" class for saving. Now, the save game structure has a binary data for the size of the next chunk which then loads into that container class. What I realized is that I can "peek" ahead the data to find out the type of the object (which is saved anyway, because I'm keeping the type of object in the base class (I guess it's "bad" design in OOP sense, but who cares, not you at least)). So, it looks like I don't even have to save a separate list for object types.

There are other problems in this type of inheritance, but I think they are easier to solve than saving and loading. Let's take an example of Item class. I'm planning to inherit Paper and Container_Item from Item. Paper is something with scroll component (book, scroll, note, etc.) and Container has of course a container feature. Item lists are type Item, but I guess it's not a problem, because you can store pointer of derived items as well and then cast them to derived objects when you need to access them. The reason I'm doing this is that I feel it's good kind of "premature optimization" where some features of items needs to be in derived classes, because only a small amount of items are type Paper of Container. It saves both memory and perhaps more importantly save game size.

Cfyz · « **Reply #7 on:** April 06, 2016, 09:56:59 AM »

This Krice has broken down, get us another one.

Skullcoder · « **Reply #8 on:** April 06, 2016, 06:54:52 PM »

Quote from: Krice on April 06, 2016, 06:50:11 AM

Quote from: Skullcoder on April 05, 2016, 06:31:20 PM
A word on the type property definition: For the purpose of object serialization one only needs to record the number of object pointers within the type data record, and the offset of each pointer within the type record.

etc.. You wrote lot of text not related to this problem. Why? What you are using is "ridiculous C tricks with C++ hater attitude". Didn't you see the 'C++' in the title of this thread?

It's foolish to interpret criticism as hate.

I did see the C++ and, as it turns out, C++ added facilities expressly to pull off "C tricks" because they're so damn useful, such as allowing you to overload the new operator rather than dick around with macros that wrap the malloc() calls. The designers of C++ realized that you may also need to do things the C way sometimes in order to route around a language edge case (or add new features to C++ itself), and so they provide full compatibility with C.

I don't hate C++, I just recognize that it could be much better than it currently is. A decade ago I helped petition to have hashmaps added to the STL, it was denied... Now we finally have hashmaps in the new C++ STL (after having rolled our own for a decade). Our constant criticism of C++ for not standardizing on a hashmap implementation and instead necessitating many separate fractured libraries eventually improved C++. Perhaps some day we'll have other improvements such as "virtual variables", which I mentioned because there is a common pitfall when implementing object hierarchies that need per-instance state (called the diamond inheritance problem) which, in the absence of virtual variables, can be avoided by moving that global state into your fully C++ sanctioned custom Allocator bookeeping data. I then explained the lower level implementation details (call it C if you like).

I fail to see where most of what I said is unrelated. What you think is not related to the issue is actually related in my experience. For example, if you want to create a generic parent (template) class which has the ability to recursively store itself and any pointers to other objects its subclasses (or template instantiations) may contain, you will need some type introspection, AKA a "property definition". C++ traditionally has had very poor support for type introspection, and so we perform "tricks" via macros and template functions to make up for the deficiency. Once C++ had no run time type ID at all, and now most compilers support this, largely thanks to criticism. Because no version was mentioned of the particular C++ in use, I mentioned these methods and including a method universal to any C++ compiler whereby you resort to adding your own run time type information. Note: manually calling save( this.propertyptr ) on each property in a .save() method is functionally equivalent to providing your own type information, you'll just be doing so explicitly in source code rather than implicitly via the language's metadata (if it provides such).

It's a rather involved topic. It would be beneficial if C++ had serialization standardized in its STL, but it does not have that yet, and so we've delved into the deep implementation details of such a low level language feature as introspection (that most other high level languages have standardized support for today). As it turns out that brushes into memory allocation. In fact, the method for saving and loading this data that I posted previously is essentially the same as a generational garbage collector algorithm, and once implemented can be used as such. The naive approach would be: When memory gets fragmented, save the game state, then load it, and you've just performed a generational GC pass (or a mark/sweep pass, depending on how you look at it).

And, that's why the heterogeneous (de)serialization of object hierarchies touches on a wide range of topics. Feel free to ignore any information provided. Consider that I didn't write it just for you...

Also of note: C++ frequently takes implementations and/or hints on where to go from the popular Boost C++ library.
Boost was one alternative to rolling your own hashmap back in the day, and as it turns out the Boost library also has generic serialization support. In other words, while waiting for C++ to get around to adopting new standard features, they rolled their own so you don't have to.

Quote

Code portability - depend only on ANSI C++ facilities.
Code economy - exploit features of C++ such as RTTI, templates, and multiple inheritance, etc. where appropriate to make code shorter and simpler to use.
Independent versioning for each class definition. That is, when a class definition changed, older files can still be imported to the new version of the class.
Deep pointer save and restore. That is, save and restore of pointers saves and restores the data pointed to.
Proper restoration of pointers to shared data.
Serialization of STL containers and other commonly used templates.
Data Portability - Streams of bytes created on one platform should be readable on any other.
Orthogonal specification of class serialization and archive format. That is, any file format should be able to store serialization of any arbitrary set of C++ data structures without having to alter the serialization of any class.
Non-intrusive. Permit serialization to be applied to unaltered classes. That is, don't require that classes to be serialized be derived from a specific base class or implement specified member functions. This is necessary to easily permit serialization to be applied to classes from class libraries that we cannot or don't want to have to alter.
The archive interface must be simple enough to easily permit creation of a new type of archive.
The archive interface must be rich enough to permit the creation of an archive that presents serialized data as XML in a useful manner.

I bolded the part in which Boost's implementation overlaps specifically with the example you gave, indicating that both myself and Boost believe this is relevant to object serialization. Indeed this list of goals from their library covers many (if not all) of the implementation details I touch on, such as proper resolution of nested data structures ("Proper restoration of pointers to shared data" and "Deep pointer save and restore"), or "don't require that classes to be serialized be derived from a specific base class or implement specified member functions" (which is what my "C tricks" using C++ Allocator overloading and macros or complex templating shenanigans allows).

There is now a detailed description of the full implementation details of such a system in this thread, and links to two separate implementations, one of which may be adopted by C++ some day, if only we criticize the standards makers for not having it long enough.

Krice · « **Reply #9 on:** April 06, 2016, 09:28:53 PM »

I don't really understand why people like you work with C++ (or C). Why can't you pick a modern language and program with it? I think even some changes they made to C++11 were forced modern features and the development of C++ is out of control, a compromise driven by people who really don't understand the language. C++ was never supposed to be a high level language.

Also it's not a surprise that I don't have a clue what you are explaining. Must be some kind of esoteric ideas for programming I just can't understand. But I want to give a hint: I'm not interested about your ideas. They suck, whatever they may be.

sokol815 · « **Reply #10 on:** April 06, 2016, 10:46:10 PM »

3 stars to skullcoder! I'm sure your input will help future motivated devs who are working on difficult serialization problems in c++. The highly academic language of your responses can make it hard for non-native english speakers to clearly grasp what you are saying, unfortunately... nothing can be done about that. Don't mind Krice; he satisfies his need for human interaction by trolling on the internet. Welcome to the forums!

Tzan · « **Reply #11 on:** April 06, 2016, 11:31:05 PM »

I've been having a hard time following along too, but I think that's because I'm firmly in the C# camp since 2009.
Before that was Java back to 1999.
I did do a significant amount of C around 1991, but no saving, so I never needed to figure it out.

tuturto · « **Reply #12 on:** April 07, 2016, 04:33:55 AM »

I dont' do C/C++, but I found skullcoders explanation interesting to read and I learned something while doing so (surprisingly, serialization isn't in stl yet). Thanks for taking your time to write that.

Krice · « **Reply #13 on:** April 07, 2016, 08:06:51 AM »

The first part of the problem is solved, because I found out it wasn't too difficult to inherit from Item and then use the Paper class for scrolls etc. Rather than using "raw" new I made a factory type class that returns Item, but creates the actual object type depending on the item type. Then I had to change only one (cast from Item to Paper) line to read the scroll. I still need to pass some extra parameters to Get_New_Item for scrolls, but since that routine is called only in three places in the entire source code I think I can live with it.

Code: [Select]

U_Item *Object_Factory::Get_New_Item(int i, int a)
{
	U_Item *rv;

	O_Item_Type it(i);
	const int imt=it.Get_Main_Type();

	if (imt==imtScroll)
	{
		rv=new U_Paper(i, it.Get_Scroll_Type(), -1, -1);
	}
	else if (imt==imtContainer)
	{
		//return ordinary item for now
		rv=new U_Item(i, a);
	}
	else rv=new U_Item(i, a);

	return rv;
}

void U_Avatar::Read_Scroll(U_Item *i)
{
	U_Paper *p=(U_Paper*)i;
	const int m=p->Read(Is_Blind());
	if (m!=-1) Noun_Message(m, i);
}

Krice · « **Reply #14 on:** April 08, 2016, 09:18:13 AM »

Container was more interesting case, because it has virtual base class implementation. I made a class called Bag (or U_Bag to follow similar notation for game object classes in Kaduria) that inherited from Item. In this case the virtual mechanism proved to be epic, because you can simply move everything related to container from Item to derived Bag and it just works. There were also couple of casts from Item to Bag, but those were of course not related to virtual functions, because they always work no matter the class structure.

Container functionality (just as Scroll) is a composite class wrapped inside Bag. It would be nice to have Container as intermediate class to inherit from, but it's I think impossible to do in kind of class hierarchy where other object types can be containers as well. You get the diamond problem as a result. I guess multiple inheritance could solve that, but I'm afraid even to try it.

News: Read the RULES before posting.

Author Topic: Heterogeneous saving and loading objects in C++ (Read 32340 times)

Krice

Heterogeneous saving and loading objects in C++

Skullcoder

Re: Heterogeneous saving and loading objects in C++

Omnivore

Re: Heterogeneous saving and loading objects in C++

Krice

Re: Heterogeneous saving and loading objects in C++

Krice

Re: Heterogeneous saving and loading objects in C++

Omnivore

Re: Heterogeneous saving and loading objects in C++

Krice

Re: Heterogeneous saving and loading objects in C++

Cfyz

Re: Heterogeneous saving and loading objects in C++

Skullcoder

Re: Heterogeneous saving and loading objects in C++

Krice

Re: Heterogeneous saving and loading objects in C++

sokol815

Re: Heterogeneous saving and loading objects in C++

Tzan

Re: Heterogeneous saving and loading objects in C++

tuturto

Re: Heterogeneous saving and loading objects in C++

Krice

Re: Heterogeneous saving and loading objects in C++

Krice

Re: Heterogeneous saving and loading objects in C++