Author Topic: Map storage compression techniques (Read 16823 times)

sokol815 · « **on:** February 20, 2013, 09:04:55 PM »

I am currently writing a game that is a real-time 2d sandbox action/rpg with rogue-like elements (See 'Euthenia' on YouTube). The intent of this post is to document a successful compression technique I was able to implement to store a 10,000 x 2,500 block world (or any other size) into a file that then stores the world for later play, or prepares to send it across the network (for multiplayer servers).

each block consists of 4 physical bytes. the first 2 bytes are signed and the last two are normal bytes.

before any compression techniques, writing this world to disk resulted in a 93 Megabyte file. (And took approximately 20 seconds to write) Sending that much data across a network would take at least a few minutes, if not, longer. Applying a zipping algorithm through 7-zip brought the file size down to around 300K. Using this as a benchmark, I knew that creating any algorithm to store this would have to give me similar results.

the nature of such a 2d world is that many horizontally-adjacent blocks will be of the same type. This is where the compression algorithm could come in.

the 4 bytes per block I was writing was not fully utilized:
the 3rd byte was only using the first 2 bits (leaving 6 unused bits)
the 4th byte was using the first 7 bits. It has a max in-game value of 100 (leaving another unused bit)

these 7 unused bits could be used to store a series of blocks in a row. (127 blocks)
but because of the possibility of thousands of blocks in a row being the same (e.g. empty sky tiles)
I opted for a different method of bit utilization.

byte 4: the 8th bit is simply used as a boolean flag. When on, this says there are identical blocks after me.
byte 3: bits 3 to 7 are used as block counters, and can store a value up to 31 (meaning 31 blocks in a row)
byte 3: bit 8 is also used as a boolean flag. When on, there is a byte written after this block that also details a number of blocks in a row, except * 32 (because 31 blocks can be held in bits 3 to 7 of byte 3)

This allows me to store 31 blocks together without any additional space requirements, and I can store up to 8191 blocks (255 * 32 + 31) utilizing an additional byte.

This method of compression resulted in the same world being written to a 272KB file. So I am overall quite pleased with it (0.3% of the original filesize!) And a nice side-effect is saving the world back to a file (and reading it in) now takes less than a second.

Of course a world file will tend to creep up in size as people build objects and change the world through playing, but it is not very sizable. Someone creating a "checkerboard pattern" (this algorithm's worst enemy) of 32 x 32 blocks would result in a file size increase of merely 4KB.

Reading the world into memory is just a matter of running the algorithm the other way. How have you dealt with reading a large world in[to] a file?

Omnomnom · « **Reply #1 on:** February 20, 2013, 11:26:29 PM »

This is pretty neat. That's a 25 million tile world compressed to 272kb, which is what about 0.01 bytes per tile.

based on that I ran some numbers (which are probably wrong) for fun.

100gb of disk space = 9,000,000,000,000 tiles (*)
400gb of disk space = a world the size of the moon at 1x1 meter per tile resolution
5.5 terabytes of disk space = a world the size of the Earth again at 1x1 meter resolution

Have you thought about breaking your map into chunks and virtualizing the data? That would help in almost all aspects of performance if you only loaded, saved, transmitted and held in memory the parts of the map around the player (or players).

Youtube video looks good. Reminds me of terraria, but I prefer your tile graphic style for a game like this.

(*) Even this is already too much. Even if the player could see a unique 80x40 portion of the 9 trillion tile map every second it would take them almost 90 years to see the whole map. And the Earth map isn't so large with 7 billion players. But with a handful of players it is waaay too big. Even minecraft servers I played on which were much smaller (4000x4000) the maps were too big for multiplayer player vs player, couldn't find people! Even in coop play just building stuff where people would cluster together, 99% of the map was just unused (and largely unseen) wilderness.

sokol815 · « **Reply #2 on:** February 21, 2013, 06:50:43 AM »

Thanks for you comments, Omnomnom! It is fun to see what dizzying numbers such an algorithm could obtain.

The in-world resolution is 1 block = 2 feet x 2 feet. (because my tiles are 24 pixels x 24 pixels, this results in 1 pixel becoming 1 square inch. cool, eh?)

At this resolution, a world 10,000 blocks wide is 20,000 feet, or about 3.8 miles. The default character can run at a speed of 300 pixels/second (about 17 miles/hour) so....

it would take approximately 13 minutes to circumnavigate the world (and end up right where you started!)

I anticipate a small world being something closer to 4,000 x 1,600 pixels, though. 10,000 ends up being truly huge. (not to mention a world 3x10^6 by 3x10^6 in size!!)

Initially, the entire map is transferred over the network connection, and thereafter, individual batched block updates are sent for the currently "active" map areas. When a player moves sufficiently to a spot where another map area must be activated, it is retrieved from the server. I anticipate a chunk being about 100 x 50 (a little bit bigger than a 1080p screen could hold) and this size will generally be around .9 - 1.5KB.

I currently load the entire map into memory (a 10,000 x 2,500 map is about 1 GB in memory) Perhaps I will return to look at how to just load parts of the map into memory at a single time. That could be quite a nice technical feat.

I am a one-man team, and I have recently picked up a love of pixel arting, so all of the graphics in the game are done by me. (not all necessarily pixel art, though... that planet would have been insane to do pixel art style.) I'm glad you like the tiles.

This game right now does largely act like Terraria. I haven't added quite a few things (barely halfway through the 2nd month of development). I am planning on a unique crafting system that actually lets you assemble parts together to make items (and store blueprints for future reference, of course). Even a crafting system that lets you build vehicles that can fly, swim, dive, drill.... the possibilities are endless. But one step at a time for now, eh?

Also, I just realized that I was actually writing out the blocks top to bottom then left to right.... switched it to left to right before top to bottom and got the same world down to 163KB.

I'm very excited about this project because, after all, it is the game I have always wanted to play.

Quendus · « **Reply #3 on:** February 21, 2013, 07:19:29 AM »

A nice way to do map compression is to interpret the map as a bitmap and feed it into the PNG compressor. No need to reinvent the compression/decompression wheel, and you get a minimap for free.

sokol815 · « **Reply #4 on:** February 21, 2013, 03:33:14 PM »

because I am using XNA with a reach profile, doing a png compression of the map (at least using XNA's tools) is not possible: maximum texture sizes are 2048 x 2048. A png handled by C# instead of XNA would be able to solve this problem. (And I just realized hi-def only goes up to 4096x4096 anyways... so C# functions would really be the only way to do this.)

I'm using reach because I want this to be able to run on laptops with older graphics cards/without graphics cards altogether.

Sounds like a good way of loading in only the portions of the map which are close-by. Did a test creating a 10000 x 2500 png and saving it... a world with resource distributions and caves on-par with my generated world ends up being 199 KB after being saved as a png and 138KB when zipped. I cannot use this as a 1 to 1 comparison, but it seems a compressed png would have a slightly better compression ratio vs. my method. If I find I need to revisit this I will definitely give the png method a second glance. Thanks for you input, Quendus.

Quendus · « **Reply #5 on:** February 21, 2013, 10:15:28 PM »

XNA would probably want to put images it handles on the graphics card - not really what you want to do, as the raw map data will not be rendered to the screen and a minimap would have to be smaller than 2048x2048 anyway. C# functions or C# binding of libpng should handle it fine.
What was the bitdepth of the png format you used, and what sizes did you get on zipping your own compression format?

sokol815 · « **Reply #6 on:** February 21, 2013, 11:59:34 PM »

My png bit-depth was 32bit: corresponding to the 32bit/block mapping that would have been used.

the zipped worlds come in at about 1/2 to 1/3 of my original size.

Using a mini-map, although possible, is not a part of the intended implementation... I am planning on using a HUD that shows longitude and current elevation(If one acquires gear that displays this information). Homing beacons can also be dropped(as this is intended to be a tech-based game).

Krice · « **Reply #7 on:** February 23, 2013, 01:00:04 PM »

What is the data structure of a tile? 2 unsigned and 2 signed bytes are for what? Maybe you need only a byte to represent tile data and then additional layers of that other data which is only in some tiles. It's possible that you haven't thought of that because the way you talk about bits.

sokol815 · « **Reply #8 on:** February 23, 2013, 09:14:37 PM »

In-game, a single block is represented by 30 (or so) bytes. All of that can be extracted from the 4 bytes stored in the file. This includes things like tracking current lighting, is it a light, djikstra light calculation node status, liquid update status, simulated liquid pressure, passability, tile version (to create a smooth connection between tiles of the same type), liquid body identity (also part of simulating pressure), oscillation checking (prevent liquids from just oscillating between 2 blocks), physical draw color of tile (derived from the lighting level-- too expensive to calculate every frame, though... hence inclusion), furniture pointer, and a few other odds and ends.

as far as the stored map goes per-tile:
byte 1 is the foreground tile which can range from -128 to 127. the entire positive range is needed for future tiles, as it is quite possible there will be over 120 different types of tiles. a negative value indicates the tile is empty.

byte 2 is the background tile, same range, same meanings, just for the background.

byte 3 is the liquid ID, 0 means no liquid, 1 - 3 each mean: water, oil and lava

byte 4 is the liquid level which can be 0 to 100.

There are some things I could change (like using a byte instead of sbyte for the tile backgrounds and foregrounds and using 127 as an empty tile, freeing up 1 bit on each of those bytes) and instead check every single additional bit from bytes 3 and 4(retaining only the extra byte flag), which would allow me to increase the amount of compression in the file. (this would result in a notable 8x in-tile value increase up to 255 and a ushort(65535) for a single tile with the extra byte).

If I ever need to come back and revisit this I will, but for now I am implementing other exciting things like trees, furniture, multiplayer, graphics resources, more efficient lighting code, a scripting system for item interaction, AI, npcs, item crafting, vehicle crafting, leveling and stats, and eventually (if I ever have time to get back to it, or if it becomes a problem) better map compression.

I have been developing this game for nearly 2 months, now (started January 7). Although lines of code means nothing if they are not well put together, I am currently up to 10K lines across 25 classes. I anticipate many more. (Already having done many rewrites to things like the liquid system, collision detection, and lighting).

developing my own game when it is just me as designer, graphics artist, net programmer, main programmer, sound engineer and tester, I seem to keep moving to put out the brightest burning fire. And I wouldn't have it any other way. This topic of map compression (as of right now) is barely smoking.

Thanks for your input!

Krice · « **Reply #9 on:** February 24, 2013, 08:02:31 AM »

Quote from: sokol815 on February 23, 2013, 09:14:37 PM

byte 1 is the foreground tile which can range from -128 to 127. the entire positive range is needed for future tiles, as it is quite possible there will be over 120 different types of tiles. a negative value indicates the tile is empty.

byte 2 is the background tile, same range, same meanings, just for the background.

Why do you need -128 negative values if only a negative is empty tile? Is that for some reason? You could make 0 an empty tile and divide the rest of values for foreground/background tiles. One byte saved per tile!

Quote

byte 3 is the liquid ID, 0 means no liquid, 1 - 3 each mean: water, oil and lava

byte 4 is the liquid level which can be 0 to 100.

How much of the map is usually covered by liquid? It could be possible to reduce the size of saved map by saving terrain map and "liquid map" separately.

sokol815 · « **Reply #10 on:** February 25, 2013, 04:49:18 AM »

Quote

Why do you need -128 negative values if only a negative is empty tile? Is that for some reason? You could make 0 an empty tile and divide the rest of values for foreground/background tiles. One byte saved per tile!

I would love to learn how it is possible to store 2 values of up to 127 in a single byte... my calculations say this is impossible. (1 byte has 8 bits... and can be 0-255; splitting a byte evenly between 2 values results in 2 values of 0-15 each. 2^4-1 & 2^4-1)

As for storing liquid in the world separately... this could be quite possible, but I would unfortunately have to create an addressing system for the water. This addressing system would have to use 2 shorts(for x and y), so every repeatable instance of liquid would need to have a header of 4 bytes on top of the 2 bytes for liquid type and liquid level...

Due to the nature of the liquid system, adjacent liquid tiles on the surfaces of liquid bodies can end up being 1 liquid unit +- compared to their neighbors, so I would have to either restart the addressing there, or create another algorithm that interprets these differences somehow.

The quantity of liquid present in the world will be enough to make gains created by storing it separately little to none. I plan on there being oceans and frequent underground pockets of water. Also pockets of oil, and towards the bottom of the world, a giant lake of lava. (or at least pockets thereof... how else is one supposed to forge the one ring?)

Thanks to everyone who has participated in this discussion thus-far. I like seeing technical posts on this forum every now and then.

I have been a lurker here for at least 2 years and am happy to finally be interacting with members of the roguelike development community.

Krice · « **Reply #11 on:** February 25, 2013, 08:25:27 AM »

Quote from: sokol815 on February 25, 2013, 04:49:18 AM

I would love to learn how it is possible to store 2 values of up to 127 in a single byte... my calculations say this is impossible.

0-127 are regular values. Then 128 could be interpreted as zero if you decrease 128 from the values that go over 128, and so on. That way you get another set of 127 values. It's actually somewhat less if you count zero as empty tile, but you should get the idea.

Nymphaea · « **Reply #12 on:** February 25, 2013, 03:28:01 PM »

I think the problem is a tile could have both a foreground and background tile.

Although, if you switch to using 0 as an empty tile, you could use the extra bit from each of those to store the fluid type, or as a flag saying "The next byte says how many of this tile are next".

I agree with storing liquid in another file, and I think you're over thinking it. Just store it exactly the same way you are storing the normal map, tile by tile. The great thing with this is that a large majority of the tiles(especially higher up) will all be the same(0) so you could compress entire rows of tiles to very little data. You could still store liquid data in the same tile structure in game, this is just for the saving. You could have a single file that is broken into sections: "Tiles", "Fluids", "Entities", etc.

sokol815 · « **Reply #13 on:** February 25, 2013, 07:54:15 PM »

Yes, there can be (and most frequently is) a foreground tile and a background tile at the same time (of different types).

storing the liquid in another file is viable, but then I lose the compression that was stored in-tile for the background and foreground tile structure combined with the liquids... the liquid provides 7 bits to store data in-tile which actually accounts for a significant portion of the compression.

The 2 unused bits in the foreground and background together would be of little use except as a flag denoting an included byte and storing 1 extra tile in-data (although that byte now doesn't mean much (up to 511 including the bit not being used as a flag)... I would have to switch to a short for any significant amount of compression.

I ran some tests and came up with interesting results:

All files represent identical data created from a world with a static seed (this means we are working with the same data for each compression type)

The lakes being added to the world are 252(or so) small lakes throughout the map randomly placed. height of a lake is 50, width is: half below 31, half above 31)

Default Methods:

default method of writing 4 bytes with an additional possible byte. 31 in-tile storage, 8,191 with additional byte:
272,430 bytes (266 KB)

modified default method by using an additional short instead of a byte. 31 in-tile storage, 2,097,151 with additional short:
300,981 bytes (293 KB) Entire sky is stored in about 12 bytes!!!

Separated methods:

There are 2 options for writing out liquids and 2 options for writing out tiles. (each is short or byte):
Liquids alone:
liquid short alone comes out to be 86,863 bytes (84.8 KB) up to 2,097,151 per tile group
liquid byte alone comes out to be 68,807 bytes (67.1 KB) up to 8191 per tile group

Tiles alone:
tiles byte alone comes out to be 445,805 bytes (435 KB) (only 511 with byte per tile group) major performance penalty from losing those 7 bits in the liquid
tiles short alone comes out to be 227,171 bytes (221 KB) (131,071 with short per tile group)

This means that combining writing the tiles out with an additional short, and the water out with an additional byte becomes the optimal file-type when separating the two. that file would end up being 295,978 bytes (289 KB) Unfortunately, this is still larger than my current algorithm which gave a result of 266 KB. the two algorithms may scale differently when world composition is changed, but I do not believe it is more efficient, as in a large world, the majority of it will always remain in it's default state.

The performance penalty from removing the extra 7 bits for the tiles to store duplicates is just too great to make separating the two useful.
Thanks for the interesting ideas, everyone.

News: Read the RULES before posting.

Author Topic: Map storage compression techniques (Read 16823 times)

sokol815

Map storage compression techniques

Omnomnom

Re: Map storage compression techniques

sokol815

Re: Map storage compression techniques

Quendus

Re: Map storage compression techniques

sokol815

Re: Map storage compression techniques

Quendus

Re: Map storage compression techniques

sokol815

Re: Map storage compression techniques

Krice

Re: Map storage compression techniques

sokol815

Re: Map storage compression techniques

Krice

Re: Map storage compression techniques

sokol815

Re: Map storage compression techniques

Krice

Re: Map storage compression techniques

Nymphaea

Re: Map storage compression techniques

sokol815

Re: Map storage compression techniques