Splitting document storage files

Jan 9, 2015 at 2:57 PM
What are the benefits of having separate data files other than the obvious IO benefits of smaller files? As far as RaptorDB performance is concerned is it better to have say one 2 GB data file or 4 - 500Mb files?

We currently have performance issues after running a server for more than a few days and have had to schedule nightly maintenance on our RaptorDB servers to clean out our data files. We save out the most recent copy of our data objects from raptorDB to temp storage, stop RaptorDB, delete the data and view files, then restart the server and then save back the objects. Any thoughts on how splitting up the data files might stop us from having to do this?
Jan 9, 2015 at 3:21 PM
Hi Jason,

The main benefit is for smaller daily incremental backups since only the last file is being written to and the other files are read-only (more a feature for the devops and admins).

Performance wise its better to have fewer files.

I'm intrigued to know how you are using RaptorDB and what your problems are, I may be able to help and change things.
Jan 9, 2015 at 4:53 PM
Thanks for the information.

For one application, we are storing large data objects (lots of lists of sub objects in them many levels deep) and because of the "Append Only" nature of RaptorDB, resaving them multiple times causes the data files to grow large quickly. We have tried to keep saving to a minimum but are still running into performance issues after the Raptor server process has been running for more than a couple of days. The hardware of the server is pretty good plenty of cores and memory but it seems the raptor process gets to the point were it's constantly running at 50+% CPU and using a bunch of memory. It seems it's not catching up with the view rebuilds along with incoming queries, fetches, and saves. Since we started doing the nightly maintenance, as described above, we don't have any of these issues.

Unfortunately even the smaller Raptor servers after running for a period of time, maybe a week or two, seem to get slower and use more and more system resources. Again, it seems to get stuck constantly rebuilding views. For these just a simple nightly or weekly restart of the Raptor process brings them right back to normal performance. After stopping the Raptor Service in most cases we delete the views folder before restarting it. This seems to help as well and erases some of view corruption issues that crop up once it a while(i.e. showing multiple versions of same document from a query).

We understand the "Append Only" nature is what makes RaptorDB as fast as it is. The speed at fetching and saving the complex objects so fast that it beats getting the same data from SQL by a factor of 20 or more.

We know that querying is faster than fetching but in this cause, because of the complexity of nested objects, querying our defined view schemas at each level becomes inefficient and just fetching the whole object is much quicker. That said, these data objects are not your normal sized objects. We have other applications that store more basic objects and querying is much better.

All in all I really like Raptor and have been using it since V1.8, mostly with small apps as an alternative to SQL. We appreciate your dedication to the software.
Jan 9, 2015 at 5:10 PM
I have had other people use RaptorDB for storing the same entity multiple times and encounter the same problem of increasing storage size, I may change things to handle this scenario (it will require some time for a solution).

You should not be encountering view rebuilds unless there is no clean shutdowns or you change the view version numbers, check the logs to see if this is the case.

Try tweaking the Global.FreeMemoryTimerSeconds to control the GC cycle.

What is the update frequency and size of your objects?
What is the size of your mgdat files and your views?
Jan 9, 2015 at 5:36 PM
View "Rebuilds" was probably the wrong word to use. I think its the View "Re-Indexing" that is getting hung up. We know that views get corrupted without proper shutdown of the Raptor process and they need to get "Rebuilt" with a new View/Schema version.

We will try to change FreeMemoryTimer value.

The update frequency varies but averages around 15 saves an hour with about 6 - 10MB objects.

The MGDAT is around 2GB and views files are only a couple of MB.
Jan 10, 2015 at 6:41 AM
Would something like the following work for you? :
bool SetObjectKV(Guid key, object data);
object GetObjectKV(Guid key);
An optimized storage key/value store without history which will overwrite the old values for high frequency update items.

PS: The views being corrupted is something that I would want to fix as soon as possible so any details would be appreciated (from v3.1.5 the engine waits for rebuilds to finish before shutdowns so that would help).
Jan 12, 2015 at 3:21 PM
The SetObjectKV and GetObjectKV methods would be great in situations where the historic data of an object is not needed. This is the case for us in most of the implementations of RaptorDB.

We haven't had a chance to upgrade to v3.1.5(now 3.1.6) yet but will be doing so soon. The rebuilds not finishing before the shutdown in previous versions could be the source of our problems with views.

Our main concern is that over time the heavier used RaptorDB Server processes become increasing slow(hogging CPU) and requires to be restarted periodically. Updating large objects multiple times is probably the cause of it. Using SetObjectKV and GetObjectKV instead of Save might stop this from happing.

The problem is not common to all our implementations of RaptorDB, for instance we have a lightly used server that has been running for months now and is still perfectly stable.
Jan 12, 2015 at 3:28 PM
The CPU hogging is probably to do with the large object heap fragmentation and the GC being fired by RaptorDB's timer Global.FreeMemoryTimerSeconds, try increasing the timer to 30*60 seconds.

I'm chuffed that you are using RaptorDB in production like this! might I ask what kind of application you have?
Jan 12, 2015 at 4:08 PM
Oh OK, we will try increasing the GC timer. When you mentioned "tweaking" the value at first I was thinking to decrease it but that makes much more sense. Its running too often.

We started used it for a few test projects at first but made the move to a production setting this past summer. We use it in two major in house applications. The first as storage for meta data associated with our CAD data and logging changes of the CAD models themselves(PDM type functionality). The second is a storage for a dashboard type application that pulls data from multiple SQL databases in a single object store(which we store in RaptorDB).

We also use it for a few other small apps. One, for instance being a .NET MVC web app. We have found that RaptorDB plays perfectly with the MVC methodology. The controller code can just return the RaptorDB query results to right to MVC View, which makes nice thin controller methods!
Jan 24, 2015 at 7:03 AM
Hi Jason,

Checkout v3.2.0 with the new high frequency update storage file, I'm working on the docs for this version in the meantime...
Jan 26, 2015 at 2:48 PM
Thanks for the update.
Jan 30, 2015 at 6:51 AM
How is the new version?

Is it working as expected for you?
Jan 30, 2015 at 2:20 PM
We haven't had a chance to implement the new version yet but we will be working on it soon. We will let you know how it works when we get it running.
Feb 24, 2015 at 10:12 AM
Hi Jason,

Any news about testing the new version?
Feb 24, 2015 at 2:21 PM
We have are using the new version and it seems to be more stable, no issues so far with memory leaks and/or high processor use. We haven't implemented the new high frequency storage functions yet, but plan on doing so in the future.

We are still having the issues with multiple views(showing deleted versions) of the same data from queries, which seems to happen after the RaptorDB server has been running for several days. As before when we stop the server and delete the views folders and then restart, the queries start returning correct results(only showing the latest version of the data).
Feb 24, 2015 at 2:40 PM
Thanks, great to hear!

I'm investigating the query problem since someone else is also experiencing a similar issue.
Feb 27, 2015 at 11:10 AM
Checkout v3.2.6
Feb 27, 2015 at 5:23 PM
The duplicates problem seems to be fixed in v3.2.7
Feb 27, 2015 at 5:32 PM
Excellent, We'll give it a go. Thanks for your work and dedication to the software.
Mar 3, 2015 at 2:16 PM
Finally! squashed the duplicate bug issue in v3.2.8 check it out.
Mar 3, 2015 at 2:30 PM
That's great news, thanks again.