Null Pointer exception

Mar 22, 2016 at 3:39 PM
Hi,

I keep getting this exception. It is not consistent - meaning it appears only 1 out of 10 runs and not while indexing the same document.

The location of the exception is in RaptorDB\Helper\WAHBitarray2.cs file, in method:
        private void Resize(int index)
        {
            if (_state == TYPE.Indexes)
                return;
            int c = index >> 5;
            c++;
            if(_uncompressed == null)
            {
                _uncompressed = new uint[c];
                return;
            }
            if (c > _uncompressed.Length)
            {
                uint[] ar = new uint[c];
                _uncompressed.CopyTo(ar, 0);
                _uncompressed = ar;
            }
        }
on line with _uncompressed.CopyTo. I checked and the _uncompressed buffer is null (even though it is verified earlier.

The call stack is:

>   RaptorDB.dll!RaptorDB.WAHBitArray.Resize(int index) Line 445    C#
    RaptorDB.dll!RaptorDB.WAHBitArray.Set(int index, bool val) Line 112 C#
    RaptorDB.dll!RaptorDB.BitmapIndex.SetDuplicate(int bitmaprecno, int record) Line 110    C#
    RaptorDB.dll!RaptorDB.IndexFile<short>.SetBitmapDuplicate(int bitmaprec, int rec) Line 103  C#
    RaptorDB.dll!RaptorDB.MGIndex<short>.Set(short key, int val) Line 161   C#
    RaptorDB.dll!RaptorDB.TypeIndexes<short>.Set(object key, int recnum) Line 22    C#
    RaptorDB.dll!RaptorDB.Views.ViewHandler.IndexRow(System.Guid docid, object[] row, int rownum) Line 1006 C#
    RaptorDB.dll!RaptorDB.Views.ViewHandler.InsertRowsWithIndexUpdate(System.Guid guid, System.Collections.Generic.List<object[]> rows) Line 953    C#
    RaptorDB.dll!RaptorDB.Views.ViewHandler.SaveAndIndex(System.Collections.Generic.Dictionary<System.Guid,System.Collections.Generic.List<object[]>> rows) Line 234    C#
    RaptorDB.dll!RaptorDB.Views.ViewHandler.Insert<CWK.Advanced.Transport.Index.IndexedData>(System.Guid guid, CWK.Advanced.Transport.Index.IndexedData doc) Line 230   C#
    RaptorDB.dll!RaptorDB.Views.ViewManager.Insert<CWK.Advanced.Transport.Index.IndexedData>(string viewname, System.Guid docid, CWK.Advanced.Transport.Index.IndexedData data) Line 78 C#
    RaptorDB.dll!RaptorDB.RaptorDB.SaveInPrimaryView<CWK.Advanced.Transport.Index.IndexedData>(string viewname, System.Guid docid, CWK.Advanced.Transport.Index.IndexedData data) Line 891  C#
    RaptorDB.dll!RaptorDB.RaptorDB.Save<CWK.Advanced.Transport.Index.IndexedData>(System.Guid docid, CWK.Advanced.Transport.Index.IndexedData data) Line 191    C#

It looks to me like this might be some kind of race condition ... any idea of a fast-fix?

Thanks,
Dragos.
Coordinator
Mar 22, 2016 at 3:47 PM
DId it happen in previous versions?
Coordinator
Mar 22, 2016 at 3:58 PM
Try putting a lock around Length set (that's the only place that doesn't have one!).
Mar 22, 2016 at 8:43 PM
In fact, I was using an older version. I upgraded, but now I ran into a new problem ... This is the data I am indexing (it used to work fine in older version):
    public class IndexData
    {
        #region ctor

        public IndexData()
        {
        }

        #endregion

        #region public properties

        public Guid ID;
        public int IdData;
        public int IdToken;
        public string LngCode;
        
        #endregion
    }

    public class IndexDataSchema : RDBSchema
    {
        public int IdData;
        public int IdToken;
        public string LngCode;
    }

    [RegisterView]
    public class IndexDataView : RaptorDB.View<IndexData>
    {
        #region internal structures

        #endregion

        #region constructor

        public IndexDataView()
        {
            this.Name = "IndexDataView";
            this.Description = "A storage for elements";
            this.isPrimaryList = true;
            this.BackgroundIndexing = Properties.Settings.Default.RAPTORDB_BACKGROUND_INDEXING;
            this.ConsistentSaveToThisView = true;
            this.Schema = typeof(IndexDataSchema);
            this.TransactionMode = false;
            this.Version = 1;
            this.DeleteBeforeInsert = false;

            this.Mapper = (api, docid, doc) =>
            {
                api.EmitObject(docid, doc);
            };
        }

        #endregion
    }
Indexing code goes smoothly and finish fast. Then, when I query for something like:
Result<IndexDataSchema> resWA = this.raptor.Query<IndexDataSchema>(x => x.IdData == someOtherIntVariable);
It crashes with a cast exception in RaptorDB\Views\ViewHandler.cs, in OutputRow, on _rowfiller:
        private bool OutputRow<T>(List<T> rows, int i)
        {
            byte[] b = _viewData.ViewReadRawBytes(i);
            if (b != null)
            {
                object o = FastCreateObject(_view.Schema);
                object[] data = (object[])fastBinaryJSON.BJSON.ToObject(b);

                try
                {
                    rows.Add((T)_rowfiller(o, data));
                }
                catch (Exception ice)
                {
                    throw;
                }

                return true;
            }
            return false;
        }
I checked data array and it seems to be ok, except the fact that instead of 4 properties (Schema object has 3 ?!) - it has 5: the guid as ID, the 3 properties and the final element a NULL. The passed type is IndexDataSchema. I am not playing with references (the indexed object is always a new instance) - to exclude corruption while indexing, I have now the guarantee that each document receives a new Guid ID. Any idea on where should I look to fix this? It would be a pity to rollback to the older version.
Coordinator
Mar 23, 2016 at 6:05 AM
Could you test from scratch with no prior data?
Mar 23, 2016 at 8:27 AM
The data was fresh - created with the new version, read with the new version. I suppose it was because I did not adapt the views code. It was still with the old code style. Anyway, I rolled back, I am using the old version - I just added the lock as you suggested. Seems to be fine. I'll try to upgrade the code later, when I'll have a little bit more time.
Coordinator
Mar 23, 2016 at 2:13 PM
The problem may be in ExtractRow(), let me know as soon as you can.
Mar 25, 2016 at 1:02 PM
The race condition stays. Ran the code on a faster machine and it happened again. Not in the same point - so normally the data input should not be the culprit for this. There are locks on WAHBitArray.Set and on WAHBitArray.Length getter and setter (just to make sure). I checked the difference between the latest version of WAHBitArray and the version I have (yes, my build is a little bit old) but they're identical. Any hints/ideas?
Coordinator
Mar 25, 2016 at 1:53 PM
Try putting a break point on the ViewHandler.ExtractRow() it should extract 3 properties and later in InsertRowsWithIndexUpdate() a guid for the doc should be added to make 4.
Mar 25, 2016 at 1:57 PM
IMO, FreeMemory should be also encapsulated within the locks. Inside the _state check - not sure if this is the correct approach, but I did it like this:
        public void FreeMemory()
        {
            if (_state == TYPE.Bitarray)
            {
                lock (_lock)
                {
                    if (_uncompressed != null)
                    {
                        Compress(_uncompressed);
                        _uncompressed = null;
                        _state = TYPE.WAH;
                    }
                }
            }
        }
The only thing is ... now it's slow, even slower than a drunk snail ...
Coordinator
Mar 25, 2016 at 2:03 PM
Does this fix your problem?
Mar 25, 2016 at 5:11 PM
Yes, there is no longer any NullPointerException. But now, I had around 3M documents I had to index and it took around 15 minutes to do this. Now, there are 2h and it finished around half.
Coordinator
Mar 25, 2016 at 5:20 PM
The extra property in the row was troubling for me, has that gone?

If the lock on the FreeMemory() fixed things I will try to make it faster.

Let me know.
Mar 25, 2016 at 5:26 PM
I put a breakpoint in ExtractRows on int colcount = _schema.Columns.Count; and the return value is 4. It has all the fields defined in the class - including docid. In InsertRowsWithIndexUpdate, the rows variable contains 4 elements, all with the values and the last one (docid) is null. I assume is kind of expected - as the ID is sent with the other parameter in InsertRowsWithIndexUpdate. Does this help you?
Coordinator
Mar 25, 2016 at 5:38 PM
Edited Mar 25, 2016 at 5:39 PM
Thanks.

Try commenting out the bmp.FreeMemory(); line in BitmapIndex.Commit() and see if you get better performance and also check the memory usage difference. (on the current build version)
Mar 26, 2016 at 5:07 PM
Edited Mar 26, 2016 at 5:11 PM
I think I might have found how to avoid this NullPointerException ... I don't really know why, but this is what I did and now it's gone.

First - the project I am working on is composed out of 2 doc-repositories:
  • the first one has around 5M records and is created once per month and then only read;
  • the second one has around 1-2M records, is created per run - in a similar regime: insert all at once, then random read - and then dropped;
I noticed that in time, if I "push" the reads with 3-4 parallel threads, the memory starts to grow, ending up with more than 10-15GB. Therefore, I thought it might be a good idea to tweak the settings for the first repo with these values:
  • FreeBitmapMemoryOnSave - true;
  • SaveIndexToDiskTimerSeconds - 10;
  • FlushStorageFileImmediately - true;
  • TaskCleanupTimerSeconds - 1;
  • SplitStorageFilesMegaBytes - 1024;
  • FreeMemoryTimerSeconds - 512;
  • MemoryLimit - 2048;
  • CompressDocumentOverKiloBytes - 512;
I made those changes after I created the first repository.

After a while, I finished the work on the second repo and started to test it. I didn't realized that those statics were "reset" after "first" read and somehow reflected to the second repository. When I realized this and cleaned up everything ... and ... it behaved nicely again!

I don't really understand what I did wrong here, but this caused to create a lot of small files on disk (this generated the slow behavior). What is not clear to me why the NullPointerException in my case, but there is no crash with the default values.

Anyway, any idea on how to tweak the settings in a safe manner to increase the read speed (and memory)?

Thanks,
Dragos.
Coordinator
Mar 26, 2016 at 7:06 PM
Hmmm, you are really pounding the engine :)

I am finalizing of a major update (added a Web Studio UI), and will post soon.

After that I will take a serious look at memory usage in the bit array as that is the major hog in the engine.
Coordinator
May 20, 2016 at 6:36 AM
Check out v3.3.4 memory usage