Q-gram Based Encrypted Codeword Dictionary for Fast Searches Over a Large Collection of Encrypted Unstructured Documents
Main Article Content
Abstract
With the advent of cloud computing, many businesses prefer to store their unstructured documents over the cloud. The preference is to store the encrypted unstructured document over the cloud for security. In most of these instances, one of the main criteria is to support fast searches without requiring any form of decryption. It is thus important to develop methods and architectures that can perform fast searches without compromising security and return the rank results for a client query. Our technique uses the enhanced version of the symmetric encryption algorithm for unstructured documents and develops a novel secure searchable hierarchical inmemory indexing scheme for each encrypted document using multiple Bloom filters and construct a dictionary over a large collection of encrypted unstructured documents. The paper also proposes a dynamic index construction method based on hierarchical in-memory index to perform fast and parallel rank searches over a large collection of encrypted unstructured documents. To the best of our knowledge, this is a novel contribution that propose methodology of constructing a dictionary using hierarchical in-memory index for performing fast and parallel rank searches over a large collection of encrypted unstructured documents. We introduce the concept of Q-gram for building the encrypted searchable index, and provide multiple Bloom filters for a given encrypted unstructured document or a chunk to build encrypted searchable indexes using separate Bloom filter for a set of bytes. Our proposed construction enables fast rank searches over encrypted unstructured documents. A detailed study of 44 billion code-words is worked out using off the shelf serves to demonstrate the effectiveness of Layer Indexing method.