Why propose this method?

On 20150602 I attended a meeting of Seattle citizens concerned with privacy and individual-rights issues. One of the issues prominently mentioned was that, when a citizen goes to a government agency in order to apply for assistance of any sort, they are required to submit large volumes of personal information to that agency.

There are several problems with this situation:

  • Not all agencies have a published privacy policy, and even if they do there is generally no way for the client to verify that the information is handled responsibly- that is, there is no way to verify a chain of custody.

  • Frequently agencies require 're-certification' which requires the client to redundantly submit all the same information again. This is inefficient and a waste of both government workers' paid time (which feeds into the political problem of 'waste, fraud and abuse' in government agencies) and of citizens' time. The latter is an unaccounted externality, part of a large category of governmental expenses which are borne by citizens and therefore untracked. This is a waste problem of unknown, but certainly very large, magnitude.

  • When a citizen needs help from more than one agency, each agency requires more redundant paperwork rather than sharing information in a responsible fashion.

  • Since clients speak many different languages, internationalization of forms and storage methods is necessary. When each agency generates its own forms, these efforts are duplicated.

The proposed method offers these advantages:

  • Stored information is not readable by any party: compromise of the database does not automatically lead to compromise of the data in it.

  • In order to access the information, each agency must obtain permission (in the form of a digital signature) from the client and owner of the information.

  • When information is released to an agency, a record is created. If information is later found to have leaked, there is a 'chain of custody' pointing to the parties that accessed the information, providing a starting-point for investigation.

The method

The method uses the following techniques:

  • Public-Key encryption

  • A data store capable of storing records serializable to lists of key-value pairs. It can be centralized and under the control of  a responsible curator, or it may be decentralized in the form of (for example) a publicly-distributed file in any data-serialization format. The data store must be accessible to all parties involved in the data-sharing program.

  • An agent for clients' use. The clients' agent may take the form of a software program running on a computer owned by the client, or of a Web-based or other remote program running on a computer owned by a responsible party. The only sensitive piece of information is a long and memorable passphrase known only to the client. If the client forgets the passphrase then the client's data will have to be re-entered, as it cannot be retrieved.

Disambiguation Warning

In this article, the word "key" is used in two different contexts. This is an unfortunate consequence of a namespace collision in the common usage. A "cryptographic key", "public key" and "private key" are cryptographic entities. A "key" in a "key-value pair" is a tag which refers to a data value.

Initial Setup

Keys and Identity Management

Each requesting agency needs at least one cryptographic key-pair associated with it; for finer-grained control individual requesting agents could have one or more key-pairs for different uses.

In order for parties to participate in the method, each party must possess a key-pair. Requesting agency keys must be signed by a party trusted by all parties to each information-sharing transaction: in practice it will be sufficient for an authorize representative of a requesting agency to assert the agency's ownership of its private key in person for signing by the client. The actual signing can take place later, provided the client has a copy of the necessary fingerprint, but for simplicity's sake it the assertion and the signing should take place concurrently. Signing the agency's key means the client trusts that the key belongs to the agency. Those who don't trust this assertion should not  use this method.

Data Store

The data store contains a standard set of key-value pairs such that the key contents are readable but the value contents are encrypted. For example:

Client Name #VN8gq^V67
Address g~VWB,*8
City #?'2,@4)R

To avoid information leakage all values must be populated with non-identical data. This may be achieved by, for example, appending a random nonconsecutive integer to each NULL value before encrypting it. After encryption no value should be identifiable as to type or contents. All key contents must be identical for all records.


Data entry

  1. If the client does not already have a cryptographic key-pair of sufficient strength signed by the requesting agency

    1. If the client does not possess a key-pair, the client's agent generates one and posts it to a key-server. The client's agent stores the  private key encrypted symmetrically against a long and memorable pass-phrase known only to the client. With full Unicode support the client may compose a passphrase in any language supported by Unicode glyphs. The data-entry front-end program should provide a Character Map style interface for this purpose

    2. If the client possesses a key unsigned by the requesting agency, the agency downloads the public key, the client verifies ownership of the key and personal identity, and the agency signs the key, uploading the resulting signature.

    3. If the client possesses a key signed by the requesting agency, then proceed to step 2.

  2. The public key fingerprint is used as a unique identifier in a record in the data store (not an index field, as fingerprints are very long, so they are slow to sort.) If this identifier already exists then data is entered in the record it identifies.

  3. The client may now enter values to any key in the record identified by the client's public key.

  4. The form processor encrypts the data against the client's public key and transmits it into the database.

Data request

  1. The agency requests a list of keys from the data store, and chooses from this list the keys needed.

  2. The system retrieves the key-value pairs and encrypts them against the agency's public key. If the data store is managed by a third party then the report is signed before encryption.

  3. The system sends the resulting ciphertext to the agency for review of its request, first verifying the signature if applicable.

  4. The agency decrypts the message, resulting in a cleartext list of names with encrypted values.

  5. The agency signs this result with its private key and sends the resulting ciphertext to the client's agent encrypted to the client's public key.

  6. The client's agent verifies the signature, and if it is correct then the client receives the list of requested names and their associated encrypted values. (In this method the values are encrypted twice, but that's OK.)

Data fulfillment

The requesting agency must contact the client to 'unlock' the information. The request must contain the identifiers for the database keys relevant to the agency's information needs. 

  1. The client reviews the list of requested keys and their associated encrypted values.

  2. The client enters the passphrase, unlocking the private key.

  3. The client's agent decrypts the values using the client's private key.

  4. The client's agent removes values like "NULL{integer}" and replaces them with "NULL" or similar. 

  5. The client verifies that the requested information values correctly correspond to the associated keys.

  6. The client's agent encrypts the resulting message with the agency's public key and signs it with the client's private key.

  7. The client's agent sends the resulting message to the agency, retaining a copy of the transmitted message as part of the chain-of-custody record.

  8. If the data store is managed by a third party, then the client's agent may also send a copy of the transmitted message to the data store encrypted against its public key and signed with the client's private key. If the data store keeps this message then it serves as part of the chain-of-custody record, as it may only be decrypted by the agency's private key.


This method automates data exchanges that would be familiar to many users of  email encryption using PGP. It might very well use SMTP for transport of messages and use existing PGP implementations integrated with current email client programs. A simple plug-in could provide the rest of the functionality that the clients' agent needs. This may not be the optimal implementation, it imposes the additional learning-curve of the email client software, but it could serve as a quick reference implementation for testing.

As with all security-sensitive applications, it's necessary that the implementation be distributed under an appropriate Free Software license such as the GNU GPL in order that interested parties may inspect the implemented code for flaws, submit corrections, and propose improvements. Many Free software components already exist which can be adapted to implement a scheme like this, for example:

Public Key Encryption

This proposal specifies an implementation of public-key encryption such as GnuPG which relies on a Web of Trust for authentication of keys, rather than a centralized PKI such as is used for HTTPS-enabled websites and the like. This is because the trust relationship between agencies and their clients is different from that between websites and visitors in the following ways. 

Websites have no expectation of meeting their visitors in person in order to verify their identity, and most websites don't particularly care about the identities of arbitrary visitors- all are usually welcome to visit the public-facing parts of the site, and parts of the site which are meant for viewing only by certain people are protected by other means, such as logins with session-tracking. Agencies will generally expect to see their clients in person at least once, and it's not unreasonable to expect that a governmental agency, or an agency which trusts the government, to accept a government-issued ID as proof of identity in order to sign a client's key.

Most websites are willing to trust a centralized authority to authenticate their key to visitors, and browser vendors have been willing to trust those authorities as well. Still there have been several compromises of Certificate Authorities, and vulnerabilities in implementations of SSL. If these vulnerabilities can be mitigated and the SPOF of the CA removed, then a level of security more appropriate to the exchange of personal information may exist. The Web of Trust is a better solution to this specific problem, even though it has its own drawbacks.

The primary drawback of the Web of Trust in this context is that one or more parties may lose control of their private keys. We can hope that agencies would prove able to properly manage their keysets given the ability to employ people to perform this management. Clients won't necessarily have this ability and must manage the keys themselves, increasing the likelihood of losing access to or experiencing compromise of a private key. This may be mitigated by having a clients' agent which does not store the private key on a local device solely, but instead distributes the key somehow. This leaves the private key open to brute force attacks and other attacks unless further steps are taken to secure the key. In this way the value of a distributed key may be outstripped by the cost of maintaining its security. For this reason the clients' agent should store the key locally, using symmetric encryption, and should run in an environment entirely under the client's control.  This would appear to be an obstacle to adoption in a world where not every potential client owns, has access to, or knows how to use his or her own computing device; however, in a world where powerful computers can be had for less than $10 all but the latter problem are easily solvable. A client who is incapable of using a self-owned computing device is likely to be one who requires assistance in dealing with agencies in any case, and hopefully an accommodation involving a trusted human agent can be made in such cases.

Client Agent

As stated above, the clients' agent is responsible for secure storage of the client's private key and for presenting a comprehensible user interface to this user. To this end the agent UI must be thoroughly internationalized so that messages appear in the user's native language and other UI elements are not hard-coded to conform to a particular cultural norm but are configurable per-locale. The agent should incorporate a text-entry interface driven by a character-map like gucharmap so that users may enter characters in their preferred language's glyphs regardless of the type of physical keyboard available (if any).

In order for this internationalization to function, the fields in the Data Store need to be specified properly.

Data Store

The data store may take the form of  a centralized database managed by database-management software such as MariaDB, or of a distributed file in a serializable data-representation format such as XML. If the latter, care needs to be taken to keep the files in sync, perhaps using a publicly-accessible version-control system such as Git. In this way the entire revision history would be stored on each server, with changes propagating across all participating servers as they are made. Such a system would constitute a visible part of the chain-of-custody record and would be very robust against data loss or corruption.

Regardless of the form it takes, the data store must be properly internationalized with Unicode support for all fields which may contain values in more than one language. For example, telephone numbers will nearly always be encoded in Arabic numerals and can safely be constrained to the appropriate type. Clients' names may contain glyphs from any language used by the agencies' served population. Note that this the system as proposed requires that the data store support all languages used by all parties that use the system.

The pitch

I could implement a system of this type myself, if anyone cared enough about it to pay me to do it. I would need about a year and about a million dollars. It's possible that someone more competent than I could implement it cheaper and quicker, but no such person is available as far as I know. Also, I could use the money.

Expand Cut Tags

No cut tags



October 2016


Most Popular Tags


RSS Atom

Style Credit

Page generated Oct. 20th, 2017 03:56 pm
Powered by Dreamwidth Studios