2016-10-02 09:59 pm
Entry tags:

When you're not yourself, who are you?

There is a level of pain which can turn a person into a dumb, suffering animal. In fact there is more than one such level, approached in different ways. A person can be worn down by pain into a dull, unresponsive lump - or suddenly transformed by pain into a writhing, stiff body which can only make incoherent sounds. A person was there, and then just gone. When and if the pain again recedes, a person may again inhabit that animal- but it might not be the same person.



Maybe you don't believe that this is true in general, or that it can't happen to you. It's difficult for a person, or for a personality, to imagine one's own annihilation. I have experienced this level of suffering at first-hand, and have also observed it in others.
It's not a binary phenomenon. Smaller pains can alter one's personality to a smaller extent. If you have ever spent time around someone who suffers chronic pain, or if you have suffered from chronic pain while monitoring your own behavior, you will have noticed personality changes, sometimes drastic ones.

Pain isn't the only thing which can alter one's personality (and by "pain" I mean more than just physical pain: suffering in general is personality-altering). Some personal changes affect the person (and the animal in which the person inheres) positively, and some negatively. Some changes conduce to the development of personality, and some to its disruption.

Your personality, any human personality, isn't real in the way that stones or houses or Anthropogenic Global Warming are real. A personality is real in the same sort of way that Granny Weatherwax is real. Granny Weatherwax is a fictional character in a series of fantasy novels by Terry Pratchett. She appears in many of the stories and is alluded to in many others. She is an established character who, while capable of occasionally surprising the reader, can nonetheless be depended upon to act in certain ways in certain situations.

You are a fictional character in the stories you tell yourself (and others) about the world. Your personality is the character you have chosen to inhabit. It is a work of art informed by itself in that your personality determines your ideas about what your personality should be.

Your personality is also the interface which others use to model your behavior and to determine how they will act toward you. Without personality you are just a dumb animal. The use of language implies personality: an entity without personality has nothing to say. This means that human personalities are necessary for social function. Without personality, animals can't model one anothers' behavior, and without that modeling there is no basis for interaction. If you can't predict the actions of another then there is no point in risking interaction; no common ground can be found except by accident, and all cases of resource contention can only be resolved by the most naive solution: conflict.

Some social institutions make a practice of 'tearing down' personalities in order to 'build them back up' into a more institutionally-acceptable shape. This is usually accomplished through hazing and indoctrination. This is a form of remedial social engineering in that the personalities being 'broken down' were formed by other social institutions in the first place: by families, neighborhoods, communities and so on. This can lead to difficulties if and when the people who have been 'reformed' by the former institutions return to the latter context. Remedial measures such as this are an indication of poor design, or more probably a lack of design.

The process of 'breaking down' personalities is a dangerous one in cases where the results are unpredictable. An institutional 'reform' process designed to produce tractable subjects of authority instead can produce resentment, rebellion and resistance to that authority. This can produce a wasteful cycle in which the institution escalates the abuse which is intended to produce the intended effect, while instead increasing resistance to itself. Even if the proportion of rebellious elements produced under this escalation is the same as or less than that which occurs prior to escalation, the overall numnber of rebellious elements increases, making institutional disruption more likely. In as much as social interactions depend on consent and consensus (interactions which don't depend on these things are called "fights"), these cycles erode the legitimacy of the institution within the larger context of society.

The case of unpredictable results from personality breakdown is the common case. There is an entire, burgeoning industry dedicated to repairing the personal and interpersonal breakage caused by such breakdown. If a sufficiently-sophisticated science of personality existed that it were possible to engineer the breakdown and rebuilding of personalities, it would not be necessary to so so since such engineering could be accomplished in situ without the imposition of undue suffering upon the subject.

The aim of institutional hazing and suffering aside, the effect is the destruction of personality. An animal without personality is a thing. Turning people into things is the beginning of evil. Granny would not approve.
2016-07-30 07:31 pm
Entry tags:

Talk proposal for Intro to Libreboot

Libreboot is a Free Software BIOS replacement based on coreboot. Coreboot is an Open Source project to replace BIOS. BIOS is the vendor-provided software that provides initial hardware information to the Operating System. BIOS implementations are, with a sole exception, non-Free software. BIOS is available only as a binary blob. Coreboot is Open Source software, but it contains blobs for various hardware. Libreboot respects your freedoms and contains enhancements to make it easier for newcomers to install and use.

Topics


What libreboot is


A free-software version of coreboot. It supports only hardware that can run without blobs. It lives in CMOS, initializes RAM, builds a table called coreboot-table, then launches a payload. Default payload is GRUB2. Also SeaBIOS and the linux kernel are available.

Why you should care about it



Libreboot represents one way to eliminate (or at least minimize) opaque software blobs from your computing. Blobs represent potential security vulnerabilities and other problems.

Relationship to coreboot



Deliberately non-forking. Tracks main branch of coreboot (or some branch). If you want to add hardware support to libreboot, get it into coreboot first and then Free it.
Talk will include a demonstration of installing or upgrading (depending on the state of available target hardware) libreboot using software flashing or an external programmer.
The talk also introduces SILLY, a local project to provide Libre laptops for activists and training in Libreboot and related things.
2016-07-30 04:56 pm

Proposal for Hillary Clinton: a new contract for America?

 Hillary Clinton is the Democratic Party's nominee for President. She is running against Donald Trump, who regularly (but not consistently) beats her in the polls. The outcome is uncertain, because many people who are against Trump are also against Hillary. A large subset of these people are supporters of Bernie Sanders' campaign for president, and are strongly coupled with the issues addressed in the Sanders campaign platform as promulgated at https://berniesanders.com/issues/ and the pages linked from that one.

Bernie Sanders and his campaigners have repeatedly stated that the campaign is not about Bernie Sanders but about the issues linked above. Hillary Clinton may or may not support the policies promulgated by the Sanders campaign which address these issues. There is no way for an outside observer, that is to say, anyone other than Hillary Clinton herself, to tell. 

It's reasonable to suppose that, if Hillary Clinton's position on these policies were sufficiently similar to that of the Sanders campaign's supporters, that those supporters would then support Hillary Clinton. To do otherwise would undermine the claim that the campaign itself is not about Bernie Sanders but about the relevant issues.

This is a classic game-theoretic scenario. Hillary Clinton has signalled a limited willingness to co-operate with the Left ( used herein as a catchall term for those who support the relevant issues as described above) but the Left has no way in which to gauge the probability of defection. Insufficient information exists to determine either the actual extent of the proffered cooperation or the extent to which the proffer is binding. As a result members of the class I'm calling "the Left" have no reason to co-operate with Ms Clinton by voting for or otherwise supporting her campaign.

In order for "the Left" to have sufficient information in order to make a decision to support Ms Clinton's campaign, the campaign could provide an unambiguous and binding signal of cooperation. In general such a signal could take many forms. I propose that of a contract.

The two parties in question, the Clinton campaign and the "Revolution" frequently invoked by the Sanders campaign and for which authoritative members of that campaign may be considered able to speak, could mutually and publicly negotiate a contract specifying policies which the prospective President will support to address the issues linked above, actions to be taken to implement those policies by a specific time, and penalties for failure to perform the contract's terms.

It doesn't appear that people in general are willing to trust either candidate. In the absence of trust unambiguous and verifiable signals may serve to signal willingness to co-operate. Hillary Clinton needs the co-operation of as many people as possible in order to win the Presidential race against Trump. A clear set of signals exists that could potentially win the co-operation of a very large number of people. If the Clinton campaign can secure this co-operation then victory will be much more likely than otherwise.
2016-07-30 03:46 pm
Entry tags:

HOWTO use the Force

Source Code Literacy for Padawans



Preintroduction



This article is a long-term work in progress; the reader might find it worthwhile to check back occasionally to see if anything useful has been added. I started it as a set of notes for a talk of the same name at SeaGL 2016.

Talks on this topic have been done before, no doubt better. The following list contains at least one such talk, and much further material on the subject.


    Resources
  • https://github.com/aredridel/how-to-read-code/blob/master/how-to-read-code.md

  • Blog post pretty-printed version of above: http://aredridel.dinhe.net/2015/03/29/how-to-read-source-code/

  • https://blog.codinghorror.com/learn-to-read-the-source-luke/

  • https://news.ycombinator.com/item?id=3769446

  • http://www.tutorialspoint.com/developers_best_practices/handy_tools_techniques.htm

  • http://www.gigamonkeys.com/code-reading/

  • http://himmele.blogspot.com/2012/01/how-do-you-read-source-code.html

  • http://matthieu.io/blog/2016/10/23/debugging-101/


  • Introduction



    I've been reading source code, or attempting to, for over 20 years. It's hard to get started, and I never found a good guide on how to do it, so I'm going to attempt to write one even though I'm not remotely qualified to do so. I'm writing this in the hope that it may be useful to those who haven't yet learned the tricks I know, and that others with greater knowledge of the subject, or at least additional knowledge besides that which is presented here, will speak up either in the comments here or elsewhere.



    Why is it so hard to read source code?





    Programming is an act of translation. Someone gets an idea in the meat head and then translates it into a language. Typically the idea is translated first into a human-intelligible language, and then into a machine-recognizable one.

    This is an iterative process. An author will write a fragmentary first draft of part of the program, and concurrently revise already-written parts while adding new parts. By 'concurrently' I mean 'within the space of a commit'. A commit is a discrete set of changes pushed to a version-control system. Not every programmer uses a formal VCS, but most do, and if you are reading source code of a project where VCS is not used then you won't have as much to look at.

    Because the writing of source is an iterative process, multiple versions exist of nearly every project. Typically there will be a 'release version' which is regarded as stable, and one or more 'development versions' where new features are added. At any given time there will be a particular version or branch which is most appropriate for checking-out by those who would work on a particular aspect of the program. If you are just reading the code for edification or bug-hunting, then you will want to acquire the most recent stable version, or inquire among the projects' developers as to which version is the canonical one for this purpose.

    The source code is the One True Source for documentation of the program to which it compiles. It contains machine-readable parts and human-readable parts (README files and comments, or other inline documentation). The machine-readable parts are those which the compiler takes as input, which input results in the running code. For this reason the machine-readable parts are the real documentation in that they describe in a non-ambiguous way what the software actually does, while the human-intelligible part is some human's idea of what the software is supposed to do.

    Frequently authors will skip the human-intelligible part. In my opinion this is a Bad Idea. This is because machine-recognizable languages fall into a category of languages which is more restrictive than human-intelligible languages are: any machine-recognizable language must at minimum be recursively enumerable, so that any given statement in that language can be represented by a tree whose nodes represent legal tokens in the language, and whose edges represent the operation of valid generator rules. It's very easy for humans to describe things in human languages, and difficult to describe things in machine languages. IMO it's easier to translate from human language to machine language than vice versa; this is one reason why documentation is difficult to write.

    What 'Machine Readable' means


    WIP

    What can a machine read?



    Formal Languages are sets of symbols and rules for combining those symbols into 'tokens' which a certain type of machine can recognize as belonging to the language.

    There are four known types of formal language, organized in a nested hierarchy called the Chomsky Hierarchy. The two simplest types, Regular and Context-free languages, can be recognized respectively by a finite automaton and a push-down automaton.

    Now you know as much as you did before.

    Acquiring Source Code



    If you are using Debian then acquiring the source code of any program in Debian's archive is as easy as using the `apt-get source` command, which will fetch the source code from the archive and unzip it into a child of the current directory. If you are using an other operating system than Debian you can get the source code of any Debian Package via packages.debian.net. Source code for software which is not packaged by Debian is available from other sources. The GNU project maintains source code repositories of many programs; others are available from version-control programs of various kinds provided by the authors or by others.

    git clone



    git clone is the customary way of acquiring code from GitHub and other git repositories. It downloads and unpacks the current state of the version-control tree and its branches.

    tarballs



    Lots of source code files are available for download as compressed archives with the *.tar file extension. This extension indicates that the archive is in TAR format. Use the tar -x command to extract it (see man tar for details before proceeding: tar is a very complex program with many switches.)

    Many tar files are also compressed. Gzip, bz2 and LZMA are all popular compression formats used with tarfiles. Usually the compression format will be reflected in the file extension: *.tar.gz, *.tar.bz2 and the like. Sometimes the format is not reflected in the file extension; use the file command to discover it.

    What you get



    When you download and extract a source code archive, you get a file tree. It will look something like the following imaginary ls -R output:

    sourcedir:
    README main.lang includes/ resources/

    sourcedir/includes:
    library.lib

    sourcedir/resources/:
    media.ogg

    The README file is arguably the most important, especially when it is absent.
    README files often contain build instructions, configuration instructions, and information on how to contact the developers.

    The code provided by the project is often in the root directory ('sourcedir' in the example), though sometimes it is in one or more subdirectories.

    Most non-trivial programming projects will #include or use code from other projects, in order to save time and prevent re-inventing wheels. This re-used code will often take the form of "libraries" of code snippets written expressly for re-use. For example, jQuery is a Javascript library.

    Tools



    It's dangerous to go alone. Take these.

    Shell



    Your shell is the program that intermediates your computing experience. When you are using the command line, the shell is the program that writes the prompt, receives your input, and returns the results. When you use a graphical interface, the shell receives input about program interactions from the window manager and sends data about program output to the window manager so that it can be drawn into the appropriate window.

    Shells typically implement a command scripting language, allowing the user to automate tasks by placing lists of commands and arguments in a file to be invoked as a unit.

    Text Editor



    A text editor is similar to a word processor, but different in that it is optimized for syntax transformation (for example, searching and replacing text) rather than for producing pretty-looking documents which look the same when printed out. Text editors are never WYSIWYG, because the whole point of source code is to be transformed into something else.

    Editor vs. IDE



    Some text editors are 'lightweight', in that they offer a small 'footprint' in the computer's resources and don't offer a great many features. For this reason some editors of this type might be considered useful only for working with configuration files (or as a fancy pager) and not useful for programming. This is entirely a matter of opinion (I have successfully edited source code using only less and sed), but it serves to illustrate a distinction between 'lightweight' editors and "programmer's" editors.

    A programmer's editor is one which is optimized for programming. It will offer features like syntax highlighting, code folding, or auto-completion. Emacs is an example of a "programmer's editor", although some people might consider it an IDE.

    An Integrated Development Environment or IDE is a group of programs which are designed to work together to make programming easier. Typically such a collection of programs will include an editor, an interactive interpreter, and a debugger among other components. Eclipse is an example of an IDE.

    Searching and Text Processing


    Text editors frequently offer search-and-replace facilities of varying sophistication.

    In the shell


    Grep and Regular Expressions



    Parser searching



    Understand the Model



    If source code is a representation of an idea, then it may be regarded as a model of that idea. The idea is itself frequently a model of something that exists in the "real world". For example, some computer games feature a character which the player navigates through an imaginary world. The world in these games is frequently called a "map", because that's what it is: a map is a model of a place, in this case a model of a (usually) imaginary place which may or may not contain maps of maps.

    Read the Comments



    Not here. These comments are garbage. I should know, I write them myself.

    The comments in the code are the ones you should read. Each language has its own comment syntax- here are some reasonably common ones.



    // One line comment in C, PHP, Javascript
    /* This is a
    multi-line comment
    in C, PHP, Javascript
    the extra whitespace at the beginning of lines
    is optional.
    */

    ;; One line comment in Lisp or .ini files

    # One line comment in many config files



    Comments are there to illustrate the intent of the author, to provide context for code, links to outside resources, or source code for automatically-generated documentation.

    Cheat Code: talk to the Developers



    The people best qualified to discuss the mapping between a program and its underlying model are those who wrote the program. These people may or may not have time to talk to you or to answer your questions. Attempting conversation with strangers, even over the internet, may be difficult for you, and it's possible that the developers themselves might be difficult. Fortunately, it's possible to ask without asking: lurk on project IRC channels, watch presentations, peruse maillist archives. Once you have a feel for how people on the list respond to emails, you might feel more comfortable contacting them, or you might not. Either way you will learn something.

    Know the Language


    Obviously it's easier to read the source code if you already know the language. Well-written source code is often easy to read for those who know the language. Other sources are less easy to read.

    Some languages are designed for ease of reading, and some programming styles such as Literate Programming or Document-Driven Development emphasize it.

    Some programming languages which are widely considered easy to read:


    • Python

    • Lua

    • Ruby



    Some programming languages which are considered hard to read:


    • Assembly language

    • C or C++

    • PostScript


    • My own opinions don't necessarily accord with these lists.

      Specific things to know about your language



      Constants, variables and arrays



      Each language handles these things differently. Some languages don't have variables at all, only constants. Sometimes arrays are a subtype of variable, but this is not always.

      In this article I'm using the word "array" to describe something which has different (but not unique) names in different languages, and the names mean different things depending on which language it is.

      In general there are a few different kinds of arrays. All of them represent some form of key-value tuple. "Lists" usually comprise an index and a series of values like so:


      1. one value

      2. another value

      3. still another value

      4. foo

      5. fnord

      6. bar



      Some languages start their indices at 1 like the list above, and some start at 0.
      It's very important to keep track of this in order to avoid a class of errors called "off-by-one".

      Types



      Some languages are "strongly typed", meaning that every value expressed in that language has a type, whether it's an integer, a TRUE or FALSE, a character, a string of characters or whatever. Many languages have facilities for users to define their own types. It's important to know this syntax so that you can understand the constraints that apply to things named in the program.

      Types generally constrain the type of value that can be bound to a name and define the operations that can be performed on those values. For example, it doesn't make sense to "add" one character to another; each language defines differently what "a" + "b" is equal to.
      It could be "ab" (concatenation) or it could be "undefined"/NULL/nil or whatever; it could be the ASCII values of each character added together, or something entirely unexpected depending on the language and environment.

      What to do if you don't know the language



      If you want to learn the language then perhaps https://koanhead.dreamwidth.org/2335.html can be of help.

      Quite often well-written source code is intelligible even to those who don't know the language. When it isn't, hopefully some of the pointers below can assist you.

      The trick to understanding source is in understanding the underlying model it describes, and that holds true whether you know the language or don't.

      If you are familiar with BNF or other methods for specifying context-free grammars, then can be useful to skim the formal specification of the language you are dealing with (if one exists). If you are unfamiliar with CFG then the EcmaScript 262 reference has a nice introduction.

      Language specification tend to be quite long, but they contain long stretches of formal production rules that you can skip over if they are not describing the parts of the language currently of interest to you.


      What to Look For



      Entry Points



      C programs usually have a function called `main()`. The main function is the starting point for execution, so it makes sense to start reading C programs at that function, then read the header files mentioned there (if any) and then proceed from there.

      Unfortunately, not all programming languages feature such a convenient convention. When there's no obvious starting point, one can sometimes find a useful starting point by reasoning about the context of the program. For example, PHP is a language usually used for generating Web pages. It's embeddable into the pages themselves, and the Web server interprets the embedded code 'on-the-fly'. In such a situation, if you know that your Web server will serve a page called 'index.html' by default, it makes sense to look for a file called 'index.html' or 'index.php'.

      Names



      Quite often you'll be looking in the source code for something particular, for example to answer the question, "How does this program do $this_thing?"

      A naive but effective tactic is to simply search the tree for $this_thing in order to find the places in the code where $this_thing is addressed. There are many ways to accomplish this: see How To Find What You're Looking For below.

      Kinds of names



      There are many different kinds of names, and it can be helpful to know how the language in question treats them. If your reasoning about $this_thing leads you to believe that you're looking for a function that takes no arguments, and the code is in C, then it might be worthwhile for you to search for "$this_thing()" for example.

      Rules about names



      Programming languages have rules about what constitutes a valid identifier. For example, in PHP variable names must begin with $ followed by a lower case letter, digit or underscore. A language may have different rules about different kinds of identifiers, which can help you to determine what sort of thing to which a particular name refers.

      Inclusions



      Lots of files "include" other files by some mechanism or other. Reading those included files will make it easier to understand the one you are looking at.

      Include graphs and call graphs



      An include graph is a directed graph in which the nodes represent files in the source tree and the arrows point from nodes that include others to the included ones.

      A call graph is a directed graph in which the nodes represent functions and the edges point from a calling function to the one which is called.

      Doxygen can generate both kinds of graphs.

      How to Find What you're Looking For




      Using grep


      Using a linker


      Using Doxygen


      Using Egypt


      How to Find only what you're looking for



      If only I had an answer to this.

      Examples



      The Linux kernel


      FreeSwitch

2016-07-27 04:32 pm
Entry tags:

How to learn a new programming language

Introduction



This entry is a set of preliminary notes for something more fleshed-out in the future; in particular I intend that the eventual work should contain material apropos to learning styles which are different from mine. For the time being, this is a record of my recommendations for myself (and for those who learn in the same way as I do) about how to learn a new language.

Paradigms



First of all it's important to realize that programming languages all follow a certain set of common paradigms. Here are the paradigms I know about, with poor examples:

Imperative (SQL, COBOL)
Procedural (C)
Object-oriented (Smalltalk, C++, Javascript)
Functional (Lisp, Haskell, Erlang)

Many languages, such as Python, are designed in such a way as to implement all or part of more than one paradigm. Some are explicitly called "multi-paradigm" languages, and some make you guess about it. In general, if a language doesn't specify its paradigm, then it's a multi-paradigm language (or so vaguely specified that you may wish to think twice about using it.)

Within each paradigm are certain patterns which are characteristic of languages which occupy the paradigm. For example:

Imperative: duh, iunno
Procedural: Named subroutines, flow control, iterators
Object oriented: Classes, subclasses, object instances, inheritance
Functional: Map / reduce, currying

Mathematical Patterns



Languages also are based on mathematics. In order to represent mathematics in a programming language, one must start with a theory upon which one can derive arbitrary mathematical statements. Set theory and type theory are two examples of such foundation. Most programming languages may be regarded as set-theoretical languages, particularly object-oriented ones. Others, notably Haskell, are type-theoretical languages (in particlar, Haskell uses Hindley-Milner type system, which is something I don't understand particularly well, so won't say much about it.)
As with the language's paradigm, the underlying mathematics determines certain patterns which recur in languages founded thereon.

->I should put some examples here, but I can't think of any at the moment. Derp.

A knowledge of the basic common structures in languages of certain types comes more-or-less naturally with experience, but I hope to short-cut this process for others (and for myself when I inevitably forget this information).

Programming has been said to consist of Algorithms and Data Structures.
->I can't be arsed to attribute that quote just now.

Data Structures



Simple



A variable is a key-value pair; the key is generally of a uniform type (called an "identifier" in most languages and constrained in the same way as function names, etc.) and the value is generally of a type specified by the programmer or inferred by the compiler.

Complex



Arrays

Classes and Objects

Types and Typeclasses

Functions and Functors and Monads, oh my!


Algorithms



Iterators



Flow control





idk wut



Tricks



Tutorial videos



There are ever so many programming language tutorial videos on the Web in various spots. If you can find one on the main web site for the language in question, then it's likely to be pretty good.

In my experience it's good to watch these tutorials (provided they are short) even if you can't really follow what's going on, because you will retain information you don't yet understand but which will make sense later in another context.
2016-06-22 04:48 pm

Internet of Things as cognitive namespace collision

 Recently unifex of firmwaresecurity.com sent me this link  to a tweet on InternetOfShit which refers to this link  to reddit emoting over a bad security story - a Netgear customer returned their wireless-enabled camera and later discovered that they had access to (and were receiving notifications from) the camera after it had been resold. unifex cited this as "one of the joys of lack of refurbishability of IoT". Refurbishing old computer equipment is something I'm known for, so an inability to do so is of interest to me.

I'm not so sure that this device is non-refurbishable, though. As far as I can tell it's a little box with a camera, a wifi setup, and a webserver inside. I haven't had the opportunity to crack one open to see if there's a SOIC or a JTAG interface inside. WIth a little bit of equipment (a selectable power supply, a computer with a GPIO header, and some wires and clips) one could dump the contents of the device's storage and potentially reflash it with something less malign than that provided by the vendor. It's also possible that one could re-flash it using a vendor-provided upgrade path in software, something that's present in many router boxes by the same vendor. The effort and skill required to refurbish such a device might prove beyond my poor ability, but there are many more skilled hackers than I out there. I feel sure that if someone fished one of these bad boys (and they are bad) out of the bin or received it as a donation to be refurbished and passed along, then that could be brought to pass.

The reason for this is that the device in question isn't an appliance suitable only for the task the vendor printed on its box. Like most IoT devices, set-top boxes and network equipment, this is a general-purpose computer soldered on to some basic peripherals and unleashed haphazardly upon an unsuspecting world. It's not a camera, it has a camera. It's a little computer.

IoT vendors want for their customers to think of these devices, these products, as unitary, single-purpose, single-use devices. What they are is intentionally crippled general-purpose computing devices. Vendors make more money (or think they will) selling multiple single-purpose devices than they will selling a smaller number of multi-purpose devices. This is deceitful and an act of poor faith, but it is par for the course in commerce where caveat emptor is the ruling principle (as opposed to commerce where a third-party acts as a regulator or guarantor.)

A "camera connected to the Internet" would be more like a peripheral for an existing computer inside your existing network: what we now call a webcam. You can't connect to the Internet without a computer. How would that even work? I'm sure there are wireless webcams out there, but I don't know that it's possible to manage a wifi or Bluetooth link without a computer either. For that matter, I don't know that it's possible to pull images from a CCD without a general-purpose computer, although I have taken apart a USB webcam and did not find anything inside that looked like one.

If you buy a box with a computer in it, then you are buying a computer. The computer is the boss of the device, and "general purpose computer" means that it can do anything. The usual caveat applies: If you don't own your computer, then someone else does. You can be held responsible for the things done by the computers under your control, so you had better do your best to make sure you are the one controlling them.

Vendors sell on features. It's certainly possible to sell a dingus that "does anything"- every PC is such a dingus. "Does whatever" feels like only one feature though, while "unlimited cloud storage", "integrates with IFTTT" and "will make you a sandwich" are multiple features. "Does anything including stuff you don't want" is an anti-feature, and every general-purpose computer comes with it. Vendors don't want to talk about this (or, apparently, even think about it) but customers must be aware of it or it will bite them, and the rest of us, in their personal and our collective asses.


2016-06-01 05:28 pm

Security, Privacy, Freedom: Any Turn, All Scold and Berate

The title is a lame pun on a book called Godel, Escher, Bach: An Eternal Golden Braid.
That book is  good enough that if you haven't read it, then you should quit reading this and go read it instead.

This article is about the interactions of Security, Privacy and Freedom, which is more of a tangle than a braid.
The title is also an allusion to in-group social pressure, one of the trust-enforcing mechanisms outlined in Bruce Schneier's Liars and Outliers, another book which is more worth your time than this blog is.

This article deals with abstract entities which can be people or groups thereof. Therefore I'm using the pronoun "it" to refer to these entities. In the event of the involvement of actual people, I will use more suitable pronouns. 

Security, Privacy and Freedom are all hot issues today, and they all kind of evaluate to the same issue: that of agency.

Security seeks to keep the Bad Guys (that is, other agents who you, the securing agent, want to prevent) from taking away your Good Stuff. Privacy seeks to keep certain information away from all but a certain set of eyes. Freedom means your Good Stuff is actually yours to do with as you like (within the limits of living in a society, i.e. your Freedom to swing your Stuff ends where my nose begins, et cetera.)

An infringement of any of these also robs you of agency in that choices which should be yours to make are instead made for you by others. These others may or may not have your best interests at heart. It doesn't matter if they do or don't: either you are an adult citizen entitled and required to make your own decisions, or you are a ward of such a citizen. Either way if anyone other than you or your designated guardian is making decisions about who gets to touch your stuff or your information, that entity is usurping your agency.

Freedom in the Software sense is defined by the Free Software Foundation https://www.gnu.org/philosophy/free-sw.html as software that ensures its users have the freedoms to use, modify, and distribute the software and modifications thereto. Software is data; the only thing that separates software from data the fact that a computer can execute it. Any data at all can be executed by some computer, as long as you don't care about the result: any data can be represented by a number, and all numbers are computable as long as you don't care whether they terminate or what the result is. In this sense, all data is software, and to the extent that the *user* cares whether computations involving that data terminate or what the result is, both that data and the computations must be Free. Otherwise the user's agency is compromised.

This assumes that the user of the data is the same as the owner of the data. Ownership of data (in the legal sense rather than the RBAC sense) is very tricky, so much so that I expect that it will eventually have to be abandoned as a concept that society honors. Society will have to find some other way of rewarding "creators" of patterns than enforcing their sole control of the patterns in question. Anything which can be represented by a number can be copied at a cost approaching zero as the number of copies increases, and that number can be arbitrarily large. In order to exploit such patterns at all, copies need to be made. If the user and the owner of the data are not the same entity, then there's conflict over who bears responsibility for securing all these copies. The entity bearing this responsibility must also possess sufficient agency. This means that, if the owner is responsible for securing the copies, it must compromise the Freedom of the user, because it must have the ability to effect changes in the user's software which are uncalled-for by the user. Even if the user wishes to comply with the wishes of the owner, the fact that the necessary changes are initiated by a party other than the user constitutes a compromise of the user's agency and therefore of its Freedom.

This is fundamentally a Security issue, since it's all about who gets to access this data. In some cases it will also be a Privacy issue. Every Privacy issue is also a Security issue. Privacy issues occur only when the "owner" and the user are distinct. The owner of the data wishes not to show the data to any but a whitelisted class of users. 

2016-05-31 05:58 pm
Entry tags:

Boiling Frogs?

Another project I probably won't do:

Every day there are reports of incidents which militate against individual freedom or for it. The 'Boiling Frog' analogy is a popular one for people talking about Creeping Fascism or whatever their personal police-state bugbear happens to be.

The thing about boiling is that it happens gradually and then very quickly. The temperature rise is approximately linear until the phase-change temperature approaches, at which time it slows until the average temperature of the medium nears the critical temperature. When this point is reached the temperature remains constant until the boiling material has gone.

So, if the 'boiling frog' analogy were accurate one could make a 'heat map' of such incidents with an assigned weight (perhaps based on number of reports). You wouldn't expect such a map to be very useful, necessarily: it would show a linearly increasing number of events with linearly-increasing heat, up to some point where the reports would remain constant or taper off.

In this case what's 'boiling off' is not water but human beings. As the temperature increases people become unable to tolerate the infringements of their personal agency. The ones with sufficient remaining resources leave the area controlled by the polity; the others leave the polity through marginalization, imprisonment or death.

One might hope for a better indicator of 'political temperature', so that one might be able to make an informed decision about when to leave the country or to take some other action which one imagines might preserve one's personal agency. To do so, however, is to trust one's neighbors, family, friends and other potential allies to the same polity which one has already decided is untrustworthy in one's own case. This is a clearly anti-social act, and its reputational repercussions are likely to make the contemplated action too costly for most.
koanhead: (fgsea)
2016-04-29 05:08 pm

SILLY on savannah

http://git.savannah.gnu.org/cgit/silly.git

Because https://libreboot.org/github/

Since I'm hoping that SILLY will eventually be useful to the libreboot project, I don't want to host it on github.

Since pretty much all my software is Free stuff anyway, I might wind up moving everything to Savannah. We'll see.

Callpipe has its own GitLab instance, which is nice, but I'm not sure yet whether I like gitlab. For my own purposes it would be preferable to simply use the facilities that come with git (possibly including gitweb, which iirc is part of the git source distribution) and have my repositories also on a public-facing computer.
2016-04-27 03:12 pm

Proposed method for secured sharing of personal information



Introduction






Why propose this method?


On 20150602 I attended a meeting of Seattle citizens concerned with privacy and individual-rights issues. One of the issues prominently mentioned was that, when a citizen goes to a government agency in order to apply for assistance of any sort, they are required to submit large volumes of personal information to that agency.





There are several problems with this situation:



  • Not all agencies have a published privacy policy, and even if they do there is generally no way for the client to verify that the information is handled responsibly- that is, there is no way to verify a chain of custody.

  • Frequently agencies require 're-certification' which requires the client to redundantly submit all the same information again. This is inefficient and a waste of both government workers' paid time (which feeds into the political problem of 'waste, fraud and abuse' in government agencies) and of citizens' time. The latter is an unaccounted externality, part of a large category of governmental expenses which are borne by citizens and therefore untracked. This is a waste problem of unknown, but certainly very large, magnitude.

  • When a citizen needs help from more than one agency, each agency requires more redundant paperwork rather than sharing information in a responsible fashion.

  • Since clients speak many different languages, internationalization of forms and storage methods is necessary. When each agency generates its own forms, these efforts are duplicated.




The proposed method offers these advantages:



  • Stored information is not readable by any party: compromise of the database does not automatically lead to compromise of the data in it.

  • In order to access the information, each agency must obtain permission (in the form of a digital signature) from the client and owner of the information.

  • When information is released to an agency, a record is created. If information is later found to have leaked, there is a 'chain of custody' pointing to the parties that accessed the information, providing a starting-point for investigation.




The method





The method uses the following techniques:



  • Public-Key encryption

  • A data store capable of storing records serializable to lists of key-value pairs. It can be centralized and under the control of  a responsible curator, or it may be decentralized in the form of (for example) a publicly-distributed file in any data-serialization format. The data store must be accessible to all parties involved in the data-sharing program.

  • An agent for clients' use. The clients' agent may take the form of a software program running on a computer owned by the client, or of a Web-based or other remote program running on a computer owned by a responsible party. The only sensitive piece of information is a long and memorable passphrase known only to the client. If the client forgets the passphrase then the client's data will have to be re-entered, as it cannot be retrieved.



Disambiguation Warning


In this article, the word "key" is used in two different contexts. This is an unfortunate consequence of a namespace collision in the common usage. A "cryptographic key", "public key" and "private key" are cryptographic entities. A "key" in a "key-value pair" is a tag which refers to a data value.





Initial Setup






Keys and Identity Management


Each requesting agency needs at least one cryptographic key-pair associated with it; for finer-grained control individual requesting agents could have one or more key-pairs for different uses.

In order for parties to participate in the method, each party must possess a key-pair. Requesting agency keys must be signed by a party trusted by all parties to each information-sharing transaction: in practice it will be sufficient for an authorize representative of a requesting agency to assert the agency's ownership of its private key in person for signing by the client. The actual signing can take place later, provided the client has a copy of the necessary fingerprint, but for simplicity's sake it the assertion and the signing should take place concurrently. Signing the agency's key means the client trusts that the key belongs to the agency. Those who don't trust this assertion should not  use this method.




Data Store


The data store contains a standard set of key-value pairs such that the key contents are readable but the value contents are encrypted. For example:

KeyValue
Client Name #VN8gq^V67
Address g~VWB,*8
City #?'2,@4)R



To avoid information leakage all values must be populated with non-identical data. This may be achieved by, for example, appending a random nonconsecutive integer to each NULL value before encrypting it. After encryption no value should be identifiable as to type or contents. All key contents must be identical for all records.




Usage



Data entry



  1. If the client does not already have a cryptographic key-pair of sufficient strength signed by the requesting agency


    1. If the client does not possess a key-pair, the client's agent generates one and posts it to a key-server. The client's agent stores the  private key encrypted symmetrically against a long and memorable pass-phrase known only to the client. With full Unicode support the client may compose a passphrase in any language supported by Unicode glyphs. The data-entry front-end program should provide a Character Map style interface for this purpose

    2. If the client possesses a key unsigned by the requesting agency, the agency downloads the public key, the client verifies ownership of the key and personal identity, and the agency signs the key, uploading the resulting signature.

    3. If the client possesses a key signed by the requesting agency, then proceed to step 2.


  2. The public key fingerprint is used as a unique identifier in a record in the data store (not an index field, as fingerprints are very long, so they are slow to sort.) If this identifier already exists then data is entered in the record it identifies.

  3. The client may now enter values to any key in the record identified by the client's public key.

  4. The form processor encrypts the data against the client's public key and transmits it into the database.





Data request



  1. The agency requests a list of keys from the data store, and chooses from this list the keys needed.

  2. The system retrieves the key-value pairs and encrypts them against the agency's public key. If the data store is managed by a third party then the report is signed before encryption.

  3. The system sends the resulting ciphertext to the agency for review of its request, first verifying the signature if applicable.

  4. The agency decrypts the message, resulting in a cleartext list of names with encrypted values.

  5. The agency signs this result with its private key and sends the resulting ciphertext to the client's agent encrypted to the client's public key.

  6. The client's agent verifies the signature, and if it is correct then the client receives the list of requested names and their associated encrypted values. (In this method the values are encrypted twice, but that's OK.)




Data fulfillment


The requesting agency must contact the client to 'unlock' the information. The request must contain the identifiers for the database keys relevant to the agency's information needs. 




  1. The client reviews the list of requested keys and their associated encrypted values.

  2. The client enters the passphrase, unlocking the private key.

  3. The client's agent decrypts the values using the client's private key.

  4. The client's agent removes values like "NULL{integer}" and replaces them with "NULL" or similar. 

  5. The client verifies that the requested information values correctly correspond to the associated keys.

  6. The client's agent encrypts the resulting message with the agency's public key and signs it with the client's private key.

  7. The client's agent sends the resulting message to the agency, retaining a copy of the transmitted message as part of the chain-of-custody record.

  8. If the data store is managed by a third party, then the client's agent may also send a copy of the transmitted message to the data store encrypted against its public key and signed with the client's private key. If the data store keeps this message then it serves as part of the chain-of-custody record, as it may only be decrypted by the agency's private key.




Implementation




This method automates data exchanges that would be familiar to many users of  email encryption using PGP. It might very well use SMTP for transport of messages and use existing PGP implementations integrated with current email client programs. A simple plug-in could provide the rest of the functionality that the clients' agent needs. This may not be the optimal implementation, it imposes the additional learning-curve of the email client software, but it could serve as a quick reference implementation for testing.




As with all security-sensitive applications, it's necessary that the implementation be distributed under an appropriate Free Software license such as the GNU GPL in order that interested parties may inspect the implemented code for flaws, submit corrections, and propose improvements. Many Free software components already exist which can be adapted to implement a scheme like this, for example:




Public Key Encryption



This proposal specifies an implementation of public-key encryption such as GnuPG which relies on a Web of Trust for authentication of keys, rather than a centralized PKI such as is used for HTTPS-enabled websites and the like. This is because the trust relationship between agencies and their clients is different from that between websites and visitors in the following ways. 





Websites have no expectation of meeting their visitors in person in order to verify their identity, and most websites don't particularly care about the identities of arbitrary visitors- all are usually welcome to visit the public-facing parts of the site, and parts of the site which are meant for viewing only by certain people are protected by other means, such as logins with session-tracking. Agencies will generally expect to see their clients in person at least once, and it's not unreasonable to expect that a governmental agency, or an agency which trusts the government, to accept a government-issued ID as proof of identity in order to sign a client's key.





Most websites are willing to trust a centralized authority to authenticate their key to visitors, and browser vendors have been willing to trust those authorities as well. Still there have been several compromises of Certificate Authorities, and vulnerabilities in implementations of SSL. If these vulnerabilities can be mitigated and the SPOF of the CA removed, then a level of security more appropriate to the exchange of personal information may exist. The Web of Trust is a better solution to this specific problem, even though it has its own drawbacks.



The primary drawback of the Web of Trust in this context is that one or more parties may lose control of their private keys. We can hope that agencies would prove able to properly manage their keysets given the ability to employ people to perform this management. Clients won't necessarily have this ability and must manage the keys themselves, increasing the likelihood of losing access to or experiencing compromise of a private key. This may be mitigated by having a clients' agent which does not store the private key on a local device solely, but instead distributes the key somehow. This leaves the private key open to brute force attacks and other attacks unless further steps are taken to secure the key. In this way the value of a distributed key may be outstripped by the cost of maintaining its security. For this reason the clients' agent should store the key locally, using symmetric encryption, and should run in an environment entirely under the client's control.  This would appear to be an obstacle to adoption in a world where not every potential client owns, has access to, or knows how to use his or her own computing device; however, in a world where powerful computers can be had for less than $10 all but the latter problem are easily solvable. A client who is incapable of using a self-owned computing device is likely to be one who requires assistance in dealing with agencies in any case, and hopefully an accommodation involving a trusted human agent can be made in such cases.




Client Agent



As stated above, the clients' agent is responsible for secure storage of the client's private key and for presenting a comprehensible user interface to this user. To this end the agent UI must be thoroughly internationalized so that messages appear in the user's native language and other UI elements are not hard-coded to conform to a particular cultural norm but are configurable per-locale. The agent should incorporate a text-entry interface driven by a character-map like gucharmap so that users may enter characters in their preferred language's glyphs regardless of the type of physical keyboard available (if any).





In order for this internationalization to function, the fields in the Data Store need to be specified properly.





Data Store




The data store may take the form of  a centralized database managed by database-management software such as MariaDB, or of a distributed file in a serializable data-representation format such as XML. If the latter, care needs to be taken to keep the files in sync, perhaps using a publicly-accessible version-control system such as Git. In this way the entire revision history would be stored on each server, with changes propagating across all participating servers as they are made. Such a system would constitute a visible part of the chain-of-custody record and would be very robust against data loss or corruption.


Regardless of the form it takes, the data store must be properly internationalized with Unicode support for all fields which may contain values in more than one language. For example, telephone numbers will nearly always be encoded in Arabic numerals and can safely be constrained to the appropriate type. Clients' names may contain glyphs from any language used by the agencies' served population. Note that this the system as proposed requires that the data store support all languages used by all parties that use the system.





The pitch



I could implement a system of this type myself, if anyone cared enough about it to pay me to do it. I would need about a year and about a million dollars. It's possible that someone more competent than I could implement it cheaper and quicker, but no such person is available as far as I know. Also, I could use the money.





2016-04-27 03:04 pm

Projects

Here's a list of stuff that's going on so that folk who are interested can ping me if they care about something.
Later I will add links so that this list is more nearly meaningful.
  • SILLY
  • Rail trail trip
  • FGSEA schemes
  • xlr8
  • local setup (does this ever end?)