Wednesday, February 11, 2009

Develop openly to benefit upfront from your project's community

One of the many advantages of performing free software development out in the open (releasing early and often) rather than behind closed doors (releasing "when its ready") is that you get the opportunity to immediately draw upon the rich set of skills and experience held by your project's community that might otherwise remain untapped.

For example, in March last year I created the initial implementation of a QR Code encoder for BWIPP. It wasn't complete since it lacked an important optimisation known as mask selection, but it worked so I put it out into the trunk of the project and tagged a release to the applause of many expectant users.

Then during April this new code was thoroughly tested by Jean-François Barbeau of Objectif Lune, who incorporate BWIPP into their commercial offerings. Our many commercial integrators are valuable members of the community since, for the most part, they are very generous and provide exhaustive testing that results in new code becoming enterprise-class in very little time at all.

Eventually in November I added the missing mask selection code, although this was very inefficient and based on my best guess at the interpretation of ISO/IEC 18004:2006 section 6.8.2.1. This section is very ambiguous, open to a large number of interpretations and more closely resembles something you might expect to find on a single slide from a PowerPoint presentation than a part of an internationally reviewed normative specification.

A disconcerting issue was that my implementation calculated a different optimal mask pattern to the (unworked) example provided by the specification. However, all of the other Open Source implementations that I tried also disagreed with the provided example - and unfortunately they mostly differed from each other as well!

Actually, it does not really matter which mask is chosen from the point of view of the correctness of the QR Code symbol. The purpose is simply to pick the mask that reduces the number of occurrences of the undesirable features that increase the likelihood of a misread. However it's always nice to know that you are doing something right by producing the best possible output.

After pointing out this discrepancy on the project mailing list I was helpfully referred by an individual that assisted with the QR Code standardisation process to the editor of the ISO standard - and there really isn't anybody much more qualified to speak authoritatively about such subtle and detailed matters! So luckily in late December he confirmed that my initial implementation was indeed correct.

At this point we had a highly-inefficient, correct implementation. However in December, whilst my implementation was still being verified I received a welcome surprise. An expert on the PostScript language developed some highly-optimised code for the mask selection process which produced equivalent results to my own verified code in just a fraction of the runtime. The result was that we now had an implementation that was both efficient and correct.

So this was a clear example where the involvement of a community in terms of testing, influence, connections, authority and coding turned something that was "good enough" (and that I may have been content to leave as was) into something that was better than any other implementation that I have seen.

Well done guys!

Sunday, January 11, 2009

n choose r

Just for fun I cooked up a little bit of PostScript to calculate nCr that uses purely stack-based storage - quite efficient.


/nCr { % n r
2 copy sub 2 copy lt {exch} if % n r maxd mind
1 1 5 3 roll % mind j=1 v=1 n maxd
1 add -1 exch { % for i : n -> maxd+1 step -1
mul % mind j v*i
1 index 3 index le { % j > mind ?
1 index idiv exch 1 add exch % mind j+1 v/j
} if
} for
{ % mind j v
1 index 3 index gt {exit} if % j > mind ?
1 index idiv exch 1 add exch % mind j+1 v/j
} loop
exch pop exch pop % v
} bind def

Wednesday, June 25, 2008

Upgrading from PostgreSQL 7.4 to 8.1 in Debian Etch

Debian allows both versions 7.4 and 8.1 of PostgreSQL to coexist on the same system which somewhat simplifies the upgrade procedure.

Assuming that the PostgreSQL 7.4 packages are currently installed with a fairly standard configuration, we start by installing PostgresSQL 8.1.

apt-get install postgresql-8.1 postgresql-client-8.1

The newly installed system will be started on a port 5433 whilst the old system continues to run on standard port 5432. For the time being it is best to leave both instances running on these ports which will allow you to migrate the data via a simple pipe.

Amend the configuration files for the new instance by hand, based on the existing configuration and restart the instance.

/etc/init.d/postgresql-8.1 restart

Populate the new instance with the existing databases:

sudo su - postgres
/usr/lib/postgresql/7.4/bin/pg_dumpall -p 5432 | /usr/lib/postgresql/8.1/bin/psql -p 5433

Stop the old database instance and use the new client.

/etc/init.d/postgresql-7.4 stop

Switch the new instance to using standard port 5432.

vi /etc/postgresql/8.1/main/postgresql.conf [set port = 5432]
/etc/init.d/postgresql-8.1 restart

Use the new psql client and your own applications to ensure that the new installation is operating correctly and that the data import was successful.

sudo -u postgres /usr/lib/postgresql/8.1/bin/psql -p 5432

If all is well then remove the old package, maybe backing up the old data.

tar cvzf ~/oldpgdata.tgz /var/lib/postgresql/7.4/data
apt-get purge postgresql-7.4 postgresql-client-7.4

Sunday, April 13, 2008

Catching up with Barcode Writer in Pure PostScript

It has been a while since I last wrote about Barcode Writer in Pure PostScript. The project has been far from dormant in recent months so here is a chance to catch up with what's been keeping me busy in my “spare” time. (And even very busy at other times!)

It's hard to recall from memory all of the improvements that have been made to the project over the last year but thankfully the commit logs do not forget! There have been the usual bug fixes, some code optimisations and new miscellaneous features, but the main highlight has to be the inclusion of support for 2D barcodes which went in to the mix as follows:

MaxiCode (June to July 2007)

MaxiCode is an irregular matrix symbologies whose symbols consist of a hexagonal grid of dots around a bullseye finder pattern. The sequencing of these dot positions does not follow any regular pattern and so unfortunately the mapping matrix must be hard-coded into the software. MaxiCode also has various different "modes" of operation, some of which impose a strict format on the initial part of the data which makes the input encoding quite complicated.

PDF417 (Boxing Day to New Year's Day)

Technically speaking, you might refer to PDF417 as a "stacked-linear" symbology, however BWIPP renders it using a grid of tall, rectangular cells. The worst thing about this symbology is that it requires a set of lookup tables that contain the "cluster sets" - three groups of 930 numbers used to convert from codewords to bar/space widths. The sequencing of the numbers within these sets appears to be quite random (if you know otherwise then please let me know) and so they must be hard-coded into the software which leads to a lot of uninteresting code – ouch!

Data Matrix (early- to mid-January)

The is a matrix symbology that can be rendered using a grid of squares through which the data zig-zags in eight-module, L-shaped clusters. Whilst the ordering of the modules within the grid is reasonably complicated, it can nevertheless be determined algorithmically for only a small amount of computational cost and requires only some minor tweaking to fix up the corner cases for matrices that do not contain some multiple of eight modules. So overall this symbology can be coded very nicely. We can generate both the standard square symbols types as well as the optional rectangular symbols.

Aztec Code (early- to mid-February)

This is a matrix symbology that can be rendered using a grid of squares with the data wrapping clockwise in two-module wide layers around a square finder pattern in the centre of the symbol. Whilst there are a few different types of symbols it is possible to fold the implementation for each of these into a single relatively sophisticated but direct algorithm that does containing excessive branching. So again, this symbology can be coded quite cleanly.

QR Code (February to late-March)

This is a matrix symbology that can be rendered using a grid of squares with the data vertically meandering in two-modules columns from right to left. With respect to implementation this symbology is quite hideous with its one saving grace being that is does not require the inclusion of hard-coded lookup tables for module placement. Firstly, in certain symbols the final data codeword is defined to be four bits wide rather than the usual eight which results in an awkward bit shift having to be applied to the trailing codewords in order to avoid propagating the exceptional processing required for these specials cases throughout the remainder of symbol generation process. Secondly, the "drunken walk" algorithm for placing the modules within the symbol (whilst avoiding the pre-defined static feature placeholders) has an unexplained inconsistency in the way that you perform the hop over the vertical timing pattern. Thirdly, the format and version information functions are unnecessarily complicated, however since their domain is very small it is possible to use a small set of pre-calculated lookup tables for these in order to avoiding using a significant amount of complex code. But finally, the worst aspect of this symbology is the optional, but recommended, process of apply eight distinct mask patterns to the candidate symbol in turn and then to evaluate these in order to select the one that would produce an output symbol with the fewest undesirable properties. To perform the evaluation algorithm as given by the specification turns out to be significantly more operationally expensive that the entire remainder of the symbol generation process! So for the time being we always select one particular mask.

Going Forward

So, we presently support all major 2D barcode formats, but with one major caveat - the user (or application developer) that is working with BWIPP has to do some preparative work to process the barcode data into the particular intermediate format required by each encoder for which they require the corresponding specification. This is a small task compared to the sometimes sophisticated numeric manipulation involved in the remainder of the symbol generation process. However it does involve extensive string manipulation which is a task for which PostScript is definitely not well suited whilst purpose-built application development languages (such as Perl and C++) have much better support for this task either natively or through libraries.

So the next major set of challenges on the BWIPP roadmap is to integrate the high-level encoding routine for each 2D symbology that convert from a user-supplied ASCII string to the intermediate format that is required by the encoders at present. The result will be that the novice user can simply enter the data that they require to place into a barcode, with only the minimal restrictions as necessarily imposed by each symbology, and our code will create the most optimal encoding that produces the best symbol for the given data, thereby making the system much easier to use for the uninitiated user.

Lastly, but by no means least, an extremely useful component in the implementation of support for 2D barcode generation has been the extensive testing performed by Jean-François Barbeau. He has helped detect and fix a number of bugs, some obvious, and some much more subtle so that we can place much greater confidence in the correctness of the output – so a big thank you on behalf of the PostScript barcoding community!

Saturday, January 12, 2008

Toying with Barcodes - a talk from 24C3

Interesting and entertaining talk by FX of Phenoelit from 24C3 about hacking real-world systems that (mis)use barcodes. Doesn't mention BWIPP though :-P