One of the best things about not being a web developer is that I don’t have to deal with web stuff. Open source web development is just endlessly frustrating to me. It drives me bonkers. Every single time I think I have a problem solved, I am just confronted with another problem that takes some random time to solve—anywhere from a few minutes of googling to hours and even days. I suppose web developers get through this either with lots of googling or with help from co-workers and then have a daily flow with their technology. They certainly aren’t getting through these problems because of good documentation—there ain’t none. But as a person who just creates my own website, I go through all that front end pain for one site and can’t see it as a long term investment. It’s just pain. I don’t pore over all the source code to see what’s going on so that the virtually meaningless error messages make some sense—that is when there even is an error message.

Argh! I hate it.

I use Jekyll for this blog. I wanted the security of a static HTML site and Wordpress seems to have issues. So I started my Oregon Trail-ish journey by creating a VirtualBox Mint virtual machine and installing Jekyll.

The reason I chose Jekyll was that I wanted to create custom pygments syntax colorizers. I started out knowing that I’d have to learn a little python for that and that’s fine. It’s nice to add a language to my repertoire. I actually looked forward to that. I realized that there would be a little ruby in there too. Fine, python and ruby.

This document will discuss some of my frustrations.

Version Hell

First, there is an issue of versions. Things are constantly being improved. Versions change and compatibility falls apart. When you google things, you might find you are getting answers that apply to previous versions. I got the latest Mint ISO for my virtual machine. Shouldn’t it have the latest stable versions of stuff?

Turns out, maybe not. I had ruby 1.9.1 to 2.0 problems early on. I didn’t find out because of a descriptive error message. I learned that because of some meaningless error message that I had to google.

So, install ruby 2.0. Done. But it doesn’t use 2.0. I tried to change the link, but that didn’t seem to work. I uninstalled 1.9.1. Or I tried at least. It was a no go. I reverted to a previous VM snapshot and then let google came up with the answer. I need something called rvm which stands for, I presume, ruby version manager.

OK, fair enough. Let them co-exist and tell the shell with rvm which one to use.

I typed in the rvm command. Not installed. So I did apt-get install rvm. Not there. No suggestions as to where I might find it. What? Why isn’t it available though apt-get? It seems like it’s sort of important given the issues with versioning and the fact that Mint ships with an old version.

OK, so how to install? I had to use something called curl. So I tried that. Didn’t work. Had to apt-get install curl. Finally, I got rvm.

Now after running “source {path to rvm}” I was able to use rvm to set ruby to 2.0.

Whew!

Ruby Gem Hell

Now I can get a minimal website up with Jekyll. Yay!

So, onto pygments—the reason I’m using Jekyll.

But first, I need to learn enough python. Python evangelists tout python as super easy. Their argument seems to be that you don’t need to go through the drudgery of typing curly brackets like in C#. At first python seems easy. But then you start finding you need all of these things that are prefixed and suffixed with two underscores. And without these, you’re pretty relegated to doing not so much that’s useful. So fewer curly brackets, but more underscores. I’d say it’s a wash. Also, I had to learn tuples which is a new concept for me. Not hard, but new.

The class I needed to override to make my own syntax highlighters was pretty straightforward. I made a new Z-80 assembly colorizer class from their regex class, an override for the CPP colorizer that puts in Arduino stuff, and one for the C from the SDCC toolchain that lets me dip into the Z-80 assembly colorizer inside the asm().

And it even seemed to work for some of the files. Needs a little refinement as I write this, but it’s reasonably good enough…

So, how to use it… I thought all I had to do was make sure that it’s executable, pygmentize was available. So I made a symbolic link to my pygmentize and ran Jekyll with my code with the highlighter markdowns.

Nothing doing. It did not work. Grrrr…. It’s like Jekyll uses its own pygmentize. So I did some googling. And sure enough. It uses a pygments gem. Oh great. now I gotta figure out gems…..

OK, I can do this… So I googled a little and found that I could use mercurial to get the source code. Easy enough—once I apt-get installed hg. Now I had the source of the ruby pygments gem.

I looked through the source and there was a directory for custom lexers. Yay!‼ I put my custom lexer in there and ran the make mapfiles. And then did the gem install. Then Jekyll build—and crash. My lexers were not recognized. Obviously I did something wrong. I moved my file from the custom lexer folder to the lexers folder where I probably should have placed it first. Now it was picked up.

But, my lexer imports a class called words from the lexer module. And the lexer module doesn’t have words. Ugh! Version hell again. I started down the road of changing my code. But then I decided I should try to change the pygments code. So, I swapped the pygments gem copy of pygments with my later pygments.

Then, I used the gem build or whatever and it told me that all these files weren’t files. Huh? Yes they are…. Aren’t they? I poked around the code to see where it was getting this file list. Turns out, it was getting it from git. I can’t just place a file there. I have to place it and check it in. OK, easy enough.

Now, the gem built. And Jekyll used my custom pygments classes‼! Yay‼!

Hurdle after hurdle… But I’m getting through them. People actually enjoy this kind of developing? Ugh! Give me embedded systems.

BOM and Hex Editor Hell

So, I started making some stuff I had written earlier into content. Of course, it used Unicode symbols, and I was using Windows to originally write that stuff, so I copied it over to Notepad and saved as Unicode.

But when I Jekylled it, the yaml front matter wasn’t being processed. Because the extension is .html instead of .md? Nope. .markdown? Nope. I was stumped. I noticed a warning in the Jekyll frontmatter page that reads:

UTF-8 Character Encoding Warning
If you use UTF-8 encoding, make sure that no BOM header characters exist in
your files or very, very bad things will happen to Jekyll. This is especially
relevant if you’re running Jekyll on Windows.

Other than my YAML front matter not being recognized, I wouldn’t say that “very very bad things happened to Jekyll”. What is a “very, very bad thing” anyway? Do I have a BOM header? What is BOM besides bill of material, anyway? So I googled. It’s a byte order marker. And Windows apparently inserts it at the beginning of some files. Did it put one in my files?

Well, let’s see… certainly there must be some nice useful hex editor. So I found hexedit. I think I had to apt-get install it. I fired it up. Yep, I guess I had a BOM. Now to delete it. Certainly hexedit must let me delete two bytes…… You’d think…. Wouldn’t you? Besides being pretty ugly, hexedit didn’t seem to have a way to delete. The man page was little help. I googled and found a man page online that lied about commands (i.e. described a different version)…. So I googled good hex editors for linux. Someone highly recommended one called ht. So I apt-get installed ht. Man oh man, I didn’t think I could see a hex editor worse than hexedit. But ht was. Finally, I apt-get installed ghex thinking anything starting with a g will at least let me do what everyone should be able to do with a hex editor—other than just view the damn file.

ghex allowed me to delete the BOMs. Whew‼! Why was that so fricking hard? How dare a program call itself a hex “editor” if you can’t edit the hex!

Anyway, Jekyll serve served up…unprocessed yaml front matter again. Is it because it’s Unicode? I made a test file in ASCII and its front matter was processed.

So, I suspected the file was not being interpreted because it’s Unicode instead of UTF-8. I looked to no avail for a Unicode to UTF-8 converter. Nothing seemed to exist. So I resaved them in Windows Notepad as UTF-8 and removed the BOMs again.

And….it worked‼‼ Yay‼! Not in hell right now—maybe purgatory.

Now that I actually got it working, I just have to work on the content, get the kinks in the pygmentizer ironed out, and put this site up…..

pygments custom lexers

Oh, and just in case you’re interested, here is my latest set of custom lexers:

  1 # -*- coding: utf-8 -*-
  2 """
  3     pygments.lexers.c_cpp
  4     ~~~~~~~~~~~~~~~~~~~~~
  5 
  6     Lexers for C/C++ languages.
  7 
  8     :copyright: Copyright 2006-2015 by the Pygments team, see AUTHORS.
  9     :license: BSD, see LICENSE for details.
 10 """
 11 
 12 import re
 13 
 14 from pygments.lexer import RegexLexer, include, bygroups, using, \
 15     this, inherit, default, words
 16 from pygments.util import get_bool_opt
 17 from pygments.token import Text, Comment, Operator, Keyword, Name, String, \
 18     Number, Punctuation, Error
 19 from pygments.lexers.c_cpp import CLexer, CppLexer
 20 
 21 __all__ = ['CSdccZ80Lexer', 'CppArduinoLexer', 'AsmSdccZ80Lexer' ]
 22 
 23 class AsmSdccZ80Lexer(RegexLexer):
 24     name = 'AsmSdccZ80'
 25 
 26     aliases = ['asm-sdcc-z80']
 27 
 28     tokens = {
 29         'whitespace' : [
 30             (r'[ \t]+', Text)
 31             ],
 32         'comment' : [
 33             ( r'[^\n]*', Comment, '#pop')
 34         ],
 35         'root' : [
 36             include('whitespace'),
 37             (r';', Comment, 'comment'),
 38             (r'\n', Text),
 39             (r'\.', Comment.Preproc, 'directive'),
 40             (r'^([A-Za-z1-9_$][A-Za-z0-9_$]*)(::?)', bygroups(Name.Label, Punctuation)),
 41             (words((
 42                     'adc', 'add', 'and', 'bit', 'call', 'ccf', 'cp', 'cpd', 'cpdr',
 43                     'cpi', 'cpir', 'daa', 'dec', 'di', 'djnz', 'ei', 'ex', 'exx', 'halt',
 44                     'im', 'in', 'inc', 'ind', 'indr', 'ini', 'inir', 'jp', 'jr', 'ld',
 45                     'ldd', 'lddr', 'ldi', 'ldir', 'neg', 'nop', 'or', 'out', 'outd',
 46                     'otdr', 'outi', 'otir', 'pop', 'push', 'res', 'ret', 'reti', 'retn',
 47                     'rla', 'rl', 'rlca', 'rlc', 'rld', 'rra', 'rr', 'rrca', 'rrc',
 48                     'rrd', 'rst', 'sbc', 'scf', 'set', 'sla', 'sra', 'sll', 'srl', 'sub', 'xor'), suffix=r'\b'),
 49                 Operator, 'args')
 50             ],
 51         'directive' : [
 52             include('whitespace'),
 53             (r';', Comment, 'comment'),
 54             (r'\n', Text, '#pop'),
 55             (r'[^;\n]+', Comment.Preproc )
 56             ],
 57         'args' : [
 58             include('whitespace'),
 59             (r';', Comment, 'comment'),
 60             ( r'\n', Text, '#pop'),
 61             ( r',', Punctuation),
 62             ( r'[~+-]', Operator),
 63             ( words(('a', 'b', 'c', 'd', 'e', 'h', 'l', 'i', 'r', 'af', 'bc', 'de', 'hl', 'ix', 'iy', 'sp'), suffix=r'\b'), Name.Variable),
 64             ( words(('z', 'nz', 'c', 'nc', 'pe', 'po', 'p', 'n'), suffix=r'\b'), Name.Property),
 65             ( r'\'', Name),
 66             ( r'[A-Za-z1-9_][A-Za-z0-9\$_]*[:]*', Name.Label),
 67             ( r'\(', Punctuation, 'z80Reg16'),
 68             ( r'#?((0x[0-9A-Fa-f]+)|([0-9]+))', Number),
 69             ( r'([A-Za-z_$][A-Za-z_$0-9]*)', Name.Label)
 70         ],
 71         'z80Reg16' : [
 72             (words(('af', 'bc', 'de', 'hl', 'ix', 'iy', 'sp')), Name),
 73             ( r'#?((0x[0-9A-Fa-f]+)|([0-9]+))', Number),
 74             ( r'#?[A-Za-z_$][A-Za-z_$0-9]*[:]*', Name.Label),
 75             ( r'\)', Punctuation, '#pop')
 76         ]
 77     }
 78 
 79 
 80 class CSdccZ80Lexer(CLexer):
 81     name = 'C-Sdcc-Z80'
 82 
 83     aliases = ['c-sdcc-z80']
 84 
 85     tokens = {
 86         'root' : [
 87             (r'Z80_IO_PORT\b', Keyword, 'macro'),
 88             ( words(('naked', 'critical', 'interrupt'), prefix=r'__'), Keyword),
 89             inherit
 90         ],
 91         'statements' : [
 92             ( words(('naked', 'critical', 'interrupt'), prefix=r'__'), Keyword),
 93             ( r'__asm__', Keyword, 'asm' ),
 94             inherit
 95         ],
 96         'asm' : [
 97             ( r'[(]', Punctuation, 'asmstr' ),
 98             include('whitespace')
 99         ],
100         'asmstr' : [
101            ( r'[)]', Punctuation, '#pop:2' ),
102             include('whitespace'),
103             ( r'(")([^"]*)(")', bygroups(using(this), using(AsmSdccZ80Lexer), using(this)))
104         ]
105     }
106 
107 
108 class CppArduinoLexer(CppLexer):
109     name = 'Cpp-Arduino'
110 
111     aliases = ['cpp-arduino']
112 
113     tokens = {
114         'statements' : [
115             (words(('HIGH', 'LOW', 'INPUT', 'OUTPUT', 'INPUT_PULLUP', 'LED_BUILTIN',
116             'pinMode', 'digitalWrite', 'digitalRead',
117             'analogReference', 'analogRead', 'analogWrite',
118             'tone', 'noTone',
119             'shiftOut', 'shiftIn', 'pulseIn',
120             'millis', 'micros', 'delay', 'delayMicroseconds',
121             'min', 'max',
122             'abs', 'constrain', 'map', 'pow', 'sqrt',
123             'sin', 'cos', 'tan',
124             'isAlphaNumeric', 'isAlpha', 'isAscii', 'isWhitespace',
125             'isControl', 'isDigit', 'isGraph', 'isLowerCase',
126             'isPrintable', 'isPunct', 'isSpace', 'isUpperCase',
127             'isHexadecimalDigit',
128             'randomSeed', 'random',
129             'lowByte', 'highByte', 'bitRead', 'bitWrite', 'bitSet', 'bitClear', 'bit',
130             'attachInterrupt', 'detachInterrupt', 'interrupts', 'noInterrupts',
131             'Serial',
132             'Stream',
133             'Keyboard',
134             'Mouse'), suffix=r'\b'), Keyword.Pseudo),
135             inherit
136         ]
137     }