You are here: Home / weblog

Dominic Cronin's weblog

Showing blog entries tagged as: Docker

Docker from the powershell, take two

Posted by Dominic Cronin at Mar 29, 2019 02:50 PM |
Filed under: ,

Take one

Back in 2016, I posted a quick and dirty technique for parsing the output from the docker CLI utilities. I recently returned to this looking for a slightly more robust approach. Back then I'd also pointed out the existence of a PowerShell module from Microsoft that made things a bit easier, but this has now been deprecated with a recommendation to use the docker cli directly or to use Docker.DotNet. This latter is a full-blown API that might be really handy for some tasks, but not for what I had in mind. The Docker CLI, OTOH is the problem I'm trying to solve. Yes, sure, I might get further with spending more time figuring out filters and formatting, but the bottom line is that the PowerShell has made me lazy, and I plan to stay that way. A quick Google turns up any number of people hacking away at Bash to solve docker's CLI inadequacies. Whatever. This is Windows, and I expect to see a pipeline full of objects with properties. :-)

Take two

It turns out that a reasonably generic approach to parsing docker's output gives you exactly that: a pipeline full of objects. I've just added a couple of functions to my $profile, which means I can do something like

docker ps -a | parseColumns

and get the columns from ps in my pipeline as object properties. Here is the code:

function parseColumnsFromHeader ($line){
    $cols = @()
    # Lazily match chunks of text followed by at least two whitespace or a line end
    $re = [regex]"((.*?)(\s{2,}|$))*"
    $matches = $re.Match($line)
    # Group 0 is the whole match, then you count left parens, so
    # group 1 is ((.*?)(\s{2,}|$)),
    foreach ($capture in $matches.Groups[1].Captures){
        if ($capture.Length -gt 0) {
            # The captures are therefore both the chunk of text and the following whitespace
            # so the length is right, but we trim for the name
            $col = @{
                Name = $capture.Value.Trim()
                Index = $capture.Index
                Length = $capture.Length
            }
            $cols += New-Object PSObject -Property $col;
        }
    }
    return $cols
}

filter parseColumns {

    if (-not $headerDone) {
        $cols = parseColumnsFromHeader $_;
        $headerDone = $true;
    } else {
        $propertiesHash = [ordered]@{}        
        foreach($col in $cols) {                
            $propertiesHash.Add($col.Name, ([string]$_).Substring($col.Index, $col.Length))            
        }
        New-Object PSObject -Property $propertiesHash
    }

}

This works equally well for other docker commands, such as 'docker images'. The only proviso is that the output begins with a single line with column headers. As long as the headers themselves have at least two white space characters between them, and no more than one in the header text itself, it should be fine.

OK - it's fairly generic. Maybe it's better than my previous approach. That said, I'm irritated by the reliance on having to parse columns based on white space. This could break at any moment. Should I be looking for something better?

Not good enough

The thing is, I'm always on the lookout for techniques that will work everywhere. The reason I'm reasonably fluent in the vi editor today is that some years ago, I consciously chose to stop learning Emacs. This isn't a holy wars thing. Emacs was probably better. Maybe... but it wasn't everywhere. Certainly at the time, vi was available on every Unix machine in the world. (These days I have to type "apt-get update && apt-get install -y vim && vi foobar.txt" far more often than I'd like, but on those machines, there's no editor installed at all, and I understand why.)

One of the reasons I never really got along with the Powershell Module is that on any given day, I can't guarantee I'll be able to install modules on the system I'm working on. I probably can paste some code into my $profile, or perhaps even more commonly, grab a one-liner from this blog and paste it directly in my shell. But general hacks in your fingertips FTW.

A better way?

So maybe I just have to learn to love the lowest common denominator. So if I'm on a bare windows Machine doing Docker, can I get the pain threshold down far enough? Well just maybe!

If you look at the Docker CLI documentation, you'll see that pretty much every command takes a --format flag, which allows you to pass a GO template. If you want to output the results as json, it's fairly simple, and then of course, the built-in ConvertFrom-Json cmdlet will get you the rest of the way.

I'm reasonably sure that before too long I'll be typing this kind of thing instead of using the functions above:

docker ps -a --format '{{json .}}' | ConvertFrom-Json 

Using environment variables to configure the Tridion microservices

Within a day of posting this, Peter Kjaer informed me that the microservices already support environment variables, so this entire blog post is pointless. So my life just got simpler, but it cost me a blog post to find out. Oh well. I'm currently trying to decide whether to delete the post entirely or work it into something useful. In the meantime at least be aware that it's pointless! :-) Anyway - thanks Peter.

When setting up a Tridion content delivery infrastructure, one of the most important considerations is how you are going to manage all the configuration values. The microservices have configuration files that look very similar to those we're familiar with from versions of Tridion going back to R5. Fairly recently, (in 8.5, I think) they acquired a "new trick", which is that you can put replacement tokens in the files, and these will be filled in with values that you can pass as JVM parameters when starting up your java process. Here's an example taken from cd_discovery_conf.xml

<ConfigRepository ServiceUri="${discoveryurl:-http://localhost:8082/discovery.svc}"
ConnectionTimeout="10000"
    CacheEnabled="true"
    CacheExpirationDuration="600"
    ServiceMonitorPollDuration="10"
    ClientId="registration"
    ClientSecret="encrypted:HzfQh9wYwAKShDxCm4DnnBnysAz9PtbDMFXMbPszSVY="
    TokenServiceUrl="${tokenurl:-http://localhost:8082/token.svc}">

Here you can see the tokens "discoveryurl" and "tokenurl" delimited from the surrounding text with ${} and followed by default values after the :- symbol.

This is really handy if you are doing any kind of managed provisioning where the settings have to come from some external source. One word of warning, though. If you are setting up your system by hand and intending to maintain it that way, it's most likely a really bad idea to use this technique. In particular, if you are going to install the services under Windows, you'll find that the JVM parameters are stored in a deeply obscure part of the registry. More to the point, you really don't want two versions of the truth, and if you have to look every time to figure out whether tokenurl is coming from the default in your config or from deep underground, I don't hold out much hope for your continued sanity if you ever have to troubleshoot the thing.

That said, if you do want to provision these values externally, this is the way to go. Or at least, in general, it's what you want, but personally I'm not really too happy with the fact that you have to use JVM parameters for this. I've recently been setting up a dockerised system, and I found myself wishing that I could use environment variables instead. That's partly because this is a natural idiom with docker. Docker doesn't care what you run in a container, and has absolutely no notion of a JVM parameter. On the other hand, Docker knows all about environment variables, and provides full support for passing them in when you start the container. On the command line, you can do this with something like:

> docker run -it -e dbtype=MSSQL -e dbclass=com.microsoft.sqlserver.jdbc.SQLServerDataSource -e dbhost=mssql -e dbport=1433 -e dbname=Tridion_Disc
-e discoveryurl=http://localhost:8082/discovery.svc -e tokenurl=http://localhost:8082/token.svc discovery bash

I'm just illustrating how you'd pass command-line environment arguments, so don't pay too much attention to anything else here, and of course, even if you had a container that could run your service, this wouldn't work. It's not very much less ugly than constructing a huge set of command parameters for your start.sh and passing them as a command array. But bear with me; I still don't want to construct that command array, and there are nicer ways of passing in the environment variables. For example, here's how they might look in a docker-compose.yaml file (Please just assume that any YAML I post is accompanied by a ritual hawk and spit. A curse be on YAML and it's benighted followers.)

   environment: 
      - dbtype=MSSQL
      - dbclass=com.microsoft.sqlserver.jdbc.SQLServerDataSource
      - dbhost=mssql
      - dbport=1433
      - dbname=Tridion_Discovery
      - dbuser=TridionBrokerUser
      - dbpassword=Tridion1
      - discoveryurl=http://localhost:8082/discovery.svc
      - tokenurl=http://localhost:8082/token.svc

This is much more readable and manageable. In practice, rather than docker-compose, it's quite likely that you'll be using some more advanced orchestration tools, perhaps wrapped up in some nice cloudy management system. In any of these environments, you'll find good support for passing in some neatly arranged environment variables. (OK - it will probably degenerate to YAML at some point, but let's leave that aside for now.)

Out of the box, the Tridion services are started with a bash script "start.sh" that's to be found in the bin directory of your service. I didn't want to mess with this: any future updates would then be a cause for much fiddling and cursing. On top of that, I wanted something I could generically apply to all the services. My approach looks like this:

#!/bin/bash
# vim: set fileformat=unix

scriptArgs=""
tcdenvMatcher='^tcdconf_([^=]*)=(.*)'
for tcdenv in $(printenv); do
    if [[ $tcdenv =~ $tcdenvMatcher ]]; then
        scriptArgs="$scriptArgs -D${BASH_REMATCH[1]}=${BASH_REMATCH[2]}"
    fi
done

script_path="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )"
$script_path/start.sh $scriptArgs

(I'm sticking with the docker-compose example to illustrate this. In fact, with docker-compose, you'd also need to script some dependency-management between the various services, which is why you'd probably prefer to use a proper orchestration framework.)

The script is called "startFromEnv.sh". When I create my docker containers, I drop this into the bin folder right next to start.sh. When I start the container, the command becomes something like this, (but YMMV depending on how you build your images).

command: "/Discovery/bin/startFromEnv.sh"

instead of:

command: "/Discovery/bin/start.sh"

And the environment variables get some prefixes, so the relevant section of the setup looks like this:

    environment: 
      - tcdconf_dbtype=MSSQL
      - tcdconf_dbclass=com.microsoft.sqlserver.jdbc.SQLServerDataSource 
      - tcdconf_dbhost=mssql
      - tcdconf_dbport=1433
      - tcdconf_dbname=Tridion_Discovery
      - tcdconf_dbuser=TridionBrokerUser
      - tcdconf_dbpassword=Tridion1
      - tcdconf_discoveryurl=http://localhost:8082/discovery.svc
      - tcdconf_tokenurl=http://localhost:8082/token.svc

The script is written in bash, as evidenced by the hashbang line at the top. (Immediately after is a vim modeline that you can ignore or delete unless you happen to be using an editor that respects such things and you are working on a Windows system. I've left it as a reminder that the line endings in the file do need to be unix-style.)

The rest of the script simply(!) loops through the environment variables that are prefixed with "tcdconf_" and converts them to -D arguments which it then passes on to script.sh (which it looks for in the same directory as itself).

I'm still experimenting, but for now I'm assuming that this approach has improved my life. Please do let me know if it improves yours. :-)

If you think the script is ugly, apparently this is a design goal of bash, so don't worry about it. At least it's not YAML (hack, spit!)

System refresh: new architecture for www.dominic.cronin.nl

It's taken a while, and the odd skinned knuckle and a bit of cursing, but I can finally announce that this site is running on...erm.. the other server. Tada! Ta-ta-ta-diddly.... daaahhhh!!!!

Um yeah - I get it. it's not so exciting is it really? The blog's still here, and it's got more or less the same content. It doesn't look any different. Maybe it's a tiny smidgin faster, but even that's more likely to do with the fact that we switched over to an ISP that actually makes use of the glass that runs in to our meter cupboard. 

But I'm excited. Just a bit, anyway. Partly because it's taken me months. It needn't have, but it's the usual question of squeezing it into the cracks between all the other things that need to get done in life. That and the fact that I'm an utter cheapskate and I don't want to pay for anything. There's also plenty not to be excited about. As I said, the functionality is exactly as it was. The benefits I get from it are mostly about the ability to do things better going forward. 

So what have I done? Well it all started an incredibly long time ago when I started tinkering with docker. I figured that the whole containerisation technology thing had such a lot of potential that I ought at least to run docker on my own server. After all, over the years, I'd always struggled with Plone needing to have a different version of Python than the one available in the current Gentoo ebuilds. I'd attempted a couple of things, including I think an early version of what became LXC, but then along came virtualenv, which made the whole thing moot. 

Yeah, well - until I wanted to play with docker for itself. At this point, I just thought I'd install it on my server, and get going, but I immediately discovered, that the old box I was running was 32-bit, and docker is just far too hip to run on anything so old-fashioned. So I needed a new server, and once I'd realised that, that's when the whole thing started. If I was going to have a new server, why didn't I just containerise everything? It's at this point that someone inevitably chips in with a suggestion that if I weren't such a dinosaur, I'd run it on the cloud, wouldn't I? Well yes - sure! But I told you - I'm a cheapskate, and apart from that, I don't want anyone's soul-less reliability messing with my carefully constructed one-nine availability commitment. 

Actually I like cloud tech, but frankly, when you look at the micro-budget that supports this site, I'd have spent all my time searching out a super-cheap host, and even then I'd have begrudged it. So my compromise with myself was that I'd build it all very cloudy, and then the world's various public clouds would be my disaster recovery plan. And so it is. If this server dies, I can get it all up in the cloud with a fairly meagre effort. Still not going to two-nines though.

So I went down to my local high street where there's a shop run by these Indian guys. They always have a good choice of "hardly used" ex-business computers. I think I shelled out a couple of hundred Euros, and then I had something with an i5 and enough memory, and a couple of stupidly big disks to make a raid. Anyway - more than enough for a web server - which is just as well, because pretty soon it ends up just being "the server", and it'll get used for all sorts of other things. All the more reason to containerise everything. 

I got the thing home, and instead of doing what I've done many times before, and installing Gentoo linux, I poked around a bit on the Internet and found CoreOS. Gentoo is a masochist's delight. I mean - it runs like a sports car, but you have to own a set of spanners. CoreOS, on the other hand, is more or less maintenance free. It's built on Gentoo's build system, so it inherits the sports car mentality of only installing things you are going to use, but then the guys at CoreOS do that, and their idea of "things you are going to use" is basically everything that it takes to get containers up and keep them running, plus exactly nothing else. For the rest, it's designed for cloud use, so you can install it from bare metal to fully working just by writing a configuration file, and it knows how to update itself while running. (It has a separate partition for the new version, and it just switches over.) 

So with CoreOS up and running, the next thing was to convert all the moving parts over to Docker containers. As it stands now, I didn't want to change too much of the basics, so I'm running Plone on a Gentoo container. That's way too much masochism though. I'd already been thinking I'd do a fresh one with a more generic out-of-the-box OS, and I've just realised I can pull a pre-built Plone image based on Debian (or Alpine). This gets better and better. And I can run it all up side-by-side in separate containers until I'm ready to flip the switch. Just great! Hmm... maybe my grand master plan was just to get to Plone 5! 

The Gentoo container I'm using is based on one created by the Gentoo community, which you can pull from the Docker hub. Once I found this, I thought I was home and dry, but it's not really well-suited to just pulling automatically from a docker file. What they've done is to separate out the portage tree into a separate container. This is smart, because you are unlikely to want the whole of portage in your container for any given purpose that makes you want to run Gentoo. What you do instead is mount the portage data using docker's --volumes-from argument. With it mounted, you can run emerge and install whatever packages you need, and then at runtime you get to run a much slimmer system. Which is great, but it means you have to create and store your own image manually rather than using a dockerfile. (At least, that's how it ended up for a noob like me, once I realised that dockerfile doesn't have an equivalent of --volumes-from.) 

My goal was to set up CoreOs to automatically pull the docker images it needed, and run some setup commands. This meant that I'd need to have my personalised Gentoo image available somewhere. Some of the data in there was sensitive, so I went looking for a private Docker registry that I could upload it to. There are plenty of private registries, but most of them aren't free. (If you don't mind the whole world pulling your containers, then free registries abound.) I eventually found https://canister.io/, which suited my needs. That said, my needs aren't much. If I ever need an alternative to canister, I'll probably look at Google Cloud Platform, which isn't free but has a private container registry where you only pay for storage and data egress, at pretty reasonable rates. Or I could just host it myself, but that's maybe too many eggs in the same basket. 

Meanwhile, my very next step ought most probably be to get backups sorted out. The "Dockerish" way to do this is to run up yet another dedicated container to deal with just this concern. Then if I want to host it separately, and my backup approach changes, nothing else needs to. Once I have the backups sorted out, it will definitely be worth the while to tidy things up so that I really can just push to the cloud if needs be. The way it's set up now, I could be up and running again very quickly but we're probably talking hours rather than seconds. 

I'm really enjoying the flexibility that containerisation gives me, although it's definitely important to get into the right mindset. Being able to build containers that will run on a really generic platform is quite liberating.

Using the Powershell to parse columns out of strings

Posted by Dominic Cronin at Jul 30, 2016 02:15 PM |
Filed under: ,

I've been kicking the tyres on Docker, and after a fairly short while I noticed that my list of containers was getting a little full. I decided to clean up, and after a quick look at the documentation, realised that I'd first have to run "docker ps -a" to get a list of all my containers, and then filter the list to get the ones I wanted to delete. (The alternative, was to read through the list, and manually execute "docker rm" on each one that I wanted to delete, and I'm far too lazy for that.)

Here's what the output from "docker ps -a" looks like

CONTAINER ID        IMAGE                  COMMAND                  CREATED             STATUS                         PORTS               NAMES
f7a3b9bb073c        dominiccronin/gentoo   "/bin/bash"              33 minutes ago      Exited (127) 33 minutes ago                        adoring_bell
2ec710c32df0        dominiccronin/gentoo   "/bin/bash"              16 hours ago        Exited (0) About an hour ago                       hungry_pare
7805ed925e51        gentoo/portage         "sh"                     16 hours ago        Created                                            portage
43c207846b56        dominiccronin/gentoo   "/bin/bash"              16 hours ago        Exited (127) 16 hours ago                          big_goodall
bbcc2e6d87d1        dominiccronin/gentoo   "/bin/bash"              18 hours ago        Exited (0) 18 hours ago                            infallible_mayer
f710c351291d        ubuntu:14.04           "C:/Program Files/Git"   8 months ago        Created                                            hopeful_archimedes
94acf6155aba        ubuntu:14.04           "C:/Program Files/Git"   8 months ago        Created                                            drunk_mahavira
e5bf3c39aa9e        ubuntu:14.04           "C:/Program Files/Git"   8 months ago        Created                                            desperate_pasteur
22ace2ca4ba1        ubuntu                 "C:/Program Files/Git"   8 months ago        Created                                            furious_brattain
a20746611b7b        67af10dd2984           "/bin/sh -c '/usr/gam"   9 months ago        Exited (0) 9 months ago                            berserk_goodall
398be811cb6a        67af10dd2984           "/bin/sh -c '/usr/gam"   9 months ago        Exited (0) 9 months ago                            fervent_torvalds
6363467ab659        67af10dd2984           "/bin/sh -c '/usr/gam"   9 months ago        Exited (0) 9 months ago                            grave_bardeen
b21bbf5103f0        67af10dd2984           "/bin/sh -c '/usr/gam"   9 months ago        Exited (0) 9 months ago                            ecstatic_feynman
56f1700ba2ca        67af10dd2984           "/bin/sh -c '/usr/gam"   9 months ago        Exited (0) 9 months ago                            elated_elion
0d41f9675f61        docker/whalesay        "cowsay boo-boo"         9 months ago        Exited (0) 9 months ago                            hopeful_brown
7309c5215e9f        docker/whalesay        "cowsay fooobar"         9 months ago        Exited (0) 9 months ago                            berserk_payne
23c1b894cec2        docker/whalesay        "whalesay fooobar"       9 months ago        Created                                            lonely_jones6
6a8c27a31740        docker/whalesay        "cowsay boo"             9 months ago        Exited (0) 9 months ago                            mad_jones
e5ca9dec78bc        docker/whalesay        "cowsay boo"             9 months ago        Exited (0) 9 months ago                            sleepy_ardinghelli
43c4d5c7a996        hello-world            "/hello"                 9 months ago        Exited (0) 9 months ago                            cocky_khorana
cbfe9e33af32        hello-world            "/hello"                 9 months ago        Exited (0) 9 months ago                            mad_leakey

The "hello, world" examples for Docker are all based on Docker's "theme animal", which is a whale, so if I could identify all the items where the image name contained the string "whale", I'd be on to a good thing. The only problem was that when you run a docker command like this in the powershell, all you get back is a list of strings. The structure of the columns is lost. A quick google showed that there is a Powershell module that might allow me to be even more lazy in the future but the thought of not being able to do it directly from the shell irritated me. So... here goes... this is how you do it:

docker ps -a | %{,@($_ -split ' {2,}')} | ?{$_[1] -match 'whale'} | %{docker rm $_[0]}

Yes, yes, I get it. That looks like the aftermath of an explosion in the top row department of a keyboard factory, so let's take it down a bit.

The interesting part is probably the second element in the pipeline. So after "docker ps -a" has thrown a list of strings into the pipeline, the second element is where I'm deconstructing the string into its constituent columns. The '%' operator is shorthand for 'foreach', so every line will be processed by the script block between the braces, and the line itself is represented by the built-in variable '$_'. (In the third, element you can see a similar construction but with a '?', so instead of a 'foreach', it's a 'where'.)

You can use a Regex with the split operator, and here I've used ' {2,}' to indicate that if there are 2 or more spaces together, I wish to use that as a column separator. Some of the columns are free text, with spaces in them, so I'm taking this pragmatic approach to avoid matching on a single space. Of course, there will be edge cases that break this, so I heartily recommend that you test the results first before actually doing 'docker rm'. Just replace the last element with something like "%{$_[1]}".

Having got the line split into columns, the next challenge is the PowerShell itself. If you throw anything that looks like a collection into the pipeline, it will get automatically unwrapped, and each item will be processed separately in the next block. So here, I'm wrapping the split in an array expression @(), and then preceding that with a comma. The comma operator is used to join a list of items into an array. Usually, this is something like 'a','b','c' - but it works just as well with a single operand, and so ,@(...) gets us an array containing an array. Then when it gets unwrapped by the pipeline, we have just the array containing the split fields. This means that in the third pipeline element we can filter on the value of $_[1] which is the IMAGE field. The fourth element actually invokes "docker rm" using the CONTAINER ID ($_[0]).

I've used Docker as the basis for this example. Just for the record, using the Docker Powershell module I mentioned,  I managed to remove all my Ubuntu containers like this:

Get-Container | ?{$_.Image -match 'bun'} | Remove-Container

 But as, I said, I'm just using Docker as an example. This PowerShell technique will also help you in many situations where there isn't a module available for the task at hand.