• 4 min read
  • Provisioning if an often overlooked aspect when architecting a SaaS solution, but it’s one of the most important ones. How do you make sure new users and tenants can be automatically created and ready to use?

    Recently, I worked on a provisioning API for Azure SQL Managed Instance and ran into some interesting quirks.

    Database restores

    When restoring databases, SQL Managed Instance will not provide any STATS or restore completion events to the user; this is because internally, the SQL Managed Instance has a background worker that does the actual restore - this is well documented by Brent Ozar on his blog. You can see this in action by restoring a large backup (say, >2GB) and then querying active restores:

    -- Query active restores
    SELECT query = a.text, start_time, percent_complete, eta = dateadd(second,estimated_completion_time/1000, getdate())
    FROM sys.dm_exec_requests r CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) a WHERE r.command IN ('RESTORE DATABASE', 'DROP DATABASE');

    for example:

    namedatabase_idquerypercent_complete
    master1RESTORE DATABASE [9f288db4-12af-49a6-ab9d-b414afaf467a] FROM URL = N’https://STORAGEACCOUNT.blob.core.windows.net/CONTAINER/mydb.bak’ WITH STATS=10, BUFFERCOUNT=8, MAXTRANSFERSIZE=3145728, NORECOVERY, REPLACE, MOVE N’v3starterdb_data’ TO N’C:\WFRoot\DB16C.2\Fabric\work\Applications\Worker.CL_App15\work\data\9f288db4-12af-49a6-ab9d-b414afaf467a.mdf’ , [truncated for brevity]37.5055
    NULL32763RESTORE DATABASE [mydb] FROM URL = ’https://ACCOUNT.blob.core.windows.net/CONTAINER/mydb.bak‘;0

    In short, SQL MI accepts a restore command and begins a restore to a database named after a random GUID. Once completed, it renames it to the user-requested database name.

    If you need to map the physical database name to the logical (user-facing) database names, check the physical_database_name column value from sys.databases:

    SELECT name, physical_database_name FROM sys.databases

    Note that this value only becomes available after restore is complete as a result, detecting restore progress programmatically is impractical (theoretically possible if OK with throwing a bunch of string matching-kludges into the query above… but I’m not).

    Detecting restoring completion

    As a result of the physical vs logical DB name mappings, detecting database restore progress or completion is non-trivial and can be a blocker if you have downstream tasks to perform, e.g. running some T-SQL against your newly restored DB to prepare it for use.

    When a restore is completed and the physical (GUID) database is renamed to the user-requested one, it appears to be online:

    SELECT
        name,
        DATABASEPROPERTYEX(name,'Updatability') AS 'Updatability',
        DATABASEPROPERTYEX(name,'Status') AS 'Status'
    FROM dbo.sysdatabases;
    nameUpdatabilityStatus
    mydbREAD_WRITEONLINE

    However attempting to query that database (either read or write) will fail with Unable to access database 'mydb' because it lacks a quorum of nodes for high availability. Try the operation again later. So what gives?

    SQL MI’s Business Critical tier uses something similar to Always On to prevent data loss. Those availability groups are configured for you automatically - but while initial replication occurs and until quorum is established, the database is present on the master but not queriable!

    You can use the following query to determine if a DB is still under replication, and therefore ready or not on the Business Critical tier:

    -- Query synchronization state
    SELECT sdb.database_id, sdb.name, sdb.physical_database_name, t.is_ready FROM sys.databases sdb
    LEFT JOIN (
        SELECT r.database_id, is_ready = CASE WHEN COUNT(CASE WHEN r.synchronization_state = 2 THEN 1 ELSE NULL END) > 1 THEN 1 ELSE 0 END
        FROM sys.dm_hadr_database_replica_states r
        GROUP BY r.database_id
    ) t ON t.database_id = sdb.database_id
    WHERE name LIKE 'Works_tid%'

    Once is_ready=1 your database is good to go, and you can safely execute additional queries against it. Note that this is during normal operation - if you are doing something fancy like resizing your instance, the ready detection not work as expected.

    One last word of caution regarding SqlConnection and SMO

    This is not specific to SQL Managed Instance, but a word of caution for those automating query execution with the C# SqlConnection class from System.Data.SqlClient: it does not necessarily open a new physical connection (see connection pooling for details) and it can only support one command at a time.

    Attempting to re-use a SqlConnection (e.g. via multiple RestoreAsync() in SMO) before it is ready can result in very bizarre behavior (the SqlConnection closes itself and the active thread may stop executing). Best practice is:

    1. Avoid SMO altogether in favor of T-SQL, which is what SMO produces anyways
    2. Create a new SqlConnection instance for every T-SQL command you wish to execute, and close the connection after. Wrapping the creation in a using statement will do this for you.
  • 3 min read
  • I love writing tests for my code but whenever starting in a new language or framework it’s such a pain since getting the mocks and fixtures just right tends to be very language & framework specific. I searched and searched for a good pytest + flask configuration that would let me do unit and integration tests for the app and found some good pieces but nothing holistic.

    Thorough testing of Flask apps

    I wanted a pytest configuration that would give no side-effects between tests and provide a solid foundation for writing everything from unit to integration to database tests:

    • Mocked (monkey-patched) methods and classes for unit testing
    • Per-test Flask application context, letting you test things like oauth-filter-for-python-flask
    • Fast, in-memory SQLite database that is torn down between each test
    • REST client to test Flask APIs or views

    pytest configuration

    The code samples below are pretty generic but may require minor customization for your Flask application. I highly recommend you take a look at the flask-bones sample, which contains many best practices and this sample will work with it out of the box.

    It assumes the use of the following modules available via pip: pytest, pytest-flask and pytest-mock

    pytest’s conftest.py:

    import pytest
    
    from yourflaskmodule import create_app
    from yourflaskmodule.config import test_config
    from yourflaskmodule import db as _db
    
    
    @pytest.fixture(scope="session")
    def app(request):
        """Test session-wide test `Flask` application."""
        app = create_app(test_config)
        return app
    
    
    @pytest.fixture(autouse=True)
    def _setup_app_context_for_test(request, app):
        """
        Given app is session-wide, sets up a app context per test to ensure that
        app and request stack is not shared between tests.
        """
        ctx = app.app_context()
        ctx.push()
        yield  # tests will run here
        ctx.pop()
    
    
    @pytest.fixture(scope="session")
    def db(app, request):
        """Returns session-wide initialized database"""
        with app.app_context():
            _db.create_all()
            yield _db
            _db.drop_all()
    
    
    @pytest.fixture(scope="function")
    def session(app, db, request):
        """Creates a new database session for each test, rolling back changes afterwards"""
        connection = _db.engine.connect()
        transaction = connection.begin()
    
        options = dict(bind=connection, binds={})
        session = _db.create_scoped_session(options=options)
    
        _db.session = session
    
        yield session
    
        transaction.rollback()
        connection.close()
        session.remove()

    Here’s an example of a base config class with the SQLite in-memory override:

    class test_config(base_config):
        """Testing configuration options."""
        ENV_PREFIX = 'APP_'
    
        TESTING = True
        SQLALCHEMY_DATABASE_URI = 'sqlite:///memory'

    Here’s an example of a test making use of all the different features:

    import pytest
    
    try:
        from flask import _app_ctx_stack as ctx_stack
    except ImportError:
        from flask import _request_ctx_stack as ctx_stack
    
    class TestCase:
        # set a Flask app config value using a pytest mark
        @pytest.mark.options(VERIFY_IDENTITY=True)
        def test_foo(self, client, session):
            # set user identity in app context
            ctx_stack.top.claims = {'sub': 'user1', 'tid': 'expected-audience'}
    
            # mock a class
            mocked_batch_client = mocker.patch('backend_class.BackendClient')
            assert(mocked_batch_client.return_value.list.return_value = ['a', 'b'])
    
            # test a view - it uses BackendClient (mocked now)
            resp = client.get('/api/items')
            data = json.loads(resp.data)
            assert(len(data['results']) > 0)
            
            # insert data into the database - will get rolled back after test completion
            item = YourModel()
            item.foo = "bar"
            session.add(item)
            session.commit()
  • 4 min read
  • My home server is multi-homed (multiple outgoing network interfaces) which a lot of the times is more trouble than it’s worth… This time around I had a need to route a specific Docker container’s traffic through a non-default outgoing interface (i.e. an OpenVPN interface to access secured resources). Below I’ll show you how I made that happen.

    Primer on multi-homed networking

    Controlling incoming connections is generally easier than outgoing. Listening connections can be setup on all interfaces or on a specific IP, which will bind it to a specific incoming interface.

    Outgoing connections, on the other hand, are a routing decision performed by the kernel. So regardless of the incoming connection, data flows out (generally) through the default route.

    Policy-based routing customizes these routing decisions so that the route — that is, the outgoing interface — can be determined by a set of rules like source IP or marked packet by IP tables.

    A bit about Docker networking

    Docker is a marvel of technology but at times feels very user-hostile due to its rigidity - it makes a lot of assumptions and doesn’t often communicate them well in documentation.

    So, to the point: Docker supports adding multiple network interfaces to containers, great! I can have my specific container continue to join the default Docker network and talk to my other containers, and create a new network specifically for this container that maps to my target outgoing interface on the host.

    However, the user-hostility: Docker doesn’t let you customize which network is the container’s default route. Normally I wouldn’t care and we’d just use policy-based routing to fix that, but remember how it’s the kernel’s routing decision? Well containers don’t have their own kernel. Docker is all NAT magic under the hood, and the actual routing decision is done on the host.

    Turns out, you can influence the default route in a container… It’s just that Docker uses the first network added to a container as the default, and from testing it appears to also add networks to containers alphabetically. OK then.

    Putting it all together

    Our recipe will leverage three key components:

    1. A custom Docker network named such that Docker adds it to the container first, making it the default route
    2. An IP tables rule to mark packets coming out of that Docker network
    3. Policy-based routing on the host to route marked packets through the non-default interface

    Here we go:

    # create a new Docker-managed, bridged connection
    # 'avpn' because docker chooses the default route alphabetically
    DOCKER_SUBNET="172.57.0.0/16"
    docker network create --subnet=$DOCKER_SUBNET -d bridge -o com.docker.network.bridge.name=docker_vpn avpn
    
    # mark packets from the docker_vpn interface during prerouting, to destine them for non-default routing decisions
    # 0x25 is arbitrary, any hex (int mask) should work
    firewall-cmd --permanent --direct --add-rule ipv4 mangle PREROUTING 0  -i docker_vpn ! -d $DOCKER_SUBNET -j MARK --set-mark 0x25
    
    # alternatively, for regular iptables:
    #iptables -t mangle -I PREROUTING 0  -i docker_vpn ! -d $DOCKER_SUBNET -j MARK --set-mark 0x25`
    
    # create new routing table - 100 is arbitrary, any integer 1-252
    echo "100 vpn" >> /etc/iproute2/rt_tables
    
    # configure rules for when to route packets using the new table
    ip rule add from all fwmark 0x25 lookup vpn
    
    # setup a different default route on the new routing table
    # this route can differ from the normal routing table's default route
    ip route add default via 10.17.0.1 dev tun0 
    
    # connect the docker_vpn
    docker network connect docker_vpn mycontainer

    That’s it! You should now observe that outgoing traffic from mycontainer going through tun0 with gateway 10.17.0.1. To get this all setup programmatically on boot, I recommend looking into docker-compose to automatically attach the docker_vpn network and create files route-docker_vpn and rule-docker_vpn in /etc/sysconfig/network-scripts to configure the routing rules.

    Note that you may need an add-on package for NetworkManager to trigger the network-scripts - on Fedora, it’s called NetworkManager-dispatcher-routing-rules.

  • 3 min read
  • A few of my tech projects experience occasional hiccups and need to be soft-reset from my Linux host (e.g. Wi-Fi SSID routed through VPN, Windows gaming VM with hardware passthough). This was annoying as it meant having a machine nearby to SSH and execute a few simple commands — often just a systemctl restart foo. Fortunately, homebridge-cmdswitch2 can easily expose arbitrary commands as lights so I would be able to easily bounce the services via my phone.

    First, since Homebridge should be running as its own system user, we need to give it permissions to restart services (as root). We don’t want to grant services to all of /bin/systemctl, so a wrapper script will be placed at /usr/local/bin/serviceswitch to encapsulate the desired behavior. Grant the homebridge user permission to run it with sudo:

    cat  /etc/sudoers.d/homebridge-cmdswitch
    homebridge ALL = (root) NOPASSWD: /usr/local/bin/serviceswitch

    Next, let’s create that /usr/local/bin/serviceswitch script with service status, start and stop commands - using such a wrapper also has the benefit that complex checks consisting of several commands can be performed. Keep in mind these are now being run as root from Homebridge!

    #!/bin/sh
    
    if [ "$(id -u)" -ne 0 ];then
      echo "You must run this script as root."
      exit 1
    fi
    
    usage() {
      error="$1"
      if [ ! -z "$error" ];then
        echo "Error: $error"
      fi
      echo "Usage: $0 [action] [service]"
    }
    
    action="$1"
    service="$2"
    if [ -z "$action" ] || [ -z "$service" ];then
      usage
      exit 1
    fi
    
    case $action in
      start|stop|status) ;;
      *) usage "invalid action, must be one of [start, stop, status]"; exit 1;;
    esac
    
    case $service in
      vm-guests)
        [ "$action" == "start" ] && (systemctl start libvirt-guests)
        [ "$action" == "stop" ] && (systemctl stop libvirt-guests)
        [ "$action" == "status" ] && { systemctl -q is-active libvirt-guests; exit $?; }
        ;;
      fileserver)
        [ "$action" == "start" ] && (systemctl start smb;systemctl start nmb;systemctl start netatalk)
        [ "$action" == "stop" ] && (systemctl stop smb;systemctl stop nmb;systemctl stop netatalk)
        [ "$action" == "status" ] && { (systemctl -q is-active smb && systemctl -q is-active nmb && systemctl -q is-active netatalk); exit $?; }
        ;;
      web)
        [ "$action" == "start" ] && (systemctl start httpd)
        [ "$action" == "stop" ] && (systemctl stop httpd)
        [ "$action" == "status" ] && { systemctl -q is-active httpd; exit $?; }
        ;;
      *) usage "invalid service"; exit 1;;
    esac
    exit 0

    Finally, here is the relevant platform section from the homebridge config:

    {
      "platforms": [{
        "platform": "cmdSwitch2",
        "name": "Command Switch",
        "switches": [{
           "name" : "vm-guests",
            "on_cmd": "sudo /usr/local/bin/serviceswitch start vm-guests",
            "off_cmd": "sudo /usr/local/bin/serviceswitch stop vm-guests",
            "state_cmd": "sudo /usr/local/bin/serviceswitch status vm-guests",
            "polling": false,
            "interval": 5,
            "timeout": 10000
        },
        {
           "name" : "fileserver",
            "on_cmd": "sudo /usr/local/bin/serviceswitch start fileserver",
            "off_cmd": "sudo /usr/local/bin/serviceswitch stop fileserver",
            "state_cmd": "sudo /usr/local/bin/serviceswitch status fileserver",
            "polling": false,
            "interval": 5,
            "timeout": 10000
        },
        {
           "name" : "web",
            "on_cmd": "sudo /usr/local/bin/serviceswitch start web",
            "off_cmd": "sudo /usr/local/bin/serviceswitch stop web",
            "state_cmd": "sudo /usr/local/bin/serviceswitch status web",
            "polling": false,
            "interval": 5,
            "timeout": 10000
        }]
      }]
    }
  • 2 min read
  • I recently ran into an issue where a bug in one of my Docker containers would intermittently chew through CPU until restarted. I wanted Monit to automatically restart the service when it was eating CPU (which ordinarily is trivial to do), but due to the mapped volume, I only wanted it to stop & start the service if it was already running. Otherwise, Monit would proceed to start the container on boot prior to the mounted drive being present, resulting in a bunch of headaches.

    “if already running” turned out to be a little more complicated than I expected. Monit doesn’t have a good answer for this built-in, so the key is to override the start action by executing a no-op when the service isn’t running:

    only check process home-assistant MATCHING ^python.+homeassistant
       start program = "/usr/bin/docker start home-assistant"
       stop  program = "/usr/bin/docker stop home-assistant"
       if cpu usage > 13% for 3 cycles then restart
       if does not exist then exec /bin/true

    Monit considers 100% CPU usage to be full utilization on all cores, which is why you see 13% (you can also verify current service CPU usage checking the output of monit status). In my case, 13% is about 65% CPU on a single core which (over 3 minutes) I deemed enough to recognize when the bug had occurred.

    Here you see I’m also using the MATCHING syntax because the entrypoint for Docker containers may change in the future (I don’t maintain this one myself).

    The only downside to this method is that Monit will log about the service not running repeatedly until it is started. In my case, because I start docker services on boot it wasn’t an issue.