Za - Reference Page

Introduction

Za is an interpreted language designed to interoperate with the shell in an efficient manner and to perform common system tasks. Some general use may also be found for other scripting tasks. The goal of the design is to bring together a variety of common tasks for which a shell language would normally make external calls and to provide cleaner, more maintainable scripts.

Overview

Both integer and floating point operations are supported within the language as well as some other basic types such as booleans and strings. Type operations are dynamic and type safety can generally be enforced at run-time, with a few minor exceptions. Some efforts have been made to allow for easy interaction between the language and the traditional shell environment.

The syntax of Za allows for largely unmodified shell calls to be made by the addition of a single character to execute normal shell commands: |   

In order to assign the results of such commands to Za variables a two character operator is used: =|

Assignation with =| results in an assigned structure with the following fields: .out, .err, .code and .okay These fields contain, respectively, the command's standard output, standard error channel output, the command's status code and a generic status boolean flag indicating success or failure (i.e. was the status code non-zero).

Additionally, there are some other syntax forms available for command execution, which will be detailed later.

Similarly, the method for passing information to shell commands is by the substitution of Za variables into the commands using the syntax {variable}

You may also use the form {=expr} for the substitution of expressions. This includes user-defined and Za library functions. When a Za variable does not exist or a Za expression cannot be evaluated on inspection of an {expr} element, then the phrase with the braces is treated literally.

This form should also be used for accessing structure fields. When a calculated substitution occurs outside of a command or string literal then the substituted value retains the type of the expression’s evaluation.

As a contrived example:

        dn=1
        path="~"
        files = {find {path} -maxdepth {dn}}
        foreach fn in files
          s={stat -c '%a %s' {fn}}
          list=list.append("{fn} {s}")
          table+="{fn} {s}\n"
        endfor
        println "Entry count : {=list.count}" 
        foreach e in list 
          println "List #{key_e} - {e}" 
        endfor
        sizes=table.col(3," ")
        t=sizes.list_float.sum
        println "column 3 (sizes) : ",sizes
        println format("total size = %10.0f",t)
    

Example output:

        Entry count : 27
        List [0] : . 755 4096 
        List [1] : ./initrd.img.old 777 31 
        ... cut for brevity ...
        List [25] : ./initrd.img 777 30 
        List [26] : ./home 755 4096
        column 3 (sizes) : [4096 31 4096 12288 4096 4096 4096 4096 16384 0 0 27 4096 2980 4096 4096
        4294967296 4096 4096 4096 4096 28 4096 1140 4096 30 4096]
        total size = 4295065740
    

The language is not intended to out-perform other languages nor to replace them. It may be seen as an attempt to clean up the syntax of common scripting tasks which must be maintained over time.

The primary dependency of the language is the availability of, preferably, Bash on the system path.  Other shells may be used by setting a command line argument, but support for them is limited.

The -S flag may also be used at startup to disable the child shell process and instead pass shell commands to the parent process. It is also entirely possible to set the shell (with the -s flag) to something like /bin/false should you not require shell interaction.

A more limited test build for Windows is also possible, but some features are currently removed until they prove to be viable.

Information regarding this should be in the INSTALL document. However, a VT-capable terminal, such as ConEmu, should be used for ANSI-based i/o. ANSI output is disabled by default on Windows. It can be re-enabled at startup using the -C flag or during run-time with the ansi() function.

This default will be reviewed periodically. Za also compiles under FreeBSD and is largely complete, other than BSD-specific peculiarities.

The language design is opinionated, with the author having considered many of the tasks he has needed to routinely perform with a mixture of other scripting tools and we have tried to unify the ones which made sense to co-opt.

This is not to say that the shell should be avoided, only that we have tried to adopt common small tasks which generally require sub-shell instantiation or for which the syntax is generally unclear.

There is no intention to make breaking changes to the language after version 2.0.0. There are still a few issues with the syntax to resolve before then, but they are quite minor.

The provided standard library however will evolve over time as new functions are added. Existing library calls will not be removed post version 2, except for the most rare issues. We would always look to deprecate functionality gracefully if unavoidable.

Should you ever need the updated library features it should be a simple task to replace the Za binary in your installation repositories for deployment.

The binary contains both the language and its standard library as well as the supervisory capability over a single child shell.

Source Credits

Za has been written using the Go language and much of its standard library implementation.

Some third-party libraries and modules have been used which require credit:

Database  go-sql-driver various  Mozilla Public License 2.0 https://github.com/go-sql-driver/mysql
String  RenderFloat Fn gorhill  DO WHAT THE F*** YOU WANT TO https://gist.github.com/gorhill/5285193
JSON  gojq  itchyny  MIT License https://github.com/itchyny/gojq
Image  svgo  ajstarks  Creative Commons 3.0  https://github.com/ajstarks/svgo
Text  go-pcre

Current:
Pavel Gryaznov

Historic:
Florian Weimer

see PCRE-COPYRIGHT-FILE in the source. https://github.com/GRbit/


Should the above form of credit be insufficient for any third-party dependency then please contact the Za maintainer to arrange an alternative form.

The current maintainer is listed in the INSTALL document.

The source file CREDITS has the full details of authors.

Usage

        za [-v] [-h] [-i] [-b] [-m] [-c] [-C] [-Q] [-S] [-W] [-a] [-D] \
            [-s path] [-V varname]                                     \
            [-t] [-O tval] [-N name_filter]                            \
            [-G group_filter] [-o output_file]                         \
            [-r] [-F "sep"] [-e program_string]                        \ 
            [-T time-out] [-U sep] [[-f] input_file]
        -v : Version 
        -h : Help 
        -f : Process script input_file 
        -e : Provide source code in a string for interpretation. Stdin becomes available for data input 
        -S : Disable the co-process shell 
        -s : Provide an alternative path for the co-process shell 
        -i : Interactive mode 
-b : Bypass startup script -c : Ignore colour code macros at startup -C : Enable colour code macros at startup -r : Wraps a -e argument in a loop iterating standard input. Each line is automatically split into fields -F : Provides a field separator character for -r -t : Test mode -O : Test override value tval -o : Name the test file output_file -G : Test group filter group_filter -N : Test name filter name_filter
-a : Enable assertions. default is false, unless -t specified.
-D : Enable line debug output -T : Sets the time-out duration, in milliseconds, for calls to the co-process shell -W : Emit errors when addition contains strings mixed with other types -V : find all references to a variable -m : Mark co-process command progress -U : Specify system command separator byte -Q : Show shell command options

Most of the time you will likely invoke a script with za script_name or include a shebang interpreter directive with the location of your za binary file, much like any other scripting language.

If you invoke your script with za without a shebang then Za arguments should be processed out of the argument list before your script starts executing, leaving only arguments for the script to process in argc()/argv() and INPUT statements.

As an example:

# za -D -e 'println argv(); println argc()' arg1 arg2
main: 1 : println argv();
[arg1 arg2]
main: 1 : println argc()
2

 

Language and Type Behaviour

Error Handling

As this is an early release, there are still unhandled error cases. Please report these. It is important for us to present errors correctly.

There are no exception handling capabilities within the language. This is likely to remain the case. There is not enough justification for them.

In general, functions should return error codes or an additional error value in the form of a second variable or a struct field.

Shell commands have their last error status available through the last() library call. However this should probably be avoided in favour of other methods shown below.

On a program error which causes termination you should receive information regarding the nature of the fault and the location it occurred. There may be additional information depending on context.

We may introduce more error handling features later if these prove insufficient.

Code Blocks

Text encapsulated within the { and } tokens will be sent as lines of code to the shell.

This is treated as an expression and returns a struct containing the stdout and stderr output (minus trailing newline characters), the status code of the last executed shell command and a status flag. The relevant struct fields are .out, .err, .code and .okay respectively.

If, instead, the ${...} form is used to encapsulate the code block, only the .out field is returned as a string.

There is a certain level of impact in the use of shell-specific symbols using these constructs. Some of that may be avoided by using alternatives such as:

| command
var =| command
var =< command
-or-
system(command_string)
... depending on your requirements.

You may also use the | command form as an expression much like the ${...} form.

Example:

        foreach f in ${ls -1} # -or- | `ls -l` 
            println "File : ",f 
        endfor 
    

Use of {...} form:

        if {find . -maxdepth 1 | wc -l}.out.as_int > 1 
            println "Multiple files in directory." 
        endif
    

Multiple commands can be processed free-form using the {...} type constructs, whereas with the |, =| and =< constructs it is necessary to add semi-colons instead of newlines, which sometimes necessitates quoting the command to avoid conflict with Za tokenisation. This isn't as complex as it sounds! It becomes clear with some use of the language.

Data Types

The language supports integer, float, string and boolean types in function calls, expressions and assignment expressions. Type width is 64-bit for numeric values.

Arbitrary-length integers and floats are also supported, but are obviously slower than the fixed-length standards.

There are currently only a few compound types as described below. Complex compound types are not necessary given the caveats below.

Za treats multi-line string values as lists for iteration purposes and during standard library calls, where appropriate.

A typical snippet of code is shown below to illustrate this:

        # read username, gid and description from password file for all users 
        foreach l in read_file("/etc/passwd") 
            fields(l,":") 
            println F[1], ",", F[4], ",", F[5]
        endfor
    

In this example, the returned value of read_file is a multi-line string. This is automatically split by FOREACH for line-by-line processing.

The fields() library call creates the F variable containing the individual words of a string. The second argument to fields() is an optional field separator string. The default separator is " ".

It should be obvious what is going on in the snippet for most readers, if they have any familiarity with basic Unix scripting tools. Fields() mimics an aspect of AWK. Foreach is a typical construct in many languages. read_file() functionality should also be obvious.

Fields, F[] and NF

'F' and 'NF' are reserved variable names. A large portion of use cases for this language involve text processing. Much like in shell scripts and languages like AWK and Perl, it was felt there was a justification to lay claim to some positional variables.

The F array is generated local to its scope. It differs from a normal system variable in that system variables are generated at the global scope and are normally read-only through library calls. NF holds the number of fields extracted.

To further illustrate the use:

        z="example input text" 
        fields(z)
        print z
        example input text

        print F[1]
        example

        print F[2]
        input

        print F[3]
        text

        print F 
        [example input text]

        F[2]="manipulated"
        print F
        [example manipulated text]

        print NF
        3

Variable Interpolation

As mentioned above, encapsulating a Za variable name in curly braces {} inside a string will cause the variable to be substituted at run-time for it’s current value. This can be done in a few places, including `` or "" quoted strings and anywhere in a co-process call statement ( | / =| ).

Expressions may also be evaluated (including standard library and user-defined functions) using the alternative syntax {=expr}

Using the second form (for expressions) is the only way to interpolate global variables. The first form will only substitute variables in the local function scope.

These limitations may be lifted later.

Arrays

Simple types need not be declared and the best fit should be chosen on assignment. Internally, variables are stored in appropriate types, but their use should be transparent. However, standard library calls should fail if incorrect types are used as arguments.

When an array is required, there are various supporting tools to deal with them.

In expressions, array elements may be directly addressed using the ary[element] syntax. This has 0-based indexing.

Optionally, you may specify an element range using the syntax ary [s:e] to specify the start (inclusive) and end (exclusive) of the range of elements. Both s and e are optional. [] is the same as [:].

When assigning from a range, for example:

a=[1,2,3,4,5,6]
b=a[2:3]

... this would assign a reference to the elements in a as a slice. In order to return a copy, the dup() function should be used, i.e.:

b=a[2:3].dup

... so that elements of b become mutable without interfering with a.

This may not be apparent when using the range operator on a string. As strings are immutable, the range operation always works against a copy.

Also, there are a number of standard library calls for dealing with arrays. The standard library calls cover treating internal arrays as lists, stacks and queues in a very basic manner.

They also allow for splitting and joining lists and match finding.

Some library calls use 1-based indexing where it is more meaningful to do so. E.g. field()

Variable-length and Associative Arrays

Arrays are generally instantiated using the VAR command. Please see the Initialisation Statements section for syntax.

The array key for associative arrays must be a string.

The value type is in line with whatever is assigned on the right-hand side.

Arrays are referenced using the typical ary[key/index] format.

Only 1-dimensional fixed-length arrays are currently supported.  This is likely to remain true as there is not much justification for n-dimension arrays for our use case.

You may still reference multi-dimensions in expressions, using syntax such as ary[a][b]… but the VAR command only handles single dimensions currently.

There is effectively pseudo-support for 2 dimensions, as the data type of the arrays can be untyped (i.e. “any” as the VAR type). This allows the assignment of other arrays/lists to the array, but it is of limited flexibility.

Also, a compound type stored inside an array may be assigned to another variable then processed as a separate array should the need arise.

Please note! If an attempt is made to write to an array element above the length of an array, it is resized dynamically to accommodate the write.

For write operations on string indices above the length boundary, this resize does not occur and a fatal error is reported, currently.

The new length after such a write will be double the previous length. If this happens, it is up to the user to scale the size back down when required. 

As usual, given the expected use cases for the language, this was deemed an acceptable trade-off. It is up to the programmer to ensure such resizes do not happen by correctly checking array access bounds, but for ad hoc scripting it should not be an issue.

Structures

Simple structured data types can be defined using a combination of the STRUCT, ENDSTRUCT and VAR commands.  Please see the language reference further below for syntax.

Currently, only scalars and the mixed array type ( [] ) may be set as struct fields. This may be altered in later versions, but it should be enough for getting config to disk and to ease passing larger amounts of data between functions.

It is possible to create both arrays of structs and structs of array, but the latter will get a bit (read: very) unwieldy in this language.

Example:

        struct eg_str ; a int; b bool; endstruct 
        var ar [10]any
        ar[0]=eg_str()
        print ar 
        [{0 false}] 

You can also directly access a field as part of an array reference in the expected manner, i.e.

    var ar [10]any
    ar[expr].fieldname=other_ar[expr].fieldname 

We are very unlikely to add further levels of nested access on the left-hand side of assignments beyond the level shown above.

Struct Literals

Rather than populating structure fields in individual statements, they may also be initialised using this compound form:

struct_name(field_value_list)

For example:

struct person
    name     string
  age      int
endstruct

p1=person()                # {name:"" age:0} - initialise with default values
p2=person("santa",123)     # {name:"santa",age:123} - set values, in order of field declaration in person struct

Literal Field Naming

You may also optionally initialise fields by name, where this makes things easier for you. You must name all of the fields or none.

For example:

p3=person(
      .age   42,
      .name  "zaphod"
)

When using named fields the order does not matter, but incorrect field names, missing field names and incorrect value types all raise an error. You might primarily use this form for clarity.

Function Call Notes

It is also possible to use named arguments when calling user-defined functions with the standard call syntax. For example:


define swap(first,second)
    return second,first
end

println swap(.first 0,.second 42)
[42 0]

When using named arguments, you must either name all arguments or none.

This does not work when using result chaining, as described below.

Result Chaining 

When an expression such as a.b does not refer to a structure field an attempt is made to evaluate it as a call to function b using a as the function's first argument.

Consequently, it is also possible to add parameters in the form a.b(x,y) which would call the function b with the arguments (a,x,y).

With this in mind you can chain this form multiple times. For example:

        define double(x) 
            return x+x 
        end

        println "abc".double 
        abcabc

        println "abc".double.len.double
        12

Another example:


        define boolneg(a) 
            return !a
        end

        println boolneg(true)
        false

        println true.boolneg
        false

        println boolneg(boolneg(true)).boolneg.boolneg.boolneg
        false

    

This is a similar behaviour to UFCS in some other languages.

Calls may be split over multiple lines. You can place sub-expressions in parentheses to achieve this, or, alternatively, you may also split lines on the dot operator.

Multiple Return Values

When a RETURN command contains multiple, comma-separated return values, they can either be assigned to a single left-hand side variable or a matching length list of variables.

Additionally, multiple values may be assigned from a matching length list. A run-time error is generated when the length of the list does not match the number of left-hand-side assignee variables. Also, if any of the assignees have been pre-declared using VAR then their types must match.

For example:

        define swap(x,y)
            return y,x
        end

        a=swap(3,10)
        print a
        [10 3]

        a,b=swap(3,10)
        print a 
        10 
        print b
        3

a,b,c=[1,2,3]
print a
1 print b
2 print c
3

Typing

It is also possible to optionally declare a type for simple variables with the following syntax:

            VAR name [ [ array_size ] ] type [ = expression ] 
    

… where type is one of int, uint, float, string or bool. any is also a valid type when specifying an array type.

This is optional for simple types but mandatory for compound types. When an assignment occurs against name, types will be checked and a failure will occur if they are not compatible types.

If a VAR statement is executed against a name which is already instantiated then a failure will occur.

This is not a full type system by any means! It is more for adding a few safeguards around scripts.

It should be noted that the VAR keyword also initialises the named variable to the default value (i.e.: 0,0,0.0,"",false) when an assignment is not included for simple types.

Iteration

When the expression result to be iterated over in a FOREACH loop is a string value, iteration is line-by-line. To iterate over a string byte-wise, a normal FOR loop should be used.

Should an array/list be iterated over, it should act, as expected, upon each element of the type.

It should also be noted that the normal FOR loop construction only supports integer values for the iterator. Should another type be required, you may use the C-like FOR variant.

N.B. On entering a FOREACH loop another variable is created with the name key_X where X is the name of loop variable.

When looping through associative arrays it is the key name of key-value pair. In other loop types it is an index counter.

Library Calls

For many of the library calls, for example, the lines() function, the input string is treated like an array. For some other functions, the behaviour is input-dependant. Notes should be present in documentation where this occurs.

Shell I/O Behaviour

Before Za begins to process a script an instance of Bash (or an alternate shell) is executed concurrently, which is terminated at the same time as Za. This behaviour can be changed with CLI arguments currently. The pipe command ( | ) may be used to feed commands into the co-process for execution.

For example:

            | ls -lh /etc
    

… would, as expected, execute a directory listing, with options, of the /etc directory. The standard output of the co-process is captured and presented on the standard output device of the Za process. Anything caught on the standard error channel of the shell, during a command execution, is captured and made available through the library call last_out(). Normal Za variables may be used in forming shell commands and are interpolated before execution.

For example:

            dir_param="/etc" 
            | ls -lh {dir_param} 
    

… to achieve the same result as before. The output of a command may be captured instead of printed by using the command assignment construct ( =| ).

For example:

            listing =| ls -lh /etc
    

The error code returned by the command will be captured and will be accessible through the last() library call. The stderr channel output is accessible through the last_out() library call.

Alternatively, and preferably, you can access the status code using the .code field of the returned value and the error output using the .err field. I.e. listing.code and listing.err of the above command.

The normal captured output is available in the .out field and a status summary flag is set in .okay. The status summary flag is set to true when the return code is zero.

Care should be taken due to the above behaviours to formulate commands allowing for the handling of stderr and stdout. You may optionally quote your command if required to clarify syntax in multi-statement lines, e.g. listing=|`ls -l`

Working Directory

It should be noted that Za maintains a coalesced current working directory between itself internally and a launched sub-shell, if present.

The current value of that path will also determine calculated file paths, when an absolute path is not used, in various file reads/writes performed during execution.

The path should generally default to the current working directory at the moment the Za binary was executed. That path is tracked by the Za system variable @cwd and can be read using the call cwd().

For example, in interactive mode:

          prompt="[#3]{@user}@{@hostname}[#-]:[#6]{@cwd}[#-] > "
          # this is one of the few places you can directly access the system variables!

          cd("/")
          # visible prompt will change, assuming you didn't start in the root directory. 

          | cd $HOME 
          # prompt will change again to indicate the new location.

          print cwd() 
          /home/daniel 
          # ... or whatever your home is.
    

Statement Separation

Statements are terminated by Unix line-feed characters (\n).

Carriage-return characters (\r) found in token space (not strings) during parsing are discarded.

It is also possible to use the alternative statement end token, a semi-colon, to terminate a statement. This allows for multiple statements per line.

Variable Scope

Please note, all variables have local scope only, except for a few system variables which will be detailed later.  

The range of local scope is the entire function, and is not further restricted in any more granular way.

Accessing Global Variables

As this is the kind of activity we want to (passively) discourage, it is not possible to directly assign to global variables from within a function call.

In order to write to a global from outside of global scope you must use the @ (SETGLOB) statement, e.g. @x=42

This is only possible on the left-hand side of assignments.

However, you may read from global variables from within function scopes without limitation, on the understanding that local variable names take priority over global variable names, e.g.:

        define q()
            # x=10
            print x
         end

         x=42
         q() 
    

.... will print 42, unless the x=10 is uncommented, in which case 10 will be printed.

Pane Support

At launch, all PROMPT and PRINT activity is performed in the global pane. This is a structure describing the entire terminal window dimensions. Through use of variants of the PANE command additional pseudo-windows can be created and switched to.

The AT command is always relative to the top-left of the current window pane. Row and column numbering starts at 1.

Example:

        pane define "envs", 2, 10, 12, 50, " [#6][#i1]Environment[#-][##] ", "double"
        pane select "envs" 
        at 2,3, "Bash Version         : "+bash_version()
        at 3,3, "Bash Major Version   : "+bash_versinfo() 
        at 4,3, "User                 : "+user() 
        at 5,3, "OS                   : "+os() 
        at 6,3, "Home                 : "+home() 
        at 7,3, "Locale               : "+lang() 
        at 8,3, "Distribution         : "+release_name()
        at 9,3, "Distribution ID      : "+release_id()
        at 10,3,"Distribution Version : "+release_version()
    

Example output:

 ╔══ Environment ═════════════════════════════════╗ 
 ║                                                ║ 
 ║ Bash Version         : 5.1.16(1)-release       ║ 
 ║ Bash Major Version   : 5                       ║ 
 ║ User                 : daniel                  ║ 
 ║ OS                   : linux                   ║ 
 ║ Home                 : /home/daniel            ║ 
 ║ Locale               : en_GB.UTF-8             ║ 
 ║ Distribution         : Ubuntu                  ║ 
 ║ Distribution ID      : ubuntu                  ║ 
 ║ Distribution Version : 22.04                   ║ 
 ║                                                ║ 
 ╚════════════════════════════════════════════════╝ 

 

Case Statement

The CASE statement is the Za variant of the switch/case construct. It is currently rather limited but this may be expanded upon in future versions.

CASE [expression]
    [ IS expression
       action ]
    [ HAS condition
       action ]
    [ CONTAINS regex_condition
       action ]
    [ OR
       default_action ]
ENDCASE

Each IS/HAS/CONTAINS clause may be selected from. The first matching clause will be actioned. Subsequent conditions to a truthful match are ignored. I.e. there is no fall-through.

A BREAK statement may optionally be used to end the CASE block. On BREAK, execution continues from the ENDCASE statement. CASE statement blocks may be nested, within pre-defined limits. (All nesting is currently limited to a depth of 1000).

Interactive Mode

Za may be started in interactive mode using the -i argument. This mode is intended for debugging and single-line task activities.

In interactive mode, a prompt is presented for user input commands. This mode currently treats each line as part of a single function in global namespace internally.

There is limited support for multi-line statements. When an open syntactical element has been input, further input will be accepted until the open element resolves itself. When this happens you should be shown a continuation prompt ( "--" ) instead of the normal prompt ( ">>" ).

The prompt may be changed using the syntax:

    prompt="new_prompt" 

... and the assigned string may include interpolated information and Za colour codes.

E.g.

                         >> prompt="[#b5][#1]{=date_human()}[#-][##] > "
    09 Oct 20 10:34 +0000 >
    09 Oct 20 10:35 +0000 > prompt="{@bashprompt}"
    daniel@vm0:/usr/local > prompt="{@startprompt}" 
                         >>

Another difference in interactive mode is that PRINT commands have an additional line-feed character appended to their output.

The entire interactive mode is experimental at this stage.

Startup Scripts (for interactive mode)

A user may also create a startup script named .zarc at the top of their home directory. For example:

    # this file: ~/.zarc
    doc `
            example startup script
    `
    prompt="{@bashprompt}"
    help
    enum s3sum ( okay=0, warn, file, sum )

Modules

The MODULE command has been included to permit separating source into workable chunks and allow for re-use.

By default, Za will look for modules in the path $HOME/.za/modules. This may be overridden by setting the environmental variable ZA_MODPATH.

If a path containing forward slashes is provided by the MODULE command, and it is an absolute path it will be used instead of the ZA_MODPATH or home path to retrieve the module.

If the path is relative, it is joined onto the absolute path of the source file which was executed.

N.B. In general, only function, struct and enum definitions and perhaps test code should be placed in modules. Access to variables in the caller's scope is not permitted. A module is treated as a separate execution space when it is read in and executed. You will only have access to local variables and globals. You may still write code outside of functions within the module scope, it just acts like a function call without a definition or return arguments.

Any variables used will be discarded at the end of module scope. Obviously, any structs, enums or functions defined within the module will persist beyond the end of the module execution.

Any test blocks within the module will be executed as they are reached.

Namespaces

We have introduced a very simple namespace binding approach for functions. When called with the syntax module_name::function_name the module specific version will be called. If called using just the plain function_name then the version defined in the current module will be used. For this to work, a module must either have no filename extension, the extension .mod or be called using the module's AS alias.

Additionally, the module name must follow the same syntax rules as identifiers.

Module Aliasing

You can optionally specify an alias to be used to refer to a module's functions using the AS clause of the MODULE command. E.g.:

    module "mods/menu/test" as menu

You may also specify a namespace qualifier for struct and enumeration definitions. This is in the same form:

    namespace::struct_name
     -and-
    namespace::enum_name

The namespace equates to the alias name of the module in which the struct or enum was defined.  If no alias was given in the module statement, then the namespace is the base filename of the module, not including it's parent path. E.g. for the module statement above the namespace would be "test" if the "menu" alias was not present.

If a struct or enum name is unqualified, then, internally, the current namespace is provided as the default value. The default namespace name, at the top-level, is main. However, you should never need to specify the main namespace under normal circumstances. 

We may provide keywords for manually specifying the namespace in a later version of Za.

Calls to standard library functions are always unqualified.

We do not currently support multiple namespace separators chained together to form a path. That is very unlikely to change without considerable persuasion.

 

Language Syntax

Function Statements

Functions may be defined using the syntax below. Arguments presented during a call are evaluated before the function call is executed.

DEFINE name (arg1,...,argN)
    [ RETURN [retval1[,...,retvalN]] ]
END

ASYNC handle_map f(...) [handle_id]
-or-
ASYNC nil f(...)
- run a function asynchronously.
  - handle_map is an array which holds references to currently active asynchronous threads.
  - if handle_map does not exist it will be created. You just name it according to how you wish to group your parallel tasks together.
  - f(…) is the name (and arguments) of a function which you wish to launch.
  - (optional) handle_id defines the name to be used as a key in handle_map for this thread.
  - if handle_id is not provided then a random, unique name is constructed instead.
  - ASYNC nil is used for launching throwaway background processes which do not return state.

[ r1[,...,rN] = ] function_name( [p1 [ ,p2,…,pX] ] )
- call a function, with parameters p1[..pX] - assigned values of r1...rN are typed according to the RETURN’ed values.

Selection Statements

IF condition
    statements
ELSE
    statements
ENDIF

ON expression DO command
- the equivalent of IF expression==true; command; ENDIF for single commands.

CASE [value_expression]
    IS boolean_expression
        statements
    CONTAINS regex_expression
        statements
    HAS expression
        statements
    OR
        statements
ENDCASE
- switch-like structure. value_expression defaults to true if not present.
- IS: clause statements execute when value_expression matches boolean_expression
- CONTAINS: clause statements execute when pattern regex_expression matches value_expression
- HAS: clause statements execute when expression is true.
- OR: default case.

WITH var AS filevar
    statements
ENDWITH
- This construct allows you to present a Za variable as a temporary file to other commands.
- The commands may be either Za or co-process based.
- The temporary file - will be removed at the end of the block.

Iteration Statements

WHILE condition
    statements
ENDWHILE

FOR [init_assignment] , [condition] , [iteration_post_assignment]
    statements
ENDFOR
- C-like for clause.
- e.g.:
    a=1
    FOR x=0 , x<20 , x+=a++
        print x," "
    ENDFOR
    # returns:
    0 1 3 6 10 15

FOR var = start TO end [STEP step]
    statements
ENDFOR
- integer iteration only.

FOREACH var IN expr
    statements
ENDFOR
- iterate over variable content lines.
- the loop iteration index is accessible through the name key_var.

Flow Control Statements

BREAK [ count ]
- exit a loop or CASE clause immediately.
- an optional construct level depth integer (count) may be provided to break out of surrounding constructs

CONTINUE
- proceed to next loop iteration immediately.

EXIT [ code [ , string ] ]
- exit script, with optional status code integer.
- the exit code defaults to zero.
- when a string is provided as the second argument then it is output to console (and the log if enabled ) as a termination reason.

FIX [ label ]
    statements
    [ RESUME ]
when an expression resolves to nil and it is followed by the try operator ? then, if present, execution
    will jump ahead to either a default FIX statement or to the first subsequent matching FIX label
    statement if the try operator is followed by a matching string literal.
    If no subsequent, matching FIX statement is present in scope then the function (or program) will
    terminate without error.
- FIX is intended for abnormal operation corrections. It is not ever going to approach try..catch functionality.
- E.g.:

    fh=fopen("/tmp/i-dont-exist")?
    # do stuff
    fh.close
    fix
        println "deal with the failure here."
    resume
    fix different_error_label
        println "deal with an alternate fault here, then exit."
    fix another_one
        # you get the idea
        resume

- Inside a FIX stanza, the local variables _try_line and _try_info are available. Others may be added later.
    - _try_line : (int)      : source line fault encountered on.
    - _try_info : (string) : fault string (not always populated, depends on nature of error.)

- The RESUME command returns program execution to the statement following the location of the fault.
- Fault information does not propagate back up the call chain in any form.
- Other functionality may be added later.

Output Control Statements

PRINT[LN] expression [,…,exprN]
- local echo.
- Println appends a newline to the final output.
- ANSI codes are interpolated as part of output expression evaluation.
- Please see the appendices for a list of available code representations.

LOG expression
- local echo plus pre-named destination log file.

LOGGING OFF|ON name
LOGGING QUIET|LOUD
LOGGING SUBJECT subject_string

- disable or enable logging and specify the log file name.
- stops (quiet) or enables (loud) local echo of LOG commands.
- set the start-of-line “subject” in log entries. Defaults to an empty string.

LOGGING WEB OFF
- disable the web server log file.

LOGGING WEB ON
- enable the web server log file.

LOGGING ACCESSFILE path
- sets the location of the web server log file.

LOGGING TESTFILE path
- sets the file location to which test mode output is directed.

CLS
- clear console screen, or the current pane.

AT row,column[,output_string]
- move cursor to row,column.
- optionally output output_string after the cursor has been moved.

PANE OFF
PANE SELECT name
PANE DEFINE name, x, y, w, h, title [,type] ]
PANE REDRAW

- disable windows
- change current window
- define a new window
- redraw the current window
- type may be none, rounddot, round, square, double, topline or sparse. Defaults to round.

Input Control Statements

INPUT id PARAM|OPTARG position [ IS "hint" ]
INPUT id ENV name [ IS "hint" ]
- set variable id from external value or exit.
- hint is an optional string for describing a missing parameter in error messages.
- type may be one of the following:
  - PARAM                     : for mandatory positional CLI arguments
  - OPTARG                   : for optional positional CLI arguments
  - ENV                         : for mandatory environmental variables

PROMPT var prompt [validator] [ IS "default_string" ]
- set var from stdin. loops until regex validator, if present, is satisfied.
- The IS clause is used to specify a default value for the input prompt.

Assignment

var = value
- assign to variable.

var =| expression
- store result of a local shell command to variable.

var += expr
var -= expr
var *= expr
var /= expr
var %= expr

- These are currently only usable in the most simple manner with scalar variables.
- Pre- and post-increment and -decrement are not supported in SETGLOB calls.

SETGLOB var=expression
-or-
@ var=expression
- Assign the result of expression to the global variable var
- use the SETGLOB alias if you wish to make this more visible in code.

Initialisation Statements

STRUCT name
    fieldname1 fieldtype1 [ = default_value ]
    .     .
    fieldnameN fieldtypeN [ = default_value ]
ENDSTRUCT

- Defines a structure type.
- These can be initialised using VAR.

ENUM enum_name ( label1[=value1] , label2[=value2] [ [,...,labelN[=valueN] ] )
- Define an enumeration.
- Values can be dereferenced with the syntax enum_name.enum_label
- If a value is not assigned, then it assumes a value and type according to the label immediately prior to it, incremented by one.
- If incrementing is not possible (string type), then an error occurs.
- example:
  - enum colours ( red=10, green, blue ).
  - the example above assigns 10, 11, 12 to red, green and blue respectively.

VAR var type
VAR var MAP
VAR var struct_type
- Initialise a new fixed-length array (when type and size specified)
- type can be int, float, bool, string, map, bigi, bigf or any.
  - "any" allows for any type.
- the MAP variant initialises an empty associative array.
- Initialise a new STRUCT variable (when type is a defined STRUCT type)

VAR var_name var_type [ = expression ]
- initialises a simple variable of type int, float, bool, uint, bigi, bigf or string.
- ensures only the declared type is assignable to this variable.

Miscellaneous

MODULE modname [ AS alias ]
- reads in and executes source from a module file.

REQUIRE feature [ num ]
REQUIRE semantic.version.number
- assert feature availability or version level in the library, or exit.
- assert running language interpreter version (when semantic number given)

ASSERT expression
- assert expression is true, or exit.
- assert can be set to not exit, but only inside test blocks
  - this is done using the ASSERT CONTINUE clause of the test statement

DOC [function_name] comment
- documentation comment. Will be treated by context when processed.
- When inside a TEST..ENDTEST block it will be included in the test output file.

TEST name GROUP gname [ASSERT FAIL|CONTINUE]
             statements
ENDTEST
- These blocks are ignored during normal execution.
- They must however still be processed by the interpreter. Don’t leave them inside loops!
- Use the CLI flag -t to enable test mode.
- Use the CLI flag -o to direct test output to a specific file.
  - Output normally goes to a file named za_test.out in the current directory.
  - This may also be changed using the LOGGING TESTFILE command.
- Use -G to filter the performed tests by their group name.
- Use -O to override each test’s ASSERT value with a global value.

PAUSE timer_ms
- delay timer_ms milliseconds.

NOP
- dummy 1 micro-second delay command.

| command
- execute shell command.

VERSION
- show Za version.

HIST
- in interactive mode, displays the command history.
- command history is navigable using the up/down cursor keys.

- other line editing options:

    -  ctrl-a : start of line (or home)
    -  ctrl-e : end of line (or end)
    -  tab    : enters/leaves completion mode
    -  ctrl-c : break
    -  ctrl-d : end input/quit
    -  ctrl-u : delete to start of line

# comment
- comment to end of line.

Standard Library

Currently, there are approximately 360 library functions, of varying degrees of utility. Please see here for current function list.

 

Appendices

Security

This language is used at your own risk.

Use a more suitable tool if this is unacceptable to you.

Having said that, there are some things you can do to mitigate the inherent risks involved in using a dynamically typed, interpreted language, with some very unsafe features.

- The PROMPT statement will automatically remove nestings of {} from input.
  * This should deny attempts to smuggle interpolated commands for that statement.

- Any user input which may be assigned or otherwise interpolated should be run through the clean() lib call, to do the same.
  * i.e. input_var.clean

- Any standard input should be treated similarly if appropriate.

- Any other form of data input mechanism should be considered for cleansing before allowing people, or other systems, to submit data through it.
  * Measures should be taken to cleanse unformatted input.

- Other calls exists for cleansing inputs:
  * stripquotes, stripcc, stripansi, addansi, clean, html_escape, trim, filter
  * the ~, ~i and ~f regex operators
  * PROMPT also has the optional regex validation feature.

- Verification and validation are not hard to write for yourself.

- The language is inherently unsafe: if you must accept user input, then verify and validate it.

- permit() options exists for stopping some bad stuff:

  * permit("eval",bool) : enable/disable the eval() lib call
  * permit("shell",bool) : enable/disable shell command execution
  * permit("interpol",bool) : enable/disable string interpolation
  * permit("uninit",bool) : enable/disable termination when uninitialised variable is encountered.
  * permit("dupmod",bool) : enable/disable execution error on duplicate module imports.
  * permit("exitquiet",bool) : enable/disable shorter error messages.
  * permit("permit",false) : disable further execution of the permit library call.

  * these are global values, so need treating carefully if using async methods.
  * The permit options can be turned on and off whenever required so can be used to wrap around some of the more dangerous things you do.
    * Ideally though, input should be cleansed instead or as well.

- The onus is on the programmer to do sane things.

The language should be safe to use with adequate care and forethought.

That does not include exposing it to unsanitised input and hoping for the best. If used for it's intended purpose of processing well understood inputs and for automating drudge work then everything should be fine. If used more speculatively at it's own limits rather than working with a more correct tool then you should expect failure. I.e. don't expose the web server to the public without understanding its limitations. don't add eval() everywhere. don't move user input straight into system() calls or the | operator (or similar) minimise input variability to a well understood set of data that may be sanitised.

ANSI mappings

The presence of a term such as [#code] in an output statement causes Za to replace the [#…] term during output with an ANSI control sequence. The actual representation on screen may differ due to colour mappings within your terminal package.

The list of replacement codes is shown below:

                        #bdefault   or bd
  Return background to default colour.
                        #bblack     or b0
  Set background colour to black.
                        #bblue      or b1 
  Set background colour to blue.
                        #bred       or b2
  Set background colour to red.
                        #bmagenta   or b3
  Set background colour to magenta.
                        #bgreen     or b4
  Set background colour to green.
                        #bcyan      or b5 
  Set background colour to cyan.
                        #byellow    or b6 
  Set background colour to yellow.
                        #bbgray 
  Set background colour to bright gray.
                        #bgray 
  Set background colour to gray.
                        #bbred 
  Set background colour to bright red.
                        #bbgreen 
  Set background colour to bright green.
                        #bbyellow 
  Set background colour to bright yellow.
                        #bbblue 
  Set background colour to bright blue.
                        #bbmagenta 
  Set background colour to bright magenta.
                        #bbcyan 
  Set background colour to bright cyan.
                        #bwhite 
  Set background colour to white.
                        #fdefault   or fd 
  Return the foreground colour to the default.
                        #fblack       or 0
  Set foreground colour to black.
                        #fblue 
  Set foreground colour to blue.
                        #fbblue       or 1
  Set foreground colour to bright blue.
                        #fred 
  Set foreground colour to red.
                        #fbred        or 2
  Set foreground colour to bright red.
                        #fmagenta 
  Set foreground colour to magenta.
                        #fbmagenta    or 3
  Set foreground colour to bright magenta.
                        #fgreen 
  Set foreground colour to green.
                        #fbgreen      or 4
  Set foreground colour to bright green.
                        #fcyan 
  Set foreground colour to cyan.
                        #fbcyan       or 5 
  Set foreground colour to bright cyan.
                        #fyellow 
  Set foreground colour to yellow.
                        #fbyellow     or 6
  Set foreground colour to bright yellow.
                        #fgray 
  Set foreground colour to gray.
                        #fwhite       or 7 
  Set foreground colour to white.
                        #- 
  Reset all colour styling to the default.
                        ## 
  Set background colour to the default.
                        #default 
  Turn off all currently raised codes.
                        #bold 
  Enable bold text.
                        #dim 
  Enable low-lighting of text.
                        #i1 
  Enable italicised text.
                        #i0 
  Disable italicised text.
                        #underline 
  Enable underlined text.
                        #blink 
  Enable flashing text (where supported.)
                        #invert 
  Enable reverse video text. (where supported.)
                        #hidden 
  Enable hidden text. (where supported.)
                        #crossed 
  Enable single strike-through text. (where supported.)
                        #framed 
  Enable framed text. (where supported.)


You may also use the bg256, fg256, bgrgb and fgrgb library calls to generate ANSI terminal strings for higher colour bit depths.

Supported Operators

The following operators are supported by the language:

Prefix Operators
--n       pre-decrement 
++n       pre-increment
sqr n     square (n*n) 
sqrt n    square root
-n        unary negative 
+n        unary positive
!b        boolean negation ( or not b )
$uc s     upper case string s
$lc s lower case string s $lt s left trim leading whitespace from string s [\t\ \n\r] $rt s right trim trailing whitespace from string s $st s trim whitespace from both sides of string s $pa s absolute path from string s $pp s parent path of string s $pb s base file name from string s $pn s base file name without extension $pe s extension only from string s $in f read file 'f' in as string literal | s return successful command output (of s) as a string Infix Operators a - b subtraction a + b addition a * b numeric multiplication str_a * b string repetition a / b division a % b modulo a ** b power
a -= b subtractive assignment a += b additive assignment a *= b multiplicative assignment a /= b divisive assignment a %= b modulo assignment a || b boolean OR ( or a or b ) a && b boolean AND ( or a and b ) a << b bitwise left shift a >> b bitwise right shift a | b bitwise OR a & b bitwise AND a ^ b bitwise XOR a ~f b array of matches from string a using regex b s.f struct field access
a.f UFCS-like function call of f, with first argument a
s .. e builds an array of values in the range s to e, inclusive s $out f write string 's' to file 'f' b ? t : f if expression b is a boolean value and is true then t else f

ary[start:end]
array subscript operator. start and end are optional.
end is not inclusive.

number[start:end]
number clamping operator. restricts lower and upper limit of the number to between start and end.
start and end are both optional. the returned number type is that of the clamping limit's type when clamping is applied.

expr ? [ label_string ]
if expression expr results in a nil value then end scope execution.
scope termination can be avoided using the FIX/RESUME statements.
array|map ?> "bool_expr" array|map -> "expression" Filters (?>) or maps (->) matches of 'bool_expr' (?>) or values of 'expression' (->) against elements in an array/map to return a new array/map. Each # in bool_expr/expression is replaced by each array/map value in turn. Comparisons a == b equality
a != b inequality a < b less than a > b greater than a <= b less than or equal to a >= b greater than or equal to a ~ b string a matches regex b a ~i b string a matches regex b (case insensitive) a in b array b contains value a a is <type> Expression a has underlying type of: bool|int|uint|float|bigi|bigf|number|string|map|array|nil Postfix Operators n-- post-decrement (local scope only, command not expression) n++ post-increment (local scope only, command not expression)