Tech Intro


What you need to know

Welcome to the Adkodas™ technical introduction. Here you can learn the nuts and bolts of what it will take prepare your data and utilize our system to get the data analysis results you're looking for.

If you're looking for a broader overview of what makes Adkodas technology unique in today's fast-paced information age, please see our general introduction and discussion of rule extraction. This overview is to help you prepare your data for our initial analysis (both for full AI machine learning and for the resulting rule extraction).

Nuts & Bolts

If you're looking to try out Adkodas' neural networks and rule extraction technology, both of which use supervised learning, here are the essentials you need to get going.

We accept .csv and .dat files: respectively, data in comma separated columns and data in space or tab separated columns. (See file types below.) Output should be listed in the rightmost columns, so that each row consists of an input vector of values followed by its associated output. For examples of this standard separated-column format, see the UC Irvine Machine Learning Repository.

We have an all-purpose translation interface for two types of data: 'spatial' and 'classification' for our neural networks.

'Classification' data is composed of generic columns of 'independent' data; that is, the output in each row should be independent of the ordering of the input columns. (For example, if we are doing a classification problem as to the credit risks posed by prospective loan applicants, interchanging the order of the applicant's age and employment status in the input vector should not affect whether or not the person is a good credit risk.)

Our rule extraction service works on Classification data.

'Spatial' data is the opposite; not only the contents but the ordering of the input columns matters for the output. (For example, if we are doing a test for symmetry, 1 0 0 0 0 1 should yield an output of 1, indicating the input is symmetrical, whereas 0 0 1 0 0 1 should yield an output of 0, for lack of symmetry.)

Basically spatial data is visual data - is this a picture of a sports car or not?

If all the input columns are binary (each column contains only values of 0 or 1) there is --for the current release-- a maximum of 180 input fields. Likewise, if the outputs are purely binary, there is (current) maximum of 30 output fields. There must be at least 1 output field.

Output fields can also be 'symbolic': output fields can contain up to 30 distinct non-binary values distributed amongst the fields for our neural networks. For example:

Horsepower Top speed MSRP Vehicle type Manufacturer
    Sports car Porsche
    Sports car Lotus
    Car Mercedes
    Sports car Mercedes
    Motorcyle Aprilia
    Car Mazda
    Motorcycle Harley-Davidson

In the first output field, there are 3 distinct values present; in the second ouput field, 6 distinct values. Thus this example has a total of 3+6 or 9 distinct output values (out of a maximum of 30).

Here, vehicle type can be represented as 1, 2, 3 (or any three distinct integers), and manufacturer could be represented by 1, 2, 3... 6 (or any other 6 distinct integers).

For input fields, we accept at least 1 and up to 20 mixed input fields: floating point, integer or binary. So there can be up to 20 floating point numbers, integers or binary or any combination thereof.

We only accept digits, the alpha chars of e, E, g and G for exponents. '.' is always used for floating point numbers and ',' is only ever used in .csv files to separate the fields.

We currently accept up to 5000 input vectors (i.e. up to 5000 rows).

Both our rule extraction and neural networks handle input and output the same way.

Rule extraction currently accepts about 10000 data points, 50 output, and something like 100 input. But we are working on making those numbers bigger for both our neural networks and our rule extraction.

Please contact us if you have any queries.

How much data is necessary

Naturally, each data set is different; notably, the amount of data required will depend on each representing each class of desired output well. In general it is good to have a good mix of input vectors showing what makes a good example of a class and what doesn't.

Perhaps the best way to address this question is by example. As one instance of a 'spatial' data set for our neural networks, we consider here the 6 bit symmetry problem, which has a single output and only 64 input vectors, of which only 8 provide positive examples. The tests below show how well the network trained with a random fraction of the input vectors removed from the file during training (and reserved as test vectors). Naturally, there is some variance caused by which vectors are randomly removed from the file (especially as there are only 8 positive examples).

% data used to train # test vectors % of test vectors correct
60% 24 70%
70% 18 77%
80% 11 81%
90% 5 100%

The more input vectors that are available to train a neural network with, the better the results will be in general.

Rule extraction can explain the results of training a neural network. How much data is needed really depends on the data set and how complex the data is.

File types

Files uploaded for training a network or running on a network may be in .dat or .csv format.

We accept .csv files with an input/output vector per line or row. The comma char ',' is illegal unless occurring in a .csv file. Since we accept .csv files we cannot accept the European use of , to indicate the decimal point in numbers. The file must conform to the usual .csv standard of separating columns, and the row is terminated by a standard eoln char [lf], [cr] or [crlf].

For .dat files we accept space or tab separated files with an input/output vector per line. In this case the file columns are separated by a space or tab char and the row is terminated by a standard eoln char [lf], [cr] or [crlf].

Legal disclaimer

Once we receive the data, we reserve the right to modify the data and to add or remove additional vectors during training.

Legal    Privacy   Contact Us