Building Semantic Web reasoning engines

2022-08

https://josd.github.io/Talks/2022/08swig/

by Jos De Roo of KNoWS office of IDLab - Ghent University - imec.

Context is Semantic Web (some slides from Tim Berners-Lee, Doerthe Arndt and William Van Woensel)

Subject, verb and object

All knowledge is just a set of statements

<#pat> <#knows> <#jo> .

### in classical logic: knows(pat,jo)

Rules Are Just Statements


#   subject        verb        object
#=============  ==========    ==============
{ ?x :son ?y }      =>        { ?y a :Male }.
{ ?x :son ?y }  log:implies   { ?y a :Male }.

### in classical logic: ∀x ∀y son(x,y) ⇒ male(y)

The terms in braces { } are formulas.

The rule statement relates two formulas.

Queries and Closure

Queries and Closure (cont)

Layer Cake

sweb-stack

We want inter-engine interop which requires proof: SemWave diagram from Tim Berners-Lee

SemWave

We need the full potential of the Semantic Web Databus and Proofbus from Tim Berners-Lee

sweb-bus

EYE, an open source reasoning engine

Detailed design of EYE

Basic EAM (Euler Abstract Machine)

This is what the basic EAM (Euler Abstract Machine) does in a nutshell:

  1. Select rule P => C
  2. Prove P & NOT(C) (backward chaining) and if it fails backtrack to 1/
  3. If P & NOT(C) assert C (forward chaining) and remove brake
  4. If C = answer(A) and tactic limited-answer stop, else backtrack to 2/
  5. If brake or tactic linear-select stop, else start again at 1/

Forward vs. Backward Chaining

In N3 you can indicate how a rule should be applied:

Proof generation

[] a r:Proof, r:Conjunction;
  r:component <#lemma1>;
  r:component <#lemma2>;
  r:gives {
    :Socrates a :Human.
    :Socrates a :Mortal.
  }.

...

Proof generation (cont)

<#lemma5> a r:Inference;
  r:gives { :Socrates a :Mortal };
  r:evidence ( <#lemma6> <#lemma3> );
  r:binding [ r:variable [ n3:uri "http://josd.github.io/var#x_0"]; r:boundTo [ n3:uri "http://example.org/socrates#Human"]];
  r:binding [ r:variable [ n3:uri "http://josd.github.io/var#x_1"]; r:boundTo [ n3:uri "http://example.org/socrates#Mortal"]];
  r:binding [ r:variable [ n3:uri "http://josd.github.io/var#x_2"]; r:boundTo [ n3:uri "http://example.org/socrates#Socrates"]];
  r:rule <#lemma7>.

<#lemma6> a r:Extraction;
  r:gives { :Human rdfs:subClassOf :Mortal };
  r:because [ a r:Parsing; r:source <http://josd.github.io/eye/reasoning/socrates/socrates.n3>].

<#lemma7> a r:Extraction;
  r:gives {
    @forAll var:x_0, var:x_1, var:x_2. {var:x_0 rdfs:subClassOf var:x_1.
     var:x_2 a var:x_0} => {var:x_2 a var:x_1}.
  };
  r:because [ a r:Parsing; r:source <http://josd.github.io/eye/reasoning/socrates/socrates.n3>].

Deep just-in-time indexing

Built-ins

  • Predicates with special meanings
  • "Traditional" N3 built-ins: https://www.w3.org/2000/10/swap/doc/CwmBuiltins
  • "New" built-ins are currently discussed in the W3C N3 community group
  • List of EYE built-ins: https://github.com/josd/eye/blob/master/eye-builtins.n3
  • EYE command line interface

    Usage: eye <options>* <data>* <query>*
    
    <data>
        [--n3] <uri>                    N3 triples and rules
        --blogic <uri>                  RDF surfaces
        --n3p <uri>                     N3P intermediate
        --proof <uri>                   N3 proof lemmas
        --turtle <uri>                  Turtle triples
    <query>
        --entail <rdf-graph>            output true if RDF graph is entailed
        --not-entail <rdf-graph>        output true if RDF graph is not entailed
        --pass                          output deductive closure
        --pass-all                      output deductive closure plus rules
        --pass-all-ground               ground the rules and run --pass-all
        --pass-only-new                 output only new derived triples
        --query <n3-query>              output filtered with filter rules
    

    EYE command line interface (cont)

    <options>
        --csv-separator <separator>     CSV separator such as , or ;
        --debug                         output debug info on stderr
        --debug-cnt                     output debug info about counters on stderr
        --debug-djiti                   output debug info about DJITI on stderr
        --debug-pvm                     output debug info about PVM code on stderr
        --help                          show help info
        --hmac-key <key>                HMAC key used in e:hmac-sha built-in
        --ignore-inference-fuse         do not halt in case of inference fuse
        --image <pvm-file>              output all <data> and all code to <pvm-file>
        --intermediate <n3p-file>       output all <data> to <n3p-file>
        --license                       show license info
        --multi-query                   go into query answer loop
        --no-distinct-input             no distinct triples in the input
        --no-distinct-output            no distinct answers in the output
        --no-numerals                   no numerals in the output
        --no-qnames                     no qnames in the output
        --no-qvars                      no qvars in the output
        --no-ucall                      no extended unifier for forward rules
        --nope                          no proof explanation
        --profile                       output profile info on stderr
        --quantify <prefix>             quantify uris with <prefix> in the output
        --quiet                         quiet mode
        --random-seed                   create random seed for e:random built-in
        --restricted                    restricting to core built-ins
        --rule-histogram                output rule histogram info on stderr
        --skolem-genid <genid>          use <genid> in Skolem IRIs
        --source <file>                 read command line arguments from <file>
        --statistics                    output statistics info on stderr
        --strings                       output log:outputString objects on stdout
        --tactic limited-answer <nr>    give only a limited number of answers
        --tactic limited-brake <nr>     take only a limited number of brakes
        --tactic limited-step <nr>      take only a limited number of steps
        --tactic limited-witness <nr>   use only a limited number of witnesses
        --tactic linear-select          select each rule only once
        --version                       show version info
        --warn                          output warning info on stderr
        --wcache <uri> <file>           to tell that <uri> is cached as <file>
    

    Use case GPS4IntegratedCare

    The GPS4IntegratedCare project objective is Automatic generation of dynamic and personalized care workflows

    Technologies used:

    Goal driven Parallel Sequences (GPS):

    Use case GPS4IntegratedCare lessons learned

    The project worked out fine, but the architecture is centralized around a single smart workflow engine and that is really not scalable.

    The proposal is to address the scalability with MAP

    Multi-Agent Proofs (MAP):

    This is just a proposal with a proof of concept in which agent1 and agent2 are GPS agents and agent2 reaches his goal thanks to the lemmata made by agent1.

    Multi-Agent Proofs: Agent2 reaches his goal thanks to the lemmata made by Agent1

    Agent2-proof makes use of lemma9 from Agent1-proof

    <#lemma13> a r:Extraction;
      r:gives {
        <http://josd.github.io/eye/reasoning/map/agent1-proof.n3#lemma9> a r:Inference.
      };
      r:because [ a r:Parsing; r:source <http://josd.github.io/eye/reasoning/map/agent1-proof.n3>].
    
    <#lemma14> a r:Extraction;
      r:gives {
        <http://josd.github.io/eye/reasoning/map/agent1-proof.n3#lemma9> r:gives {
          :map-BE gps:description ({:i1 :location :Gent} true {:i1 :location :Brugge} :drive_gent_brugge 1500.0 0.006 0.96 0.99)
        }.
      };
      r:because [ a r:Parsing; r:source <http://josd.github.io/eye/reasoning/map/agent1-proof.n3>].
    

    Reverse the burden of proof

    The burden of proof is now on the server:

    Reverse the burden of proof:

    This is much more scalable but requires client side reasoning + proof generation

    The proof could be provided as HTTP GET payload

    Testing: Examples and Test Cases

    Advanced Clinical Applications

    ACAS

    Basic MONADIC Benchmark

           triples   cycles |         cwm         eye        jena
                            |       [sec]       [sec]       [sec]
    ------------------------|------------------------------------
            10,000    1,518 |      15.400       0.420       3.200
           100,000      645 |      99.860       2.190       9.310
         1,000,000      380 |   1,094.860      21.170      75.380
        10,000,000      480 |  (OutOfMem)     231.610     853.060
       100,000,000      480 |               2,640.580   9,217.800
     1,000,000,000      431 |              32,474.540  (OutOfMem)
    
    Test environment:
    
    Linux 4.0.5 x86_64
    processor	: 0
    model name	: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
    processor	: 1
    model name	: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
    processor	: 2
    model name	: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
    processor	: 3
    model name	: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
    MemTotal:       264163268 kB
    
    
    Reference:
    
    https://github.com/josd/bmb
    

    Deep taxonomy benchmark

             depth |         cwm         eye      hermit       jdrew        jena
                   |       [sec]       [sec]       [sec]       [sec]       [sec]
    ---------------|------------------------------------------------------------
                10 |       0.160       0.018       0.055       0.130       0.047
               100 |       1.050       0.022       1.040       0.200       0.422
             1,000 |      65.930       0.063       3.580       0.870       9.302
            10,000 |   7,298.000       0.482     310.510      18.680   2,597.242
           100,000 | 732,974.070       4.808  (OutOfMem)   1,875.000  (OutOfMem)
         1,000,000 |  (848 days)      48.434              (OutOfMem)
    
    
    Test environment:
    
    Linux 4.0.5 x86_64
    processor	: 0
    model name	: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
    processor	: 1
    model name	: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
    processor	: 2
    model name	: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
    processor	: 3
    model name	: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
    MemTotal:       264163268 kB
    
    
    Reference:
    
    http://ruleml.org/WellnessRules/files/WellnessRulesN3-2009-11-10.pdf
    

    RESTdesc Composition Benchmark

            length |         cwm         eye
                   |       [sec]       [sec]
    ---------------|------------------------
                 2 |       0.188       0.029
                 4 |       0.371       0.032
                 8 |       1.004       0.038
                16 |       3.504       0.053
                32 |      13.968       0.085
                64 |      58.689       0.157
               128 |     251.361       0.344
               256 |   1,081.179       0.936
               512 | (MaxRecurs)       2.894
             1,024 |                   9.764
    
    
    Test environment:
    
    Linux 4.0.5 x86_64
    processor	: 0
    model name	: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
    processor	: 1
    model name	: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
    processor	: 2
    model name	: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
    processor	: 3
    model name	: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
    MemTotal:       264163268 kB
    
    
    Reference:
    
    https://github.com/RubenVerborgh/RESTdesc-Composition-Benchmark
    

    Online N3 editor by William Van Woensel

    http://ppr.cs.dal.ca:3002/n3/editor/

    Online course by Ruben Verborgh

    Semantic Web Reasoning With EYE

    Thank You

    Thank you for your attention

    https://josd.github.io/Talks/2022/08swig/