<<  PowerPoint Tips and Tricks Pragmatics of deception  >>
Practical Statistical Relational AI
Practical Statistical Relational AI
Overview
Overview
Logical and Statistical AI
Logical and Statistical AI
We Need to Unify the Two
We Need to Unify the Two
Goal and Progress
Goal and Progress
Overview
Overview
Markov Networks
Markov Networks
Markov Networks
Markov Networks
Hammersley-Clifford Theorem
Hammersley-Clifford Theorem
Markov Nets vs
Markov Nets vs
Inference in Markov Networks
Inference in Markov Networks
MCMC: Gibbs Sampling
MCMC: Gibbs Sampling
Other Inference Methods
Other Inference Methods
Overview
Overview
Learning Markov Networks
Learning Markov Networks
Generative Weight Learning
Generative Weight Learning
Pseudo-Likelihood
Pseudo-Likelihood
Discriminative Weight Learning
Discriminative Weight Learning
Structure Learning
Structure Learning
Overview
Overview
First-Order Logic
First-Order Logic
Inference in First-Order Logic
Inference in First-Order Logic
Satisfiability
Satisfiability
Backtracking
Backtracking
Overview
Overview
Rule Induction
Rule Induction
Learning a Single Rule
Learning a Single Rule
Learning a Set of Rules
Learning a Set of Rules
First-Order Rule Induction
First-Order Rule Induction
[Issues in learning first-order rules]
[Issues in learning first-order rules]
Overview
Overview
[Combinations of first-order and statistical learning methods]
[Combinations of first-order and statistical learning methods]
Markov Logic
Markov Logic
Markov Logic
Markov Logic
Markov Logic: Intuition
Markov Logic: Intuition
Markov Logic: Definition
Markov Logic: Definition
Example: Friends & Smokers
Example: Friends & Smokers
Example: Friends & Smokers
Example: Friends & Smokers
Example: Friends & Smokers
Example: Friends & Smokers
Example: Friends & Smokers
Example: Friends & Smokers
Example: Friends & Smokers
Example: Friends & Smokers
Example: Friends & Smokers
Example: Friends & Smokers
Example: Friends & Smokers
Example: Friends & Smokers
Example: Friends & Smokers
Example: Friends & Smokers
Markov Logic Networks
Markov Logic Networks
Relation to Statistical Models
Relation to Statistical Models
Relation to First-Order Logic
Relation to First-Order Logic
MAP/MPE Inference
MAP/MPE Inference
MAP/MPE Inference
MAP/MPE Inference
MAP/MPE Inference
MAP/MPE Inference
MAP/MPE Inference
MAP/MPE Inference
The MaxWalkSAT Algorithm
The MaxWalkSAT Algorithm
But  Memory Explosion
But Memory Explosion
Computing Probabilities
Computing Probabilities
Ground Network Construction
Ground Network Construction
But  Insufficient for Logic
But Insufficient for Logic
Learning
Learning
Weight Learning
Weight Learning
Structure Learning
Structure Learning
Structure Learning
Structure Learning
Alchemy
Alchemy
Alchemy
Alchemy
Overview
Overview
Applications
Applications
Running Alchemy
Running Alchemy
Uniform Distribn
Uniform Distribn
Binomial Distribn
Binomial Distribn
Multinomial Distribution
Multinomial Distribution
Multinomial Distrib
Multinomial Distrib
Multinomial Distrib
Multinomial Distrib
Logistic Regression
Logistic Regression
Text Classification
Text Classification
Text Classification
Text Classification
Hypertext Classification
Hypertext Classification
Information Retrieval
Information Retrieval
Entity Resolution
Entity Resolution
Entity Resolution
Entity Resolution
Hidden Markov Models
Hidden Markov Models
Information Extraction
Information Extraction
Information Extraction
Information Extraction
Information Extraction
Information Extraction
Statistical Parsing
Statistical Parsing
Statistical Parsing
Statistical Parsing
Semantic Processing
Semantic Processing
Semantic Processing
Semantic Processing
Bayesian Networks
Bayesian Networks
Relational Models
Relational Models
Relational Models
Relational Models
Other Applications
Other Applications

: Practical Statistical Relational AI. : Pedro Domingos. : Practical Statistical Relational AI.ppt. zip-: 245 .

Practical Statistical Relational AI

Practical Statistical Relational AI.ppt
1 Practical Statistical Relational AI

Practical Statistical Relational AI

Pedro Domingos Dept. of Computer Science & Eng. University of Washington

2 Overview

Overview

Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together Applications

3 Logical and Statistical AI

Logical and Statistical AI

Field

Logical approach

Statistical approach

Knowledge representation

First-order logic

Graphical models

Automated reasoning

Satisfiability testing

Markov chain Monte Carlo

Machine learning

Inductive logic programming

Neural networks

Planning

Classical planning

Markov decision processes

Natural language processing

Definite clause grammars

Prob. context-free grammars

4 We Need to Unify the Two

We Need to Unify the Two

The real world is complex and uncertain Logic handles complexity Probability handles uncertainty

5 Goal and Progress

Goal and Progress

Goal: Make statistical relational AI as easy as purely statistical or purely logical AI Progress to date Burgeoning research area Were close enough to goal Easy-to-use open-source software available Lots of research questions (old and new)

6 Overview

Overview

Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together Applications

7 Markov Networks

Markov Networks

Smoking

Cancer

Asthma

Cough

Undirected graphical models

Potential functions defined over cliques

Smoking

Cancer

(s,c)

False

False

4.5

False

True

4.5

True

False

2.7

True

True

4.5

8 Markov Networks

Markov Networks

Smoking

Cancer

Asthma

Cough

Undirected graphical models

Log-linear model:

Weight of Feature i

Feature i

9 Hammersley-Clifford Theorem

Hammersley-Clifford Theorem

If Distribution is strictly positive (P(x) > 0) And Graph encodes conditional independences Then Distribution is product of potentials over cliques of graph Inverse is also true. (Markov network = Gibbs distribution)

10 Markov Nets vs

Markov Nets vs

Bayes Nets

Property

Markov Nets

Bayes Nets

Form

Prod. potentials

Prod. potentials

Potentials

Arbitrary

Cond. probabilities

Cycles

Allowed

Forbidden

Partition func.

Z = ? global

Z = 1 local

Indep. check

Graph separation

D-separation

Indep. props.

Some

Some

Inference

MCMC, BP, etc.

Convert to Markov

11 Inference in Markov Networks

Inference in Markov Networks

Goal: Compute marginals & conditionals of Exact inference is #P-complete Conditioning on Markov blanket is easy: Gibbs sampling exploits this

12 MCMC: Gibbs Sampling

MCMC: Gibbs Sampling

state ? random truth assignment for i ? 1 to num-samples do for each variable x sample x according to P(x|neighbors(x)) state ? state with new value of x P(F) ? fraction of states in which F is true

13 Other Inference Methods

Other Inference Methods

Many variations of MCMC Belief propagation (sum-product) Variational approximation Exact methods

14 Overview

Overview

Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together Applications

15 Learning Markov Networks

Learning Markov Networks

Learning parameters (weights) Generatively Discriminatively Learning structure (features) In this tutorial: Assume complete data (If not: EM versions of algorithms)

16 Generative Weight Learning

Generative Weight Learning

Maximize likelihood or posterior probability Numerical optimization (gradient or 2nd order) No local maxima Requires inference at each step (slow!)

17 Pseudo-Likelihood

Pseudo-Likelihood

Likelihood of each variable given its neighbors in the data Does not require inference at each step Consistent estimator Widely used in vision, spatial statistics, etc. But PL parameters may not work well for long inference chains

[Which can lead to disasterous results]

18 Discriminative Weight Learning

Discriminative Weight Learning

Maximize conditional likelihood of query (y) given evidence (x) Approximate expected counts by counts in MAP state of y given x

No. of true groundings of clause i in data

Expected no. true groundings according to model

19 Structure Learning

Structure Learning

How to learn the structure of a Markov network? not too different from learning structure for a Bayes network: discrete search through space of possible graphs, trying to maximize data probability.

20 Overview

Overview

Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together Applications

21 First-Order Logic

First-Order Logic

Constants, variables, functions, predicates E.g.: Anna, x, MotherOf(x), Friends(x, y) Literal: Predicate or its negation Clause: Disjunction of literals Grounding: Replace all variables by constants E.g.: Friends (Anna, Bob) World (model, interpretation): Assignment of truth values to all ground predicates

22 Inference in First-Order Logic

Inference in First-Order Logic

Traditionally done by theorem proving (e.g.: Prolog) Propositionalization followed by model checking turns out to be faster (often a lot) Propositionalization: Create all ground atoms and clauses Model checking: Satisfiability testing Two main approaches: Backtracking (e.g.: DPLL) Stochastic local search (e.g.: WalkSAT)

23 Satisfiability

Satisfiability

Input: Set of clauses (Convert KB to conjunctive normal form (CNF)) Output: Truth assignment that satisfies all clauses, or failure The paradigmatic NP-complete problem Solution: Search Key point: Most SAT problems are actually easy Hard region: Narrow range of #Clauses / #Variables

24 Backtracking

Backtracking

Assign truth values by depth-first search Assigning a variable deletes false literals and satisfied clauses Empty set of clauses: Success Empty clause: Failure Additional improvements: Unit propagation (unit clause forces truth value) Pure literals (same truth value everywhere)

25 Overview

Overview

Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together Applications

26 Rule Induction

Rule Induction

Given: Set of positive and negative examples of some concept Example: (x1, x2, , xn, y) y: concept (Boolean) x1, x2, , xn: attributes (assume Boolean) Goal: Induce a set of rules that cover all positive examples and no negative ones Rule: xa ^ xb ^ ? y (xa: Literal, i.e., xi or its negation) Same as Horn clause: Body ? Head Rule r covers example x iff x satisfies body of r Eval(r): Accuracy, info. gain, coverage, support, etc.

27 Learning a Single Rule

Learning a Single Rule

head ? y body ? ? repeat for each literal x rx ? r with x added to body Eval(rx) body ? body ^ best x until no x improves Eval(r) return r

[For Eval(r): something like a one-sided version of information gain works pretty well see Quinlans FOIL- W]

28 Learning a Set of Rules

Learning a Set of Rules

R ? ? S ? examples repeat learn a single rule r R ? R U { r } S ? S ? positive examples covered by r until S contains no positive examples return R

29 First-Order Rule Induction

First-Order Rule Induction

y and xi are now predicates with arguments E.g.: y is Ancestor(x,y), xi is Parent(x,y) Literals to add are predicates or their negations Literal to add must include at least one variable already appearing in rule Adding a literal changes # groundings of rule E.g.: Ancestor(x,z) ^ Parent(z,y) ? Ancestor(x,y) Eval(r) must take this into account E.g.: Multiply by # positive groundings of rule still covered after adding literal

30 [Issues in learning first-order rules]

[Issues in learning first-order rules]

First-order rules can be expensive to evaluate Circuit(x,n) ? Edge(x,z1),Edge(z1,z2),,Edge(zn,x), z1!=z2,z1!=z2,,z1!=zn,z2!=z3,,z{n-1}!=zn. First-order theories can have long proofs Eg, Ackermans function First-order rules can be very expressive F(a,b,c,d,y) ? Nand(a,b,z1),Nor(c,d,z2),Xor(z2,z2,z3),Not(z3,y)

31 Overview

Overview

Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together Applications

32 [Combinations of first-order and statistical learning methods]

[Combinations of first-order and statistical learning methods]

Stochastic logic programs Horn clause programs + probabilities Probabilistic relational models (PRMs) Bayes networks defined by frames Relational Markov networks (PRMs) Markov networks defined by SQL queries Bayesian logic (BLOG), Hierarchical Bayesian Compiler (HBC) Bayes networks defined by special language Markov logic networks

33 Markov Logic

Markov Logic

Logical language: First-order logic Probabilistic language: Markov networks Syntax: First-order formulas with weights Semantics: Templates for Markov net features Learning: Parameters: Generative or discriminative Structure: ILP with arbitrary clauses and MAP score Inference: MAP: Weighted satisfiability Marginal: MCMC with moves proposed by SAT solver Partial grounding + Lazy inference

34 Markov Logic

Markov Logic

Most developed approach to date Many other approaches can be viewed as special cases [Main focus of rest of this lecture]

35 Markov Logic: Intuition

Markov Logic: Intuition

A logical KB is a set of hard constraints on the set of possible worlds Lets make them soft constraints: When a world violates a formula, It becomes less probable, not impossible Give each formula a weight (Higher weight ? Stronger constraint)

36 Markov Logic: Definition

Markov Logic: Definition

A Markov Logic Network (MLN) is a set of pairs (F, w) where F is a formula in first-order logic w is a real number Together with a set of constants, it defines a Markov network with One node for each grounding of each predicate in the MLN One feature for each grounding of each formula F in the MLN, with the corresponding weight w

37 Example: Friends & Smokers

Example: Friends & Smokers

38 Example: Friends & Smokers

Example: Friends & Smokers

39 Example: Friends & Smokers

Example: Friends & Smokers

40 Example: Friends & Smokers

Example: Friends & Smokers

Two constants: Anna (A) and Bob (B)

41 Example: Friends & Smokers

Example: Friends & Smokers

Two constants: Anna (A) and Bob (B)

Smokes(A)

Smokes(B)

Cancer(A)

Cancer(B)

42 Example: Friends & Smokers

Example: Friends & Smokers

Two constants: Anna (A) and Bob (B)

Friends(A,B)

Friends(A,A)

Smokes(A)

Smokes(B)

Friends(B,B)

Cancer(A)

Cancer(B)

Friends(B,A)

43 Example: Friends & Smokers

Example: Friends & Smokers

Two constants: Anna (A) and Bob (B)

Friends(A,B)

Friends(A,A)

Smokes(A)

Smokes(B)

Friends(B,B)

Cancer(A)

Cancer(B)

Friends(B,A)

44 Example: Friends & Smokers

Example: Friends & Smokers

Two constants: Anna (A) and Bob (B)

Friends(A,B)

Friends(A,A)

Smokes(A)

Smokes(B)

Friends(B,B)

Cancer(A)

Cancer(B)

Friends(B,A)

45 Markov Logic Networks

Markov Logic Networks

MLN is template for ground Markov nets Probability of a world x: Typed variables and constants greatly reduce size of ground Markov net Functions, existential quantifiers, etc. Infinite and continuous domains

Weight of formula i

No. of true groundings of formula i in x

46 Relation to Statistical Models

Relation to Statistical Models

Special cases: Markov networks Markov random fields Bayesian networks Log-linear models Exponential models Max. entropy models Gibbs distributions Boltzmann machines Logistic regression Hidden Markov models Conditional random fields

Obtained by making all predicates zero-arity Markov logic allows objects to be interdependent (non-i.i.d.)

47 Relation to First-Order Logic

Relation to First-Order Logic

Infinite weights ? First-order logic Satisfiable KB, positive weights ? Satisfying assignments = Modes of distribution Markov logic allows contradictions between formulas

48 MAP/MPE Inference

MAP/MPE Inference

Problem: Find most likely state of world given evidence

Query

Evidence

49 MAP/MPE Inference

MAP/MPE Inference

Problem: Find most likely state of world given evidence

50 MAP/MPE Inference

MAP/MPE Inference

Problem: Find most likely state of world given evidence

51 MAP/MPE Inference

MAP/MPE Inference

Problem: Find most likely state of world given evidence This is just the weighted MaxSAT problem Use weighted SAT solver (e.g., MaxWalkSAT [Kautz et al., 1997] ) Potentially faster than logical inference (!)

52 The MaxWalkSAT Algorithm

The MaxWalkSAT Algorithm

for i ? 1 to max-tries do solution = random truth assignment for j ? 1 to max-flips do if ? weights(sat. clauses) > threshold then return solution c ? random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes ? weights(sat. clauses) return failure, best solution found

53 But  Memory Explosion

But Memory Explosion

Problem: If there are n constants and the highest clause arity is c, the ground network requires O(n ) memory Solution: Exploit sparseness; ground clauses lazily ? LazySAT algorithm [Singla & Domingos, 2006]

c

54 Computing Probabilities

Computing Probabilities

P(Formula|MLN,C) = ? MCMC: Sample worlds, check formula holds P(Formula1|Formula2,MLN,C) = ? If Formula2 = Conjunction of ground atoms First construct min subset of network necessary to answer query (generalization of KBMC) Then apply MCMC (or other) Can also do lifted inference [Braz et al, 2005]

55 Ground Network Construction

Ground Network Construction

network ? ? queue ? query nodes repeat node ? front(queue) remove node from queue add node to network if node not in evidence then add neighbors(node) to queue until queue = ?

56 But  Insufficient for Logic

But Insufficient for Logic

Problem: Deterministic dependencies break MCMC Near-deterministic ones make it very slow Solution: Combine MCMC and WalkSAT ? MC-SAT algorithm [Poon & Domingos, 2006]

57 Learning

Learning

Data is a relational database Closed world assumption (if not: EM) Learning parameters (weights) Learning structure (formulas)

58 Weight Learning

Weight Learning

Parameter tying: Groundings of same clause Generative learning: Pseudo-likelihood Discriminative learning: Cond. likelihood, use MC-SAT or MaxWalkSAT for inference

No. of times clause i is true in data

Expected no. times clause i is true according to MLN

59 Structure Learning

Structure Learning

Generalizes feature induction in Markov nets Any inductive logic programming approach can be used, but . . . Goal is to induce any clauses, not just Horn Evaluation function should be likelihood Requires learning weights for each candidate Turns out not to be bottleneck Bottleneck is counting clause groundings Solution: Subsampling

60 Structure Learning

Structure Learning

Initial state: Unit clauses or hand-coded KB Operators: Add/remove literal, flip sign Evaluation function: Pseudo-likelihood + Structure prior Search: Beam, shortest-first, bottom-up [Kok & Domingos, 2005; Mihalkova & Mooney, 2007]

61 Alchemy

Alchemy

Open-source software including: Full first-order logic syntax Generative & discriminative weight learning Structure learning Weighted satisfiability and MCMC Programming language features

alchemy.cs.washington.edu

62 Alchemy

Alchemy

Prolog

BUGS

Represent-ation

F.O. Logic + Markov nets

Horn clauses

Bayes nets

Inference

Model check- ing, MC-SAT

Theorem proving

Gibbs sampling

Learning

Parameters & structure

No

Params.

Uncertainty

Yes

No

Yes

Relational

Yes

Yes

No

63 Overview

Overview

Motivation Foundational areas Probabilistic inference Statistical learning Logical inference Inductive logic programming Putting the pieces together Applications

64 Applications

Applications

Basics Logistic regression Hypertext classification Information retrieval Entity resolution Hidden Markov models Information extraction

Statistical parsing Semantic processing Bayesian networks Relational models Robot mapping Planning and MDPs Practical tips

65 Running Alchemy

Running Alchemy

Programs Infer Learnwts Learnstruct Options

MLN file Types (optional) Predicates Formulas Database files

66 Uniform Distribn

Uniform Distribn

: Empty MLN

Example: Unbiased coin flips Type: flip = { 1, , 20 } Predicate: Heads(flip)

67 Binomial Distribn

Binomial Distribn

: Unit Clause

Example: Biased coin flips Type: flip = { 1, , 20 } Predicate: Heads(flip) Formula: Heads(f) Weight: Log odds of heads: By default, MLN includes unit clauses for all predicates (captures marginal distributions, etc.)

68 Multinomial Distribution

Multinomial Distribution

Example: Throwing die Types: throw = { 1, , 20 } face = { 1, , 6 } Predicate: Outcome(throw,face) Formulas: Outcome(t,f) ^ f != f => !Outcome(t,f). Exist f Outcome(t,f). Too cumbersome!

69 Multinomial Distrib

Multinomial Distrib

: ! Notation

Example: Throwing die Types: throw = { 1, , 20 } face = { 1, , 6 } Predicate: Outcome(throw,face!) Formulas: Semantics: Arguments without ! determine arguments with !. Also makes inference more efficient (triggers blocking).

70 Multinomial Distrib

Multinomial Distrib

: + Notation

Example: Throwing biased die Types: throw = { 1, , 20 } face = { 1, , 6 } Predicate: Outcome(throw,face!) Formulas: Outcome(t,+f) Semantics: Learn weight for each grounding of args with +.

71 Logistic Regression

Logistic Regression

Logistic regression: Type: obj = { 1, ... , n } Query predicate: C(obj) Evidence predicates: Fi(obj) Formulas: a C(x) bi Fi(x) ^ C(x) Resulting distribution: Therefore: Alternative form: Fi(x) => C(x)

72 Text Classification

Text Classification

page = { 1, , n } word = { } topic = { } Topic(page,topic!) HasWord(page,word) !Topic(p,t) HasWord(p,+w) => Topic(p,+t)

73 Text Classification

Text Classification

Topic(page,topic!) HasWord(page,word) HasWord(p,+w) => Topic(p,+t)

74 Hypertext Classification

Hypertext Classification

Topic(page,topic!) HasWord(page,word) Links(page,page) HasWord(p,+w) => Topic(p,+t) Topic(p,t) ^ Links(p,p') => Topic(p',t) Cf. S. Chakrabarti, B. Dom & P. Indyk, Hypertext Classification Using Hyperlinks, in Proc. SIGMOD-1998.

75 Information Retrieval

Information Retrieval

InQuery(word) HasWord(page,word) Relevant(page) InQuery(w+) ^ HasWord(p,+w) => Relevant(p) Relevant(p) ^ Links(p,p) => Relevant(p) Cf. L. Page, S. Brin, R. Motwani & T. Winograd, The PageRank Citation Ranking: Bringing Order to the Web, Tech. Rept., Stanford University, 1998.

76 Entity Resolution

Entity Resolution

Problem: Given database, find duplicate records HasToken(token,field,record) SameField(field,record,record) SameRecord(record,record) HasToken(+t,+f,r) ^ HasToken(+t,+f,r) => SameField(f,r,r) SameField(f,r,r) => SameRecord(r,r) SameRecord(r,r) ^ SameRecord(r,r) => SameRecord(r,r) Cf. A. McCallum & B. Wellner, Conditional Models of Identity Uncertainty with Application to Noun Coreference, in Adv. NIPS 17, 2005.

77 Entity Resolution

Entity Resolution

Can also resolve fields: HasToken(token,field,record) SameField(field,record,record) SameRecord(record,record) HasToken(+t,+f,r) ^ HasToken(+t,+f,r) => SameField(f,r,r) SameField(f,r,r) <=> SameRecord(r,r) SameRecord(r,r) ^ SameRecord(r,r) => SameRecord(r,r) SameField(f,r,r) ^ SameField(f,r,r) => SameField(f,r,r) More: P. Singla & P. Domingos, Entity Resolution with Markov Logic, in Proc. ICDM-2006.

78 Hidden Markov Models

Hidden Markov Models

obs = { Obs1, , ObsN } state = { St1, , StM } time = { 0, , T } State(state!,time) Obs(obs!,time) State(+s,0) State(+s,t) => State(+s',t+1) Obs(+o,t) => State(+s,t)

79 Information Extraction

Information Extraction

Problem: Extract database from text or semi-structured sources Example: Extract database of publications from citation list(s) (the CiteSeer problem) Two steps: Segmentation: Use HMM to assign tokens to fields Entity resolution: Use logistic regression and transitivity

80 Information Extraction

Information Extraction

Token(token, position, citation) InField(position, field, citation) SameField(field, citation, citation) SameCit(citation, citation) Token(+t,i,c) => InField(i,+f,c) InField(i,+f,c) <=> InField(i+1,+f,c) f != f => (!InField(i,+f,c) v !InField(i,+f,c)) Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i,c) ^ InField(i,+f,c) => SameField(+f,c,c) SameField(+f,c,c) <=> SameCit(c,c) SameField(f,c,c) ^ SameField(f,c,c) => SameField(f,c,c) SameCit(c,c) ^ SameCit(c,c) => SameCit(c,c)

81 Information Extraction

Information Extraction

Token(token, position, citation) InField(position, field, citation) SameField(field, citation, citation) SameCit(citation, citation) Token(+t,i,c) => InField(i,+f,c) InField(i,+f,c) ^ !Token(.,i,c) <=> InField(i+1,+f,c) f != f => (!InField(i,+f,c) v !InField(i,+f,c)) Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i,c) ^ InField(i,+f,c) => SameField(+f,c,c) SameField(+f,c,c) <=> SameCit(c,c) SameField(f,c,c) ^ SameField(f,c,c) => SameField(f,c,c) SameCit(c,c) ^ SameCit(c,c) => SameCit(c,c) More: H. Poon & P. Domingos, Joint Inference in Information Extraction, in Proc. AAAI-2007. (Tomorrow at 4:20.)

82 Statistical Parsing

Statistical Parsing

Input: Sentence Output: Most probable parse PCFG: Production rules with probabilities E.g.: 0.7 NP ? N 0.3 NP ? Det N WCFG: Production rules with weights (equivalent) Chomsky normal form: A ? B C or A ? a

83 Statistical Parsing

Statistical Parsing

Evidence predicate: Token(token,position) E.g.: Token(pizza, 3) Query predicates: Constituent(position,position) E.g.: NP(2,4) For each rule of the form A ? B C: Clause of the form B(i,j) ^ C(j,k) => A(i,k) E.g.: NP(i,j) ^ VP(j,k) => S(i,k) For each rule of the form A ? a: Clause of the form Token(a,i) => A(i,i+1) E.g.: Token(pizza, i) => N(i,i+1) For each nonterminal: Hard formula stating that exactly one production holds MAP inference yields most probable parse

84 Semantic Processing

Semantic Processing

Weighted definite clause grammars: Straightforward extension Combine with entity resolution: NP(i,j) => Entity(+e,i,j) Word sense disambiguation: Use logistic regression Semantic role labeling: Use rules involving phrase predicates Building meaning representation: Via weighted DCG with lambda calculus (cf. Zettlemoyer & Collins, UAI-2005) Another option: Rules of the form Token(a,i) => Meaning and MeaningB ^ MeaningC ^ => MeaningA Facilitates injecting world knowledge into parsing

85 Semantic Processing

Semantic Processing

Example: John ate pizza. Grammar: S ? NP VP VP ? V NP V ? ate NP ? John NP ? pizza Token(John,0) => Participant(John,E,0,1) Token(ate,1) => Event(Eating,E,1,2) Token(pizza,2) => Participant(pizza,E,2,3) Event(Eating,e,i,j) ^ Participant(p,e,j,k) ^ VP(i,k) ^ V(i,j) ^ NP(j,k) => Eaten(p,e) Event(Eating,e,j,k) ^ Participant(p,e,i,j) ^ S(i,k) ^ NP(i,j) ^ VP(j,k) => Eater(p,e) Event(t,e,i,k) => Isa(e,t) Result: Isa(E,Eating), Eater(John,E), Eaten(pizza,E)

86 Bayesian Networks

Bayesian Networks

Use all binary predicates with same first argument (the object x). One predicate for each variable A: A(x,v!) One clause for each line in the CPT and value of the variable Context-specific independence: One Horn clause for each path in the decision tree Logistic regression: As before Noisy OR: Deterministic OR + Pairwise clauses

87 Relational Models

Relational Models

Knowledge-based model construction Allow only Horn clauses Same as Bayes nets, except arbitrary relations Combin. function: Logistic regression, noisy-OR or external Stochastic logic programs Allow only Horn clauses Weight of clause = log(p) Add formulas: Head holds => Exactly one body holds Probabilistic relational models Allow only binary relations Same as Bayes nets, except first argument can vary

88 Relational Models

Relational Models

Relational Markov networks SQL ? Datalog ? First-order logic One clause for each state of a clique * syntax in Alchemy facilitates this Bayesian logic Object = Cluster of similar/related observations Observation constants + Object constants Predicate InstanceOf(Obs,Obj) and clauses using it Unknown relations: Second-order Markov logic More: S. Kok & P. Domingos, Statistical Predicate Invention, in Proc. ICML-2007.

89 Other Applications

Other Applications

Transfer learning L. Mihalkova & R. Mooney, Mapping and Revising Markov Logic Networks for Transfer Learning, in Proc. AAAI-2007. (Tomorrow at 3:20.) CALO project T. Dietterich, Experience with Markov Logic Networks in a Large AI System, in Proc. PLRL-2007. Etc.

Practical Statistical Relational AI
http://900igr.net/prezentacija/anglijskij-jazyk/practical-statistical-relational-ai-70150.html
c

661

29
900igr.net > > > Practical Statistical Relational AI