4. VSM Examples
Here are lots of explained, interactive examples.
Make sure to have read the ‘VSM in a nutshell’ summary at least! :
- How to use VSM-boxes !
- Short Story’s examples
- Full Story’s and Various examples
- Bidents: more examples
- Bidents: with numbers, and more on ‘implicit vs explicit ‘context’
- Tridents: various examples to give you some more inspiration
- Tridents: biological examples
- List connector
- Coreference connector
- Interchangeable vs. non-interchangeable connections
- Head: semantic point of view
- General and Data term types
- Further examples
- Various biological use cases:
- Various use cases: inspired by you
1. How to interact with the VSM-box (prototype version)
- To enter a VSM-term:
- Just type and select an item from the Autocomplete list.
- Only a few ‘toy’ terms are preloaded in this page’s dictionaries. But you can type, and then select Autocomplete’s last item to add new terms!
- Double-click an existing VSM-term to edit it. Press
Esctwice to cancel.
- To add a VSM-connector:
- Trident: 1) click above the term that functions as the subunit’s subject, 2) then above the one that functions as its relation, 3) and then its object. This assigns a ‘triple’.
- Bident with no subject: 1) click twice above the relation term (this skips the subject), 2) click above the object.
- Bident with no relation: 1) click above the subject, 2) click twice above the object (skips the relation).
- Bident with no object: 1) click above subject, 2) click above object,
Escor click outside the VSM-box (skips the object).
- List-connector: 1)
Shift+ click above the list-relation, 2) click above each list-element, 3) press
Escor click outside the VSM-box (this finishes the list).
- Coreference: 1)
Ctrl+ click above the child term, 2) click above the parent term.
- To remove:
Backspaceto remove the last one, or hover and select Remove from the menu.
- VSM-connector: mouse-hover it and click the
xthat appeared on its top right.
- To reorder:
- You can drag a VSM-term around (e.g. after adding a new one at the end). Each connector leg stays attached to its own VSM-term. Connector stacking order is auto-reoptimized.
- To assign a Head:
Alt+ click above the VSM-term.
- To change term type (specific/general/data):
Ctrl+ click on the VSM-term itself.
- To assign a Head:
2. Short Story’s examples, and more
Tridents: John and the chicken
- Connector 1: John eats chicken.
- Connector 2: That ‘eating’ (of chicken, by John): it happens with a fork.
So it’s specifically the eating, that happens ‘with’/‘by use of’ a fork!
It’s not the chicken that would happen to be using the fork:
… nor John who is using the fork:
… using it for something else, as in this example:
- Extra connector 1 = “the-use-of (…) has-purpose to-hold (…)”. (Hover the VSM-term “for” ).
- Extra connector 2 = “to-hold tomato”.
- “with” (from the previous example) and “using” (here) have the same identifier! They represent the same ‘concept’ or idea or ‘thing’.
- over them both, and see they have the same Class ID.
- Remember: a VSM-term is a coupled literal term + an ID.
- In our prototype, the coupled ID is represented by a long number called Class ID,
- and the coupled term is simply a piece of text, but in our prototype it is also represented by its own number: its term ID.
- So here, “with” and “using” have the same Class ID, but a different Term ID.
- (As said before: these IDs are only stored in a temporary example-dictionary, embedded in this web page, and not in an online database).
Two chickens and more context
- We have two different types of “chicken” here: the type of person, and the type of animal.
Mouse-hover each to verify.
- Also, we have two different meanings for “with” here: the first one means ‘accompanied with’, and the second one means ‘using’. (So we could also have used the synonym “using”, like earlier).
= Key demo: You can always add more context details, to any VSM-term. You do this by adding more terms,
and by attaching them with a tri/bident. And you can do this recursively!
Like: “chicken” → “burnt chicken” (or: “chicken [is / has_attribute] burnt”), and “burnt” → “very burnt”. Likewise you could say something more about “newspaper”; or about any other term.
Tridents: a simple biological case
…as on the VSM summary page.
This is the very same example, only to
demonstrate that the preposition in was just an easier-to-read ‘avatar’ for the same concept is-located-in.
(Again, you can mousehover to verify that they have the same Class ID).
Here we added a little more context to ‘A’.
The biological showcase
- To spell this out, it says: “The being-twisted-of-a-leaf-blade pertains-to Rice, which was modified to underexpress the YAB3 gene,
performed by using an RNAi experiment type.”
(Hover over each term to see their definitions!)
- By capturing context information like this, one can e.g. use particular contexts (like experimental evidence) to calculate a confidence level, and then biologists can rank of filter query results by that confidence level!
- Note: here, the term in means ‘pertains to’, not ‘is located in’.
Mousehover the ‘in‘ here and the one above, to see that they have a different definition and Class ID, but same Term ID.
- Also try Autocomplete to select the intended meaning of a term ‘in‘.
(Click to the right of the last term to enter an extra term).
- Also try Autocomplete to select the intended meaning of a term ‘in‘.
3. Full Story’s and Various examples
The trident can miss any of its three legs, and become a bident:
● Bident without Object leg: (for relations that have no object):
● Bident without Subject leg. – And compare the structural similarity with the case where a Subject is known:
…and how artificial it would be, to capture this if bidents were not available:
● Bident without Relation leg: (or: with an implicit relation):
…from which the computer can easily infer: (once it knows that the term ‘white’ (or its ID) is classified as a Color):
Note: we indicated the head VSM-term in the above two VSM-phrases, with dashes above it. Because “mouse” is the concept that we ‘focus on’; and whose meaning is equivalent in both examples.
Bidents: more examples
This shows a bident that specifies: “to-strike a-match”. (Or more precisely: “a-striking(-of) a-match”).
It is not said who strikes the match, so this subunit has no Subject term.
Two implicit relations made explicit, based on the fact that “big” represents a ‘size’, and “very” represents a ‘qualifier’. As explained in the main text.
Here (as mentioned in the main text), the made-explicit relation is associated with the attribute “gene”‘s base concept “expression” instead. For this to be a valid sentence, the software that supports the interpretation of this type of deduction should know about such a rule, though.
An example with two types of bidents.
Bidents: with numbers
The implicit relation in the 1st sentence was made explicit in the second sentence: it is “having count”, based on the fact that “5” represents a numeric concept.
The term “approximately” modifies the term “5”.
Note! : this illustrates that “5” is not always a concept that represents ‘an exact 5’ !
• It is the idea of a ‘Five’, which can be placed in a certain context. This context can be “exactly …”, and one could make that the default assumed context. Or it could for example be “approximately …”, or a ‘five’ with some specified error margin (see a bit further).
• (This is also why we represent “5” as a ‘specific concept’ here (blue VSM-term), and not as a red ‘Data’ term. (See later)).
• One could quip: « It’s Five , Jim, but not as we know it! »
An implicit-vs-explicit example of measurement units and values, expressed with VSM.
- The simple explanation (this is enough for most people):
This says: “X” is declared to have a certain concentration, given in “mg/l”; and that concentration is declared to have the amount “5″.
- A deeper explanation of the semantics behind it. And thoughts on implicit vs. explicit context.
(See also VSM Principle 4, later on the VSMGraphs page).
Conceptually, this phrase is built up like this:
- Step 1. Consider the VSM-term “X”, before it is connected, i.e. while it stands on its own. At that point it is not yet stated explicitly how many / how much / what kind / etc. of “X” we are dealing with here. Because it is not yet placed in (connected with) any explicit context. So it could be anything. Still, it is a blue VSM-term and thus it is a specific “X”, meaning that it comes embedded in some implicit context. So it is not “X” in general, but some specific (amount/kind/…) “X” that we have in mind. – So next, we will connect other VSM-terms to it, and make some of that implicit context explicit.
- Step 2. By connecting the rightmost bident, we specify (make explicit) that it’s an “X” that comes in some concentration, and that it’s one that will be given in a unit: “mg/l”.
- Step 3. By connecting the leftmost bident, we specify that this concentration (which also could have been anything up till then) has the amount of “5″.
- Note: any term always has more implicit context, much of which we may never even be able to make explicit.
For example: What temperature was X at? Which other molecules or impurities interacted with X during the experiment?
Did X come from a particular manufacturer? Was it pre-processed? How, and by who?
And even: what isotopes did X’s atoms consist of? Etc. – So there is always more ‘context’ present.
All these things are what make up any specific VSM-term’s implicit context. They are absent details, but they are still what make the VSM-term a ‘specific thing’ in our minds.
It is straightforward to represent such things with VSM. – Though still:
- It could be helpful to have software that automatically suggests simple connectors like these.
- Software should be able to generate numeric concepts into its internal dictionaries, right when needed. (This is implemented in the vsm-dictionary module now; (but not yet in the VSM-box prototype used in this page)). Because a dictionary can never be pre‑loaded with all possible numbers, let alone numbers with a decimal separator like here:
This illustrates an error margin given to a measurement.
Some percentage examples. (Note again that “percentage” and “%” have the same Class ID).
Tridents: various examples
- Here, e.g. “2018″ is not a pure number concept, but a concept that represent a year. “2018″ would be a shorthand for “year-2018-CE”.
Note: just like in the ‘numeric concepts’ discussion earlier, one can not generate a dictionary with all possible years. Still, one could have a dictionary with commonly-used year concepts, e.g. 1900 through 2100. For other years, as long as the ‘VSM-dictionary’ module can generate pure numbers on-the-fly, we can always represent them with a VSM-phrase like “year having-number 42000″.
- Notice how “August” is connected to “year-2008”.
(This is similar to the “start-of S-phase” case, which is explained in more detail a few examples below here!)
The “year-2008” is the main (initial) concept that represents time, and it gets narrowed down to its “August” month. We could phrase the bident-unit explicitly as something like: “year-2008 [time-interval-limited-to-subinterval] August”, or just: “year-2008 [specified-down-to-an] August”. The same principle applies to e.g. “John ❤ 14 February” :
Here of course, the bident’s implicit relation could be automatically deduced to be something like: “timespan-of-month-<subject>-limited-to-day-given-by-number-<object>”.
Remember that any VSM-term represents a ‘noun’-concept. So here, “reacts-to” (after being connected with A and B like that) represents: “the reaction of A to B’.
- This sentence is based on the Conceptual Graph (CG) in this W3C example figure, which represents: ‘Tom believes that Mary wants to marry a sailor’.
- Note: a “she” was inserted (in both CG and VSM-sentence) to make the subject of “to-marry” clear for computation.
- Note: A CG would require the curator to insert meta-information around relations, like the ‘Agent’, ‘Theme’, ‘Instrument’, ‘Destination’, which VSM does not require. Because in VSM such semantics would be encapsulated in (/could be inferred from) the relation-term that the curator selected.
Tridents: biological examples
The last three VSM-terms here represent the natural language phrase ‘TuMV-infected Arabidopsis’.
Notice the difference between the previous example and this one.
In the previous one, we used the three terms “expression of PAD4“, and we connected “is” to the “expression” term.
That’s because it is the “expression” that is increased, not the gene PAD4.
So here at first sight, one might want to use three terms “start of S-phase” and similarly connect “at” to “start”, to say that the ‘translocation happens-at some start’. But somehow it feels like “S-phase” is the main concept here, or at least, after we simply specified it to a more specific part of itself.
So instead, it’s better to consider “start-of” as an attribute that has the same meaning as “initial”, and use it with a bident, in a sense of further narrowing down the meaning of “S-phase”, i.e.: “initial S-phase”. This is also consistent with similar structures like “5 dogs”, or “half(-of) dogs”.
So we created a bident unit that can be read as “S-phase [further-specified-to] start-part” here,
(or as a more specific “S-phase [interval-limited-to-subinterval-as] start-part”).
In contrast, in the previous example we have a trident unit “expression of(=pertaining-to) PAD4“. If we would use a bident there, it would be something like “PAD4 [limited-to-attribute?] expression”. But it does not make semantic sense to narrow down the meaning of PAD4 to a property that it is associated with (instead of an own property that specifies what it can be, like in “mutated gene”). It does not change that PAD4 still represents a gene. – So there we must further connect to “expression”.
This explanation can be summarized by using this insight: ‘adding context‘ means ‘narrowing down the meaning of a VSM-term, as to what range of possible meanings it may represent‘. Therefore:
“start (of)“ narrows down the meaning of “S-phase” itself, and then we further connect to this narrowed-down “S-phase” concept itself; while “expression” is a distinct concept related to “PAD4“, so we further connect to “expression”.
(A first shot at creating one line of some protocol description).
The bident that connects ‘drug X‘ and ‘mg/l‘ has the implicit relation ‘having concentration‘, because the attribute term ‘mg/l‘ would have been classified as a ‘concentration’.
(Note: The connector order, which is automatically generated by the prototype, may not be optimal here. The “… containing …” connector would better have been placed under the “… placed-in …” one. Maybe that more intuitive ordering has something to do with “tumor-cells” actually being the Head of the VSM-phrase).
- The position of the list-relation does not matter.
- The order of the list-elements may matter, depending on whether the list-relation is one for which order is important. E.g. “(ordered-list-)And” vs. “(unordered-set-)And”. – Therefore curation-software based on VSM should store the order of the list-elements as given by the curator. See:
Another list-relation. Since all VSM-terms should be thought of as nouns, or ‘thing’-concepts, this one should be thought of as something like ‘the either-or-ness of …’.
Here, “between” is linked to the meaning “to-be-located-between”, or in noun-form: “the-being-located-between”. It could be read as: “the-moving (by X) is-located-between the-and’ness-of (…)”.
Another way. This “between” is linked to the meaning “the-location-between (a given list of items)”, which can be used as a list-relation.
It could be read as: “the-moving (by X) is-located-at the-betweenness-of: A, B, C”.
Still, it may be best for a curation system to work with a vocabulary that only includes one of these two meanings for “between”, and to make it easy for a user to enter phrases like this uniformely.
(New to biology? Hover the “ubiquitinates” term to see its description).
This example was explained in detail in the Full Story – read it!
In the following two VSM-sentences, we will make coreferences to terms in the above sentence.
The “it” here refers to the Instance ID of the first occurrence in the previous sentence. – You can mouse-hover to check
that the “it” term here has the same Class ID as the “device” term above), …
(Note: the demo here does not yet support adding inter-sentence links manually).
… while the “device” term here, refers to the Instance ID of the second occurrence (the “device” that also beeps) (see Full Story).
- It doesn’t matter if the child term (surrounded in dashes) was given the label “it”, “device”, or anything else. It is only a label, for making a sentence more easily human-readable.
- Each child term (“it”/“device”/…) gets its own Instance ID and can be referred to again, in the same or in next sentences.
Note that here, the term “device” (and not “activates”) is connected to “in China”. So this sentence expresses not necessarily that Eve is in China too (she could activate it remotely, via the internet), nor necessarily that the activation is in China (the activation-switch could happen anywhere cyberspace). – Sure one may argue that the activation is in China too (it could be inferred), but definitely, this sentence does not say that Eve is there too). – This illustrates again why VSM connectors connect to single terms specifically, and not to entire ‘triples’ as in RDF.
These show that VSM can be used to store any ‘information’, in the sense of: ‘anything one can think of’, or ‘any idea that you can form in your mind’- The
second sentence shows that one can express that another sentence, or even something specific in another sentence, is untrue…
(Mouse-hover to check that “that”‘s Referring-InstanceID refers to “sees”‘s InstanceID above).
… or one can assign a likelihood to the above, because (as in real life) things are rarely black and white.
Interchangeable vs. non-interchangeable connections
Advanced topic. See the main text for a detailed explanation!
These are the main text’s examples, but as interactive VSM-boxes:
… which just uses another label for the child concept, and adds some extra specifying context (“female”) to the parent concept.
Filling it in:
The user can still extend template, with extra terms and connectors.
A second example of the same template filled in.
Another template. Click on the second empty field to see that autocomplete can really make life easy for curators. Here it would immediately suggest three often-used terms. Still, any other term can still be entered by typing it as usual.
Head: semantic point of view
Written in natural language like that, “eats” would be the ‘head’ or focal concept of the sentence: the term one you’d really refer to if you’d follow it up with e.g. ‘I saw that‘.
But also “John” can be the head concept. The sentence here says:
‘The John, who eats a chicken with a fork’ (e.g. as answer to a question).
It conveys the same information, but stresses another one of the five concepts.
And the same goes for the other three terms. E.g.: ‘The chicken, that is eaten by John with a fork’, etc.:
This illustrate that yes, it can make sense to focus on the ‘with’ (=‘using’) concept too. Just like you can focus on any concept in a VSM-sentence.
This rephrases part of the earlier sentence, saying that ‘The use of the fork, by John, resembles abuse’.
General and Data term types (experimental ideas)
These sentences will be clear once you understand VSMGraphs, which I may write more about in the future. As mentioned before, the blue / yellow / red colored VSM-terms represent Specific (=default) / General / Data terms, respectively. (And black / white / rectangular nodes in VSMGraphs).
A classification as in an ontology, or concept hierarchy. Between general concepts, not specific ones.
A classification, with as context the taxonomy version that declares this relation.
A description of a general concept.
A general concept has a label. It could be connected to many different labels like this.
It could be connected to a standardized name for it, with a “has-main-alias” relation.
Try it yourself! Because it’s easy and fun to add VSM-connectors (if you know a bit of the biology).
It’s like diagramming sentences in high school, but simpler :
(Remember to click a second time above a same term,
to skip the relation-leg and make a bident).
Compacted terms, semantically equivalent structures: paraphrases
These should be mappable onto each other using some graph-equivalency algorithm.
Mapping sentences like these, for which the determination of equivalence needs a form of automated reasoning, may be more programming work. Compare to the three ‘Levels’ of the SMBL language.
Unwrapping a GO term, leading to Semantic Structure Transparency
Some of the long Gene Ontology (GO) terms could be shown using structurally more transparent VSM structures:
Because that’s what a subject-omitting bident does: it makes phrases like: “Relation-(of) Object”.
So here it made: “regulation-(of) (Object-term)”. (As in: “[unknown] regulates (Object)”).
By using GO terms’ existing OWL-definitions, and so analyzing their internal conceptual structure, it should be possible to show GO annotations (as well as larger protein interaction models) as simpler, readable VSM-sentences.
English special terms
Statements like ‘if … then …’ have to be reformulated for VSM, by using a single “if-then” ‘relation’ VSM-term.
Past tense of verbs. A similar structure could be for negations (“does‑not buy”). Or for modal verbs (“may buy”), although one could also make a structure with “buy” as object of the “may” relation.
The relation “‘s” is the same as the possessive “of”. The trident is only added in inverse order, because the terms are ordered like that too in readable English.
Quantifiers for logics and mathematics (experimental section)
One should be able to express anything in VSM, so logical expressions are a valid subset too. E.g.:
The leftmost triple can be read as: “the-being-smaller is-valid-for-all x”, which explains the peculiar order of the trident’s relation/object/subject-legs. – Still, the trident was added with three clicks in the normal order of Subject, Relation, and Object.
The same example, now with a quantifier symbol.
- Both are the same, the second one just is more wordy.
- It expresses: ‘For all x, there exists y, such that x < y’.
- (Please forgive the unnatural looking ordering of the connectors here. It would be clearer if the leftmost trident were placed above the other two. Here the current connector-ordering algorithm still works suboptimally).
The VSM-sentence looks weird (perhaps), but it’s because we’re used to see mathematical expressions in that order. Here is the exact same sentence but with terms reordered (and same connections).
It says: ‘x is smaller than y, and that is valid for some y, and that second thing is valid for all x’.
It should be possible to make an example for integrals, or any other mathematical formula too.
Just to show that, unlike a normal text-based controlled language, the auto-completed VSM-terms can apply proper styling:
Like superscript text for ions; or for showing human gene names in their standard, italicized way.
4. Various biological use cases
Binding of two proteins which then together activate a third
If you’d want to represent this on the biochemical-reaction level, then you may actually be dealing with two units of information here:
“A and B bind-to-form some-complex”, and “that-complex activates C”.
This could be captured with an overarching ‘and’ list-connector on top of both units. However, it is probably better to limit one VSM-sentence to one unit of information. So it could be represented by two VSM-sentences then, whereby the VSM-term “that-complex” in the second VSM-sentence would refer to the “some-complex” term in the first one.
(Curation software should be able to handle inter-sentence references then).
More simply, however, it can be represented with a single VSM-sentence too. If you do not need to explicitly focus on the preparatory step of “A binds B”, then you can just assume that a molecular complex of some A bound to some B already exists, and you just express that that activates C. We can use a list-connector, with a list-relation that expresses “molecularly-bound-unit-of …”, see:
Or, if you’d want to represent this on the biological-process level instead, then you can make this:
Note that it would be interesting if there existed an algorithm that could map variants like these onto each other… e.g. before a user runs a query from either a reaction-level or a process-level perspective…
Proteins and cofactors examples
= ‘The fact that a protein A is bound to a ‘cofactor’ molecule B, is required for that specific protein A (bound to B) to bind to protein C’.
= ‘Some protein A that was not bound to some cofactor molecule, did not bind to protein C’.
- Note: this only states this co-ocurrence of not-events. If you’d want to express causation, then you can insert a “causes” relation and a referring “it”: “… causes it not binds-to C”.
- Pro tip (advanced topic): a reflection on specific vs. general concepts (or VSM-terms):
that causation would also just be an observation for a (or several) specific A and C. – If you’d want to express that this happens for any and all ‘A’s in general, then you could make “A” a general concept (Ctrl-click it).
But: usually in a scientific report, even though such a result may be based on many observations, it is only valid for the range of specific “A“s, within the context of the study(‘s possibly many experiments). This still not qualifies for it to be a general “A”. A general “A” would be e.g. for saying that “A[-in-general] is-classified-as protein”).
BEL conversion examples
This is a literal conversion of statements in BEL (Biological Expression Language) :
SET CellLine = "U266"
proteinAbundance(HGNC:IL6) increases rnaAbundance(HGNC:ENO1)
Or line 2 in short-form: p(HGNC:IL6) -> r(HGNC:ENO1)
BEL uses two lines for this. The second line is conveniently short when written in short-form.
The VSM-sentence may be quick to construct when using VSM-templates, which in addition supports the curator with convenient autocomplete functionality. And the VSM-sentence is easier to read as one quasi-natural-language sentence, as one unit of information.
Also, compared to controlled languages (like BEL), VSM does not need to be regularly updated with new rules to gain more power of expression. Only the controlled vocabularies that are plugged in to the VSM-box need to be updated, externally, and that happens to them all the time anyway.
This corresponds to the three BEL statements
(from under here) :
SET Anatomy = "cardiovascular system"
SET MeSHDisease = "Stroke"
abundance(CHEBI:corticosteroid) decreases biologicalProcess(MESHD:Inflammation)
A VSM-sentence represents this as one fluently readable unit of information.
Types/classifications (CHEBI, MeSH Disease, etc) automatically got linked to what the curator chose via autocomplete, and are all embedded in VSM-terms.
Just mouse-hover them to see.
In addition and importantly: VSM enables to specify more precisely how the ‘“stroke” context’ relates to the rest of the information:
with the relation “after” (found in the plain English text
(Note that the relation could have been “before” too, i.e. describing a preventative measure for at-risk patients. But a simple ‘MeSHDisease = …’ context with BEL does not capture this level of detail. It is easy to specify this with VSM).
- Maybe some SBML example.
- Maybe some SBGN graph, written as a number of VSM-sentences.
- Maybe an ‘experiment protocol’ step-by-step procedure, as a ‘story’ with inter-sentence coreferences. Or see also the ‘500 tumor cells’ example earlier.
5. Various use cases
If you have interesting examples, let me know! I may add it here.
Read about VSM‘s implications and roll-out on the Discussion page