ChatCalledQuest demo¶

The ccq module is thought as Geografia's backend engine and is aimed to translate from written spoken-italian to written LIS-glossed texts (comprensione) and viceversa (produzione).

Sign Language don't have an exact written counterpart since its simultaneous/non-linear gesture/facial expression structure, thus the adoption of gloss-level intermediate representation of the sign message (basically a simplified version of the spoken language) .

LIS <--> iTA Scheme : from beloved Ilenia¶

schema_lis_ilenia-2.png

Comprensione : gloss2spoken examples¶

plain string as input

In [ ]:
import os
os.environ['TOKENIZERS_PARALLELISM']='false'
import spacy
import spacy_transformers
import ccq, config
import importlib
In [20]:
importlib.reload(ccq)
engine = ccq.CCQ()
In [19]:
affirmative_spoken = 'Luca va in Spagna la prossima estate'
engine.translate(affirmative_spoken, direction='spoken2gloss')
Out[19]:
(['prossimo', 'estate', 'luca', 'spagna', 'andare'], True, '')
In [4]:
affirmative_spoken2 = 'Stasera voglio bere una birra'
engine.translate(affirmative_spoken2, direction='spoken2gloss')
Out[4]:
(['stasera', 'birra', 'volere', 'bere'], True, '')
In [5]:
negative_spoken = 'Luca non va in Spagna la prossima estate'
engine.translate(negative_spoken, direction='spoken2gloss')
Out[5]:
(['prossimo', 'estate', 'luca', 'spagna', 'andare', 'no'], True, '')
In [6]:
closed_interrogative_spoken = 'Luca va in Spagna la prossima estate?'
engine.translate(closed_interrogative_spoken, direction='spoken2gloss')
Out[6]:
(['prossimo', 'estate', 'spagna', 'andare', '?'], True, '')
In [7]:
open_interrogative_spoken =  'Dove andrà Luca la prossima estate?'
engine.translate(open_interrogative_spoken, direction='spoken2gloss')
Out[7]:
(['prossimo', 'estate', 'luca', 'andere', 'dove', '?'], True, '')
spoken2gloss fail test¶
In [8]:
fail_spoken = open_interrogative_spoken*3
engine.translate(fail_spoken, direction='spoken2gloss')
Out[8]:
('', False, "failed ['max_tokens', 'valid_tokens']")
In [9]:
fail_spoken = 'ciao Lucia, sei strana con questa punteggiatura.'
engine.translate(fail_spoken, direction='spoken2gloss')
Out[9]:
('', False, "failed ['punct']")
In [10]:
fail_spoken = 'Luca, Antonio, Paolo e Marco vanno al mare'
engine.translate(fail_spoken, direction='spoken2gloss')
Out[10]:
('', False, "failed ['punct', 'subject']")
In [11]:
fail_spoken = 'Luca, Paolo e Marco vanno al mare'
engine.translate(fail_spoken, direction='spoken2gloss')
Out[11]:
('', False, "failed ['subject']")

Produzione : gloss2spoken examples¶

In here it is assumed as input a list containing space/punct separated word tokens and their relative possible attribute.

The current possible (and required if the corresponding token is present) attributes are:

  • subject
  • verb
  • time
In [12]:
affirmative_gloss = [('prossimo','time'),('estate','time'),('luca','subject'),('spagna',''),('andare','verb')]

engine.translate(affirmative_gloss, direction='gloss2spoken')
Out[12]:
(['luca', 'andare', 'prossimo', 'estate', 'spagna'], True, '')
In [13]:
engine.translate(closed_interrogative_spoken, direction='spoken2gloss')
Out[13]:
(['prossimo', 'estate', 'spagna', 'andare', '?'], True, '')
In [14]:
negative_gloss = [('prossimo','time'),('estate','time'),('luca','subject'),('spagna',''),('andare','verb'),('no','')]

engine.translate(negative_gloss, direction='gloss2spoken')
Out[14]:
(['luca', 'andare', 'prossimo', 'estate', 'spagna', 'no'], True, '')
In [15]:
open_interrogative_gloss = [('prossimo','time'),('estate','time'),('luca','subject'),('andare','verb'),('dove',''),('?','')]

engine.translate(open_interrogative_gloss, direction='gloss2spoken')
Out[15]:
(['luca', 'dove', 'andare', 'prossimo', 'estate', '?'], True, '')
In [16]:
closed_interrogative_gloss = [('prossimo','time'),('estate','time'),('spagna',''),('andare','verb'),('luca','subject'),('?','')]

engine.translate(closed_interrogative_gloss, direction='gloss2spoken')
Out[16]:
(['luca', 'andare', 'prossimo', 'estate', 'spagna', '?'], True, '')
In [ ]:
 

zero.shot learning¶

In [17]:
from transformers import pipeline
classifier = pipeline("zero-shot-classification",model="Jiva/xlm-roberta-large-it-mnli",  use_fast=True, multi_label=True) 
In [18]:
# we will classify the following wikipedia entry about Sardinia"
sequence_to_classify = "La Sardegna è una regione italiana a statuto speciale di 1 592 730 abitanti con capoluogo Cagliari, la cui denominazione bilingue utilizzata nella comunicazione ufficiale è Regione Autonoma della Sardegna / Regione Autònoma de Sardigna."
# we can specify candidate labels in Italian:
candidate_labels = ["geografia", "politica", "macchine", "cibo", "moda"]
classifier(sequence_to_classify, candidate_labels)
Out[18]:
{'sequence': 'La Sardegna è una regione italiana a statuto speciale di 1 592 730 abitanti con capoluogo Cagliari, la cui denominazione bilingue utilizzata nella comunicazione ufficiale è Regione Autonoma della Sardegna / Regione Autònoma de Sardigna.',
 'labels': ['geografia', 'macchine', 'politica', 'cibo', 'moda'],
 'scores': [0.38871291279792786,
  0.22633220255374908,
  0.1939833015203476,
  0.13735689222812653,
  0.13708347082138062]}

todos¶

  • extend & test failure management for gloss2spoken translation

  • abbellify gloss2spoken : add articles, conj, verb declination

  • the backend will be somehow linked to somekind of db storing image having with indexes glosses; The db will be most probably pretty limited in quantity and general in semantics terms (an image could have more than one index, and probably also an index could have more than one image) thus there will be a need for implementing also a word vector similarity engine (for synonims) and a zero-shot transformer (in order to exploit the contextual meaning of a sentence for the representantion of each word token)

  • cleaner code

useful links¶

  • nlp cheat sheet

https://github.com/janlukasschroeder/nlp-cheat-sheet-python

  • spacy

    https://spacy.io/models & https://spacy.io/models/it

  • transformers

    • italian zero-shot : https://huggingface.co/Jiva/xlm-roberta-large-it-mnli

    • italian fill mask : https://huggingface.co/Musixmatch/umberto-wikipedia-uncased-v1

  • stanford transformers--> stanza

    https://github.com/stanfordnlp/stanza

  • transformer + spacy : italian NER

    https://huggingface.co/bullmount/it_nerIta_trf

    pip install https://huggingface.co/bullmount/it_nerIta_trf/resolve/main/it_nerIta_trf-any-py3-none-any.whl

In [ ]: