1,502
edits
mNo edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
Testu historikoen edukiak errepresentatzeko eta anotazioez aberasteko lan-fluxua eta datu-eredua garatzeko asmotan, Larramendiren [[Item:Q453|Azkoitiko Sermoia]] hartu dugu adibide. Wikitekan (euskarazko Wikisourcen), eskuizkribua eta transkribapena ditugu, eta hemen, MLV Wikibase honetan, transkribaketaren tokenak (hau da, hitzak eta interpuntzio ikurrak segmentu banatan jasotzen duen zatiketa, modu bertikalean errepresentatu daitekeena, aspalditik usadioa den legez (ikus, adibidez, [https://universaldependencies.org/format.html CONLL formatua]). Galdeketak bistarazten duen taularen atzetik, Datu Lotuak daude, hau da, hirukote semantikoak. Corpus datuak Datu Lotu gisan jasotzeko proposatzen dugun eredu honetan, Linguistic Linked Data arloko azkenengo proposamenak hartzen ditugu aintzat (ikus [[Item:Q1260|Stanković, Chiarcos et al. 2023]]). | Testu historikoen edukiak errepresentatzeko eta anotazioez aberasteko lan-fluxua eta datu-eredua garatzeko asmotan, Larramendiren [[Item:Q453|Azkoitiko Sermoia]] hartu dugu adibide. Wikitekan (euskarazko Wikisourcen), eskuizkribua eta transkribapena ditugu, eta hemen, MLV Wikibase honetan, transkribaketaren tokenak (hau da, hitzak eta interpuntzio ikurrak segmentu banatan jasotzen duen zatiketa, modu bertikalean errepresentatu daitekeena, aspalditik usadioa den legez (ikus, adibidez, [https://universaldependencies.org/format.html CONLL formatua]). Galdeketak bistarazten duen taularen atzetik, Datu Lotuak daude, hau da, hirukote semantikoak. Corpus datuak Datu Lotu gisan jasotzeko proposatzen dugun eredu honetan, Linguistic Linked Data arloko azkenengo proposamenak hartzen ditugu aintzat (ikus [[Item:Q1260|Stanković, Chiarcos et al. 2023]]). | ||
[https://doi.org/10.13140/RG.2.2.30500.86400 2023ko abenduan aurkeztutako posterra ikus ezazu] (euskaraz). | |||
''With the aim of proposing a workflow and data model for the representation of historical text content and annotations, we use Larramendi's [[Item:Q453|Azkoitiko Sermoia]] as showcase. On Basque Wikisource, we store the manuscript faximile and its transcription, and here, on MLV Wikibase, the text tokens (i.e., words and interpunction signs as vertical text, like it is usual (see e.g. [https://universaldependencies.org/format.html CONLL format]). Behind the table visualized in the sparql query interface, there are Linked Data, that is, semantic triples. In this model we propose for representing corpus data, we follow recent proposals made in the domain of Linguistic Linked Open Data (see [[Item:Q1260|Stanković, Chiarcos et al. 2023]]).'' | ''With the aim of proposing a workflow and data model for the representation of historical text content and annotations, we use Larramendi's [[Item:Q453|Azkoitiko Sermoia]] as showcase. On Basque Wikisource, we store the manuscript faximile and its transcription, and here, on MLV Wikibase, the text tokens (i.e., words and interpunction signs as vertical text, like it is usual (see e.g. [https://universaldependencies.org/format.html CONLL format]). Behind the table visualized in the sparql query interface, there are Linked Data, that is, semantic triples. In this model we propose for representing corpus data, we follow recent proposals made in the domain of Linguistic Linked Open Data (see [[Item:Q1260|Stanković, Chiarcos et al. 2023]]).'' | ||
Line 5: | Line 7: | ||
Import from Wikisource in this first experiment is done with [https://github.com/dlindem/wikibase/blob/main/mlv/wikisource-to-wikibase.py this script]. | Import from Wikisource in this first experiment is done with [https://github.com/dlindem/wikibase/blob/main/mlv/wikisource-to-wikibase.py this script]. | ||
See a [https://zenodo.org/records/12078616 poster presented in June 2024]. | See a [https://zenodo.org/records/12078616 poster presented in June 2024] (English Version). | ||
== SPARQL == | == SPARQL == |