nanopubs-way-to-create-even-more-silos.html (references)

References for: nanopubs-way-to-create-even-more-silos.html

Full identifier: https://iphylo.blogspot.com/2024/06/nanopubs-way-to-create-even-more-silos.html

Nanopublication	Part	Subject	Predicate	Object	Published By	Published On
https://w3id.org/np/RA0KOq2jIn... RA0KOq2jIn	links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion	nanopubs-way-to-create-even-more-silos.html	http://purl.org/dc/terms/title title	Nanopubs, a way to create even more silos (with 3 comments)	Tobias Kuhn	2024-06-24T12:17:19.601Z
https://w3id.org/np/RAs5C1SOd5... RAs5C1SOd5	links a nanopublication to its assertion http://www.nanopub.org/nschema#hasAssertion assertion	nanopubs-way-to-create-even-more-silos.html	http://www.w3.org/2000/01/rdf-schema#comment comment	Hi Roderic, Great to see that you decided to look into nanopublications and get to the bottom of it. Let me respond to some of the points you raised and clarify a few misunderstandings and inaccuracies. > Nanopubs, a way to create even more silos The title is highly misleading in my view. You can use any kind of URI-based identifiers in nanopublications, so any silo-ness on the conceptual level is user-generated and not a shortcoming of the nanopublication technology. And on the data storage level, nanopublications are born in the global decentralized nanopublication network and never live in just one location, so they are as anti-silo as you can be in that respect. The nanopublication you mention can be found at all these places: - https://server.np.trustyuri.net/RAXCvEZfCcjYuH5DWOIujBehGQt61y_nRHWssw9u6aYig - http://130.60.24.146:7880/RAXCvEZfCcjYuH5DWOIujBehGQt61y_nRHWssw9u6aYig - https://server.np.dumontierlab.com/RAXCvEZfCcjYuH5DWOIujBehGQt61y_nRHWssw9u6aYig - http://rdf.disgenet.org/nanopub-server/RAXCvEZfCcjYuH5DWOIujBehGQt61y_nRHWssw9u6aYig - https://np.knowledgepixels.com/RAXCvEZfCcjYuH5DWOIujBehGQt61y_nRHWssw9u6aYig And you can query it at all these places (and more): - https://query.np.trustyuri.net/tools/full/yasgui.html#query=SELECT+%3Fs+%3Fp+%3Fo+WHERE+%7B+GRAPH+%3Chttps%3A%2F%2Fw3id.org%2Fnp%2FRAXCvEZfCcjYuH5DWOIujBehGQt61y_nRHWssw9u6aYig%23assertion%3E+%7B+%3Fs+%3Fp+%3Fo+%7D+%7D+&contentTypeConstruct=text%2Fturtle&contentTypeSelect=application%2Fsparql-results%2Bjson&endpoint=%2Frepo%2Ffull&requestMethod=POST&tabTitle=Query&headers=%7B%7D&outputFormat=table - https://nanopub.sdsc.edu/tools/full/yasgui.html#query=SELECT+%3Fs+%3Fp+%3Fo+WHERE+%7B+GRAPH+%3Chttps%3A%2F%2Fw3id.org%2Fnp%2FRAXCvEZfCcjYuH5DWOIujBehGQt61y_nRHWssw9u6aYig%23assertion%3E+%7B+%3Fs+%3Fp+%3Fo+%7D+%7D+&contentTypeConstruct=text%2Fturtle&contentTypeSelect=application%2Fsparql-results%2Bjson&endpoint=%2Frepo%2Ffull&requestMethod=POST&tabTitle=Query&headers=%7B%7D&outputFormat=table - https://query.np.kpxl.org/tools/full/yasgui.html#query=SELECT+%3Fs+%3Fp+%3Fo+WHERE+%7B+GRAPH+%3Chttps%3A%2F%2Fw3id.org%2Fnp%2FRAXCvEZfCcjYuH5DWOIujBehGQt61y_nRHWssw9u6aYig%23assertion%3E+%7B+%3Fs+%3Fp+%3Fo+%7D+%7D+&contentTypeConstruct=text%2Fturtle&contentTypeSelect=application%2Fsparql - https://query.knowledgepixels.com/tools/full/yasgui.html#query=SELECT+%3Fs+%3Fp+%3Fo+WHERE+%7B+GRAPH+%3Chttps%3A%2F%2Fw3id.org%2Fnp%2FRAXCvEZfCcjYuH5DWOIujBehGQt61y_nRHWssw9u6aYig%23assertion%3E+%7B+%3Fs+%3Fp+%3Fo+%7D+%7D+&contentTypeConstruct=text%2Fturtle&contentTypeSelect=application%2Fsparql Can you point me to any technology that is less silo than this? > there are reasons not to be optimistic about nanopubs (or text-mining in general) Nanopublications are not a special case of text mining. In fact, they were conceived to make sure we don't need text mining in the first place ("Why bury it first and then mine it again?"; see https://doi.org/10.1186/1471-2105-6-142 ). > In other words, > [Helictopleurus dorbignyi] -> [isSynonymOf] -> [Helictopleurus halffteri] > This seems a fairly simple thing to say, ... Yes, it always looks simple until you dig into the details, and then it gets more complicated. This modeling was the result form extended discussions with people at Pensoft and other biodiversity experts. I might not be using the right domain terms here, but as you know, taxons have names and these names have URI-based identifiers even, but these names have often been used in different ways. So, the same names (even with identifiers) can refer to different taxon concepts. For these concepts, there aren't universally available and acceptable identifiers yet, unfortunately. But they can be constructed form a taxon name and a reference to a publication or "treatment" where this interpretation is explained. There are others who can explain better the domain side of this reasoning, which you can probably piece together from your own expertise in the domain, but that was how we arrived at this semantic model. > indeed we could say it with a single triple, but the corresponding nanopub requires 33 RDF triples to say this. 33 doesn't seem like a very large number to me. And I can explain in detail the purpose of every single one of these triples. Provenance and metadata are important, and they need a bit of space, that's all. > By itself this isn’t terribly useful because neither of the two taxa are “things” that have identifiers, they are blank nodes. Yes, but that's a result of the modeling decision above. Nanopublications as a technology don't force you to do it this way. > cannot have interoperability unless you use the same identifiers for the same things No, that's not true. We should indeed minimize the use of different identifiers for the same thing, but we can very well achieve interoperability with multiple identifiers. Approaches like linksets and scientific lenses, among others, can do this. We are working on concrete solutions for this within the nanopublication ecosystem too. On an open and distributed system like the Web, it's in fact impossible to enforce unique identifiers for the same things. You cannot prevent different people defining different identifiers for the same thing at different ends of the Web, and it has actually happened many many times, so we have to deal with this whether we want to or not. > That means persistent identifiers, identifiers that you have some confidence will be around in ten, 20, or 50 years (at least). That's a completely separate issue from the previous sentence. Yes, but what does "to be around" mean? Will catalogueoflife.org still be up in 2074? We don't know! But we can design systems where this doesn't even matter. What matters is that we can keep using the identifier and have an agreement of what it means. And you can do that with nanopublications and their ecosystem, and we are working on the concrete methods and tools. > I find it alarming that the link to the source of the statement that these two names are synonyms is not the DOI for the paper 10.3897/BDJ.12.e120304, No need to be alarmed. But yes, it should refer to the DOI and this should be fixed. We are looking into it with Pensoft. The reason is that this nanopublication was created before the DOI was minted. And the last step of making final versions of nanopublications upon formal article publication isn't fully developed yet. We have a long journey ahead of us, so we decided to move fast and allow ourselves to make mistakes on the way. This is one of them and it will be fixed. > The taxon names have as their identifiers https://www.checklistbank.org/dataset/9880/taxon/3K9T4 and https://www.checklistbank.org/dataset/9880/taxon/3K9ST. These identifiers are also local to a particular dataset. Why not use identifiers such as the Catalogue of Life entries for these names (i.e., e.g. https://www.catalogueoflife.org/data/taxon/3K9T4, Checklistbank has broader coverage and is therefore more universal, and includes Catalogueoflife. To me Checklistbank seemed at least as persistent and long-term as Catalogueoflife, so overall preferable, but I am not familiar with the details and policies behind these platforms. In any case, this is a just a modeling decision we made, which we could have just as well done differently. It just shows we can use any kind of URI-based identifier, and consequently some tools/people will sometimes make modeling mistakes, but that's part of the process. > This is nice, but where is the equivalent for linking the publication to the nanopub via its DOI, or the taxon names to the nanopub? It should and can be there. See my answer above. > For example, http://purl.obolibrary.org/obo/NOMEN_0000285 is used to define the relation between. I confess it’s unclear to me why NOMEN_0000285 isn’t used to directly link the two ChecklistBank records, rather than the indirection via #subjtaxon and #objtaxon, given that is a relationship between names (isn’t it?). See above. > It amazes me how readily people create new ontologies We just defined terms where needed, and didn't necessarily group them together in ontologies, applying the thinking of "breaking things into nano-pieces" to ontologies and their terms, in a sense. > especially as in the wider world there is a trend towards one vocabuary to rule them all (schema.org). Not all trends are good. Just saying :) > I find it disheartening that the bulk of the information in a nanopub is administrivia about that nanopub. Again, it all has a purpose, which I can explain in more detail if you like. > I understand the desire to establish provenance and to cryptographically sign the information, but all this is of limited use if the actual scientific information is poorly expressed. This is a false dichotomy. Of course we want both: provenance/validity, as well as properly expressed assertions. Nanopublications don't tell you upfront which ontologies to use (so no conceptual silo) or how to formulate your statements, but then of course modeling mistakes can happen. There is no way to prevent that, but that doesn't mean the technology or the ecosystem is flawed. I hope these responses and clarifications are helpful. Tobias	Tobias Kuhn	2024-06-19T12:09:27.022Z