KBDB?

User avatar
webwit
Wild Duck

30 Jul 2013, 01:09

Muirium wrote:Embrace extend … and extinguish, only works in closed source. No one's trying to take the Wiki away. The more organised data we have the better, surely.
You're arguing a side. Propaganda is boring, and predictable. I could argue your side for you, black and white is easy. But I'd rather intellectually weigh all the issues, rather that wishing away anything that isn't on "my side". That would include engaging in the idea that people take the info from the wiki to extend on this data but not merge it back, and whether that is good or bad for the wiki. It is open source after all, so one can argue any use is good. On the other side would people still bother adding to the wiki after they added their keyboard elsewhere in an ui which is suboptimized for this goal? Also one could entertain the idea that a couple of people put incredible effort into the wiki, hoping for others to add their share, and now they might see others taking that info but adding new info elsewhere. I hope this will work out great. That there is an easy to query relational database, which links and feeds back to the wiki, and vice versa. In other to make that happen, you need to discuss and understand and respect the issues, so you can make the right choices and pick the right solutions. The phrase "I hope we can find a way to exchange data" instead of "I will feed the data back in OUR wiki" on the other hand worries me that the leprechauns are expected to feed data back. That is why I have a bad feeling about this.

User avatar
matt3o
-[°_°]-

30 Jul 2013, 08:56

webwit wrote:The phrase "I hope we can find a way to exchange data" instead of "I will feed the data back in OUR wiki" on the other hand worries me that the leprechauns are expected to feed data back. That is why I have a bad feeling about this.
The optimum would be that all the structured data (the info that you find in the upper right table of each KB sheet) were scraped by the website to create an easy to access search tool (and keyboard collection manager and rating system and whatever).

My impression is that kbdb could do wonders to help feeding the wiki, so making the db read only would a pity (for the sake of the wiki completeness too). An alternative would be to let the DB exclusively handle the structured data that is continuously pushed to the wiki. Keeping both the db AND the wiki read/write would quite an effort, you know, perfect syncing is never an easy task.

But I have no control over the wiki, its code or the direction the admins or the community want for it, hence "I hope we can find a way to exchange data". That you can read "I hope we can find a way to collaborate".

If I didn't care about DT wiki I would have already released a beta of kbdb and advertised on GH :P
Attachments
wiki-to-kbdb.jpg
wiki-to-kbdb.jpg (164.22 KiB) Viewed 5111 times

User avatar
webwit
Wild Duck

30 Jul 2013, 10:37

matt3o wrote:But I have no control over the wiki, its code or the direction the admins or the community want for it, hence "I hope we can find a way to exchange data".
I don't remember that your wiki extension or other contribution was turned down by the Demongolator. :mrgreen:
I'm glad this wasn't Beardsmore opinion when he wanted to research switches and keyboards, to proceed and start his own thing, after scraping the wiki. Or of all the contributors to the open source software we run. The wiki software and the wiki data is open source. There is currently no official control or admin. It is what the contributors make of it. I just update the software and hand people access or copies if they need it.

Let's face it. It's not that you don't have control or whatever, you didn't try, but that you don't want to deal with the complexities of the wiki and integration with it, and it's easier to get the data and take it from there. This is were I worry that such complexities will be left to the leprechauns to handle. They will get data. Again the wiki is what the contributors make of it. If someone contributes a tool which takes and expands on data but it doesn't integrate back, there is no magic wiki team who will do what the contributor didn't want to get into, it would just be a half-arsed contribution.

User avatar
matt3o
-[°_°]-

30 Jul 2013, 10:54

I feel mildly offended by your words, but okay I take them.

Do you think it's possible to create a 2-way system between the db and the wiki? Honestly I don't think so. One has to be the master.

Even if I could put my hands on wiki's code I think it would be very complex to reach what I have in mind (instant search for example, or rating or collection management).

Also I have no intention to scrape the wiki if I can't give something back, even if it would be "legally" possible.

User avatar
ne0phyte
Toast.

30 Jul 2013, 11:25

matt3o wrote:Do you think it's possible to create a 2-way system between the db and the wiki? Honestly I don't think so. One has to be the master.
And in that case letting the Wiki with a more strict rule on the detail fields be the master is definitely the sane way.

After looking at it again it seems like a good alternative to use the http://deskthority.net/wiki/Special:Export page, fetch the wiki data as XML and write a little parser for the Wiki markup (at least for getting the Infobox data).

If we have some well defined rules on mandatory fields, formatting and how to enumerate things, it shouldn't be too hard to generate a db out of that. Imo the data should still be accessible through the wiki. After all the kbdb should allow us to search for keyboards with filters by switch/year/price/manufacturer and so on.

User avatar
Icarium

30 Jul 2013, 12:13

Just to point out this option: http://semantic-mediawiki.org/

User avatar
Halvar

30 Jul 2013, 14:35

I'd say let's stay with the bazaar model instead of the cathedral model.

It's not like there is a fixed amount of time that keyboard enthusiasts spend documenting keyboards, and that fixed amount of time would need to be divided between wiki and database. It's more like this: different kinds of people respond to different ways to do things, and whatever gets people to contribute is good, as long as the database content is truly open and exportable.

Even with Semantic MediaWiki and/or custom programming, I don't see a really good solution to include an easy to use and flexible database in the wiki. Semantic MediaWiki does include a function to create input forms like this:

http://www.placeography.org/index.php?t ... n=formedit

But first someone needs to create a form like that for keyboard entries, and then the way to add new keyboards and variants will still never be as intuitive and fool-proof as in a standalone database -- which is very important when trying to attract more than a few contributors.

Let matt3o do his thing, let's see how it goes, let's be happy about one more (independent) source of keyboard wisdom, even if there are contradictions and duplicates. If it thrives we have a great new source of information that for the most part wouldn't be there at all without the database. I don't see too much of a competition there. If it should begin too stall some time down the road and not proceed any more, we can still think about ways to integrate the data into the DT wiki. Most important thing will be that all contents (including photos, videos etc.) is under a free license when they are posted in either medium, so that it isn't lost if matt3o or webwit for some reason choose to stop maintaining the technical basis.

As I said, I personally do not really believe in the appeal of crowdsource databases to contributors, in spite of what Muirium wrote. There would be more of those on the web if this principle worked. Look at how many wikis there are vs. how many databases. Ready to be surprised though.

edit: my English sucks
Last edited by Halvar on 30 Jul 2013, 17:17, edited 2 times in total.

User avatar
Muirium
µ

30 Jul 2013, 14:50

Halvar, you ninja, you beat me to it! These projects aren't taking slices out of the same pie. And ultimately they're the same project, if the DB is proven to work.

The existence proof of many crowdsourced Wikis vs. few if any comparable databases does indeed say something. I suspect it's the existence of Wikipedia and Wikia as models that helped Wikis take off. But there's no way to know whether it's a supply or demand thing until we try.

Something I must point out is that Wikis everywhere have a consistent pattern of few members providing the overwhelming majority of content. Our own is sadly highly representative. Even ubiquitous Wikipedia is a tough place to contribute. I remember trying a few times, back in the day, then giving up. A common story, explored at length on Hypercritical. I'd love to see a database get a chance.


Original post:

There is an ultimate goal we've all got our eyes on: a unified Wiki and database which presents itself in which ever way the users and contributors want, but whose data store is agnostic. Or that's my read anyway. So long as there is a canonical data store we're good. Semantic MediaWiki is a likely part of it.

The argument I think (and I've raised Webwit's ire already so I'll explain myself) is about how we get there. Whether it's better to do all the heavy duty architecture up front, or to see if KBDB takes off first and only do that work if it does. The downside to a quick and dirty launch is that if the DB gets real interest, all the new contributions directly to it will be invisible to the Wiki until the sync mechanism is cooked up. That raises obvious sync issues for as long as it takes; with Webwit's understandable skepticism that it'll be a sufficient priority. The downside to doing all the design and implementation work up front is the chance the new project burns out before it sees the light of day.

The Wiki is a deep resource we all respect. But many of us seldom contribute to it. This isn't through lack of regard for its value, present or future. It's because Wiki editing is a learned craft, much more so than simpler forms of humble data entry. Guarding against fractured efforts is one thing, but to me at least the presence of a DB enables more contribution, from a wider pool of people. This is not a zero sum game. Nor is it a rival, if done right, when it counts.

User avatar
webwit
Wild Duck

30 Jul 2013, 16:45

In all the time you took in this thread to argue that the wiki takes time to learn, you could have added a couple of keyboards to the wiki.

User avatar
Muirium
µ

30 Jul 2013, 16:48

But I learned how to make circuitous arguments already! It's all free…

Anyway, I want to run arbitrary queries. That's the fun part Wikis don't provide.

User avatar
ne0phyte
Toast.

30 Jul 2013, 17:25

Don't want to add fuel to the fire but after looking at the markup of the infoboxes... I think that parsing the wiki export is more than viable.

Fields mostly reference the manufacturer, switch type and so on. We are already dealing with relational data and the wiki article/category names can be used as primary keys for a database. By including the actual article text (stripped from wiki markup tags) a full text search combined with conditions like switch, branding, etc. should be possible.

e.g. SELECT name, wikiurl FROM Keyboards WHERE switch = 'MX Red' AND article LIKE '%ergonomic%'

For those who didn't check, this is what the infobox looks like.

Code: Select all

{{ infobox dkeyboard |
	name = Accodata Keycat Mouse Keyboard |
	pn = 65840 |
	fcc = HMN2YZKEYCAT |
	branding = Accodata |
	manufacturer = Powersource Computer Systems Inc (?) |
	switch = [[Alps_CM#Blue | Blue Alps CM]] |
	layouts = |
	interface = DIN, Serial |
	weight = Unknown |
	years = ~1988 |
	price = }}
EDIT: I know that matt3o wants to add more details and more specific data combined with an easy mask to enter/update entries. For simple queries this approach should suffice, though.

User avatar
webwit
Wild Duck

30 Jul 2013, 17:52

It's probably more stable to parse the export instead of the pages, where the html output might subtly change between updates.

User avatar
Daniel Beardsmore

30 Jul 2013, 20:58

Halvar wrote:Even with Semantic MediaWiki and/or custom programming, I don't see a really good solution to include an easy to use and flexible database in the wiki.
I grant you, it's tricky. A wiki's flexibility is also its weakness. Infoboxes don't exist. They are literally nothing more than a markup expando that puts extra markup around what you typed in, e.g. adding line breaks, static text, some HTML etc. You can put them anywhere on the page, and as many as you want. ([wiki]SIIG MiniTouch[/wiki] for example has two, one each for the two sub-ranges.)

One workaround, would be to write a MediaWiki extension as follows:
  • When editing an article, delete any known infobox types {{infobox dkeyboard/dswitch/dmouse/…}} from the markup
  • Add an infobox editor toolbar above the MediaWiki markup box, where you can list, create, edit and remove infoboxes
  • When saving an article, add the infobox markup code (back) to the start of the page¹
Infoboxes could simply be presented in DHTML dialog boxes, with form controls.

Saving changes to a page would:

a) write the live data into the database for aggregation purposes
b) generate new MediaWiki markup to represent this data in the wiki
c) mistakenly lose all the customisation in historical transclusions (all pages created prior to this new system) that falls outside of the acceptable parameters ;-)

It's not easy, not even close, because the next issue is categorisation and sub-types. Where does the infobox system get its knowledge from? When selecting a switch for your new keyboard entry, does the infobox editor:

a) retrieve all pages within Category:Keyboard switches, or
b) retrieve all the aggregation database rows for infobox dswitch (when you have two or more per page, for example)
c) panic, because …

… we've forgotten that people might want to use terms such as "Alps CM blue" when there is no such page! Many switch families and keyboard families are collected onto pages that use all sorts of methods (tables, lists, multiple infoboxes, colour templates) to describe all the variants, and none of this will be accessible to the software.

This is not insurmountable. There's no reason why variant tables and colour tables cannot be replaced in the markup during editing with something like:

{{__!!_DO_NOT_TOUCH_!!__|segment=parts_table|ID=90210}}

Then you'd be able to graphically edit the table, and have the resultant markup data written back in at the correct place when saving the page.

MediaWiki's flexibility is its weakness.

TL;DR
From a practicality standpoint, I would suggest that a read-only scraper for aggregation purposes is the most realistic option. Leave the wiki engine as-is, and just collect the data from the pages and cache them into a database.

This would require a convention be declared for infobox and variant table formats and a lot of page editing to ensure 100% compliance is both attained and maintained, but it's much easier than trying to combine such disparate worlds.

¹ There may need to be some heuristics to determine other markup which must always precede this, e.g. {{DISPLAYTITLE:…}}

JBert

30 Jul 2013, 23:18

Daniel Beardsmore wrote:I grant you, it's tricky. A wiki's flexibility is also its weakness. Infoboxes don't exist. They are literally nothing more than a markup expando that puts extra markup around what you typed in, e.g. adding line breaks, static text, some HTML etc. You can put them anywhere on the page, and as many as you want. (SIIG MiniTouch for example has two, one each for the two sub-ranges.)
This might actually make it easier to wrap the items in the infobox in a Microformat - basically HTML with strictly defined CSS class naming rules to let a scraper distinguish between fluff and data.

We'd only need to agree upon a format, which might take some time though...

User avatar
Daniel Beardsmore

30 Jul 2013, 23:45

That's pretty much what I was thinking for the data tables, in addition to selecting a schema for the contents. Infoboxes are template-driven and it's trivial to alter the infobox template.

User avatar
ne0phyte
Toast.

31 Jul 2013, 00:56

This is what we need: http://wiki.dbpedia.org/About
DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data.
Here are some example queries: http://wiki.dbpedia.org/OnlineAccess#h28-6

Otherwise is there a huge API http://en.wikipedia.org/w/api.php. There's sadly no way to obtain expanded (templates applied) plain text. So we can't just request a key-value list for the infobox of a specific article or something like that.

Examples:
http://deskthority.net/w/api.php?format ... 0keyboards

http://deskthority.net/w/api.php?&actio ... age=ABS_M1

User avatar
Halvar

31 Jul 2013, 07:41

@ne0phyte: this only tries to make the most of what's in the wiki, without any help for wiki authors to enter the information in a form that is helpful for the database. It just expects wiki authors to know what they should and what they shouldn't do. Which is bad for reasons discussed earlier in this thread. It just means fewer contributors will be able to maintain the wiki.

@Daniel: "MediaWiki's flexibility is its weakness.": Making wiki pages the master for a structured database is just a huge stretch of the ideas behind wikis. One might get it to work in one way or the other, but one can easily break the ideas behind the wiki design when doing so, especially the point that it is easy to make changes, create new pages, add information, review what other have changed etc. I guess we all have seen software that was changed in a way that it still works but doesn't adhere to the ideas behind its design any more. This software is then broken, even if it still works technically.

User avatar
Daniel Beardsmore

31 Jul 2013, 20:50

OK — we need something to function as a master. A read-only scraper is a quick and effective way to achieve the primary goal, which is to aggregate the data for listing and searching. The wiki is genuinely not difficult to work with.

However, there's a strong voice in this discussion that the wiki is too hard to use, and that we need lots of knobs and sliders for providing guided data entry. Any attempt to appease this voice leads to path filled with pain, with the likely end result being scrapping the whole wiki and replacing it with a database-driven system with some form of article editor (perhaps a reimplementation of MediaWiki syntax, or a rich text editor like TinyMCE), and the article editor will need placeholder support for data generated from the database, such as {{table:variants}}, together with URL-free relative links such as [[SMK second generation]].

I think that such a database-driven system, if properly implemented, would be a fantastic system with the potential for use by many other groups, but unless a sufficiently similar product already exists, then this risks being a huge effort on the part of a small number of people solely to counteract the laziness of the other people who refuse to learn a bit of MediaWiki syntax and a few fairly obvious conventions.

There's a wiki subforum here and several of us are more than happy to answer any questions people have about working with the wiki with tips and advice. Do you personally feel that implementing a complete replacement for the wiki is worth the effort? If enough people are truly committed to the idea, then fine: I agree that MediaWiki is truly sub-par for what we as a community need, and a database-driven system would work wonders.

However, I am not sure that this burden is justifiable considering that we already have evidence that a few lines of trivial JavaScript code already bring up the only data we need!

Obviously, if software matching these requirements already exists, then great. Don't forget that we'll be replacing all the pages, including the glossary sections such as Keyboard terms and Keyboard keys, the keyboard mod area, etc, so it will need to support functioning as a conventional wiki as well.

User avatar
Halvar

01 Aug 2013, 00:04

OK, if I accept as the most important requirement that we need something to function as a master because we can't afford split efforts, then I agree with your way to do it, namely develop wiki infoboxes into something that can be searched and, as far as it is possible with a justifiable effort, be easily input and modified.

BTW, you are also right that a wiki is really easy to contribute to as long as it's not overcharged with conventions or hard-to-use templates (like Wikipedia is). Especially if you're a writer type... ;)

What I'm afraid of is that we end up
a) suffocating matt3o's enthusiasm to contribute something new and valuable
b) and making some half-hearted scraper and changes to the infobox templates that lead to lots of conventions and scare people off from contributing to the wiki
c) and spend valualble time adding some half-hearted search mechanisms based on the scraped wiki pages that nobody uses because they are not detailed enough.
In short, pretty much everything would actually come out worse than it is now instead of better.

I would prefer a "split efforts" scenario to that outcome. Let matteo & some friends make a good structured database, leave structured searches to them entirely, and keep maintaining the DT wiki as if nothing happened, quite like you did when ripster started his wiki on reddit. Webwit is right though -- if matt3o decided to design his DB in a way so that it could replace the wiki, then we would have a situation of competition that would be quite destructive.

It's not easy ...
Last edited by Halvar on 01 Aug 2013, 00:27, edited 1 time in total.

User avatar
Daniel Beardsmore

01 Aug 2013, 00:22

Who do you imagine point b will scare off, exactly?

User avatar
Halvar

01 Aug 2013, 00:38

well even what we have now:

Code: Select all

{{infobox dkeyboard
| image name   = 
| pn           = 
| fcc          = 
| branding     = 
| manufacturer = [[Cherry]]
| switch       = [[Cherry MX Black]]
| layouts      = Modified [[ISO]]
| features     =
| interface    = [[AT keyboard interface|AT]]
| weight       = 
| years        = 
| price        = 
| switch mount = [[Plate mount]]
}}
would need a lot of explanation. What is "pn"? What is "years" exactly? What goes into "features" and how do I format it? Am I supposed to use tags from a list for features? How do I format manufacturer and switches? OK, you can just copy that from another keyboard and fill the fields that you think you understand and leave everything else free, but you won't get a good database if new contributors don't understand exactly what the fields are supposed to mean and how they're supposed to be used.

User avatar
ne0phyte
Toast.

01 Aug 2013, 01:16

Die you check the "infobox dkeyboard" template? I think there are some comments on the fields (but I might be wrong and I don't want to check on my phone).

EDIT: in = on. Typing English with German autocorrect doesn't work :mrgreen:
Last edited by ne0phyte on 01 Aug 2013, 10:28, edited 1 time in total.

User avatar
Daniel Beardsmore

01 Aug 2013, 01:22

What's your point?

http://deskthority.net/wiki/Template:Infobox_dkeyboard

I've even gone to the trouble of documenting the template fields for people. Did you seriously not think to ask someone? I would have readily informed you of the way to access the template documentation. Why has it taken so long for someone to raise this point?

The documentation could do with more detail, but you only need ask and I'll amend the templates to clarify anything as needed. I am not responsible for creating these templates, so I cannot authoritatively state how all the fields should be used, but if it's an issue, ask in the wiki forum and we can make a decision.

If you create an isolated database, people will only be able to amend a few facts and figures. If the wiki is the master data source, people will absorb the syntax and they'll also spot opportunities to make minor changes, and then later major changes.

That's why they're called "wikis" — you can quickly, easily edit any information at will, and pick up the finer points as you go along. If you fluff up, no big deal, just undo the change. Everything is comprehensively audited. Found a spelling mistake? Just click "Edit" above the affected section, correct it, and commit the change.

There's no great secret, no mystery. No-one is alone. No damage is permanent. There's a whole subforum dedicated to discussing the wiki if people need to ask questions. But the idea that people are just stewing in silence for year after year hoping that leprechauns are going to come around and inject knowledge into their brains at night while they sleep is just absurd.

User avatar
Halvar

01 Aug 2013, 01:52

You're right, that's not too hard to find if you know what you're looking for. The fact that I didn't find that in spite of using MediaWiki quite regularly as an author at work and at home speaks of my blindness as well as the fact that sometimes people just don't work like you expect them to.

User avatar
matt3o
-[°_°]-

01 Aug 2013, 16:46

I'm not implying that the wiki is hard to use, just that I believe there *might* be a better way. Think of MP3. Apple made such an easy and appealing tool that people preferred to buy tracks rather than download them.

If I'm wrong, oh well, who cares. If I'm right we might have found a way for everyone to enjoy browsing/collecting/adding keyboards. Isn't this what we all want?

User avatar
webwit
Wild Duck

01 Aug 2013, 16:50


User avatar
Halvar

01 Aug 2013, 17:07

I've worked with the FCKEditor plugin for MediaWiki, and that one is actually a very nice help -- especially as you can easily toggle between Wiki markup and WYSIWYG. It's also quite stable and pretty fast.

User avatar
Muirium
µ

01 Aug 2013, 17:10

Wikipedians Reject Change. We dedicate the entirety of tonight's program to cover this earth shattering revelation.

User avatar
ne0phyte
Toast.

01 Aug 2013, 17:16

Muirium wrote:
Wikipedians Reject Change. We dedicate the entirety of tonight's program to cover this earth shattering revelation.
Thousands of editors/contributors didn't mind figuring out how the wiki markup works and they even prefer the straight and easy way of entering data in plain text. *cough* *cough*

User avatar
Daniel Beardsmore

01 Aug 2013, 19:08

Muirium wrote:Wikipedians Reject Change. We dedicate the entirety of tonight's program to cover this earth shattering revelation.
Have you tried it? My first impression (on a Core i7 PC, no less) was "WTF this is incredibly slow — what have they done?" and it's reassuring to know that it's not just me. It really is slow. Also, because it lets you just type directly onto the page, it's quite disconcerting, although that's not a fault. The speed, though, is.

Don't get me wrong. In an ideal world we would replace MediaWiki with another software system that is more suited.

However, it seems apparent to me that people are trying to find ways to replace a system they won't use and don't understand, and therefore cannot reliably comment on its strengths and weaknesses. Likewise, they cannot conceive exactly how much work is involved in solving the stated problems.

Take-home questions:

1) How does creating a separate database stimulate wiki growth?

2) Are we satisified that the database and wiki will remain in sync?

Post Reply

Return to “Keyboards”