Keyboard database

mecano · 28 Feb 2017, 00:25

Mind you I think I understand perfectly what's going in here, you should really have a look at Drupal, Taxonomy, CCK, Views, Apache Solr, autocompletion on prepopulated and editable dropdown menu and so on, basically everything can be linked to anything at different levels wether you want a content to be part of a thesaurus or being a more evolve type (with lots of particular data fields, thumbs, images gallery, etc.), you can crosslink contents and terms and make them one as long as you either link them by hand (the hard way) or give them the same name (think id here, the smart way).
For this project I suggest having two main content types, keyboard and switch.
I don't see the problem with the many NULL fields, because with this model you only need a keyboard or a switch id, all other fields can be null. The system won't shoke (until you want and mark a field as required) it will just output nothing for these fields (or output whatever you want on conditional if NULL). Just think that you have an internal id (abstracted or meaningfull as you wish) that will be hidden to the visitor so you can have duplicates with same name/family if you have to.
You can even allow creation of new terms or term families, moderate them or not, have revisions or not, etc.
The late PHP7 looks great in many aspects, and it is way faster than before, lower footprint as well.
Maybe you should drop MySQL and start using MariaDB, it's a drop in replacement for some versions, cli commands are the same, not really faster but back to open source and started by original MySQL programmer as a fork. Postgre is really neat, but both maria and mysql are coming close nowadays and benefit for better integration with third parties.

Daniel Beardsmore · 28 Feb 2017, 00:55

Choosing an entire stack is only valid if I own the server/VPS, and right now I don't have that, so I'd need to justify that option (or find a non-Nazi host that has the required services). If we use the Deskthority server, the options are constrained there too. (Although it's true Deskthority does use MariaDB, MySQL is perfectly capable of running a keyboard database, although the wiki is horribly slow even on MariaDB.)

I'd like to see your schema for this. For example, how would you store multiple years per keyboard? Each keyboard can have in effect an unlimited number of associated years, simply because each given year type may occur more than once, e.g. oldest and newest IC dates, oldest and newest case dates, or multiple conflicting internal labels (such as found in Maxi Switch keyboards). Likewise, you can have effectively unlimited serial and part numbers, and unlimited PCB codes. I don't know if you've looked around Keycombo but I'm also storing both the internal and external serial and part numbers and PCB codes, to help identify dating schemata and brand-specific PCB codes. In essence, no code in a keyboard is meaningless, and having them on record allows this data to be mined.

Most of these fields are restricted in Keycombo to avoid writing the one-to-many association, and no-one has yet presented a clear opinion on what the schema should be. Having true one-to-many years makes the SQL much easier (only one column to process for min/max) but makes the UI more challenging. Note that years are not a simple property: each year is directly linked to what specific characteristic bears that year, e.g. the year moulded into the case, or the year printed onto the rear label.

Likewise, keyboards are not associated with switches, but with an unlimited number of (switch type, purpose) pairs that define the type of switch and what it is being used for within that keyboard. Here, I did make the effort to support true one-to-many association.

Trying to get this to work with taxonomies sounds like a nightmare, like trying to hammer in a nail with a mouse. Does it actually seem like a reasonable way to assemble this kind of data, and what would the UI be like trying to deal with this?

elecplus · 28 Feb 2017, 17:56

Have you taken a look at any Bill of Materials (BOM) programs? There are several open source ERP programs that include a BOM, and they seem to handle all these problems well, as long as you choose one that also does a reverse BOM. The basic idea is that you create the parts that make up the whole unit, and then the program will tell you how many complete units you can make. The reverse is that you input a whole unit, and then start adding the parts that make this unit. Yes, you can add parts on the fly in most of the programs. You specify how many of each part is required for the build. You can also specify alternatives for each part. This would be useful for Cherry black vs Cherry blue, different types of caps, alternate circuit boards or cases (a Dell case on an IBM buckling spring). You can add notes to each part or alternate part (controller board X only works with PCB Y).

Daniel Beardsmore · 28 Feb 2017, 19:18

My feeling is that the data model and objectives are just too specific for any off-the-shelf finished product. The time needed to tear apart the program and remove all the extraneous parts (probably 99% of it would have to be ripped out) and then build in all the missing parts isn't likely to be faster or easier than building the whole thing from scratch. The single specific gotcha is the data entry workflow — it's far more than just a few HTML forms to fill in.

This gives a sort of chicken-and-egg problem in that the choice of UI framework can limit you to how you build the software (platform/language, data model etc), but the nature of the software limits you in terms of UI frameworks (they have to be able to cope with the demands, and since we have no idea who might be involved, they may have language preferences, such as hating PHP even more than I do).

It kind of feels like there are two camps: people who understand what Keycombo is, why it exists and what it seeks to achieve, but are non-technical, and technical people who have no idea what Keycombo seeks to achieve or how it works, or how their favourite framework would truly handle it. Whatever else people want this database to be, it must at a bare minimum replace Keycombo in its entirety.

However, you get another chicken-and-egg with the notion of extending the site to cover private collections and various other things, as how far you want to take this also factors into how you'd want to build, extend and maintain it. An off-the-shelf product for one aspect may not cover another.

What I also found telling was that there was very little interest in Keycombo in the first place. You have value-by-society and absolute value. The expression "ahead of its time" more commonly describes the gap between absolute value and the combination of poor marketing and unenlightened people who simply can't see the value of something handed to them: no matter how good something is, if people refuse to accept it, its effective value is reduced to their level. (Apple are very good at the marketing side of this: they readjust societal valuation regardless of absolute value.)

Regardless of what value Keycombo has to me personally in terms of its data mining capabilities, if its value-by-society is low, it will still fail regardless.

mecano · 28 Feb 2017, 20:11

Daniel Beardsmore wrote: ↑Choosing an entire stack is only valid if I own the server/VPS, and right now I don't have that, so I'd need to justify that option (or find a non-Nazi host that has the required services). If we use the Deskthority server, the options are constrained there too. (Although it's true Deskthority does use MariaDB, MySQL is perfectly capable of running a keyboard database, although the wiki is horribly slow even on MariaDB.)

I can provide such host.
Yes Deskthority's server is handling so much things already…
As I said MariaDB is generally no faster than MySQL. Moving from closed sources and have faster patches may be the real motivation.

Daniel Beardsmore wrote: ↑ I'd like to see your schema for this. For example, how would you store multiple years per keyboard? Each keyboard can have in effect an unlimited number of associated years, simply because each given year type may occur more than once, e.g. oldest and newest IC dates, oldest and newest case dates, or multiple conflicting internal labels (such as found in Maxi Switch keyboards). Likewise, you can have effectively unlimited serial and part numbers, and unlimited PCB codes. I don't know if you've looked around Keycombo but I'm also storing both the internal and external serial and part numbers and PCB codes, to help identify dating schemata and brand-specific PCB codes. In essence, no code in a keyboard is meaningless, and having them on record allows this data to be mined.

To answer your question two solutions :
- create/duplicate the first keyboard, change the year in the new one.
- allow multiple values for fields in keyboard content. Link values to display data (ie images or other contextual data).
But as your model is like quicksands ie one year, several codes, different pictures, seriously : each physical different keyboard is a new entry in the database approach seems like the more reasonable one. Then allow creation of new fields on the fly.

Daniel Beardsmore wrote: ↑ Most of these fields are restricted in Keycombo to avoid writing the one-to-many association, and no-one has yet presented a clear opinion on what the schema should be. Having true one-to-many years makes the SQL much easier (only one column to process for min/max) but makes the UI more challenging. Note that years are not a simple property: each year is directly linked to what specific characteristic bears that year, e.g. the year moulded into the case, or the year printed onto the rear label.
Likewise, keyboards are not associated with switches, but with an unlimited number of (switch type, purpose) pairs that define the type of switch and what it is being used for within that keyboard. Here, I did make the effort to support true one-to-many association.

Yes, you only need the one-to-many if the type is worthy, I mean 'year' you won't have nothing in year, though you can use it to display all keyboards from a particular year. So how many types? Keyboard, switch, and?
Then indeed all happen in the controller/view.

Daniel Beardsmore wrote: ↑ Trying to get this to work with taxonomies sounds like a nightmare, like trying to hammer in a nail with a mouse. Does it actually seem like a reasonable way to assemble this kind of data, and what would the UI be like trying to deal with this?

How so? It gives you relations and bridges for mining/organizing/displaying things, it may look convoluted but it is not.

webwit · 28 Feb 2017, 20:34

mecano wrote: ↑Yes Deskthority's server is handling so much things already…

Nah it has over 90% resources left. Probably over 95%. It just runs a little hobby site and its wiki. It can also run a little keyboard database. With little I mean it won't have millions of records or visitors.

In general you guys seem to be vastly over-complicating things.

Unless someone here is unemployed, who on earth is going to make your Apache Solr over NodeJS linked with AngularJS and bootstrapped Laravel over Oracle solution (slightly exaggerating there). There must be many front-end packages using basic stuff like MySQL/MariaDB and PHP for handling entities with attributes and relations, and with a users/permissions system. I think the interesting work lies in designing the data model.

Daniel Beardsmore · 28 Feb 2017, 21:27

mecano wrote: ↑ To answer your question two solutions :
- create/duplicate the first keyboard, change the year in the new one.

Where's the data normalisation?

mecano wrote: ↑ Yes, you only need the one-to-many if the type is worthy, I mean 'year' you won't have nothing in year, though you can use it to display all keyboards from a particular year.

The same keyboard may have more than one year recorded against it. Take for example the Ortek keyboards:

http://telcontar.net/KBK/Keycombo/brand.php?id=33

This helps guard against misreading dates: if several dates are listed together and they agree with each other, then it indicates that the dates are more likely to be correct. If the dates differ, this may be a cause for reinspection of the evidence later. Essentially, instead of choosing an arbitrary date from an arbitrary component, I'm recording all the data to be sure and safe.

If you have specific arguments for or against the data model, that's different, but please base such arguments on an examination of the existing site and the existing data model.

A fairly good example subset of data is the Alps page:

http://telcontar.net/KBK/Keycombo/brand.php?id=12

All the Alps keyboard codes are directly visible.

Brands are not a taxonomy; they self-construct this page and could be used for more data in future:

http://telcontar.net/KBK/Keycombo/fcc.php

Also, I'm doing more than listing keyboards by year. Take a look at the Alps SKCL/SKCM preliminary timeline:

http://telcontar.net/KBK/Keycombo/switc ... .php?id=25

You can then counter-bias this against the distribution of data in general:

http://telcontar.net/KBK/Keycombo/mastertimeline.php

The Cherry MY timeline is muddled and odd, and really needs a lot more data:

http://telcontar.net/KBK/Keycombo/switc ... .php?id=49

At first, Types 1, 2 and 3 were clearly separated, but as I added more data, it became increasingly blurred!

mecano · 28 Feb 2017, 22:02

webwit wrote: ↑
mecano wrote: ↑Yes Deskthority's server is handling so much things already…
Nah it has over 90% resources left. Probably over 95%. It just runs a little hobby site and its wiki. It can also run a little keyboard database. With little I mean it won't have millions of records or visitors.

In general you guys seem to be vastly over-complicating things. Unless someone here is unemployed, who on earth is going to make your Apache Solr over NodeJS linked with AngularJS and bootstrapped Laravel over Oracle solution (slightly exaggerating there). There must be many front-end packages using basic stuff like MySQL/MariaDB and PHP for handling entities with attributes and relations, and with a users/permissions system. I think the interesting work lies in designing the data model.

Taking how long images from Round5 and 6 are loading and because the forum is working great here, I thought the wiki under ressources quota moderation and so that Deskthority server had not so much left to share, but must admit I didn't investigate much, and these images are numerous and not tiny.

Yes there are surely many (or not so many modular, ready to ship) CMS, that is what I am talking about oversimplifying things, but if you want to find things in the keyboard jungle database you need filters and facets, hence solr if you don't want to eat all the server ressources, there may be other solutions I'm not aware of, of course.
I, for myself, think the model can be really simple and the work is on how you will have user create, view and search the data.

webwit · 28 Feb 2017, 22:44

Round5/6 probably loads slowly because web browsers tend to limit the number of simultaneous requests to a single domain. The wiki is not the quickest when you're logged in (when not logged in it serves cached pages and is much quicker). Possibly it could be optimised because (except for the MyISAM search table) it uses Innodb with default cache/memory settings, while we have plenty of spare ram (32GB total). Not sure if that helps much for dynamic pages though.

webwit · 28 Feb 2017, 22:55

I once did a project with Solr on a vast nosql database. That works great. But I think it would be overkill for something like this, without millions of records. And how many people know Solr? This is the advantage of PHP/MySQL, while not being the sexiest. Both their strong and weak point is that even your dog could submit a patch to fix or add something.

Daniel Beardsmore · 28 Feb 2017, 23:09

SQL Server behaves as if it owns the server, while MySQL pretends it's not even there. It came as a shock to discover just how little cache it allows itself (zero or close to that) unless you specifically tell it that it's allowed to play with all that RAM you bought it! Now I understand that database corruption error that only occurred when I was logged in — it makes sense now.

Solr seems a bit overkill. The whole database is 137 kB of code and templates (not counting a few central library files) and 544 kB of MySQL tables including the indexes. Nothing Enterprisey is needed to make this work: it works perfectly without it.

webwit · 28 Feb 2017, 23:21

The problem with optimising Innodb is it takes experimenting, and that those innodb files need to be rebuild (unless one limits on optimisations without that), which means downtime, and because it's difficult to guess the best settings, it means multiple downtimes. Unless that experimenting is taken to another, similar server. For example, in some cases adding too much cache makes it slower because of the penalty of cache management if the queries are pretty unique. Shit like that.

mecano · 01 Mar 2017, 00:27

Daniel Beardsmore wrote: ↑
mecano wrote: ↑ To answer your question two solutions :
- create/duplicate the first keyboard, change the year in the new one.
Where's the data normalisation?

If you have no relation between keyboards years and switch colors other than static attributes why would you need that?

There wiil be no update anomaly as each keyboard is no unique, not each keyboard model but each keyboard. Of course things like serial numbers and so on for a specific keyboard goes to multiple values fields.
So after defining main content types (keyboard, switches), you need to define multiple values fields : who are they? Or more exactly what are the less significant datas in the chain, here serial numbers are a good guess I think, years are not as you can use these for advanced data mining while serial numbers will only be of use in simple search field (simple doesn't mean you can't use wilcards or regex).

There will be no insertion anomaly as you can have NULL value fields.

I don't see ways of deletion anomaly either.

You'll have normalization through the CMS anyway (via id and fields tables).

Daniel Beardsmore wrote: ↑
mecano wrote: ↑ Yes, you only need the one-to-many if the type is worthy, I mean 'year' you won't have nothing in year, though you can use it to display all keyboards from a particular year.
The same keyboard may have more than one year recorded against it. Take for example the Ortek keyboards:

http://telcontar.net/KBK/Keycombo/brand.php?id=33

This helps guard against misreading dates: if several dates are listed together and they agree with each other, then it indicates that the dates are more likely to be correct. If the dates differ, this may be a cause for reinspection of the =evidence later. Essentially, instead of choosing an arbitrary date from an arbitrary component, I'm recording all the data to be sure and safe.

Here I don't see a year field with multiple values or a special need for a special relation, I see four fields of type, let's call it 'year_manufactured' with a date year format of name (field relation here, id) 'ic', 'case', 'label' and 'pcb'.

Daniel Beardsmore wrote: ↑ If you have specific arguments for or against the data model, that's different, but please base such arguments on an examination of the existing site and the existing data model.

A fairly good example subset of data is the Alps page:

http://telcontar.net/KBK/Keycombo/brand.php?id=12

All the Alps keyboard codes are directly visible.

Brands are not a taxonomy; they self-construct this page and could be used for more data in future:

http://telcontar.net/KBK/Keycombo/fcc.php

Also, I'm doing more than listing keyboards by year. Take a look at the Alps SKCL/SKCM preliminary timeline:

http://telcontar.net/KBK/Keycombo/switc ... .php?id=25

You can then counter-bias this against the distribution of data in general:

http://telcontar.net/KBK/Keycombo/mastertimeline.php

The Cherry MY timeline is muddled and odd, and really needs a lot more data:

http://telcontar.net/KBK/Keycombo/switc ... .php?id=49

At first, Types 1, 2 and 3 were clearly separated, but as I added more data, it became increasingly blurred!

I thought you wanted to remodel, sorry

Looks like all you need is a front-end and a search/sorting facility as Webwit said.
Great job Daniel by the way

Daniel Beardsmore · 01 Mar 2017, 00:52

OK, so …

Here's NTC KB-615X family:

http://telcontar.net/KBK/Keycombo/family.php?id=14

You'll notice that the family has a name, plus its manufacturer, FCC ID, and notes. Same for G81-3000:

http://telcontar.net/KBK/Keycombo/family.php?id=42

So keyboards are attached to families. Or on the other hand, they may be attached to a product:

http://telcontar.net/KBK/Keycombo/product.php?id=1

In fact, I just corrected "AWC" to "Wong's Electronics" because I've since learned that AWC is Atari Wong, which is why it only appears on Atari PCBs.

Here's one of the oldest Alps SKCL keyboards known:

http://telcontar.net/KBK/Keycombo/keyboard.php?id=88

That's attached to its manufacturer without going through a family.

So how would you taxonomise the link between a family and its manufacturer?

I don't object to a better data model, but you seem determined to tear apart every link between records in the existing database until it's nothing more than a huge spreadsheet, and I have no idea what anyone would gain by taking a working database and reducing it to rubble.

Does nobody understand what this database even does?!

Daniel Beardsmore · 01 Mar 2017, 01:01

Ugh, just forget I ever asked.

mecano · 01 Mar 2017, 12:06

Daniel Beardsmore wrote: ↑ So how would you taxonomise the link between a family and its manufacturer?

At this point I don't, are you lost in UI and want hints or? Because the UI is the problem here, right?

mecano · 21 Mar 2017, 14:06

Took the time to read properly and must admit I was walking out of my shoes, sorry for the noise and to Daniel, I was seeing it as refactoring through a particular CMS, and taking the CMS/UI approach hence so many misunderstanding I guess. Now to add some fuel, another option for a search engine would be Sphinx, which is C++ and probably a far more easier to deploy.

Daniel Beardsmore · 21 Mar 2017, 19:42

There are a lot of related issues here, as it's a very open-ended situation.

The starting point is my existing keyboard database, which is already live and operational.

Right now, I don't consider search to be a concern. There's not a lot to search for, and not enough data to warrant anything more than a brief SQL query to look for it.

What we don't know is: what's it going to be called, where is it going to live, and what are the optimal Web frameworks (front and back end, and usage thereof) to construct an admin interface for adding fresh data. The admin interface should result in the fewest duplicate records and the most streamlined data entry.

That is all I believe to be necessary: to just have a maintainable and manageable replacement to the existing database.

However, it does seem reasonable to add extra facilities, and the choice of framework, schema, etc will have an effect on how easily this can be achieved.

mecano · 22 Mar 2017, 11:31

Not sure if this has already been suggested but for more flexibility with what is already done, maybe have a look at Django?
When you say 'what's going to be called' you mean what people will be looking for first?
Do you have stats enabled for now? Like awstats or google ones? The telcontar site is already indexed by google and you'll get feedback through these.
Talking about duplicates, I guess this is something you'll always have to fight with as it depends on contributors, even if the schema and interfacess are optimized and as long as you allow humans a certain amount of liberty in the data creation. How many contributors are there yet?

Daniel Beardsmore · 22 Mar 2017, 22:50

"Keycombo" was my lousy and misleading attempt at a name, on the basis that it holds keyboard/switch/year combinations. It's not totally awful, but it will be no good as a name if more features are added.

I only have Webalizer. Publicly, my site is free of cookies and mostly free of JavaScript. (Behind the scenes pages use both.) My site is so boring, there's nothing to measure, but I will never implement any kind of cookie-based tracking or, if I can help it, public-facing JavaScript. (Even focusing the first form field can be done in CSS now, though I've never tried that.) What I don't have is SSH, and I'm used to having a shell active for development, including my Apache error log filter/formatter.

There are zero contributors. Not even me. The UI is too bad to let anyone use it, and I've not updated it in months. I don't foresee many people actually contributing, not least because every entry requires photos as proof.

"Maybe look at" isn't sufficient — I've wasted enough years on poorly-chosen technology and I'd rather have no site at all than destroy the last shreds of sanity on even more garbage.