Written by Josep Llort on August 24, 2018
In the process of purchasing a document management software, the future users of these tools too often focus all their efforts on the features of the systems to enhance their user experience. Some may be concerned with the technology used to create their knowledge base as well as the costs but leave aside the evaluation of a subject that we consider key; how to get our data from this system in the future.
We can extend the previous statement to other types of software such as ERP or CRM, among many other community edition management systems. In general, when we acquire software, a relationship is established, which in some cases we could call "marriage," almost for life. This fact can go unnoticed at first, but with the passage of time and during the creation of the knowledge base, it becomes more evident. Moreover, this is because the software comes to occupy, in a short time, a central point of the company. This position creates a strong dependence on it, which is not simple or convenient to get rid of.
Finally, that simple choice made years ago, can become a prison. Having the software empowered with "our information or knowledge base" which we can access, manage and little else. We would use the following metaphor; "We put the money in a bank, we can take out small amounts through the ATM, but we cannot move all our capital to another entity."
We will focus mainly on document management systems, open source or otherwise and their community edition management systems; although as I said earlier in this article, it can be applied to most computer solutions.
We found with KnowledgeTree community edition management system a paradigmatic case, not to mention extreme. An open source document management systems in the form of a web application, with a classic architecture based on the PHP open source language of and a MySQL database (the classic LAMP architecture).
Due to the ease of installation and its simplicity, it was well received by the open source user community, so the use of this application was quite successful. Up to here, everything was perfect.
At the end of 2012, the company behind KnowledgeTree decides to pull the plug on the community edition open source document management software, eliminating its distribution, as well as source code on SourceForge and other free software servers.
In this scenario, users have two options; continue with an open source document management application without any support, in which no new releases or patches would be released; or switch to a SaaS (Software as a Service) model. Sure enough, it was time to make cash; no matter how bad the reputation it gives you, for some the dollar goes before everything else. This action was completely legal, although to the displeasure of many and, at least, ethically debatable.
An unpleasant story, which some have suffered directly. Whatever the reason may be, in the end, we find a web application for content management and its website design, in which we have one of our most important assets, electronic documents. We can reach this point by different paths; cases as the previously detailed; companies that disappear from one day to the next, obsolescence of some applications that have not been able to renew or companies that have decided to freeze the applications. In the latter case, I will name Activiti, a BPM in which Alfresco actively participates or was involved - it is not very clear - but the facts are the facts. Everything seems to indicate that in the last year, all efforts have gone to the paid version and the project, in a year and a half the open source version has been unable to get beyond a beta version, which seems eternal. A notice for navigators; the developers that created the software, seeing the state of affairs, have created their own architect web development called Flowable, something that is at least distributing. It is quite likely that someone is thinking about how to fill the box of dollars.
One day we get up, and we have our obsolete webbased document management system, without patches for the bugs (with the risk that this implies), our information is imprisoned, and we do not know how to get it out. Here we should learn a lesson: that when selecting software, it is essential to consider the functionalities and costs, as in the future we may have to migrate from it. Therefore, it is quite sensible to evaluate this scenario and keep in mind how these migration processes can be and the facilities that the manufacturer gives us, in order to carry it out.
In OpenKM we have always begun with the premise that the information belongs to the user and that our software must be the custodian of it, but not the jailer. For this reason, since the first version of OpenKM, we have by default a system for importing and exporting files and metadata. Also, as part of the documentation of the application, we explain the basic structure of the database, in case someone needs to access the application at the database level. Additionally, with the complete Webservices API, it is possible to implement the necessary and customized logic to export the data in a straightforward way.
In general terms, a migration process of a document management system can be proposed from at least three ways:
In OpenKM it is quite usual for users of other document managers or community edition management systems to ask us to migrate their data to OpenKM, in other articles I will expand on other cases. In most cases, users do not know if their document manager has an API or not and if they do, have never used it. In absolutely no case, except with OpenKM, have we found a manufacturer that has a simple and integrated export utility, which allows you to extract all the data. When I talk about everything, I mean not only the documents but also the document properties: all the document versions, as well as related document metadata, applied security, who created the user and when, notes, etc.
I remember having tested an Alfresco utility a couple of years ago to export the repository. The export generated a 20GB ZIP file, completely unreasonable, with their respective XML and to make things worse, it incorporated it as part of the document manager. Someone, one day, will have to explain to me the logic that devised an export system by which the export file ends up entering the document management system itself. If this becomes commonplace, I think we will need to work on banishing this trend.
Returning to the primary topic, a DER of the database from the manufacturer or the knowledge of the main tables are that should be taken into account to identify your data within the application; will not be given in general. In most software licenses, if not all, it is explicitly forbidden to reverse engineer the application. However, to be clear, these restrictions are created so that the competition cannot simply incorporate third-party functionalities without needing any arrangement, which unfortunately also happens. Here is another notice for navigators, I would distrust those who are more dedicated to copying their neighbours, than implementing their own solutions.
In the case of the KnowledgeTree community edition document management system, there is no export utility. We do have an API in which we find some services based on CMIS and RESTful: http://orion.KnowledgeTree.com/KnowledgeTree_API_Documentation
If what we want is to migrate all the information, here we not only mean documents but also metadata, we will see that in both cases the implementation of the API, in RESTful and CMIS is insufficient for this purpose. I personally find the CMIS disappointing - in an article, I will try to talk about our experience with this API that aims to be a mechanism to interoperate between different document management applications, but which for me is insufficient and not without significant problems.
At this point, we have no alternative but to investigate the structure of the database to identify where the information is located, in what formats and in what form should its export be performed.
In OpenKM we have made migrations of different document management applications from KnowledgeTree through Docuware, Cannon Document Management or IBM Content Management, among others. In the case of KnowledgeTree, it has been possible to develop a DMS to OpenKM migration tool application, which automates this process. Unfortunately, it is not possible in all cases to create a unique migration process; for many reasons.
In the end, as in many other applications, the retrieval of electronic documents from a document manager will depend on our ability to identify, at the database level, where the information is located.
Most, at this point in the article, would leave the reader to their fate with the KnowledgeTree document management application and its database. We will try to provide all the clues we have for interpreting this database, based on the latest available version of KnowledgeTree, specifically version 3.7.
It seems that in the following tables you would find the comments that are made about the nodes (in general it is quite probable that this information management is not very relevant):
We will find the types of documents in the table:
The documents are in the table (the application works with document versions):
The metadata of the documents will be found in the tables (if we are lucky, our repository will not have metadata, because this part of the structure is somewhat tedious):
The relationships between documents can be found in the tables:
The users subscribed to documents and folders can be found in the tables:
The keywords or tags (depending on how we want to call them) can be found in the table:
Titles in the folders, in the table:
The privileges in the table:
The privileges are complex and difficult to understand. There are more tables in the database, but those that we really consider relevant are those that we have detailed above.
When acquiring an application, it a minimum description of the DER of the database is very helpful to have. If we equate an application to the form of an organism, we could identify the database as the part of the human brain where the information is stored. Incorrect storage will make it difficult not only for access but also for the performance of the data management process.
When looking at some databases, we would find many rather curious things that would most likely make us rethink our choices. In general, a DER - Diagram Entity Relationship - of a database that is incredibly complex and not easily understood, should arouse at least some alarm. In some cases, it may be justified; in others, it is due to - from my point of view - that it has likely been based on incorrect ideas or poor execution. While it is true that databases are evolving over time, it is also true that if there are serious design errors at the database level, it is not at all easy to solve them and time will only make them more evident.
North America: Please call +1 646 206 6071.
Monday - Friday: 08:00 am - 17:00 pm EDT for immediate assistance. Currently, it is Friday 11:43 am in New York, USA.
Europe Spain: Please call +34 605 074 544.
Monday - Friday: 09:00 am - 14:00 pm, 16:00 pm- 19:00 pm CEST for immediate assistance. Currently, it is Friday 17:43 pm in Palma de Mallorca, Spain.