Comments on: Data management in SIM/SEM systems http://blog.amber.org/2007/08/28/data-management-in-simsem-systems/ Thoughts of a minor lunatic Wed, 21 Oct 2009 01:55:36 +0000 http://wordpress.org/?v=2.9.2 hourly 1 By: petrilli http://blog.amber.org/2007/08/28/data-management-in-simsem-systems/comment-page-1/#comment-49820 petrilli Wed, 29 Aug 2007 13:59:29 +0000 http://blog.amber.org/2007/08/28/data-management-in-simsem-systems/#comment-49820 Having worked in large data warehousing applications in Oracle, Sybase, Informix, DB/2, PostgreSQL and MySQL, I'd say that if every single implementation suffers from similar issues, then whether or not the concept of relational theory works is really irrelevent. Additionally, the primary goals, and this is something I should have addressed directly, of ACID compliance are simply not interesting in certain applications. The overhead of ACID compliance is huge, and while absolutely critical to the historical mainstay of RDBMS, it's nothing but needless cruft for some applications. As for using a RDBMS as a "flat file," I can tell you that tested benchmarks on commodity hardware shows a near 30-50% hit on sequential scan rates. While this is not the issue in OLTP applications, as you will no doubt be aware, in Data Warehousing sequential scans are the rule, not the exception. Because of that, traditional RDBMS systems are not overly interesting for large data scans. When your data access model is largely pointer-driven (such as in a network database), then the performance difference is gigantic. In some rough comparisons between PostgresSQL and AllegroCache, I observed order of magnitude or more differences in graph usage. No relational database is designed to handle massive graph-style queries like reachability, condensation, cycle removal, etc. This is especially painful with digraphs that have cycles in them. I understand that for many applications, RDBMS is an acceptable solution, but to argue that because you've never seen it collapse into a gigantic morass of tangled query planner attempts doesn't mean it doesn't happen. Having worked in large data warehousing applications in Oracle, Sybase, Informix, DB/2, PostgreSQL and MySQL, I’d say that if every single implementation suffers from similar issues, then whether or not the concept of relational theory works is really irrelevent. Additionally, the primary goals, and this is something I should have addressed directly, of ACID compliance are simply not interesting in certain applications. The overhead of ACID compliance is huge, and while absolutely critical to the historical mainstay of RDBMS, it’s nothing but needless cruft for some applications.

As for using a RDBMS as a “flat file,” I can tell you that tested benchmarks on commodity hardware shows a near 30-50% hit on sequential scan rates. While this is not the issue in OLTP applications, as you will no doubt be aware, in Data Warehousing sequential scans are the rule, not the exception. Because of that, traditional RDBMS systems are not overly interesting for large data scans.

When your data access model is largely pointer-driven (such as in a network database), then the performance difference is gigantic. In some rough comparisons between PostgresSQL and AllegroCache, I observed order of magnitude or more differences in graph usage. No relational database is designed to handle massive graph-style queries like reachability, condensation, cycle removal, etc. This is especially painful with digraphs that have cycles in them.

I understand that for many applications, RDBMS is an acceptable solution, but to argue that because you’ve never seen it collapse into a gigantic morass of tangled query planner attempts doesn’t mean it doesn’t happen.

]]>
By: Leandro Guimarães Faria Corcete DUTRA http://blog.amber.org/2007/08/28/data-management-in-simsem-systems/comment-page-1/#comment-49819 Leandro Guimarães Faria Corcete DUTRA Wed, 29 Aug 2007 12:51:10 +0000 http://blog.amber.org/2007/08/28/data-management-in-simsem-systems/#comment-49819 Ðe problem is that you don’t really understand databases. You identify Oracle woes as inherent to RDBMSs, while ðey are Oracle or even SQL particularities. When you propose flat files for reporting, you forget RDBMSs can be used in exactly ðe same way, while also making normalised data available for more dynamic uses. When you talk about forensic analysis proposing object or graph DBMSs, you forget you are fixing a few data acceß paþs in detriment of all others. It may be nice for reporting, but you can do ðe same with foreign keys on the logical side, indexing and materialised views on ðe physical one; when you talk about linking, you are mixing up the logical and physical levels. Same for real time analysis. In fact, due to normalisation, RDBMSs are uniquely suited for in memory databases. Don’t let current implementations blind you to concepts and poßibilities. Ðe problem is that you don’t really understand databases. You identify Oracle woes as inherent to RDBMSs, while ðey are Oracle or even SQL particularities.

When you propose flat files for reporting, you forget RDBMSs can be used in exactly ðe same way, while also making normalised data available for more dynamic uses.

When you talk about forensic analysis proposing object or graph DBMSs, you forget you are fixing a few data acceß paþs in detriment of all others. It may be nice for reporting, but you can do ðe same with foreign keys on the logical side, indexing and materialised views on ðe physical one; when you talk about linking, you are mixing up the logical and physical levels.

Same for real time analysis. In fact, due to normalisation, RDBMSs are uniquely suited for in memory databases.

Don’t let current implementations blind you to concepts and poßibilities.

]]>