|
|
Rank: Newbie
Joined: 10/8/2007 Posts: 17
|
I'm trying to troubleshoot someone's Umbraco site.
The main problem is that it's running slowly and the database seems to be increadibly large (around 5.5GB). I ran some sql to find where this space is going and most of it is in the 'cmsPropertyData' table which contains nearly 14 million records (it takes up 2.5GB space alone, 1.5GB actual data). It's a fairly large umbraco instance with about 10 sites and each site is in 4 different languages (effectively four copies of each site). But does 14 million records seem a bit abnormally large to you?
I tried running the following SQL against the database...
SELECT COUNT(dataNvarchar) FROM cmsPropertyData WHERE dataNvarchar IS NOT NULL AND dataNvarchar <> ''
...to get the number of records that are using the nvarchar data type (a lot of the page property nodes are using this type) and the count is 5'121'451. However, when I run this sql...
SELECT COUNT(DISTINCT dataNvarchar) FROM cmsPropertyData WHERE dataNvarchar IS NOT NULL AND dataNvarchar <> ''
...(notice the 'distinct' syntax) to get unique values the count is 28'787 so it seems there's a hell of a lot of duplicate data in there. What's more it seems to keep on climbing, everytime I do a total count of that table it goes up a little more, and the database has continued to gradually increase in size. The 28'787 value seems to never go up however.
I'm completely lost as to why it's doing this and what is going on. Can anyone suggest any reason as to why it could be doing this? Areas I could investigate further? Thanks for any help!
Dominic Pettifer
|
|
 Rank: Addict
Joined: 7/19/2006 Posts: 790 Location: Århus, Denmark
|
I don't know if this is the source for your problem, but could it possibly be related to this bug, that has just been fixed in the 3.0.5 release? Release/Expire dates can cause versions numbers to go highThere is not a lot of description on the reported bug, but if the site uses the release/expire features, then that might be it? Umbraco saves a new record each time a document is saved/published, so if some of the documents are publishing themselves all the time, that could cause the numbers to blow up. So maybe doing an update to 3.0.5 could help. Also you might want to check out the Client Tools for Umbraco to do a cleanup of the existing database, to bring down the number of existing versions?
|
|
 Rank: Fanatic
Joined: 7/20/2006 Posts: 408 Location: Amsterdam
|
Don't know or its related, but a few months ago we saw a dramatic load on our (load balanced) database envirenment, we are running a cluster of 6 heavey SQL server machines, and one umbraco process did use 20% of CPU power.
After a small research we found out that one document was published over and over, and resulted in:
select count(*)as aantal, nodeId from cmsDocument group by nodeId order by aantal desc aantal nodeId 1414664 2736 811279 2822 93 1272
It was caused because there was:
- 1 record Latest=1, Published=1, + older VersionId - 1 record Latest=0, Published=0, + newer VersionId.
So every minut the update process entered new data in the database and cluthering everything up :(
After manually updating we resolved the problem, we have still 4 gigs of umbraco crap in our database ;)
|
|
 Rank: Addict
Joined: 7/19/2006 Posts: 649 Location: Preston, UK
|
Ironfoot, Try http://www.codeplex.com/ClientTools4Umbraco and maybe clean out some stuff dont forget db backup just in case
Level 2 certified. If it aint broke dont fix.
|
|
Rank: Newbie
Joined: 10/8/2007 Posts: 17
|
Thanks for the suggestions, that does help a lot.
One thing I forgot the mention, was that data was manually imported directly into the umbraco database (via TSQL scripts) instead of via the umbraco API. I've heard somewhere that this is a bad idea. Is there a chance that doing this has caused some data to get out of sync somewhere, causing various documents and nodes to get republished over and over?
Dominic Pettifer
|
|
 Rank: Umbracoholic
Joined: 7/20/2006 Posts: 1,074 Location: Charleston, West Virginia, United States
|
Ironfoot, It is possible that if you imported using SQL scripts, that the page in question did get screwed up somehow. If it is a single node that is the culprit, you may want to try deleting that node, and adding it back through the interface. Of course, if you have links pointing to the old ID, you will need to update those. If it is more than a few pages causing the problem, then things get a bit more tedious. When deleting the node, you will have to delete it from the trashcan as well, but, it should allow you to clear up some db space.
• 2007/2008 MVP • 2008/2009 MVP • Certified • Licensing • Support • Development • Hosting •
|
|
 Rank: Administration
Joined: 7/25/2006 Posts: 415 Location: vipperoed, denmark
|
I'm trying to clean a massive database for a 3.03 solution. The db is 8gb and the reason for this is a known bug that results in creating a new document version every minute or so. I've tried Thomas H's client tool and it worked like supposed, removing a lot of versions. No node in the cmscontent has more than 3 versions. But still I'm having a massive 8gb database with a table cmspropertyData containing over 30.000.000 records. I'm suspecting that cmspropertydata contains a LOT of records that isn't referenced at all since the version they relate to is long gone. I think the reason Thomas H's client tools can't get to these since the client tools works by traversing existing node's versions. My assumtions:1) If table cmsPropertyData contains data related to notexisting versions then these will not be deleted. 2) Also if table cmsContentVersion contains data related to notexisting versions then these will not be deleted. Some numbers:cmsdocuments: 18.752 records cmscontent: 10.995 records cmscontentversion: 973.692 records cmspropertydata: 31.273.844 records If any of you are sitting with the sql script skills to clean this, then please through it after me. I'll wrap it in a usercontrol and share it immediatly. Second best: If anyone can provide me with the exact database structure then I just might be able to create the sql statements... (I hate it but ...) Please mail me at jesper(at)jesper.com kindly, Jesper webbureau jesper.com doing webdesign / development / umbraco implementations / 2007&2008 MVP
|
|
 Rank: Administration
Joined: 7/25/2006 Posts: 415 Location: vipperoed, denmark
|
I have tested the below statements in a healthy solution. They all returned 0 rows. In the 8GB database, the first gives me loads of rows. -- check cmscontent select * FROM cmscontent WHERE NOT EXISTS (SELECT * FROM cmsDocument WHERE cmsDocument.nodeId = cmsContent.nodeId ) -- check cmscontentversion select * FROM cmscontentversion WHERE NOT EXISTS (SELECT * FROM cmscontent WHERE cmscontent.nodeId = cmscontentversion.ContentId ) -- check the cmspropertydata table select * FROM cmspropertydata WHERE NOT EXISTS (SELECT * FROM cmscontentversion WHERE cmscontentversion.versionId = cmspropertydata.versionId ) I'm that close to changing the above to delete from statements. Any thoughts? Other tables to include? kindly, Jesper webbureau jesper.com doing webdesign / development / umbraco implementations / 2007&2008 MVP
|
|
 Rank: Umbracoholic
Joined: 9/8/2006 Posts: 1,698 Location: KY, USA
|
Hi, Jesper, Scary thought to go in and blast a bunch of data with direct sql calls... but that's probably the only solution at this point. Need I mention... BACKUPS! In fact, any chance you can create a duplicate site with a copy of your database? Then you can run your scripts and see what happens to the site, rollbacks, the ability to create and publish new content, etc. without impacting the live site. When you're satisfied... back up the live site one more time (just to be on the double-safe-side) and do it. Let us know how it works out! cheers, doug.
MVP 2007-2009 - Official Umbraco Trainer for North America - Percipient Studios
|
|
 Rank: Administration
Joined: 7/25/2006 Posts: 415 Location: vipperoed, denmark
|
Hi Doug, It's always good to mention backups. I promise that I work on a duplicate version. I executed the scripts 1 1/2 hour ago and the last statement is still running. Kindly, Jesper webbureau jesper.com doing webdesign / development / umbraco implementations / 2007&2008 MVP
|
|
 Rank: Addict
Joined: 7/19/2006 Posts: 597 Location: Bad Homburg, Germany
|
First of all: There should be no cmsPropertyData or cmsDocument or cmsContent row without a cmsConentVersion I think. Also for deleted nodes (which are in the waste) there are content and versions which are all not published. So you are right with your assumptions because the Client Tools are going from cmsVersions to the Content and the documents... And the question is where do theses rows come from? But I am with Doug to create backups and run the scripts... And If this comes from a bug in 3.0.3 we should integrate these options to the Client Tools Cheers, Thomas
• 2007/2008 MVP • www.thoehler.com • Bad Homburg, Germany
|
|
 Rank: Administration
Joined: 7/25/2006 Posts: 415 Location: vipperoed, denmark
|
Hi Thomas, Fantastic profile picture :-) The database has a very very long history. V2.0 upgraded to 3.0 alpha and so forth. I think that at some point someone has tried to cleanup the database manually. Kindly, J webbureau jesper.com doing webdesign / development / umbraco implementations / 2007&2008 MVP
|
|
 Rank: Addict
Joined: 7/19/2006 Posts: 597 Location: Bad Homburg, Germany
|
jesper wrote:Fantastic profile picture :-) Yap, I love it... jesper wrote:The database has a very very long history. V2.0 upgraded to 3.0 alpha and so forth. I think that at some point someone has tried to cleanup the database manually. Just check the updateDate in the cmsDocument rows which doesn't have a cmsContentVersion. Perhaps you get a glue if this problem still exists? Thomas PS: Why am I not getting any mail from this forum???
• 2007/2008 MVP • www.thoehler.com • Bad Homburg, Germany
|
|
 Rank: Administration
Joined: 7/25/2006 Posts: 415 Location: vipperoed, denmark
|
Stay away from the scripts below as they do not work ! jesper wrote:
-- check cmscontent
select * FROM cmscontent WHERE NOT EXISTS (SELECT * FROM cmsDocument WHERE cmsDocument.nodeId = cmsContent.nodeId )
-- check cmscontentversion
select * FROM cmscontentversion WHERE NOT EXISTS (SELECT * FROM cmscontent WHERE cmscontent.nodeId = cmscontentversion.ContentId )
-- check the cmspropertydata table
select * FROM cmspropertydata WHERE NOT EXISTS (SELECT * FROM cmscontentversion WHERE cmscontentversion.versionId = cmspropertydata.versionId )
webbureau jesper.com doing webdesign / development / umbraco implementations / 2007&2008 MVP
|
|
 Rank: Umbracoholic
Joined: 9/8/2006 Posts: 1,698 Location: KY, USA
|
Hi Jesper, I take it that you didn't have success after all. Here's an idea that came to me late in the evening... what if you took the good published content of the site and pushed it into a new database, rather than trying to clean the massive/existing database? You'd lose your rollback history, but that might be a small price to pay? As always, let us know what you try and how it works out. You are surely not the only one who will eventually walk this path. cheers, doug.
MVP 2007-2009 - Official Umbraco Trainer for North America - Percipient Studios
|
|
 Rank: Administration
Joined: 7/25/2006 Posts: 415 Location: vipperoed, denmark
|
drobar wrote: I take it that you didn't have success after all. .
Nope .. you're quite right there. When using the above statements (changed "select *" to "delete") I found that I deleted way to much data. The result was that the media section was useless. drobar wrote: Here's an idea that came to me late in the evening... what if you took the good published content of the site and pushed it into a new database, rather than trying to clean the massive/existing database? You'd lose your rollback history, but that might be a small price to pay?
I wouldnt mind that approach .. my problem is that I don't know which data not to copy (yet). Please help me validate the following. I believe that in a healthy db the following should return 0 (zero). Do you agree? Want to post your result? select count(*) FROM cmspropertydata WHERE NOT EXISTS (SELECT * FROM cmscontentversion WHERE cmscontentversion.versionId = cmspropertydata.versionId ) If the above is true then I would conclude that the following is valid: (DON'T RUN THIS) delete FROM cmspropertydata WHERE NOT EXISTS (SELECT * FROM cmscontentversion WHERE cmscontentversion.versionId = cmspropertydata.versionId ) I'll continue today and let you know . If ( and I really hope that ) anyone (Ruben, Thomas ?) who has a clue on how to clean up "cmspropertydata" would post anything that could help (and email, since forum notifications are dead). kindly, J webbureau jesper.com doing webdesign / development / umbraco implementations / 2007&2008 MVP
|
|
 Rank: Addict
Joined: 7/19/2006 Posts: 597 Location: Bad Homburg, Germany
|
I have registered via 'watch this topic'. Now I have received five mails for your post, but no mail from the post from doug (really weird). I first have to do some immediate work. After this I will take a look at our test system and will come back to you this morning.
• 2007/2008 MVP • www.thoehler.com • Bad Homburg, Germany
|
|
Rank: Newbie
Joined: 9/5/2008 Posts: 9 Location: Chattanooga, TN
|
Is there a way to disable rollback history or anything?
Also, has anyone had success finding a SQL command to scrub the history? Otherwise I'll post something as soon as I get it.
I'm working with an extremely small DB limit (Bad Discountasp.net) and need to figure out something.
|
|
 Rank: Addict
Joined: 9/27/2007 Posts: 977 Location: Belgium
|
Hi jonhoye, Check out this thread on the same subject: http://forum.umbraco.org/yaf_postst6564_305-Database-Huge--cmsPropertyData--392MB.aspxHope that helps. Regards, /Dirk
level 1 certified - umbraco blog at netaddicts.be - working on an integrated forum4umbraco
|
|
 Rank: Addict
Joined: 7/19/2006 Posts: 597 Location: Bad Homburg, Germany
|
In this db size the ClientTool will run into timeouts. One of the problems are the indexes for this table so that deleting that many entries will run hours and will boost you database log file. I made a script for a friend removing the indexes, deleting the rows and recreating the indexes. I have it at home and will post it tonight... Thomas
• 2007/2008 MVP • www.thoehler.com • Bad Homburg, Germany
|
|
|
Guest |