[Resolved] Performance Issues: Media Items Options
agrath
Posted: Thursday, August 28, 2008 1:13:31 AM
Rank: Newbie

Joined: 12/7/2007
Posts: 8
Hi Guys

I'm Gareth, Paul Foley's business partner (silverbug on here).
One of our umbraco installations has been acting up, so much as that the media section takes almost a minute to load up.
The first thing I did was check the CPU on the server, recycle the app pool etc - nothing I tried
fixed it.
I thought what the heck, and fired up SQL Profiler, and started observing the SQL:BatchStarting event.
For those of you who haven't used it, SQL Profiler lets you see the database queries being submitted to the database server, their text, SPID and various other great bits of data.
Usually you use it when you want to see what's happening behind the scenes or if you want to optimise an application by reducing the number of database queries etc.
It doesn't come as part of the workgroup sql installation, or express edition but you can get it in the developer, standard or enterprise editions.

When I click the blue media icon in the bottom of the umbraco admin menu, I see approximately 20,000 queries hit my database server.
Another example, is loading the homepage of the site, I see 34 queries hit the database server.
This system has approximately 2000 media items of various types, but on average, they have 4+ properties.
From looking at the profiler output, I see this for loading the media list:

First, _for each element_ in the tree (3 queries each):

--Find the id based on the guid
Select id from umbracoNode where uniqueID = 'cc07b313-0843-4aa8-bbda-871c8da728c8'
--get some info from the UmbracoNode table
Select createDate, trashed, parentId, nodeObjectType, nodeUser, level, path, sortOrder,
uniqueId, text from UmbracoNode where id = 1032
--Get some more info from the cmsContentType table
Select Alias,icon,thumbnail,description from cmsContentType where nodeid=1032

This could easily be optimised into a single query with a single inner join.

Then it gets scary:
For each media item in each folder of the tree:

Select createDate, trashed, parentId, nodeObjectType, nodeUser, level, path, sortOrder, uniqueId, text from UmbracoNode where id = 1032
Select Alias,icon,thumbnail,description from cmsContentType where nodeid=1032
Select top 1 VersionId from cmsContentVersion where contentID = 1377 order by id desc
Select dataNvarchar from cmsPropertyData where id = 8472
Select dataNvarchar from cmsPropertyData where id = 8473
Select dataNvarchar from cmsPropertyData where id = 8474
Select dataNvarchar from cmsPropertyData where id = 8475
Select dataNvarchar from cmsPropertyData where id = 8476

Okay, so here's the root of the problem - and as far as i can see, it's a problem thoughout, we've only just noticed it in the media items due to the sheer number of them and the number of properites on each. I think you'll agree that these numbers multiply out real fast - a good example is the homepage of 34 queries. 1 Visitor is 34 queries, 2 is 68, 4 is 136...

For each media item, first the base data for the node is retrieved, such as it's parent, the node object type etc.
Then the icon details, description etc, presumably for the UI. (again a join could be used)
Then the most recent version guid for what I think is this node (though the data gets a little jumbled and out of order in profiler and it's hard to tell)
Then _for each property_ of the media type, the data is retreived from cmsPropertyData.

Okay, so a lot of complaining is going on here, I have read parts of the umbraco source, and i'm a fairly competent programmer with a fair bit of experience in application development - it seems to me that the core of the problem is that the database abstraction layer / orm / whatever you want to call it, is a little too broken down.

What this means, is that each little bit retreives it's own data when it needs it and returns this. While this is fine for a small scale, it won't scale well.
There seemed to be small things, like when you have two constructors for getting an item by either a guid or an id, the guid one will retreive the id and then call the id one - rather than abstracting the code that reads the returned results and feeding it two different data sources (one selected by guid and the other by id) but these will compound when applied en-masse, for example loading the media tree.
The real problem with loading the tree (I can deal with 3 queries per media item, though this is still going to be several thousand queries) is the property system.
As I understand it, looking at the output of the data, you're using a structure which i've used in a few systems and that's storing individual properties vertically rather than horizontally - this obviously has huge customisation benefits for the adminstration - one of the main features of umbraco of course is it's property system.
The problem is in the implementation of retrieving the data.

From looking at the output, I can't see where the actual property ids for a given media type are stored and then used, I assume they're retrieved much earlier in the batch and cached.

I can see several solutions to the problem.

1) Use a UDF to retrieve each property and rotate the data -
create function dbo.getVarcharProperty (@nodeId as int)
returns varchar(255)
begin
declare @dataNvarchar varchar(255)
select
@dataNvarchar = dataNvarchar
from
cmsPropertyData
where
id = @nodeId
return @dataNvarchar
end

Select dbo.getVarcharProperty(8472), dbo.getVarcharProperty(8473)

This will work and slightly optimise but still wont be very efficent since the UDF performs a seperate select anyway

2) Build a temporary table (rotated) and populate it using cursors or similar in a stored procedure - as far as I can see umbraco doesn't use stored procs or udfs, so this is probably not an option.

3) Select all the properties as a single batch to a cache and use a sqlcachedependency - while it's okay, it's sql 2005 only and from experience quite difficult to get working - this wouldn't work for an application such as umbraco where the deployment isn't strictly controlled.

4) Be sneaky and do some of this in the application: an example

Presently we do this to get the properties:
Select dataNvarchar from cmsPropertyData where id = 8472
Select dataNvarchar from cmsPropertyData where id = 8473
Select dataNvarchar from cmsPropertyData where id = 8474
Select dataNvarchar from cmsPropertyData where id = 8475

What if this was:
Select id, dataNvarchar, dataNText, dataInt, dataDate from cmsPropertyData where id in (8472,8473,8474,8475)
(or better yet where propertyTypeId = ??
This data would then come back to the application (you're getting all the properties anyway, you obviously know the property ids, so then iterate, once, over that result set and perform your class population.
//pseudo code, haven't checked source
Enum propertyTypeMapping
{
dataNvarchar = 1, dataNText = 2, dataInt = 3, dataDate = 4
}
Media m = new Media();
int[] cmsNodePropertyIds = new int[] { 8472, 8473,8474,8475 };
string sql = string.format("Select id, propertyTypeId, dataNvarchar, dataNText, dataInt, dataDate from cmsPropertyData where id in ({0}), String.Join(",",cmsNodePropertyIds)
DataTable props = getDataTable(sql)
foreach(DataRow row in props.Rows)
{
int propertyId = (int)row["id"]
Property p = m.properties.find(delegate(Property prop) { return prop.id == propertyId });
if(p!=null)
p.setValue(row[Enum.GetName(typeof(propertyTypeMapping), (int)row["propertyTypeId"])]);
}


Anyway, if someone else has a better solution than changing the code (that isn't removing properties or decreasing the number of media items) let me know - these are just ideas based on experience.
I don't want to start digging into the umbraco source and recompiling since this is clearly an issue that needs to be resolved in the main code base - it's a scalability nightmare waiting to happen.

Happy to discuss over chat mediums or just post here, i'll check it a few times today, i'm available on googletalk: agrath@gmail.com, and msn: agrath@orcon.net.nz

Gareth Evans
Sniper Systems Ltd
hartvig
Posted: Thursday, August 28, 2008 4:59:13 AM

Rank: Addict

Joined: 3/17/2008
Posts: 982
Location: Nyborg, Denmark
From my point of view it mostly seems like you're using umbraco in a wrong way. Normally, you wouldn't have any request to the db from the homepage due to the way umbraco caches published data.

In the same way optimizing the querying of data hasn't been a top priority as they're only accessed when editing data from the back office. But nevertheless it's obvious that something could be done and it would be great to collaborate (however direct sql optimizations like UDFs or temp. tables are not an option as we'd like to keep the logic in the business layer and not in the db to ensure db independence).

But with 20k queries to the db server just when activating the media library, you're either using the media library in a very special way (one way that it wasn't designed for) or you have a *very* corrupt installation. I've never heard that people should have those problems (and there's far bigger installations out there).

Cheers,
Niels...


Jeeeez, did I really start this :-)
agrath
Posted: Friday, August 29, 2008 12:13:04 AM
Rank: Newbie

Joined: 12/7/2007
Posts: 8
Hi Niels,
Sorry it took me a while to get back to you.

I'm pretty sure we're using umbraco correctly. The only thing we've done differently on this system was the media items were created via a .net script, but we used the umbraco apis to do so.

Regarding collaboration, i'd be happy to chat. I agree about the db independence - especially if you want to support other dbms' like mysql etc. My suggestions were just thinking out loud really.

I wouldn't say we're using the media library in any special way, most of those items are PDF files, which are linked to from the site. The files themselves have their own media type but I don't think that's likely to be the cause.

I guess if it's a corrupt installation, we can rebuild it from scratch but i'd rather get to the bottom of this - I don't see what would have corrupted it though.

I guess the question is, where to from here? What steps do we take to resolve the issue as we can't give the site to the client if he goes to the media library and it cripples our server.

Gareth
hartvig
Posted: Friday, August 29, 2008 2:02:47 AM

Rank: Addict

Joined: 3/17/2008
Posts: 982
Location: Nyborg, Denmark
I'm mostly confused that you have 20.000 db queries by simply opening the media library. Are all the media stored in the root folder?

Jeeeez, did I really start this :-)
agrath
Posted: Friday, August 29, 2008 2:17:33 AM
Rank: Newbie

Joined: 12/7/2007
Posts: 8
Hi Niels,

Nope they're all broken into folders.
I've done a bit more investigation and profiled a few other sites.
Out of the 3 i've profiled (all of which we've set up), 2 have this problem.
The main issue seems to be selection of the property values from cmsProperty when the tree is getting loaded - the other site isn't as bad, only 2500 queries, but still getting up there.

The only thing the 2 sites have in common is the way the items were batch uploaded, but i've checked and rechecked the upload code and it's just creating the items using:

new umbraco.cms.businesslogic.media.Media(parentMediaID);
and then
umbraco.cms.businesslogic.property.Property extnProperty = media.getProperty("umbracoExtension");
if (extnProperty != null) extnProperty.Value = extension;

I'll flick you an email using the contact form on the umbraco site with some database credentials and login details for the umbraco admin if you'd like to have a look for yourself.

We're at a bit of a loss as to what to do next. I can be reached on msn or googletalk on the addresses above to speed this up, if you have time
hartvig
Posted: Friday, August 29, 2008 2:42:16 AM

Rank: Addict

Joined: 3/17/2008
Posts: 982
Location: Nyborg, Denmark
Actually, I think you have a valid point. The media aren't loaded optimized for the tree like the documents are, which means that at least all root media + children are loaded :-/

I'll make sure that get fixed for Umbraco 4.

Jeeeez, did I really start this :-)
hartvig
Posted: Friday, August 29, 2008 2:45:59 AM

Rank: Addict

Joined: 3/17/2008
Posts: 982
Location: Nyborg, Denmark
(I'm still wondering about your home page, getting 34 hits on the db server. Are you calling GetMedia helper methods in the XSLT or using the api in anyway to display content? Normally umbraco cache all published content for the nodefactory+xslt as well as all meta data for templates, etc. (so that should only cause db req. the first time the app loads), so it shouldn't really query the db unless you call the media/member helper methods).

Jeeeez, did I really start this :-)
agrath
Posted: Friday, August 29, 2008 2:48:18 AM
Rank: Newbie

Joined: 12/7/2007
Posts: 8
Yes, we use the GetMedia helper methods from XSLT, so that explains the database queries on the homepage, but the queries seem to occur every time.

Gareth
agrath
Posted: Friday, August 29, 2008 5:35:23 AM
Rank: Newbie

Joined: 12/7/2007
Posts: 8
Niels,

I had a user control that was on the dashboard which had a line that should have been commented.
This line made a call to a function which, using the umbraco apis, got all of the media items and all their properties.

Right now, i'm extremely embarrassed that I even made this mistake, and the amount of time it took before I realised where the problem was - from my fiddler output, I could see a page request to /umbraco/TreeInit.aspx?app=media which was taking 33 seconds, but in actual fact, it was a request to /umbraco/dashboard.aspx?app=media which was loading my user control which was taking 33 seconds and this was due to the GetMedias call. Since the call for TreeInit was made at the same time, this also took 33 seconds (as it waited for the other page i guess, or database resources)

A sidenote here, the code i'm using for GetMedias was based on the Client Tools for Umbraco .net application from here: http://www.codeplex.com/ClientTools4Umbraco and this is likely to suffer from the same database hammering.

There is no problem with the umbraco tree code, or the loading of the media nodes from the tree.

There is however, still some performance optimisation that could be done regarding the loading of the properties and possibly media items/types in general, however this is not really a major issue.

Quote:

Actually, I think you have a valid point. The media aren't loaded optimized for the tree like the documents are, which means that at least all root media + children are loaded :-/


This doesnt seem to be the case, it seems to me that this code is fine, and is already optimised.

The actual problem here was the call to new Media(id) causing multiple queries (i count 3 + 1 for properties the first time,cached + 1 for each property the first time,cached) + 1 every time for the data of each property) to be executed, which of course, when called in a loop, causes the behaviour I was seeing on the database.
It should be fine, provided only a single media item is loaded at once.

Thank god it's friday.

Gareth
Users browsing this topic
Guest


You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.