Web Q&A: Data Shredding, Updating the Status Bar, and More

Article
10/22/2019

Web Q&A

Data Shredding, Updating the Status Bar, and More

Edited by Nancy Michell

QLast month you mentioned data shredding in a question about converting from relational data formats to XML. What exactly is data shredding and what do I need to know about it?

AHere shredding refers to extracting relevant data out of hierarchical XML and putting it into flat relational storage.

Shredding is an open problem with many different solutions, but in general, the more hierarchical the XML is (that is, the more deeply nested the data), the more joins are required to turn the flat, shredded data back into XML. Joins are expensive, so the more deeply nested the original XML, the more expensive the data's reconstitution becomes. XML also has other context information, like in-scope namespaces, that accumulates on the stack during parsing and therefore also contributes to performance slowdown.

For example, given very simple, flat XML like this

<x y="a" z="b c"/>

you could choose several different relational layouts. Here are three examples. A single X table with y and z columns:

X    y     z
     a     b c

Separate X, Y, and Z tables joined together:

X    ykey zkey
      0    0
 
Y    xkey value
      0    a
 
Z    xkey value
      0    b
      0    c

Separate tables for attributes, elements, and names, and a pivot table to describe which attributes go with which elements:

Elements    pk  namekey
             0   0
 
Attributes  pk  namekey     value
             0   1           a
             1   2           b c
 
Names       pk  local-name  namespace-uri
             0   x           NULL
             1   y           NULL
             2   z           NULL
 
AttributeList  elementkey  attributekey
                0           0
                0           1

The problem with the first two approaches is that you have to know the structure of the XML beforehand (to describe the tables or columns that are used). The third approach is much more general, but requires many joins even for simple XML data. And all three approaches invariably require joins the minute that the XML becomes nested (when the x element can contain other elements); you'll generally need at least one join per level of depth in the XML. The XML-to-relational mapping problem is something that could fill a whole book. Microsoft, and lots of other companies are working on this problem.

The previous examples illustrate the variety of solutions. They also show that no matter what method you choose, the shredded model becomes complex and expensive to reconstruct. Many companies use hybrid approaches in which they shred the data to query it efficiently (using Microsoft® SQL Server™) but never reconstruct it; instead, they also store a copy of the original XML. This solution introduces its own problems (like keeping the two copies in sync), but performs better than pure XML or pure relational data alone.

QMy program, written in Visual Basic® .NET, does file copying inside a loop. For each file copied, I update the status bar text. This works fine as long as my form has the focus. The minute a user brings another window into focus, the text no longer updates. I've tried using me.update and mystatusbar.update without any luck. Even if I return focus to the form, the updating continues to not work. What's the secret? Unfortunately, these don't work:

Application.DoEvents()
Statusbar.Update()
Statusbar.Refresh()

Application.DoEvents()
Statusbar.Update()
Statusbar.Refresh()

AAll you really need is a call to Application.DoEvents right after you update the status bar. This can really bog down the code though, so you might prefer to set up a Timer control that monitors a custom property's value. If it is anything other than "Ready" then the code displays and updates the status bar. This allows it to be multithreaded and not slow down other running programs. Here's a quick and dirty, but slow-running, example:

Statusbar.value = myvalue
Application.doevents()

And the following is an example using a timer monitor:

Private Sub tmStatusMonitor_Tick(ByVal sender As System.Object, _
    ByVal e As System.EventArgs) Handles tmStatusMonitor.Tick
    If strStatus <> "Ready" Then 'strStatus is the custom property 
         sbStatus.Text = strStatus
         Application.Doevents()
    End If
End Sub

In the main code, the strStatus string is updated to state whatever text is appropriate and then set back to "Ready" when finished.

Statusbar.value = myvalue
Application.doevents()

And the following is an example using a timer monitor:

Private Sub tmStatusMonitor_Tick(ByVal sender As System.Object, _
    ByVal e As System.EventArgs) Handles tmStatusMonitor.Tick
    If strStatus <> "Ready" Then 'strStatus is the custom property 
         sbStatus.Text = strStatus
         Application.Doevents()
    End If
End Sub

In the main code, the strStatus string is updated to state whatever text is appropriate and then set back to "Ready" when finished.

QI need some help using resources from within Visual Studio® .NET. I understand the resource story from the command line, but I have had no luck from within the IDE.

AFirst of all, have a look at the SDK tutorial installed in the Framework SDK's tutorials\ResourcesAndLocalization folder. The key points you need to know are that you can easily localize a Windows® Form by selecting the form in Visual Studio .NET, then changing the localize property to True. When you do, Visual Studio .NET automatically pushes all the localizable properties of your form (including the controls on it) into a .resx file and adds the necessary code to your form to load those dynamically. These are your default resources. If you then change the language property of your form, Visual Studio .NET makes a new .resx file for that culture, and any changes you make for that language will be written to it. If you change language back to default, you will notice that the form reverts to its original state. You can add as many cultures as you like; just remember to save your work regularly. There will be one .resx file per language.

Note that you need to add a line of code for the other languages to get picked up:

Thread.CurrentThread.CurrentUICulture =   
   Thread.CurrentThread.CurrentCulture

Add this line just before the ResourceManager object is created in the Windows Forms-generated code section. The construction of the ResourceManager was added by the designer when you set the Localized property of the form to True.

This will ensure that if you change your language in Windows, it will affect the resources available inside the project. You shouldn't add this line of code for MUI versions of Windows. (MUI is the Windows Multilingual User Interface, which supports user-activated language switching of the UI. The system can have resources for several languages at once and the user can switch among languages from the Control Panel.) Once you have added a valid .resx file to a project, you can double-click the file in the project window (if you are viewing all files) to edit the entries. This is how you can add new values or alter existing ones.

QI have a question about JScript® performance when doing string concatenation. I realize that, generally speaking, doing a lot of concatenation with large strings is a performance killer. However, I'm concerned about seeing different behavior with different browsers. My current test is tracing the problem to string-handling code. Are there better ways to handle large string concatenation in JScript?

AThe code you sent (see Figure 1) works much slower than the modified script in Figure 2. When concatenating normal length strings, performance is not great but it's reasonable. ASP pages tend to build up many long strings using the += operator (where "long" means tens of kilobytes, not ridiculously long, as in the example.)

Figure 2 Modified Script

<HTML>
<BODY>
<SCRIPT LANGUAGE="JSCRIPT">
<!--
function runtest()
{
var oldTime = new Date();
var tempStr = new String();
var tempArr = new Array(10000);

alert("Starting test");

for(var i=0; i<10000;++i)
{
    tempArr[i] = "This is big String that we are using for 
                  manipulation.";
} 

tempStr = tempArr.join("");

var newTime = new Date(); 
alert("string size="+tempStr.length+ " time taken = "+ (newTime-
      oldTime));
}
-->
</SCRIPT>
<input type="button" value="start test" onclick="runtest()">
</BODY>
</HTML>

Figure 1 Original String Handling Code

<BODY>
<SCRIPT LANGUAGE="JSCRIPT">
<!--
function runtest()
{
var oldTime = new Date();
var tempStr = new String();

alert("Starting test");

for(var i=0; i<10000;++i)
{
    tempStr += "This is big String that we are using for manipulation."; 
} 

var newTime = new Date(); 

alert("string size="+tempStr.length+ " time taken = "+ (newTime-
      oldTime));
}
-->
</SCRIPT>
<input type="button" value="start test" onclick="runtest()">
</BODY>
</HTML>

It is often true that algorithms, which in theory ought to greatly speed up string concatenation (through lazy concatenation), in fact make performance much worse; the cost of maintaining the extra memory for the lazy concatenation trees eat into the savings.

Rather than attempting to eliminate contention by eliminating allocations, you could make contention less likely by implementing a better heap algorithm for small allocations. This will lead to a huge increase in performance.

That said, doing lots of += on strings is considered bad programming. It is an O(n2)operation—every time you double the number of strings concatenated, you quadruple the runtime. It does not matter how fast the heap is—the problem here is one of memory moves, rather than contention.

No one should use a language like JScript to do manipulation of multi-megabyte strings. The fact that JScript performs poorly when asked to manipulate these strings is unfortunate, but the only people who will notice are benchmark writers.

You should try not to waste time worrying about the performance of unrealistic programs. The time you spend tracking down and optimizing performance problems ought to be focused on making things faster for real-world programs.

QIf you have to concatenate large strings in JScript, what would be the best approach to take?

AWell, the join method on arrays gets around the O(n2) nature of the naïve string concatenation algorithm.

As for other techniques, suppose you want to build up the string "XXXXXXXX". The naïve algorithm shown here would build up:

X + X = 
XX + X = 
XXX + X =
XXXX + X =
XXXXX + X =
XXXXXX + X =
XXXXXXX + X =
XXXXXXXX

Notice that this allocates 1 + 2 + 3 + ... + 8 = 36 characters to build an eight-character string, or (n × n + n) / 2 characters for an n character final result. A better method would be

X + X =
XX + XX = 
XXXX + XXXX = 
XXXXXXXX

which allocates 2 × n - 1 characters for an n-character final result. If you try this program

tempStr = "This is big String that we are using for manipulation."; 
for(var i=0; i<18;++i)
{
    tempStr += tempstr;
}

you'll see that it builds up a much longer string in much less time. The key is to eliminate as many memory allocations as possible. Concatenating large strings is really not much more expensive than concatenating small strings—but doing 10,000 allocations of large strings will be extremely costly.

QI'm thinking about adding tooltips to all the links on my Web site's navigation menu. My thought is that this will help users get a better understanding of what they're about to click on without having to go to the actual page. This seems like a good idea to me, but I don't see this too often on other sites. It doesn't take DHTML or anything fancy to do this; in fact, it's simply an attribute in the anchor tag. Why aren't more Web developers using it?

AYou should ask yourself if you're convinced that people aren't understanding the links. If you're not convinced, you could be trying to solve a problem that doesn't exist.

Assuming you're certain that there is a problem, there are a few reasons not to solve it through tooltips. First, the links themselves should be self-sufficient and make sense in the context of your Web site. There shouldn't be a need for any description at all. Think of stop or yield signs on the street—if you have to read a sentence to understand what the sign is telling you to do, the sign has failed in its purpose. With menus of any kind, you're aiming for a similar level of direct communication. If your Web site needs a user's manual, then you have a problem.

Second, tooltips do not stay visible for very long—about six seconds. For many people, this isn't quite enough time to read a paragraph of text, assuming they were truly interested in doing so. Because the tooltip is auto-generated, it appears on every mouseover. If you move your mouse across the list, you'll see tooltips appearing and disappearing for each one, and this makes for lots of visual noise. Compared to just letting users click on the link to see what it actually is, the overhead of trying to explain everything beforehand can create more problems than it solves. By the way, there are two attributes that you can use with the anchor (<A>) tag: TITLE, which has spotty support across browsers, and ALT, which is intended to provide information for screen readers, should you go this route. Typically, ALT makes sense only for images.

QIn the December 2002 Web Q&A, you address a reader's problem concerning catching a C# exception in JScript. The reader did not want to have to change the JScript code, but the only methods you were able to suggest to access the C# exception properties worked "on the catch side, not on the throw side."

Wouldn't it be possible in the C# code to catch the C# exception, create a new instance of a JScriptException, assign properties appropriately to reflect the relevant information from the C# exception, and then throw the JScriptException? Or would the newly thrown JScriptException then be wrapped in a further instance of JScriptException?

AThere isn't a JScriptException class in .NET. This "class" is specific to the JScript .NET language and is not accessible by any C# component. With ActiveX® components you can implement IErrorInfo and JScript will use this information to raise the exception. Accessing the InnerException property is the only way to get to the original exception.

Got a question? Send questions and comments to WebQA@microsoft.com.

Thanks to the following Microsoft developers for their technical expertise: Mike Blaszczak, Scott Berkun, Michael Brundage, Mark Davis, Seth Demsey, Kit George, Francois Liger, Eric Lippert, Sanjay Manghnani, Gray McDonald, Bill Metters, Eric Smith, and Maxim Stepin

Additional resources