NewView Design Notes -------------------- Like all good hobbyist apps NewView didn't really get an overall design. Overall it is largely on Windows HTMLHelp, although arguably that is a fairly obvious design these days. Some aspects of it are somewhat non-obvious. NewView.exe is a Speedsoft Sibyl application. The original Sibyl libraries have been significantly bug-fixed, and slightly enhanced for performance etc. They were originally Sibyl Fixpack 3, as I had difficulties with Fixpack 4 (FP4 introduced Linux support, so changed many things and broke some). The new HelpMgr.dll is written using Open Watcom C++ http://openwatcom.org Actually it's straight C, for some reason I did not use C++... =============================================================================== Sibyl Problems -------------- - NewView is near a limit on the total number of Units that can be linked. I've removed unneeded dependencies, we must be down to about 6 remaining... The max I could use had 28 files in the project plus 108 dependencies - As always you can't use optimisation. Most of it probably works but especially the bitmap decompression was going wrong when I accidentally turned optimisation on. - I/O checking is a funny thing. The original Sibyl code turned it off explicitly in system.pas so it was useless. I changed that. - Assembler: Sibyl doesn't recognise some valid 386 instructions and certainly can't do any Pentium/SSE/SSE2/3dNow instructions :) =============================================================================== Strings ------- Unfortunately Sibyl is still stuck in the Delphi 1.0 (?) days when strings were at most 255 bytes. This makes them useless for a lot of the stuff NewView does (such as decoding help topics). Sibyl has AnsiStrings, which are large, referenced counted strings. I think they do work, but unfortunately most of Sibyl's libraries do not use them, and without optimisation there is no efficient way to concatenate onto ansistrings(!). As a result of all this I use pchars (zero terminated strings), or my TAString class. This is incredibly tedious compared to automatically managed strings, but no worse than any other objects in Object Pascal, and these strings are pretty efficient for concatenation, and have no length limits. I made many conversion functions such as AsString to make it easier to combine with other types of strings. =============================================================================== Performance ----------- Currently, TAString has a cunning piece of code that explicitly checks for invalid object references in *every method*. While very cunning, this is kind of ridiculous performance wise, because it sets up exception handling even for addition of a single character. Probably, it should be compiled out for a release mode. Sibyl's compiler is unfortunately completely broken in optimising mode. How a commercial compiler could be released like this is an interesting question. Suffice to say, Sibyl gives you fast-ish compiled code, but it is probably less than half as fast as a properly optimsed C/C++ program, maybe much worse. This is primarily because it does not make much use of registers, instead using stack for nearly everything. In any case, I have done a few key things: - const string parameters wherever possible. This avoids the large overhead of copying the entire string each call - assembly for a few key things Sibyl already has some assembly for certain string ops, I added a few more. More could be used, but there are diminishing returns. Sibyl does have a highly optimised Move (memory copy) so that is used e.g. in TAString - Minimising string operations Some bits of the help topic decoding are a bit longer to avoid so much copying of strings. Of course, algorithms have been optimised carefully, especially in the area of file I/O: I changed the help file loader over from loading the entire file to just loading pieces as needed. On the whole this achieves decent peformance. Unfortunately, the SPCC component library seems somewhat sluggish - not to the level of interpreted code, but definitely much less snappy than compiled C apps. Here, I think optimisation is desperately needed, to speed up all the repititive loops in e.g. loading forms from SCU (resource) data. Short of re-writing the whole thing in C/C++ this cannot be improved much. =============================================================================== Multi-lingual Support --------------------- The goals of the language code is a) Load languages at runtime, reasonably fast b) Be human editable, so that people can contribute without any tools required or even contacting the author c) Be able to update an existing language file with missing items to avoid manual tracking of which strings have been translated d) Minimal specific code b and c are to encourage open source contributions. Resources or msg files would satisfy a, but neither is human editable, and both are difficult to update at runtime. Instead, I implemented the following design Language Files These files are simply text files, each line consisting of: "" is the name of various text strings from the application. The names are defined by the application code. is the string that should be used for this language. Strings can be DBCS for e.g. Japanese. Forms When created, each form registers a method of itself for language events, which looks like this: Procedure OnLanguageEvent( Language: TLanguageFile; const Apply: boolean ); The registration is usually done in OnCreate RegisterForLanguages( OnLanguageEvent ); This callback is called in one of two cases: 1) Apply = true: A new language is being loaded, or 2) Apply = false: A language file is being updated/saved. (more later). In the either case, the form is expected to load (or pretend to load) all it's strings from the language file it is given. Specifically, it should: 1) Call the Language.LoadComponentLanguage( self, Apply ) This loads strings for any static control elements e.g. menu and button captions and hints. Parts of controls that are normally variable e.g listbox items are not loaded. 2) Load any strings to be used in code ("code strings") It should do this by calling Language.LL: Language.LL( Apply, , '', '' ); for each string to be found. LL will look up the given name in the language file, then store the value into the string variable, or if not found. When the form first registers itself, the language callback will be called immediately, so the form can load the current language. Note: Even if the default language is being used (ie. no language file used, ie. English) this still happens so that default values can be loaded for code strings (e.g. prompts). If Apply is false then no strings will actually be loaded but the references (names) will be recorded. Strings Outside of Forms Code strings that are needed but aren't related to a form can be registering a procedure at unit initialization: Initialization RegisterProcForLanguages( OnLanguageEvent ); These language events look the same as for forms but are not methods: Procedure OnLanguageEvent( Language: TLanguageFile; const Apply: boolean ); this procedure should 1) Set the prefix: Language.Prefix := '.'; This is mostly a convenience, allowing codestrings to be grouped together. could be unit name, for instance. NOTE: LoadControlLanguage sets prefix automatically to the form name being loaded, which allows related codestrings to automatically appear under the form name. However, this prefix remains in effect until changed or another form is loaded, so you MUST set it if loading strings outside a form. 2) Call LL for each needed string, as before Loading Languages var Language: TLanguageFile; try Language := TLanguageFile.Create( Filename ); except ... end; ApplyLanguage( Language ); This will call all registered methods and procedures telling them that a new language is to be loaded. Updating Language Files try Language := TLanguageFile.Create( Filename ); UpdateLanguage( Language ); Language.AppendMissingItems; except ... end; Language.Destroy; UpdateLanguage calls all registered methods and procedures, but with Apply set to false so that required items are searched for, but not actually applied to components or code string variables. After UpdateLanguage, the TLanguageFile contains a list of missing items and their default values. AppendMissingItems writes this list to the end of the file called filename. When a new language file is being saved, none of the required items will be found, so all of them will be stored. Note that the strings saved to the file will be the default value for codestring, but the last loaded language strings for forms. Dynamically Loaded Forms In order that language files can include all required strings, all forms must be loaded before updating a language file. This can be semi-automated by registering an "ensure loaded" procedure for each form: procedure EnsureMyFormLoaded; begin if MyForm = nil then MyForm := TMyForm.Create( nil ); end; then registering this procedure in the UNIT initialization: Initialization ... RegisterUpdateProcForLanguages( EnsureMyFormLoaded ); UpdateLanguage will call all these procedures before accessing the language file, ensuring that all forms are loaded. Performance of String Lookups Requirement (a) included the desire to be "reasonably fast". Well, looking up names in a string list is not the fastest thing to do. I've made this use a binary search, so the algorithm is O( N log N ) which is generally considered acceptable. I haven't done any timings though. Testing ... there are currently 443 strings required for NewView ... In practice it seems to take no detectable time to load a language. That's on my Duron 1.1G, 256MB RAM. Multilingual stuff could be further optimised: - Optimise for sequential lookup. Most of the time, the next item will be immediately after the previous item, so long as the language file has not been rearranged. If not, fall back to normal search. This is probably the simplest and most beneficial. - change to pstrings: instead of copying strings from the language file (codestrings only), store pointers to them. This would save time (since not copying so many strings) and memory (since there would not be two copies of all strings). The time is probably not significant because there is not a lot of data. The memory would be nice to save, but a lot of the strings are component properties which are generally allocated dynamically anyway, and we cannot pass a pstring to component string properties. - could free strings from the language file after using them. This would save memory, but might have side effects, and would be no faster. OTOH it would save a lot of memory since even component properties would not be duplicated. - lookup strings hiearchically (e.g. first search for formname in a small list) This would be a huge improvement in speed for random access, but is probably not needed if sequential access is optimised (see above) =============================================================================== Exception Logging ----------------- I added code to the Sibyl libraries to store a callstack into global variables. Performance wise this is outrageously inefficient since it happens on every exception, but then exceptions should not occur in normal processing. This global callstack can be accessed in a method specified in Application.OnException. This code doesn't currently work properly from other threads - ie. the search thread. =============================================================================== Help Manager ------------ Replacing HELPMGR.DLL (c:\os2\dll\helpmgr.dll). HlpMgr2.pas contains a DLL implementation with the same interface. Use DLLRNAME target.exe HELPMGR=HLPMGR2 for testing. Initially, I was going to put all the code in HLPMGR2, loading mainform etc. Problem is, that programs using help manager may have very small stack space, e.g. ICONEDIT.EXE has 24kB. NewView overflows this and crashes. Also, as a principle, I feel uncomfortable loading all my unreliable code into an existing, possibly very robust application's process space, potentially crashing it if anything goes wrong.* The third problem was the need to implement 16-bit entry points for certain older applications, such as PMCHKDsk. Therefore, instead we launch a NewView session (and relaunch if needed) The filename is passed as normal. In addition these parameters are passed: /hm:xxx indicating "helpmanager mode" xxx is the HelpManager window, for NewView to send messages to. /owner:yyy yyy is the application window using helpmanager * HelpMgr is small, tight C code with lots of validation and no GUI code. It's much simpler than NewView itself. Who Owns the Help Window ------------------------ In original helpmgr, the help window is set to be owned by EITHER a) the active top-level window when help requested (by whatever means) OR, if set: b) the "relative window" set by a HM_SET_ACTIVE_WINDOW message. In NewView I've currently disabled the ownership; because it's usually more annoying than helpful. Associations ------------ One or more windows can be associated with a help instance. I suspect the original HelpMgr stores a window's associated help instance somewhere in window words, but this is not documented. Instead, I store a list of associated windows with each help instance. ?? Is this section out of date? =============================================================================== ViewStub -------- ViewStub opens up shared memory and examines a linked list there, that specifies all files that any instance of NewView has open. If it finds the same set of files as is being specified in it's parameters, already open in one newview instance, then it just activates that instance (WinSetFocus; the list contains the main HWND). It then passes any important parameters (e.g. search text) to the existing instance with window messages referring to shared memory, as for help manager. If not found, then it runs a new instance of NewView (WinStartApp), passing all the parameters unchanged. On eCS 1.1 we have the problem that there is already a copy of newview.exe in x:\ecs\bin. So the installer knows about that and a full install replaces that copy. Similarly the help file in x:\ecs\book. =============================================================================== Searching --------- The IPF compiler creates a search index table of all the "words" in the text of help topics. This does not include the titles (displayed in contents) of topics, nor does it include index words. Each entry in the search table is either a series of alpha numeric characters, or a single non-alphanumeric symbol. So for example if the text of a topic includes __OS2__ then the search table will have entries for "_" and "OS2". However it is often useful to be able to search for "words" containing symbols, so the program must look for consecutive strings of search table words. NewView does this by this method: For each search term (TSearchTerm): 1. Seperate into IPF words (symbols and alphanumeric strings) (TSearchTerm.Parts) 2. Search file dictionary for starting words, search for finishing matches. e.g searching for "if(" we would look for words ending in "if" for middle words, search for exact matches (case insensitive). e.g searching for "_os2_" we would look for words matching "os2". There could only be "os2" "Os2" "OS2" and "oS2". for end words, search for beginning matches. e.g searching for "if(" we would look for words starting with "(" 3. Matched topics are where there is one or more matches for all words (This is obtained by looking up matching topics for each word and ANDing the results together.) AND a search thru the actual topic text shows one or more occurrences of a matching sequence. e.g. we don't want _os2_ to match a topic that just contains an underscore somewhere and "os2" elsewhere. This is complicated by the fact that each word can match more than 1 dictionary word. Search logic is: topic match if: ( ( matches all required terms ) or ( there are no required terms) ) and ( ( matches any optional term ) or ( there are no optional terms ) ) and ( doesn't match any excluded term ) or ( there are no excluded terms ) Algorithimically this can be done sequentially: if no terms nothing matches, abort if ( any optional terms) match each term, or results else all topics match by default if ( any required terms ) match each term, and results if ( any excluded terms ) match each term, and ( not results ) ... this has the benefit of not requiring keeping 2 flags per topic but who cares... that's not a big cost Keep an "allowed" (=not excldued) flag a "matched" flag set allowed to true for all set matched to false for all if optional term or results -> matched flags if required term and results -> allowed flags or results -> matched flags if excluded term and not results -> allowed flags In practice, for speed this is implemented with arrays of integers acting as flags, one for each topic. This is very susceptible to optimisation, even eg. SSE! but I have not done any since it seems pretty fast as is. Highlighting matching search terms. To handle multi-part search terms, the results are stored as something called a "word sequence". Each step of the sequence is a mask for the entire dictionary, indicating which words can match at that step. The steps are arrays and the sequence is a TList. To handle multi-term searches, we then store each of these word sequences in another list. Finally, because each file has it's own dictionary and search table, we store a final list (AllFilesWordSequences) of the results for each file. A list of results for open files For each file, a list of word sequences For each term, a word sequence For each term part, a mask of matching words When displaying a topic, we first select a list of word sequences, based on the file the topic is from. Then, as we go through the topic, at each word we look to see if we have started one of the word sequences, by looking at each sequence in turn and seeing if the word is allowed in the first step. If we find a match for the start of one of the sequences, we look ahead to see if it is a complete match. If it is, we start highlighting, then continue decoding and counting down until the sequence is finished. Startup Topic Search -------------------- When running View view does some arbitrary, undocumented search. Seems to be matching, in order: - topic title starts with - index entry starts with - topic title contains - index contains Test case - view cmdref topic match? topic shown ----------------------------------------------------------------- dir y DIR - Displays Files in a Directory device y DEVICE (OPTICAL.DMD) - Installs Optical Device Driver optical y DEVICE (OPTICAL.DMD) - Installs Optical Device Driver drivers y BASEDEV - Installs Base Device Drivers syntax y Syntax and Parameters [about TRSPOOL] saves y BACKUP - Saves Files determine y CHKDSK - Checks File System Structure deter y Problem Determination determines y BUFFERS - Determines Number of Disk Buffers divided y "topic not found" [note: divided occurs in the text of the first topic] expiration y NET ACCOUNTS - Administers Network Accounts expir y NET ACCOUNTS - Administers Network Accounts Help On Help ============ 1. Standalone; invoke help old helpmgr: shows the topic itself new helpmgr: first looks for a help window which has marked itself as showing newview help if so activates that else passes [OWNHELP] to NV as filename 3. App help NOT loaded; invoke help on help from app 3. App help loaded; invoke help on help from app 4. App help loaded, help on help loaded; invoke help on help from app 5. App help loaded, invoke help from NV 6. App help loaded, help on help loaded, invoke help from NV