Monday, May 4, 2009

My code can't be migrated to Unicode!

Do you have any snippets of unbelievably convoluted code that you wrote and/or inherited and have no idea of how to possibly migrate it to Delphi 2009 and Unicode?

I believe that 99% of all of your Delphi code migrates to Unicode simply by hitting the build button in Delphi 2009, and that the rest is handled by hints, warnings and errors.

If you disagree, then I want to hear all about it. Send me a code snippet that you're banging your head against, or simply have a question about and we'll see what happens. :)

74 comments:

  1. Of course there would never be any problems with just a snippet... You can easily rewrite the entire snippet! Can I send you a large project that is a mixture of C++ and Delphi and see how that goes? :P... Sorry, I'm still a little bitter with how the transition went down. I'm hoping 32 to 64 bit goes better (i.e. we can continue to build both). That said, I really do appreciate having Unicode for my new projects :)

    ReplyDelete
  2. ..same thing here, we have a lot of D2007 code interfacing/linking with COBOL dLLs. All Char, String, PChar, Array of Char etc. must be rewritten to ensure correct size/type.
    bye.

    ReplyDelete
  3. Every customer I've talked to says that it was way easier than they thought it would be.

    Nick

    ReplyDelete
  4. Nick,

    it took over 3 months of hard work, going over the about 400k LOC in NexusDB line by line to find all the little gotchas and problems that the compiler did NOT find and mark with a hint, warning or error.

    While there is no question that D2009 is an fantastic release (the move to Unicode was way overdue, the IDE is a lot more stable then any since D7 and the new compiler and runtime features are great), the effort required to port D2007 or older code to D2009, especially if it communicates over the network and has to read/write files in a way that keeps compatibility with existing files and allows communication with clients compiled under D2007 or older is huge.

    But even simple code that calls Windows API functions is strongly affected in ways that the compiler doesn't recognize at all.

    Take for example:

    function SomeAPIWrapper: string;
    var
    Buffer: array[0..63] of Char;
    begin
    SomeAPI(Buffer, SizeOf(Buffer));
    Result := Buffer;
    end;

    Char is now a WideChar instead of AnsiChar, and SomeAPI now maps to SomeAPIW instead of SomeAPIA. Looks fine, right?

    Wrong. SomeAPI expects the length in characters, not in bytes. The above code doesn't trigger any compiler hints, warnings or errors, but now the API function thinks it has space for 128 characters of 2 bytes each and it can potentially write over the end of the actual buffer which is only 64 characters in size.

    This is just one of many types of potential problems that really require going over the source line by line and fully understanding what's going on if you want to port an application from D2007 to D2009.

    ReplyDelete
  5. Thorsten, with respect, this is precisely why I don't have naked constants in my code. Program defensively!

    ReplyDelete
  6. @DanB: A project is a collection of snippets. Just kidding! :)
    But in all seriousness, if you have any feedback on what your experience was like, as in "this particular idiom had to be rewritten this way" that would be useful.

    @Batman: Is this something you've already tried? Are there cases that Delphi 2009 doesn't give you warnings, hints or errors for? If you have unit tests for all these entry points it should be a piece of cake to find any issues. ;)

    @Thorsten: 3rd parties will always be very special (in all aspects of the word). The SizeOf thing is interesting. I wonder why a hint or warning isn't thrown there. It should be easy to say "hey, use Length here"... I would love to see an article about your experience covering the gotchas you ran in to.

    ReplyDelete
  7. @Mark Andrews:
    If you look closely you will see that the constant in the variable declaration _is not the issue_ at all.

    The problem stems from the fact that is was a very common pattern in the past (pre-D2009) to use SizeOf for getting the size of a buffer (specifically to _avoid_ using a fixed constant in multiple places). This always worked perfectly because the size in chars (expected by the API function) matched the size in bytes (returned by SizeOf). It doesn't work in D2009 because the size in bytes and chars now differs (so using Length() would be the correct solution in this case as Anders said).

    @Anders Ohlsson:

    The problem is that the compiler has no idea if that particular function wants the size in chars or bytes.

    Suppose you are doing something like:

    Move(Buffer[0], p^, SizeOf(Buffer));

    _here_ the SizeOf is correct. But how would the compiler know the difference between the 2 cases?

    So if you would decide to have the compiler emit a hint, you would end up with false positives one way or another which are going to cause just as much problems for people as not giving any hint at all.

    Also, this example is such scratching the tip of the iceberg. There are many many more cases and patterns the require very close attention, but most of them are more specialized then this general example I've given.

    ReplyDelete
  8. @Thorsten: Great 2nd example, and you're right, the compiler wouldn't know without knowing the intent of the code, which of course is impossible. I would love to see more examples!

    ReplyDelete
  9. I find your 99% statement to be quite optimistic.

    There is a mix of snippet corner cases (like Thorsten's) to be dealt with, and there is the sheer size issue (like DanB). We have more than 1 million LOC there, it took months to get them to compile without warnings, it'll take months still before all the changes have been tested & stressed enough to be called "stable".

    I guess that 99% figure may be true for code that never dealt with more than ASCII before Delphi 2009, and doesn't deal with more than ASCII outside of demos after D2009, as IME that 1% can be awfully expensive to fully sort out (but we needed Unicode, so we don't complain).

    An issue we've been battling at some point was with the compiler trying to be smart about figuring out the type of constant strings and characters on its own (and failing miserably at it). This lead to some constants being replaced by raw byte arrays and UTF8 constants entered as hex strings just to be sure the compiler wouldn't turn the Euro symbol f.i. into something else, depending on the machine the code was compiled on (even if source was an Unicode file). Yes, they were QC'ed.

    ReplyDelete
  10. @Eric: Apart from the bugs you QC'd (thanks), how could the IDE be improved to help your migration experience?

    ReplyDelete
  11. I have the task of conversion to 2009 still in front of me.

    A big help would be a collection of hints/gotchas/Best practices when you convert Delphi projects to Unicode.

    ReplyDelete
  12. @Christian: Does this help?

    Marco's white paper - http://edn.embarcadero.com/article/38980
    Nick's series of articles - http://blogs.embarcadero.com/nickhodges/2008/11/20/39149

    ReplyDelete
  13. Personally, I found the biggest issue was the tens of thousands of lines of third party code where people freely mixed PAnsiChar and PChar. Of course, that was poor style before and is dead wrong now. I just gave up porting until we were able to get updates for all the third party stuff.

    We have a lot of C++ code that we interface with (similar to Batman's scenario I suppose) but all of that was in a specific set of files in a specific package, so most of the work was done with simple Find-Replace. And Find in Files and strong coffee is all you'd need for the SizeOf problem before.

    I've had bigger issues migrating to Delphi 4, and Delphi 6's platform hints are still switched off, so I honestly think the migration of our several million lines went quite well.

    One thing that bugged me endlessly: replacing set compares with CharInSet. It seems like a change that is always done in the same way under the same circumstances. Compiler magic anyone?

    ReplyDelete
  14. Has anyone tried converting the TurboPower Orpheus libs, esp. the text editor?

    ReplyDelete
  15. @Ken: Some TurboPower update work was done here:
    http://www.songbeamer.com/delphi/

    Paul Breneman keeps some TurboPower news here:
    http://www.turbocontrol.com/TPSupport.htm

    @Anders: While we're on the topic, the TurboPower libraries would make a great migration case study:
    http://qc.embarcadero.com/wc/qcmain.aspx?d=71972

    ReplyDelete
  16. I'm pretty sure that there are many projects which will not be convertible to Unicode seamlessly.
    I'm still waiting to go for D2009 for one of my main projects and can't because one of the components is still not available for D2009.

    The author of this component still seems to have big troubles to convert his product for D2009 because of some Unicode annoyances. I'm talking about Contextsoft's DB Designer and DB Extension.
    Two really excellent products I really like but it seems there are some problems in getting a D2009 compatible version out. My project depends on DB Extension and I simply can't drop it.

    So I'm still waiting and can't really use D2009 for over half a year now.

    Michael

    ReplyDelete
  17. Has anyone tried to compile convert Bold to D2009 ?

    ReplyDelete
  18. Hi,
    I would like to know if the solution I conceived to a problem (I encountered yesterday) is considered "best practice".

    The code interfaced with WinINet.dll

    StrStr := TStringStream.Create(stringparam);
    .
    . Some StrStr manipulation
    .
    HttpSendRequest(Request,nil,0,@StrStr.DataString[1],(StrStr.DataString))

    It worked with Delphi 2007 but in 2009 it produced wrong result on the server. So I did the following

    StrStr := TStringStream.Create(stringparam);
    .
    . Some StrStr manipulation
    .
    tmpStr := AnsiString(StrStr.DataString);
    HttpSendRequest(Request,nil,0,@tmpStr[1],Length(tmpStr))

    What do you think?

    Thanks,
    Yoni.

    ReplyDelete
  19. @Yoni: If a PAnsiChar around the @StrStr.DataString[1] doesn't do it, then I would assume your solution is the one to go with.

    ReplyDelete
  20. @Bruce: It sure would... I downloaded AsynchPro and migrated it to Delphi 2009. It's about 145,000 LOC... After a couple of hours and 566 edits, I have it compiled and installed. 516 warnings still... I haven't got the slightest how to test it though... Trying the FTP example now.

    ReplyDelete
  21. >> I believe that 99% of all of your Delphi code migrates
    >> to Unicode simply by hitting the build button in Delphi 2009,
    >> and that the rest is handled by hints, warnings and errors.

    This statement is so arrogant and patronising, it really pisses me off. I have been developing and maintaining a tri-lingual application since the very first version of Turbo Pascal. And due to the lack of unicode support, I have been forced into hundred of tricks and kludges (as well as using third party controls) to keep it working. The application is around 200k LOC and frankly I don't even dare to touch it with D2009.

    Furthermore, perhaps you would admit to us how many man hours of recoding went into converting the Delphi IDE to unicode? The IDE is by far not an international application (indeed it is about as pure American as you could get); I would bet good money that you did not just press F9 and go...

    ReplyDelete
  22. @Anders Ohlsson: sure it's a piece of cake like you said never the less it's a modification we have to do therefore time consuming we have a lot of code to revisit i suppose we can just apply a batch process to transform all String, Char ect. to be AnsiString, AnsiChar etc. or redeclare these types to be Ansi globaly since we don't care about unicode at least for now. I think we all agree that a modification has to happen to make our code work and a lot of testing too. Another much simpler option is to stay with D2007 but at the same time we like the new VCL look and components available only in D2009.

    ReplyDelete
  23. >Apart from the bugs you QC’d (thanks), how could the IDE be improved
    >to help your migration experience?

    The first thing would be to have an option to disable the compiler "smart guessing" of string and character constants: no guessing, just use the specified or default types.
    For instance an unqualified string should be a unicode string. Always. An unqualified character should always be an UCS-2 character, not sometimes UCS-2 sometimes ANSI depending on the character.
    Similarly an UTF8String should be UTF8, always, no questions asked, etc.
    Fixing the CharInSet kludge by extending 'in' to 16bits would also help.

    Also eliminate kludges (TStringStream, I'm looking at you), a String should be a unicode string always, all the time.
    Taking the encoding variability out of String, and having it only for AnsiString would be nice too.
    Getting warnings for all the implicit automagic conversions for constants strings/characters would be a boon (even if they're safe).

    That would eliminate the guessing and head scratching on runtime behaviors, and introduce predictability: currently there are situations where looking at the code is insufficient to know what will actually happen.

    As for the IDE itself, first-class support for source files encoding would be nice, if a source is UTF8, make it obvious in the editor (in the status line f.i.), and properly support all international characters in the IDE (some unicode characters will garble the code editor view, there is at least one QC for that), you could also take care of the clipped characters at the end of an italic line while you're at it.

    ReplyDelete
  24. Thorsten, here's what I mean:

    function SomeAPIWrapper: string;
    const
    NumChar = 64 ;
    var
    Buffer: array[0..NumChar-1] of Char;
    begin
    SomeAPI(Buffer, NumChar);
    Result := Buffer;
    end;

    I accept your comment about the use of SizeOf and should have been clearer. SizeOf is a common pattern, but not one that I use for the above reason. That's what I meant about programming defensively. Always assume the other guy is out to ruin your code :-)

    ReplyDelete
  25. Andreas HausladenMay 5, 2009 at 5:02 PM

    @Eric: "you could also take care of the clipped characters at the end of an italic line while you’re at it."

    I don't see any letter clipped in an italic line in Delphi 2009. I see this in Delphi 7 but not in Delphi 2009 (Courier New, ClearType)

    ReplyDelete
  26. @AndrewFG: Oops... The 99% is a typo. It should be 90%. Sorry if it's pissing you off. That wasn't the intent.

    I understand if you don't have the extra time to make a copy of your application and test out a migration, but don't let FEAR hold you back.

    Of course some effort went into Unicode enabling the IDE. It's one of the most WinAPI intensive applications out there. Some of the code is 15 years old.

    To you point about not being international - we sell English, French, German and Japanese versions...

    @Batman: I would not recommend a global search and replace, and certainly not a redeclaration. That will come back and bite you.

    @Eric: Great feedback. Please QC all of those if you haven't already. The only issue I've seen with the editor is with Thai characters - Thai strings work as they should at runtime, but they are not showing up properly in the code editor.

    @Mark: Trying to foresee the future is always a good thing.

    ReplyDelete
  27. Ironically, it is the projects that were "Unicode enabled" prior to D2009 using WideStrings, hard-coded W versions of the Windows API, and 3rd party unicode GUI controls, that are proving to be far more work to port.

    Since so much of the VCL was unusable before (because it made ASCII assumptions), these projects re-invented much of the VCL functionality (just one simple example: ini-file handling). With less work, we could just fix these now-redundant functions (and gui components), but the right-way long-term is to replace them with built-in VCL functionality, which requires a ton of development and testing resources.

    All that said, I am VERY happy that Unicode support finally made it in (just like our current anticipation for 64 bit, and cross-our-fingers support for cross-platform compilation), it is just far more work than naive comments by those working with small codebases seem to imply.

    ReplyDelete
  28. Is there any guide-line or best practices of us to migrate our codes to unicode?

    We have tried to build our D5/D7 projects on D2009 trial. OMG, we even don't know how to start to migrate, and we don't know how many time it cost. We know the migration is the right thing, but we don't know how to do the thing right.

    Do you consider take some such projects, especially multi-lingual projects, migrate them, and give us some guide-line to do the thing right.

    ReplyDelete
  29. @yc: Does this help?

    Marco’s white paper - http://edn.embarcadero.com/article/38980
    Nick’s series of articles - http://blogs.embarcadero.com/nickhodges/2008/11/20/39149

    ReplyDelete
  30. Wonderful post!
    I found the conversion not hard but boarding: scanning through millions lines of code for SizeOf etc.
    I especially loved Cobus Kruger comment: “so most of the work was done with simple Find-Replace. And Find in Files and strong coffee is all you’d need...” so TRUE (at least for me).
    Anyway for my question:
    In the documentation for GetLongPathName it states:
    “In the ANSI version of this function, the name is limited to MAX_PATH characters. To extend this limit to 32,767 wide characters, call the Unicode version of the function and prepend "\\?\" to the path...”
    Does that mean that when i call the above method i need to give it a buffer with the 32,767?
    What about calls to GetModuleFileName do they need to provide such a big buffer?
    I’m sure I miss understood this documentation so if you can help....
    Thanks
    BH

    ReplyDelete
  31. Has anyone tried to port...

    http://snoopspy.springnote.com/pages/592081/attachments/936818

    Its a winpcap wrapper, so lots of nice pointers to play with.

    Pretty sure this is an example which will not be click and compile.

    Regards
    Andrew

    ReplyDelete
  32. Wouter van NifterickMay 14, 2009 at 1:01 PM

    This compiles in old Delphi's, but D2009 tells me that the datatype is too large: "exceeds 2 GB". Ran into this one several times.

    type TextAr=Array[0..MaxInt-1] of Char;

    I wonder what happens with a 64bit version of Delphi :-)

    ReplyDelete
  33. From AsyncPro...

    TOBuffer = array[0..pred(High(Integer))] of Char;

    Ugh...

    ReplyDelete
  34. See the thread: Unicode incompatible types error .... using winsock.pas shown at: https://forums.codegear.com/thread.jspa?threadID=17676&tstart=0


    Thanks.

    ReplyDelete
  35. I for myself make use of TPApro & TPSystool in most of my projects. If I see those migrated to D2009 well... but I bet it aint just a recompile for sure. maybe you should take over those nice libraries that have followed delphi from the beginning and include them in the product, since its as easy a recompiling ;)

    ReplyDelete
  36. Like many developers, I would love to upgrade to Delphi 2009. However I am reluctant to do so because I have developed a very large Delphi application that is highly dependent on third party utilities like SysTools, Orpheus, InfoPower, and ChartFX. The Turbo Power utilities are especially problematic because they are effectively orphaned. Is there any thought at Embarcadero of OFFICIALLY AND ACTIVELY assisting those developers who are willing and able to modify these utilities so that they can be run in Delphi 2009. Alternatively could someone at Embaradero take on this responsibility.

    I've talked to many other Delphi developers who are in the same situation. We are all holding off on the purchase of Delphi 2009 until this problem is resolved. I'm also concerned that these same third party utilities will not be compatible with the 64-bit Delphi compiler when it is released.

    Thanks for hearing my comments. I really hope that there is some way the Delphi community is able to resolve to this problem. When there is a resolution, I will one of the biggest and most vocal proponents of Delphi 2009.

    ReplyDelete
  37. This guided me to complete my job with ease :D

    ReplyDelete
  38. I had to do the same trick as in Yoni's post on:
    May 5th, 2009 at 10:32 am
    Casting as PAnsiString does not work. Can anyone explain why the tmp AnsiString type variable is necessary?

    ReplyDelete
  39. Awfully blithesome with them Tods Homme Vente Chaude http://airforcesweekly.com/todshommeventechaude/tods-homme-vente-chaude.html

    ReplyDelete
  40. My daughter said they are very reasonable and very much relaxed to show prix sac tod's homme http://airforcesweekly.com/todspascher/tods-pas-cher.html

    ReplyDelete
  41. The textile is lax
    mocassins tod's http://airforcesweekly.com/todsmocassinhommeprix/tods-mocassin-homme-prix.html

    ReplyDelete
  42. Extravagantly shipping with low figure and fashion design with extreme property
    billige moncler http://www.storemoncleronline.com/da/

    ReplyDelete
  43. I fool got my Uggs positively what i ordered
    Canada Goose Canada http://canadagoosecanada.blogspot.com

    ReplyDelete
  44. Give A Person Run Deep Impression
    Hogan Femme 2012 http://hoganfemme2012.blogspot.com

    ReplyDelete
  45. I have learn a few good stuff here. Certainly worth bookmarking for revisiting. I surprise how much attempt you put to make such a magnificent informative website.

    ReplyDelete
  46. It's feeling tore femintne gge sensitive
    HOGAN SOLDES http://chaussureshoganfemmev.blogspot.com

    ReplyDelete
  47. mocassins tod s vernis pas cher
    veritable tods ?chaussurres tods boutiques http://www.chaussurestods.eu/todds/todsfrance.html

    ReplyDelete
  48. Chaussure Supra Tk Society Homme Gris
    baskette supra http://baskette-supra.blogspot.com

    ReplyDelete
  49. bottines
    quel couleur tod's http://www.sxjhwzfd.com/chaussurestods/chaussures-tods1.html

    ReplyDelete
  50. basquette louis vuitton
    collant louis vuitton http://louisvuittonbruxelles.blogspot.com

    ReplyDelete
  51. You know therefore substantially on the subject of this specific issue, created myself for my part believe it through quite a few various facets. It is including both males and females will not be intrigued except in cases where it is actually think about use Woman crazy! Your own things excellent. Continuously take care of it!

    ReplyDelete
  52. We've find out a few fantastic information the following. Surely benefit social bookmarking intended for returning to. I actually astonish how the whole lot try you determine for making any such superb useful internet site.

    ReplyDelete
  53. I'm not certain where you're getting your info, but good topic. I must spend a while learning much more or understanding more. Thanks for wonderful info I used to be on the lookout for this info for my mission.

    ReplyDelete
  54. tods soldes
    Louboutin Chaussures http://louboutinchaussuresc.blogspot.com

    ReplyDelete
  55. Moncler Femmes
    Parajumpers Alaska Kvinder http://ParajumpersAlaskaKvinders.blogspot.com

    ReplyDelete
  56. Piumini Moncler Uomo
    Parajumpers Kodiak M?nd http://parajumperskodiakmnd.blogspot.com

    ReplyDelete
  57. guilty! and yes it blew up in my face!! .. very true! my husband (separated now) also did the same even without admitting to it! Plus I felt I didnt come first in his eyes .. anyway ... even though family always mean good it never really ends well! Bad .. bad .. bad!! There were other issues but this put a lot of pressure and unnecessary tension btw me and him and in some ways one of the reasons why we arent together no more. Great advice .. dont ever bring family into your issues!! Ever!!

    ReplyDelete
  58. peuterey rivenditori|piumino donna peuterey|prezzi peuterey 2013
    giacconi peuterey http://itpeuterey5.makrofuari.com

    ReplyDelete
  59. I do believe this is the most sizeable details for me personally. That i'm pleased learning ones report. But want to remarks upon many widespread issues, The internet site fashion is a plus, the articles is definitely pleasant : Chemical. Beneficial approach, many thanks

    ReplyDelete
  60. The Hacker’s Corner » Blog Archive » My code can’t be migrated to Unicode!

    ReplyDelete
  61. Hi colleagues, its impressive article The Hacker’s Corner » Blog Archive » My code can’t be migrated to Unicode! regarding tutoringand completely explained, keep it up all the time.

    ReplyDelete
  62. Heya i'm for the first time here. I came across this board and I in finding It truly helpful & it helped me out much. I'm hoping to provide something back and aid others such as you aided me.

    ReplyDelete
  63. certainly like your web-site however you have to check the spelling on quite a few of your posts. Many of them are rife with spelling problems and I in finding it very bothersome to inform the truth however I'll certainly come back again.

    ReplyDelete
  64. Hey There. I stumbled upon your weblog the usage of msn. This is a actually beautifully composed document. I am going to make sure to take a note of them as well as resume get more information of this helpful information and facts medicaresolutions medicaresolutions. Many thanks for this post. I'm going to undoubtedly return.

    ReplyDelete
  65. Just wish to say your article is as astounding. The clearness in your post is simply spectacular and i can assume you're an expert on this subject. Well with your permission allow me to grab your RSS feed to keep updated with forthcoming post. Thanks a million and please keep up the gratifying work.|

    ReplyDelete
  66. Thanks for one's marvelous posting! I truly enjoyed reading it, you can be a great author. I will be sure to bookmark your blog and may come back later on. I want to encourage you continue your great job, have a nice evening!|

    ReplyDelete
  67. Hi there, I found your web site via Google whilst looking for a comparable matter, your site came up, it appears to be like great. I have bookmarked it in my google bookmarks.

    ReplyDelete
  68. Way cool! Some extremely valid points! I appreciate you writing this write-up and the rest of the website is very good.|

    ReplyDelete
  69. Ductless Air Conditioning Technique Presentation

    ReplyDelete

Note: Only a member of this blog may post a comment.