State of the union, version 1.2
With SyncEvolution 1.2 released and work on 1.3 under way it is a good time to take a small break and reflect on the state of the SyncEvolution project.
This section introduces the key ideas, some of the supported protocols + backends and how they work in combination with specific peers.
The main purpose of SyncEvolution is the synchronization of Personal Information Management (PIM) data (contacts, events, tasks, notes). This intentionally does not include email and messages in general because those are sufficiently different to require different solutions. SyncEvolution provides real synchronization of PIM data:
- Items (= contact, event, ...) may have different properties and representations on both sides of a sync (heterogeneous environment).
- Items can be modified at any time, even while offline (more than just online access to a server).
- Also works between devices, without involving a server.
- A database (= a set of items) can be synchronized to one or more peers (other devices, servers). For example, the same address book on a desktop can be synchronized with a mobile phone and a laptop.
- SyncEvolution supports arbitrary number of databases. However, many peers (in particular SyncML servers) are more limited and only provide access to one addressbook, calendar, etc.
For each peer, SyncEvolution needs to remember the previous state so that the next sync can be limited to only those items which have changed. The underlying assumption is that there is no cycle in the connections between multiple sync peers. For example, "phone <-> laptop, laptop <-> server, phone <-> server" is bad because a new item created on the phone will go to the laptop, from there to the server and then come back as new item to the phone, resulting in a duplicate which then repeats the process ad infinitum. Each SyncEvolution instance only knows the peers it is set up to talk with and thus cannot detect the problem.
To break this cycle reliably, each item must have a unique ID that is assigned to it once when the item gets created. For ad-hoc synchronization to work without constraints on the topology, this ID must be supported by all involved peers. Unfortunately, calendar and contact data either doesn't have such an ID or it is not supported. Without such an ID, each request to add a new item must be checked against all existing items based on some key properties to find out whether it is a duplicate. This is slow and error-prone ("Is 'Doe, John" the same person as 'John Doe', or is one the father and the other a child or namesake?").
TODO "ad-hoc synchronization between SyncEvolution instances":
- Teach Evolution Data Server to create and preserve a unique ID for contacts (the vCard 3.0 UID gets overwritten at the moment).
- Improve the SyncEvolution<->SyncEvolution sync such that
- it uses the unique ID to speed up sync between peers which sync against each other for the first time and
- checks the ID to find duplicates in following syncs.
TODO "ad-hoc synchronization between arbitrary peers":
- Do the duplicate detection based on item content for each add request.
SyncEvolution has an extensive test suite which is run regularly (used to be nightly, currently triggered manually). It contains unit tests for various aspects of SyncEvolution and does real interoperability testing with different SyncML, CalDAV and CardDAV servers. See the SyncEvolution 1.2 test report for an example. More CalDAV/CardDAV servers were added in the 1.3 branch.
Keeping this testing going and analyzing/reporting problems is an on-going activity. It is needed to avoid regressions and achieve higher quality in the releases, which are built as part of these test runs. There are additional ideas.
TODO "improve nightly testing"
- Only the sync code is currently checked under valgrind. Also run syncevo-dbus-server under valgrind.
- Run tests in MeeGo (or soon Tizen) chroot.
- Automatically integrate and test branches which are considered ready for integration.
- Test syncevo-http-server. Could do suspend/resume tests against that.
SyncML + Engine
The Synthesis engine which provides SyncML is one of the best in the industry. It has extensive support for data modeling/conversion and supports suspending a session and resuming it later.
The same engine is also used to synchronize between two backends internally. This is how synchronization was added for CalDAV/CardDAV, protocols which themselves only provide online access. This works reasonably well, but there are also quite a few limitations.
- make data conversion available outside of a sync session
- decouple sync engine from SyncML and SyncML message encoding/decoding
- support items which are a set of items: important for CalDAV + ActiveSync, because those combine all VEVENTs with the same UID in one item; certain transformations cannot be done in the engine at the moment because they depend on access to all related items at once and that is not how they are handled at the moment
- more flexible session handling: instead sending changes in one direction, then back and then stopping, allow the session to continue until both sides are in sync; required for CalDAV where storing an event might lead to further changes that have to be sent back
TODO "push sync":
- react to local or remote changes immediately (instead of polling at long intervals) and/or
- make sync sessions without changes more efficient (in particular when polling)
TODO "better credential handling"
- When creating multiple configurations which need the same credentials (Google CalDAV and SyncML, for example), the username/password needs to be set separately. Replace with a mechanism where both configs only contain a pointer to shared credentials.
- Also support an external system component which does the authentication without ever returning username/password to SyncEvolution. Depends on having such a component.
This could be used to use a CalDAV/CardDAV server as backend for a SyncML server.
- In the backend, check databaseuser/password before falling back to the context's username/password.
- Support shared credentials, to avoid having to configure them in each source.
At the moment there are situations where SyncEvolution cannot determine what the right resolution for a failed synchronization is. It has a built-in backup mechanism and can ask the user for assistance, but ultimately it would be better to not bother the user. This can be achieved in some cases by simplifying the problem:
- The peer (most likely a service on the Internet) must be able to store all data.
- There is a separate local database for each database on the peer.
In that case it is acceptable to wipe out the local data and restart with the data stored on the peer.
TODO "automatic error recovery"
- Implement the necessary policy in SyncEvolution ("peer wins").
- Check whether this interferes with libfolks (local IDs will change, data added by libfolks to a contact might get lost).
Works reasonable well and passes automated testing against a variety of servers (Apple Calendar Server, DAViCal, Google Calendar, Yahoo). But there are some know limitations, like meeting invitations being sent by Evolution and the CalDAV server.
- Find a way to suppress sending of meeting invitations on the CalDAV server when Evolution already sent one, or
- suppress sending of meeting invitations in Evolution for calendars which are mirrored in a CalDAV server.
- When storing a meeting and/or the server does not return an ETag, retrieve the possibly modified item from the server and store the modified item locally. Right now the more possibly automatically modified data on the server gets ignored. Depends on the engine improvements mentioned above.
TODO "handle concurrent changes"
- Use ETags to avoid modifying more recent data on the server.
TODO "CalDAV attachments"
- At the moment, attachments are not supported at all and even may get lost. Need to preserve them and perhaps even support the more efficient "managed attachments" that are currently being discussed by the CalConnect consortium.
TODO "WebDAV: use sync extension"
- Each sync session must list the entire collection (= retrieve path names and ETag) to determine new and modified items. This can be done more efficiently by using the sync extension defined and implemented in the Apple Calendar Server.
TODO "CalDAV tasks = VTODO
- The CalDAV backend is limited to exchanging VEVENTs. It might also not handle collections well (or at all) that contain both VEVENTs and VTODOs.
This was not included in SyncEvolution 1.2 although it was already quite usable. Work on it is still going on. The main issue with ActiveSync is the limited data model specified as part of the protocol.
Contacts are only allowed to have a fixed number of certain phone numbers and addresses, which is a limitation that neither Evolution nor Google Contacts have. There is no good way to handle these limitations except educating the user about them, or enforcing the same constraints locally by modifying the app which creates and modifies contacts.
The ActiveSync calendar format does not support detached recurrences properly ("you are invited to a specific instance of a meeting series"), although Exchange internally does.
TODO "finish initial ActiveSync support"
- detached recurrences without parent:
- investigate receiving multiple detached recurrences in multiple items with the same UID in each (the Exchange workaround for the ActiveSync calendar format limitation): might break SyncEvolution
- implement the "stand-alone detached recurrence" support (either do it like Exchange does or better, create a fake parent event)
- test concurrent item changes while a sync runs (supposed to work, but without a test case it is hard to be sure)
[updated] TODO "ActiveSync performance improvement"
- Writing changes to the server is done one change at a time. Could be improved considerably by batching changes. Depends on core engine improvements.
TODO "ActiveSync push sync"
- enhance activesyncd and SyncEvolution (see engine above) to react to server-side changes with minimal delay
- At the moment, attachments are not supported at all and even may get lost. Need to preserve them. Not sure how to do it efficiently.
Overall Google Calendar syncing works well with CalDAV (I'm using it myself) as long as one does all meeting scheduling in the Google web interface. But there are some known issues, most of them on the server side:
- stand-alone detached recurrences cannot be accessed, SyncEvolution workaround only works for some cases
- meetings removed by SyncEvolution - not sure yet whether this is better fixed on the server or client side
- cannot updated/delete meetings although the organizer: SyncEvolution is allowed to create an event, but then cannot update or remove it
- cannot remove detached recurrence: breaks the testLinkedItemsRemoveNormal tests
Works via SyncML. Google's support for SyncML is very incomplete (many properties not supported, for example birthday).
TODO "better Google Contacts support"
- Test ActiveSync and/or
- write backend based on Google Data Protocol - might be the best solution, because ActiveSync also has limitations (see above).
Apple Calendar Server, DAViCal
Done via CalDAV/CardDAV. No know issues with these peers.
Yahoo Calendar + Contacts
Works well, when it works; unfortunately the number of requests per 24 hour period is so limited that the tests cannot complete without running into a 503 "Service Unavailable" error.
TODO "Yahoo token authentication"
- Currently SyncEvolutions uses normal HTTP authentication. Yahoo also supports another, token based authentication mechanism for approved apps. Get SyncEvolution approved and implement that other authentication mechanism. Hopefully that'll avoid the 503 error.
SyncEvolution can synchronize against phones if those phones support SyncML via Bluetooth. This is a common feature among older feature phones but most (all?) Android phones and iPhones don't support it. Testing of this feature is limited and there is no list of phones which are guaranteed to work.
[updated] Synchronization with Android/iOS is possible by installing a third-party SyncML client (like the ones from Synthesis) and configuring the phones to use a SyncEvolution HTTP SyncML server.
TODO "contact sync via PBAP"
- Write a backend using PBAP, the only (?) protocol supported by Android and iOS for address book access. Not very good for real syncing, but at least one-way sync should be possible.
The GTK "sync UI" is the main user interface for SyncEvolution in MeeGo and Linux in general. It is included in the source and binary distribution archives. It supports configuring syncing against SyncML services (extensible via configuration templates) and phones which support SyncML. Configuring the latter is integrated into the GNOME Bluetooth applet, which invokes the sync UI. The UI also has "emergency recovery" support which allows the user to restore from the automatic backup and/or choose between different recovery operations after a failed sync:
- continue with local data
- continue with remote data
- try to merge both ("slow sync")
TODO "port to GTK3"
- The sync UI is currently using GTK2. It needs to be ported to GTK3. The goal is to put the GTK2 version into maintenance and continue with the GTK3 version.
- CalDAV/CardDAV and ActiveSync cannot be configured yet with the UI. In this context it becomes important to let the user choose local and remote databases. Would be useful to have, although there is a certain overlap with the Evolution integration.
This happens to be the main backend for storing data locally. Other backends could be supported just as well (there's nothing technical which favors Evolution) if there were developers motivated enough to implement and test them. KDE/Akonadi has come a long way, but seems less active (see KDE/Akonadi and Community below).
TODO "Evolution backend improvements"
- use new APIs in EDS 3.2/3.4 for more efficient change detection
- enable creating databases from inside SyncEvolution again (depends on EDS 3.4 (?))
- Evolution typically supports offline read access with some of its backends. It does not support write access. SyncEvolution addresses that, but needs to be configured and invoked separately. It would be nice to have seamless and transparent syncing from inside Evolution. ActiveSync is getting integrated like that at the moment -> extend that to CalDAV/CardDAV?
There is a generic file backend with a 1:1 mapping between a single item and a file in a directory. The format of the local data can be configured. vCalendar 1.0, iCalendar 2.0, vCard 2.1/3.0 and plain text notes are supported already.
TODO "iCalendar 2.0 .ics file"
- Some people synchronize an .ics file by pointing the Evolution backend to it (no longer works after EDS removed support for the file:// URI) or manipulating the ~/.evolution/calendar data (a hack which only works when being very careful). Write a SyncEvolution backend which reads/writes an iCalendar 2.0 file using libical.
The Akonadi backend is available in the source code. Support for KWallet instead of GNOME keyring is also there. A KDE GUI is in development.
TODO "test and release binaries"
- nightly builds needs to be reconfigured to enable building the KDE support
- nightly testing needs to include KDE backends, both for unit testing the backends and real combinations with peers; it depends on the KDE developers to do something if issues are found in those tests (if any are found)
Manipulating PIM data
SyncEvolution also is a very capable tool for manipulating databases via the command line. It can list, import/export and delete items. Converting between formats (for example, mirror Evolution contacts as files in vCard 2.1 format) is possible by setting up synchronization with a file backend as peer.
TODO "improve item manipulation"
- the command line syntax is a bit awkward and depends on dummy configurations: make config+source names optional
- convert between formats (depends on libvxx engine improvements)
Lots of users. The feedback from users is often very helpful for improving the software.
Not so many developers, though, except those paid to work on SyncEvolution. Ove Kaaven (port+UI for Maemo/N900 and MeeGo Harmattan/N9/N950) and Frederik Elwert (Genesis UI) are the notable exceptions. Sascha Peilicke/Dinesh/Rohan Garg have done some work for KDE/Akonadi, but are not very active in the SyncEvolution project itself.
I can only speculate about the reasons for the lack of external contributors:
- The number of people who care about PIM sync and storage is small to start with. It's not one of those sexy areas that gets a lot of attention, although it is arguably very important.
- OpenSync still seems to be considered "the" open source sync solution, despite not having a stable release available or even in sight anytime soon.
- Perhaps developers also get the impression that problems will be solved anyway, without having to get involved. That is true for some aspects, but definitely not for all.
- SyncEvolution development has a certain learning curve (although Ove and Franz Knipp managed to write their backends with very little assistance) and often happens at a rapid pace, which makes it hard for people to contribute small improvements.
- SyncEvolution was focused on Evolution and SyncML initially, which might make it seem too limited in scope for some use cases. But the scope has already increased and will increase further, so now would be a good time to check it out again.
Many of the TODOs above will not get worked on unless some external developer picks up the mantle and contributes patches. I'd be more than happy to help someone get started. If there is interest, I can also tag bugs in Bugzilla as "easy fixes". Right now I am not doing that because it would create additional work which would not be justified if there is no-one interested - I know, chicken and egg...