Skip to end of metadata
Go to start of metadata
  • Ensure that liveconfig has been enabled by looking for the string "LIVECONFIG=true" in the startup log. It's best to enable it on when calling runAssembler
  • Consider enabling liveconfig in all remote pre-production environments
  • The EAR should be deployed to all remote pre-production environments in standalone mode (look at server startup for "standalone=true")
  • Verify that all of the data source components (for instance, /atg/dynamo/service/JTDataSource) have a class of atg.nucleus.JNDIReference or atg.service.jdbc.WatcherDataSource. If atg.service.jdbc.WatcherDataSource is used, logging should be disabled. The class atg.service.jdbc.MonitoredDataSource should never be used
  • Ensure that loggingDebug is disabled for all components. After a load test, search the logs for "**** debug"
  • Disable OOTB ATG Logging (debug.log, info.log, warning.log, error.log), as these are redundant and logged into the application servers's stdout log. Create the property localconfig/atg/dynamo/service/logging/LogQueue.properties with "logListeners^=Constants.null" (no quotes)
  • [ATG 10+] Disable the OOTB dynamo.log, as it is also redundant. This is accomplished with the Java argument: -Ddisable.atg.dynamo.log=true (see MOS Article:https://support.oracle.com/rs?type=doc&id=1362731.1)
  • Check that SQLRepositoryEventServer starts up properly if distributed caching is being used. Look at the startup log
  • Ensure that selective cache invalidation is properly enabled. Test it thoroughly
  • Ensure that ServerLockManager is not running on an instance that also uses DAF.Deployment (for instance CA, or Search Admin). To find out which modules are running, look at the "Running Applications" page on /dyn/admin. To see if ServerLockManager is running, grep the startup logs for "ServerLockManager"
  • If locked caching is enabled, ensure that two ServerLockManagers are running per commerce cluster - a primary and a backup. All instances in the cluster should point to the same two ServerLockManagers
  • Verify that URL rewriting is being handled. All links (including document.location.href) should be appended with ;jsessionid=x for cookieless users. This is called URL rewriting. <dsp:a tags automatically append jsessionid. Standard <a href tags do not. If URL rewriting is not done properly for all links, cookieless users should be prevented from accessing the site. Search engine bots, which do not use cookies, will end up creating hundreds of thousands of sessions with each crawling
  • Check that at least two global scenario server (GSS) instances are running per environment. The number of GSS instances is highly dependent on the number and profile of global scenarios. These GSS instances should ideally not be running on an instance that handles end-user sessions. Larger environments (> 100 instances) only need a handful of GSS instances
  • Ensure that all important/relevant patches, fixpacks and hotfixes have been applied. Log in to support.oracle.com, click on the "Patches & Updates" tab, click on the "Product or Family (Advanced)" sub-tab, enter "Oracle ATG Web Commerce" in the box, select your major product version below (e.g. 10.x, 10.1.x, etc) and hit the "Search" button
  • Ensure that only one process editor server (PES) is running per cluster. If a second process editor starts up, it could replace all of the currently running scenarios with the new ones (if there are any - sometimes there aren't). This can cause major problems. Run the query "select distinct s.machine from v$session s where s.USERNAME = '<CORE_SCHEMA>'" against your Oracle DB to find all boxes connected. Run "select count(*) from v$session s where s.USERNAME ='<CORE_SCHEMA>' and MACHINE='<HOST>';" for each result to count how many ATG instances are connected. On each host, run "ps -ef | grep java | grep atg" to find each ATG process. Go to /dyn/admin/nucleus/atg/scenario/ScenarioManager on each instance. The processEditorServer flag should be false on all but one instance
  • If repository items are imported, be sure to reset the seeds of the affected idspaces appropriately (das_id_generator.seed and das_secure_id_gen.seed). There should be no risk of the ID generators handing out an ID that's already in the database
  • Check through logs from load tests looking for poor transaction demarcation (grep the logs for "atg.dtm.TransactionDemarcationException"). If poor transaction demarcation is found, it needs to be fixed
  • Ensure that nobody changed the $class of /atg/dynamo/transaction/TransactionManager unless there was a good reason to do so. ATG sets this automatically and it should generally not be overwritten
  • When using BigEars (where you provide -Datg.dynamo.modules=x to the JVM), ensure that all modules are started in order from more generic (left) to more specific (right). Verify using this JSP
  • Ensure that static 404, 500, etc pages have been created and are working. Enter jibberish URLs (when connected through the load balancer) and see what happens
  • Verify that the default passwords have been changed for all accounts in das_account
  • Verify that the default passwords for all users in /atg/userprofiling/InternalProfileRepository have been changed
  • Check that $DYNAMO_HOME/localconfig and $DYNAMO_HOME/home/servers/*/localconfig are empty and non-writable to developers. Developers on a dev box can add seemingly innocuous settings here that end up resulting in major issues later because these directories are rolled up into EARs that get deployed to all environments
  • Verify that startup logs are clean. There shouldn't be errors
  • Make sure that the ATG performance monitor has been enabled during load tests (and only during load tests) and its output studied
  • Ensure that the proper ATG repository cache modes are being used. Seehttp://docs.oracle.com/cd/E23095_01/Platform.93/RepositoryGuide/html/s1003cachingmodes01.html. Avoid using locked caching where possible. Consider having someone from Oracle review the settings
  • Ensure that ATG repository cache size tuning has been performed. If the caches are full, the heap should not be full. See this document for more information
  • Verify that adminPort, drpPort, httpPort, rmiPort, siteHttpServerName, siteHttpServerPort properties of /atg/dynamo/Configuration have been properly set on an instance-by-instance basis. Each JVM should have its own unique ATG server (-Datg.dynamo.server.name=X where X = /export/stor07/Oracle/Middleware/user_projects/domains/customer_domain/ATG-Data/servers/X)
  • Ensure that the catalog maintenance service (CMS) is running on one instance in the commerce cluster. The unessential functions should be removed. Seehttp://docs.oracle.com/cd/E23095_01/Platform.93/ATGCommProgGuide/html/s0502batchservices01.html
  • Ensure that all System.out.println statements are removed from the Java code. Be careful when removing them, as code could break (e.g. Iterator.next() calls)
  • Check that the ScenarioManager, WorkflowProcessManager (in CA, CSC, Search Admin etc), InternalScenarioManager (in CA, CSC, Search Admin, etc) all start up properly. A successful startup message looks like "22:00:17,250 INFO [ScenarioManager] Initializing Process Editor Server X:20150." If you see a message like "Initializing individual process server X:8851. Current configured Process Editor Server is set to X:8851", ensure that the value configured in <server-name> for scenarioManager.xml on all cluster instances is equal to the value of /atg/dynamo/service/ServerName.serverName on the designated PES/WPM/ISM instance and restart
  • If price lists are being used and the number of lists == 1, consider setting /atg/commerce/pricing/priceLists/PriceListManager.usePriceCache=false
  • If custom catalogs are used, ensure that DCS.DynamicCustomCatalogs is not running on the agent-side. Verify using the "Running Applications" page on /dyn/admin
  • If internationalized content is stored in repositories backed by Oracle, make sure that all such repositories have useSetUnicodeStream=true. Don't forget the _production and _staging repositories. Put in GLOBAL.properties
  • If using SQLServer, ensure that useSetUnicodeStream=false for all repositories. Put in GLOBAL.properties
  • Look through log files for sensitive data. For instance, make sure nobody printed out a user's credit card number, expiration date, and CVV2 number in a logInfo(). There should be no personally identifiable information contained in logs, with the exception of session IDs
  • Verify that "<distributable>" has been added to the web.xml of each web app using session failover or that -distributable is being passed to runAssembler
  • Ensure that the taglibs (specifically the DSP taglib) in the custom web apps match the patch level that ATG is at. When applying ATG patches, the taglibs in the custom web apps often have to be manually updated
  • Verify that JavaScript libraries have the "defer="defer"" attribute where possible. If that attribute is not present, the entire JavaScript file must be downloaded (by itself; nothing else may be downloaded in parallel) and fully parsed before the client-side downloading/rendering can continue. Where possible, the defer attribute should also be used for 3rd party libraries. See http://www.schillmania.com/content/entries/2009/defer-script-loading/. There should always be a way to disable JavaScript includes quickly and at runtime
  • Be sure that there are no plans to use the ACC in production unless it is absolutely required
  • Make sure that PDML caching is enabled. maximumCacheEntries, maximumEntryLifetime, maximumEntrySize and maximumCacheSize of /atg/commerce/pricing/PMDLCache should all be set to a large value
  • Check that idspaces in das_secure_id_gen and das_id_generator don't have numeric prefixes or suffixes. You could run into primary key constraint violations otherwise (e.g. prefix of "1" and seed of "001" - you'll run into a conflict with ID "1001")
  • Ensure that all of the batch sizes in das_secure_id_gen are > 997. The batch sizes can default to 7, which leads to great contention on specific records in das_secure_id_gen. That can lead to a cascading failure of a production environment
  • Be sure that all comments are JSP comments instead of HTML comments. HTML comments can give hackers valuable information about the construction of your site
  • Be sure that only forms containing non-sensitive data use HTTP GET. All sensitive data should be submitted using HTTP POST
  • Ensure that all sensitive data is always sent over HTTPS. With the exception of a user's HTTP session ID, nothing personally identifiable should be sent over HTTP
  • Ensure that all components (e.g. JavaScript, CSS, images) of pages sent over HTTPS are accessed over HTTPS
  • Ensure that if transient users or orders are persisted, make sure the consequences are well understood
  • Consider using repository cache groups, which will cut down on the number of individual queries to the database. When cache groups are not used, the contents of each auxiliary table are retrieved using individual SQL queries, as opposed to using one join
  • Consider changing request-handling thread names. This greatly aids with troubleshooting. See MOS Note 1486700.1 - Improve Thread Names with ThreadNamingPipelineServlet to Aid in Hung Thread Analysis for more information
  • Ensure that the browsers used for accessing all internally-facing applications comply with Oracle's supported environments matrix. The browser versions do matter, especially with the liberal use of Flex, JavaScript and other front-end technology
  • Consider using the transaction droplet if you see an excessive number of transactions in the database and/or high database CPU utilization. Be sure to thoroughly test this before deploying to production. Additional information can be found in MOS Note 1038076.1 - How to Use Transactions in Request Processing to Improve Database Performance.
  • Verify that the PageFilter in web.xml is bound to the right extension. Typically it's *.jsp, not /*. If it's bound to /*, all HTTP requests (including those for images, CSS, JavaScript, etc) will pass through the servlet pipeline
  • Verify that the item-cache-size and query-cache-size item descriptor attributes set in repository definitions match what's reported in /dyn/admin. There are situations (eg. certain super/subtype relationships) where the values set in the repository definition XML are not the values that are used at runtime
  • If you're using ATG's REST functionality, be sure that only the bare minimum amount of required functionality/data is exposed in restSecurityConfiguration.xml. Test that nobody from the outside can execute methods and retrieve/manipulate data by arbitrarily entering REST URLs
  • Check the code for ArrayLists, HashMaps, and other collections that are not thread-safe but are accessed in a multi-threaded environment (any component marked as "global" in scope). Synchronize or switch to equivalents that are thread safe. Problems resulting from this are notoriously difficult to troubleshoot and can lead to data inconsistency
  • Avoid using MD5 as the password hasher. Use SHA 256 with strong passwords. The default is now SHA 256 but it could be MD5 from an old implementation. See /atg/dynamo/security/DigestPasswordHasher.algorithm
  • Make sure that all caches have bounds, particularly instantiations of atg.service.cache.Cache, like /atg/commerce/pricing/priceLists/PriceCache. Many caches have no bounds out of the box. Caches without bounds are effectively memory leaks
  • Verify that the key repositories are pointing to the proper ID generators. For example, the profile repository using the obfuscated ID generator would not be a good idea
  • Make sure that ATG is not actually installed in a production environment. The EAR should be fully self-contained (e.g. not in "development" mode where it contains pointers to the ATG installation)
  • No labels